hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-09 03:11:58 +00:00

Author	SHA1	Message	Date
Teknium	3601e20f47	fix(windows): use PortableGit (not MinGit), fix relaunch os.execvp crash, surface npm errors Three real bugs from teknium1's first Windows install run: 1. MinGit has no bash.exe. MinGit is the minimal-automation Git for Windows distribution — it ships git.exe but deliberately strips bash and the POSIX coreutils. Installer logged "Could not locate bash.exe" and Hermes would fail to run any shell command. Switched to PortableGit — the full Git for Windows minus the installer UI. PortableGit ships bash.exe at <root>\bin\bash.exe plus sh, awk, sed, grep, curl, ssh in usr\bin\. ARM64 variant is detected separately (PortableGit--arm64.7z.exe). 32-bit falls back to MinGit-32-bit with a warning (PortableGit is 64-bit only). PortableGit ships as a 7z self-extractor (56MB vs MinGit's 38MB). We invoke it with `-o<target> -y` to extract silently — no 7z install needed, it's self-contained. Updated tools/environments/local.py::_find_bash candidate order to prefer the PortableGit layout (<root>\bin\bash.exe) with the MinGit layout (<root>\usr\bin\bash.exe) as a fallback so existing installs keep working. 2. os.execvp "Exec format error" on Windows.* Setup wizard's "Launch hermes chat now? Y" called `os.execvp(["hermes", "chat"])` which on Windows can only swap to real Win32 .exe files — chokes with OSError(8) on .cmd batch shims and Python console-script wrappers. Added a win32 branch in hermes_cli/relaunch.py::relaunch() that uses subprocess.run + sys.exit — functionally identical (user sees "hermes exited, then new hermes started") with one extra PID in play. POSIX path is UNCHANGED — still uses os.execvp for in-place replacement. Catches OSError in the Windows branch and surfaces a "open a new terminal so PATH picks up, then re-run hermes" hint instead of a cryptic traceback. 3. npm install failures silent on Windows. The install.ps1 was invoking `npm install --silent 2>&1 \| Out-Null` inside a try/catch. PowerShell's try/catch does NOT trigger on non-zero process exit codes — only on unhandled .NET exceptions — so npm failing printed a generic "npm install failed" with zero information about WHY. The silent pipe ate the stderr. Rewrote Install-NodeDeps to: - Resolve npm.cmd via Get-Command (respects PATHEXT) instead of relying on bare `npm` name resolution. - Use Start-Process with -PassThru to capture the actual exit code. - Redirect stderr to a temp log and surface the first ~800 chars of the real npm error when install fails, plus the log path for the full text. - Fail loudly with the right exit code instead of a misleading success. - Bail cleanly with a helpful message when npm isn't on PATH at all. 4. "True" printing to console after Node check. `Test-Node` returns $true; installer called it as a bare statement (no assignment, no cast). PowerShell prints bare return values. Wrapped the call in `[void](Test-Node)`. ## Tests - Added 3 new tests in tests/hermes_cli/test_relaunch.py covering the Windows branch: subprocess is called (not execvp), child exit code propagates, OSError surfaces a helpful message. All 23 tests pass (20 existing + 3 new). - 77 Windows-compat tests still pass, POSIX behaviour unchanged.	2026-05-08 14:27:40 -07:00
Teknium	e93bfc6c93	feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags Second pass on native Windows support, driven by a systematic audit across five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG, os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns (npm.cmd batch shims, start_new_session no-op on Windows), subsystem health (cron, gateway daemon, update flow), and module-level import guards. Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux, test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix. ## New module - hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command, windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs). All no-ops on non-Windows. ## CRITICAL fixes (would crash or silently break on Windows) - tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would AttributeError on import on Windows, breaking `hermes --tui` entirely (it spawns this module as a subprocess). Guard each signal.signal() call with hasattr() and add SIGBREAK as Windows' SIGHUP equivalent. - hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop behind `os.name != "nt"` — Windows has no zombies anyway. - tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0 ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows). Generated sandbox client parses both. - cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use shutil.which("bash") so Windows (Git Bash via MinGit) works, with a readable error when bash is genuinely absent. - 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py (npm install + node version probe), browser_tool.py x2. On Windows npm is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...]) fails with WinError 193. shutil.which(...) returns the absolute .cmd path which CreateProcessW accepts because the extension routes through cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the same path subprocess would resolve itself). ## HIGH fixes (silent misbehaviour on Windows) - tools/environments/local.py get_temp_dir: hardcoded /tmp returned on Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd tracking silently broken — `cd` in terminal tool did nothing. Windows branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes (works in both bash and Python, guaranteed no spaces). - tools/environments/local.py _make_run_env PATH injection: `/usr/bin not in split(":")` heuristic mangles Windows PATH (";" separator). Gate the injection behind `not _IS_WINDOWS`. - hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer Popen + watcher-script Popen both used start_new_session=True, which Windows silently ignores. Watcher stayed attached to CLI's console, died when user closed terminal after `hermes update`, left gateway stale. Now branches through windows_detach_popen_kwargs() helper (CREATE_NEW_PROCESS_GROUP \| DETACHED_PROCESS \| CREATE_NO_WINDOW on Windows, start_new_session=True on POSIX — identical to main). ## MEDIUM fixes - gateway/run.py /restart and /update handlers: hardcoded bash/setsid chain crashes on Windows when user triggers /update in-gateway. Now has sys.platform=="win32" branch using sys.executable + a tiny Python watcher with proper detach flags. POSIX path is unchanged. - cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/... style paths that break subprocess.Popen(cwd=...) and Path().resolve(). Added _normalize_git_bash_path() helper that translates /c/Users, /cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op. _git_repo_root() now routes every result through it. - cli.py worktree .worktreeinclude: os.symlink on directories failed hard on Windows (requires admin or Developer Mode). Falls back to shutil.copytree with a warning log. ## Tests - 29 new tests in tests/tools/test_windows_native_support.py covering: subprocess_compat helpers, TUI entry signal guards, kanban waitpid guard, code_execution TCP fallback source-level invariants, cron bash resolution, npm/npx bare-spawn lint per-file, local env Windows temp dir, PATH injection gating, git bash path normalization, symlink fallback, gateway detached watcher flags. - One existing test assertion adjusted in test_browser_homebrew_paths: it compared captured Popen argv to the BARE `"npx"` literal; after the shutil.which() change argv[0] is the absolute path. New assertion checks the shape (two items, second is `agent-browser`) rather than the exact first-item string. Behaviour unchanged; test was too strict. All 56 tests pass on Linux (30 from previous commits + 26 new). 267 tests from the affected files/dirs (browser, code_exec, local_env, process_registry, kanban_db, windows_compat) all pass — zero regressions. tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged; all pre-existing test failures confirmed unrelated via `git stash` re-run. ## What's still deferred (LOW priority) - Visible cmd-window flashes on short-lived console apps (~14 sites) — cosmetic, needs a follow-up pass once we have user reports. - agent/file_safety.py POSIX-only security deny patterns — separate hardening task. - tools/process_registry.py returning "/tmp" as fallback — theoretical; reachable only when all env-var candidates fail.	2026-05-08 14:27:40 -07:00
Teknium	b53bd12fe4	fix(windows-editor): default EDITOR=notepad so /edit and Ctrl+X Ctrl+E work Pre-existing Windows bug surfaced while reviewing the portable-MinGit install: prompt_toolkit's Buffer.open_in_editor() falls back to POSIX absolute paths (/usr/bin/nano, /usr/bin/vi, /usr/bin/emacs) that don't exist on native Windows. When neither $EDITOR nor $VISUAL is set, Ctrl+X Ctrl+E ("open prompt in editor") and /edit both silently do nothing on Windows — the user hits the key, nothing happens, no error. This wasn't caused by MinGit (full Git for Windows doesn't fix it either, because the Windows Python subprocess call resolves `/usr/bin/nano` as `C:\usr\bin\nano`, which doesn't exist even with nano installed). Fixes: - hermes_cli/stdio.py::configure_windows_stdio now sets EDITOR=notepad on Windows if neither EDITOR nor VISUAL is set. notepad.exe is in every Windows install, works as a blocking editor (subprocess.call waits for the window to close), and writes back to the file. - hermes_cli/config.py (hermes config edit): reorder fallback list so Windows tries notepad first — previously nano led the list, which required Git Bash / WSL to be in PATH. - Users who want VSCode / Neovim / Notepad++ can still override via $env:EDITOR — that's checked before our default kicks in. Docstring spells out the common overrides. The Ink TUI (`hermes --tui`) already handled Windows correctly via ui-tui/src/lib/editor.ts falling back to notepad.exe on win32 — this commit brings the classic prompt_toolkit CLI into parity. 3 new tests in test_windows_native_support.py verify: - EDITOR=notepad gets set when unset on Windows - Explicit $EDITOR is respected - $VISUAL is respected (not overwritten by our default)	2026-05-08 14:27:40 -07:00
Teknium	9de893e3b0	feat(windows): close native-Windows install gaps — crash-free startup, UTF-8 stdio, tzdata dep, docs Native Windows (with Git for Windows installed) can now run the Hermes CLI and gateway end-to-end without crashing. install.ps1 already existed and the Git Bash terminal backend was already wired up — this PR fills the remaining gaps discovered by auditing every Windows-unsafe primitive (`signal.SIGKILL`, `os.kill(pid, 0)` probes, bare `fcntl`/`termios` imports) and by comparing hermes against how Claude Code, OpenCode, Codex, and Cline handle native Windows. ## What changed ### UTF-8 stdio (new module) - `hermes_cli/stdio.py` — single `configure_windows_stdio()` entry point. Flips the console code page to CP_UTF8 (65001), reconfigures `sys.stdout`/`stderr`/`stdin` to UTF-8, sets `PYTHONIOENCODING` + `PYTHONUTF8` for subprocesses. No-op on non-Windows. Opt out via `HERMES_DISABLE_WINDOWS_UTF8=1`. - Called early in `cli.py::main`, `hermes_cli/main.py::main`, and `gateway/run.py::main` so Unicode banners (box-drawing, geometric symbols, non-Latin chat text) don't `UnicodeEncodeError` on cp1252 consoles. ### Crash sites fixed - `hermes_cli/main.py:7970` (hermes update → stuck gateway sweep): raw `os.kill(pid, _signal.SIGKILL)` → `gateway.status.terminate_pid(pid, force=True)` which routes through `taskkill /T /F` on Windows. - `hermes_cli/profiles.py::_stop_gateway_process`: same fix — also converted SIGTERM path to `terminate_pid()` and widened OSError catch on the intermediate `os.kill(pid, 0)` probe. - `hermes_cli/kanban_db.py:2914, 3041`: raw `signal.SIGKILL` → `getattr(signal, "SIGKILL", signal.SIGTERM)` fallback (matches the pattern already used in `gateway/status.py`). ### OSError widening on `os.kill(pid, 0)` probes Windows raises `OSError` (WinError 87) for a gone PID instead of `ProcessLookupError`. Widened the catch at: - `gateway/run.py:15101` (`--replace` wait-for-exit loop — without this, the loop busy-spins the full 10s every Windows gateway start) - `hermes_cli/gateway.py:228, 460, 940` - `hermes_cli/profiles.py:777` - `tools/process_registry.py::_is_host_pid_alive` - `tools/browser_tool.py:1170, 1206` ### Dashboard PTY graceful degradation `hermes_cli/pty_bridge.py` depends on `fcntl`/`termios`/`ptyprocess`, none of which exist on native Windows. Previously a Windows dashboard would crash on `import hermes_cli.web_server` because of a top-level import. Now: - `hermes_cli/web_server.py` wraps the pty_bridge import in `try/except ImportError` and sets `_PTY_BRIDGE_AVAILABLE=False`. - The `/api/pty` WebSocket handler returns a friendly "use WSL2 for this tab" message instead of exploding. - Every other dashboard feature (sessions, jobs, metrics, config editor) runs natively on Windows. ### Dependency - `pyproject.toml`: add `tzdata>=2023.3; sys_platform == 'win32'` so Python's `zoneinfo` works on Windows (which has no IANA tzdata shipped with the OS). Credits @sprmn24 (PR #13182). ### Docs - README.md: removed "Native Windows is not supported"; added PowerShell one-liner and Git-for-Windows prerequisite note. - `website/docs/getting-started/installation.md`: new Windows section with capability matrix (everything native except the dashboard `/chat` PTY tab, which is WSL2-only). - `website/docs/user-guide/windows-wsl-quickstart.md`: reframed as "WSL2 as an alternative to native" rather than "the only way". - `website/docs/developer-guide/contributing.md`: updated cross-platform guidance with the `signal.SIGKILL` / `OSError` rules we enforce now. - `website/docs/user-guide/features/web-dashboard.md`: acknowledged native Windows works for everything except the embedded PTY pane. ## Why this shape Pulled from a survey of how other agent codebases handle native Windows (Claude Code, OpenCode, Codex, Cline): - All four treat Git Bash as the canonical shell on Windows, same as hermes already does in `tools/environments/local.py::_find_bash()`. - None of them force `SetConsoleOutputCP` — but they don't have to, Node/Rust write UTF-16 to the Win32 console API. Python does not get that for free, so we flip CP_UTF8 via ctypes. - None of them ship PowerShell-as-primary-shell (Claude Code exposes PS as a secondary tool; scope creep for this PR). - All of them use `taskkill /T /F` for force-kill on Windows, which is exactly what `gateway.status.terminate_pid(force=True)` does. ## Non-goals (deliberate scope limits) - No PowerShell-as-a-second-shell tool — worth designing separately. - No terminal routing rewrite (#12317, #15461, #19800 cluster) — that's the hardest design call and needs a separate doc. - No wholesale `open()` → `open(..., encoding="utf-8")` sweep (Tianworld cluster) — will do as follow-up if users hit actual breakage; most modern code already specifies it. ## Validation - 28 new tests in `tests/tools/test_windows_native_support.py` — all platform-mocked, pass on Linux CI. Cover: - `configure_windows_stdio` idempotency, opt-out, env-preservation - `terminate_pid` taskkill routing, failure → OSError, FileNotFoundError fallback - `getattr(signal, "SIGKILL", …)` fallback shape - `_is_host_pid_alive` OSError widening (Windows-gone-PID behavior) - Source-level checks that all entry points call `configure_windows_stdio` - pty_bridge import-guard present in `web_server.py` - README no longer says "not supported" - 12 pre-existing tests in `tests/tools/test_windows_compat.py` still pass. - `tests/hermes_cli/` ran fully (3909 passed, 9 failures — all confirmed pre-existing on main by stash-test). - `tests/gateway/` ran fully (5021 passed, 1 pre-existing failure). - `tests/tools/test_process_registry.py` + `test_browser_*` pass. - Manual smoke: `import hermes_cli.stdio; import gateway.run; import hermes_cli.web_server` — all clean, `_PTY_BRIDGE_AVAILABLE=True` on Linux (as expected). ## Files - New: `hermes_cli/stdio.py`, `tests/tools/test_windows_native_support.py` - Modified: `cli.py`, `gateway/run.py`, `hermes_cli/main.py`, `hermes_cli/profiles.py`, `hermes_cli/gateway.py`, `hermes_cli/kanban_db.py`, `hermes_cli/pty_bridge.py`, `hermes_cli/web_server.py`, `tools/browser_tool.py`, `tools/process_registry.py`, `pyproject.toml`, `README.md`, and 4 docs pages. Credits to everyone whose prior PR work informed these fixes — see the co-author trailers. All of the PRs listed in `~/.hermes/plans/windows-support-prs.md` fixing `os.kill` / `signal.SIGKILL` / UTF-8 stdio / tzdata / README patterns found the same issues; this PR consolidates them. Co-authored-by: Philip D'Souza <9472774+PhilipAD@users.noreply.github.com> Co-authored-by: Arecanon <42595053+ArecaNon@users.noreply.github.com> Co-authored-by: XiaoXiao0221 <263113677+XiaoXiao0221@users.noreply.github.com> Co-authored-by: Lars Hagen <1360677+lars-hagen@users.noreply.github.com> Co-authored-by: Luan Dias <65574834+luandiasrj@users.noreply.github.com> Co-authored-by: Ruzzgar <ruzzgarcn@gmail.com> Co-authored-by: sprmn24 <oncuevtv@gmail.com> Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com> Co-authored-by: Prasanna28Devadiga <54196612+Prasanna28Devadiga@users.noreply.github.com>	2026-05-08 14:27:40 -07:00
Dilee	729a659a3c	fix(teams-pipeline): add skill asset and fix async test env	2026-05-08 12:41:41 -07:00
Teknium	5e8dfc9f6d	fix(teams-pipeline): fill in missing delivery URL in adapter-reuse test test_build_pipeline_runtime_reuses_existing_teams_adapter_surface set delivery_mode='incoming_webhook' but omitted incoming_webhook_url. _teams_delivery_is_configured() requires the URL to mark delivery as enabled, so the guarded build_pipeline_runtime gate in runtime.py correctly left teams_sender=None and the assertion failed. The intent of the test — prove we reuse the existing TeamsSummaryWriter from plugins/platforms/teams/adapter.py rather than introducing a new adapter surface elsewhere — is unchanged. Added the URL so the gate passes and the architectural assertion holds.	2026-05-08 12:00:09 -07:00
Dilee	397f750bb4	feat(teams): add pipeline outbound delivery via existing adapter	2026-05-08 12:00:09 -07:00
Teknium	a99547740d	fix(teams-pipeline): drop-scheduler fallback + test wiring for enablement gate Two salvage follow-ups on top of @dlkakbs's plugin runtime. 1. Install a drop-scheduler when the runtime fails to build. Previously when ``build_pipeline_runtime()`` raised (e.g. missing Graph env vars, subscription store path unwritable), ``bind_gateway_runtime`` logged a warning and returned False, leaving the msgraph_webhook adapter with no scheduler at all. Incoming Graph notifications would then fall back to the adapter's default ``handle_message`` path, which produces a raw JSON dump as a user-role message — not useful and fires every time Graph retries. Now a no-op drop-scheduler is installed instead, so: - Graph notifications ack cleanly (202) so Graph stops retrying. - The failure is surfaced once in the log with the error. - No user-role messages get manufactured from raw change payloads. The adapter is still bindable later once the runtime becomes available (e.g. after the operator runs ``hermes teams-pipeline validate`` and fixes the config), since the gateway's ``_teams_pipeline_runtime`` sentinel wasn't set to a non-None value. 2. Test wiring for ``_teams_pipeline_plugin_enabled()`` gate. The happy-path runner-wiring tests monkeypatched ``bind_gateway_runtime`` but not ``_load_gateway_config``. In the hermetic test environment the real config read ran, saw no enabled plugins, and short-circuited the bind call before the test could observe it — so the test expected ``calls == [runner]`` but got ``calls == []``. Adds a ``_load_gateway_config`` monkeypatch with ``plugins.enabled = ["teams_pipeline"]`` to the happy-path tests. The explicit-disabled test ``test_gateway_runner_skips_wiring_when_teams_pipeline_plugin_disabled`` already patches the config correctly. Also renames ``test_bind_gateway_runtime_leaves_scheduler_unchanged_on_failure`` to ``test_bind_gateway_runtime_installs_drop_scheduler_on_failure`` and updates the assertion — this test contradicted the drop-scheduler test in ``tests/plugins/test_teams_pipeline_plugin.py`` which expected the scheduler to be installed. The plugin-test name (``test_bind_gateway_runtime_drops_notifications_when_unavailable``) clearly describes the intended behavior; fixing the wiring-test assertion aligns both tests. Validation: - ``scripts/run_tests.sh tests/plugins/test_teams_pipeline_plugin.py tests/gateway/test_teams_pipeline_runtime_wiring.py tests/hermes_cli/test_teams_pipeline_plugin_cli.py`` — 25/25 passed.	2026-05-08 11:18:14 -07:00
Dilee	07bbd93337	feat(teams-pipeline): add plugin runtime and operator cli Third slice of the Microsoft Teams meeting pipeline stack, salvaged onto current main. Adds the standalone teams_pipeline plugin that consumes Graph change notifications from the webhook listener, resolves meeting artifacts (transcript first, recording + STT fallback later), persists job state in a durable store, and exposes an operator CLI for inspection, replay, subscription management, and validation. Design choices follow maintainer review feedback on PR #19815: - Standalone plugin rather than bolted-on core surface (plugins/teams_pipeline/, kind: standalone in plugin.yaml). - Zero new model tools. The agent drives the pipeline by invoking the operator CLI via the terminal tool, guided by the skill that ships with a follow-up PR. - Reuses the existing msgraph_webhook gateway platform for Graph ingress. Pipeline runtime is wired in via bind_gateway_runtime and gated on plugins.enabled so gateways that don't run the plugin boot cleanly. Additions: - plugins/teams_pipeline/: runtime (gateway wiring + config builder), pipeline core, durable SQLite store, subscription maintenance helpers, Graph artifact resolution, operator CLI (list, show, run/replay, fetch dry-run, subscriptions list, subscribe, renew-subscription, delete-subscription, maintain-subscriptions, token-health, validate). - hermes_cli/main.py: second-pass plugin CLI discovery so any standalone plugin registered via ctx.register_cli_command() outside the memory-plugin convention path gets its subcommand wired into argparse without touching core. - gateway/run.py: _teams_pipeline_plugin_enabled() config gate, _wire_teams_pipeline_runtime() binding after adapter setup, and the two runner attributes used by the runtime. Credit to @dlkakbs for the entire plugin implementation.	2026-05-08 11:18:14 -07:00
Teknium	d0aad4b021	fix(computer-use): harden image-rejection fallback + AUTHOR_MAP Follow-up to #15328's vision-unsupported retry branch in run_agent.py. _strip_images_from_messages() previously deleted any message whose content was entirely images. That's fine for synthetic user messages injected for attachment delivery, but it breaks providers for tool-role messages — the paired tool_call_id on the preceding assistant message ends up unmatched, which OpenAI-compatible APIs reject with HTTP 400. Fix: tool-role messages whose content becomes empty are replaced with a plaintext placeholder that preserves the tool_call_id linkage. Only non-tool messages are dropped. Added 10 tests covering the role-alternation invariants + image-type coverage. Image-rejection detector: expanded phrase list (image content not supported / multimodal input / vision input / model does not support image) and gated on 4xx status so transient 5xx errors never get misinterpreted as 'server said no to images'. Detection is documented as best-effort English phrase matching. AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to ddupont808 so release notes attribute the salvage correctly.	2026-05-08 11:07:38 -07:00
Teknium	850413f120	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-05-08 11:07:38 -07:00
Teknium	b8d7e0e6d3	fix(msgraph_webhook): harden auth surface + IP allowlisting + response hygiene Defense-in-depth polish on top of the webhook listener before it becomes a real attack surface once the pipeline starts creating subscriptions and Graph starts POSTing to the configured public URL. - Timing-safe clientState comparison. Previously used `==` on strings; switches to hmac.compare_digest so a mismatch does not leak how many leading characters matched. client_state is documented as a strong shared secret (openssl rand -hex 32 in the setup docs), so a timing-safe primitive is the right call. - Split GET and POST handlers. Graph validates a subscription by sending GET with validationToken in the query; anything else on GET is now a 400 so the endpoint cannot be probed or mistakenly used for data exfil. Previously a bare GET fell through to the POST path and blew up on request.json() with a confusing 400. - Empty response bodies on success. 202 is returned with no body so internal counters (accepted / duplicates / scheduled) do not leak to any caller that can reach the endpoint; counters remain observable via /health for operators. 403 on every-item-bad-clientState batches (so forged POSTs stop retrying), 400 on malformed / unknown-resource batches (sender configuration issue). - Optional source-IP allowlist. New `allowed_source_cidrs` extra field (list or comma-separated string) and `MSGRAPH_WEBHOOK_ALLOWED_SOURCE_CIDRS` env var let operators restrict the webhook to Microsoft Graph's published webhook source ranges in production. Empty = allow all, preserving dev-tunnel / localhost workflows. Invalid CIDRs are logged and ignored rather than crashing. Also gates the handshake endpoint so disallowed IPs cannot probe it. - Tests updated for the new response contract (empty-body 202, auth-only 403, config-error 400) and extended to cover: bare GET rejection, POST-with-validationToken handshake tolerance, timing-safe compare actually invoked via hmac.compare_digest spy, malformed body / missing value array, IP allowlist accept/reject paths, handshake IP allowlist, invalid CIDR entries, comma-string CIDR list parsing. 52/52 passed (was 40). Full gateway suite: 5049 passed / 1 pre-existing failure in test_discord_free_response (unrelated, reproduces on clean origin/main).	2026-05-08 10:29:58 -07:00
Dilee	26a59e4f6c	fix(msgraph): normalize webhook dedupe and resource matching	2026-05-08 10:29:58 -07:00
Dilee	2a215de9af	fix(msgraph): bound webhook receipt dedupe cache	2026-05-08 10:29:58 -07:00
Dilee	46a6f39024	feat(msgraph): add webhook listener platform	2026-05-08 10:29:58 -07:00
Teknium	f209a35859	feat(profile): shareable profile distributions via git (#20831 ) * feat(profile): shareable profile distributions (pack/install/update/info) Closes #20456. Turns a profile into a portable, versioned artifact. Packs SOUL.md, config, skills, cron, and an env-var manifest into a tar.gz that others can install from a local path, URL, or git repo. Updates re-pull the distribution while preserving user data (memories, sessions, auth.json, .env) and the user's config.yaml overrides. New subcommands (under hermes profile, no parallel tree): hermes profile pack <name> [-o FILE] hermes profile install <source> [--name N] [--alias] [--force] [-y] hermes profile update <name> [--force-config] [-y] hermes profile info <name> Manifest (distribution.yaml at the profile root): name, version, hermes_requires, author, env_requires, distribution_owned. Security: - Installer shows manifest + env-var requirements before mutating disk; confirmation required unless -y. - auth.json and .env are never packed (same exclude set as profile export). - Cron jobs are packed but NOT auto-scheduled — user is pointed at 'hermes -p <name> cron list' to review. - Archive extraction rejects path traversal (../ members). - Alias creation is opt-in via --alias. Update semantics: - Distribution-owned paths (SOUL.md, skills/, cron/, mcp.json, manifest): replaced from the new archive. - config.yaml: preserved by default; --force-config to overwrite. - User-owned paths (memories/, sessions/, auth.json, .env, state.db, logs/, workspace/, plans/, home/, _cache/, local/): never touched. Version pin: hermes_requires accepts >=, <=, ==, !=, >, < or a bare version (treated as >=). Install fails with a clear error when the running Hermes version doesn't satisfy the spec. Sources supported by 'install': - Local .tar.gz / .tgz archive - Local directory - HTTP(S) URL pointing to a .tar.gz (uses httpx, already a dep) - Git URL (github.com/user/repo, https://..., git@..., ssh://, git://) Tests: 43 new unit tests (manifest parsing, version checks, env template, pack/install/update round-trip, config-preservation, security). E2E validated via real CLI invocations against an isolated HERMES_HOME covering pack, install with confirmation, update preservation, update --force-config, decline-preview, duplicate-install rejection, and version-requirement rejection. * refactor(profile-dist): git-only — drop tar.gz/HTTP transports and pack Scope-cut on top of the original distribution PR: a profile distribution is now exclusively a git repository (or a local directory during development). The tar.gz / HTTP archive transports and the matching `hermes profile pack` subcommand have been removed. Why: * GitHub tags, branches, and commits are already the right versioning primitive. Tag pushes do for us what 'pack + upload' did. * `hermes profile export` / `import` already cover local backup and restore; they are not a distribution format and stay untouched. * One transport means one install/update code path, one doc page, and one mental model. The extra source types doubled the surface for no real user win — GitHub auto-attaches release tarballs, and `git bundle` / `git clone --mirror` cover the airgap case. Changes: * hermes_cli/profile_distribution.py — removed pack_profile, _fetch_tar_archive (_http_fetch), _safe_extract, _archive_roots, _safe_parts, _find_dist_root, tarfile/io/urlparse imports. The new _stage_source has two arms: git URL → clone, local directory → use in place. * hermes_cli/main.py — removed the 'pack' subparser and action handler. Install help text updated to match the reduced source list. * tests/hermes_cli/test_profile_distribution.py — rewritten around a local-directory staging fixture. The install/update/describe suites now build a distribution tree on disk directly and install from it, which is what a real git clone produces after .git is stripped. Dropped TestPack, TestFindDistRoot, and the tar-specific security test. New tests cover _looks_like_git_url, env_example emission, hermes_requires enforcement, and 'installer does not import credentials if an author mistakenly leaks them in the staging tree'. * website/docs/reference/profile-commands.md — 'Distribution commands' section rewritten around git. Added a 'Publishing a distribution' section. export/import stay documented as local backup/restore. * website/docs/reference/cli-commands.md — dropped 'pack' from the profile subcommand table. * website/package.json — 'lint:diagrams' now passes --exclude-code-blocks to ascii-guard. Without it, markdown tables and box-drawing diagrams inside fenced code blocks were being misidentified as malformed ASCII boxes, blocking the PR's docs-site-checks CI with 8 false-positive errors. Validation: * Targeted suite: tests/hermes_cli/test_profile_distribution.py — 56/56 pass (down from 43 — reorganized to cover the new local-dir paths). * Regression: test_profiles.py + test_profile_export_credentials.py 102/102 still pass. export/import behaviour unchanged. * Docs lint: ascii-guard lint --exclude-code-blocks docs returns 0 errors (was 8 on the PR before the flag bump). * E2E: ran the real `hermes profile install`/`info` against a local staging dir under an isolated HERMES_HOME — install writes SOUL.md + skills to the target profile, info reads the manifest back, a bogus source produces a clear error, and `hermes profile pack` is now rejected by argparse as expected. * feat(profile-dist): distribution-aware list/show/delete + installed_at + env preview Polish pass on top of the git-only scope cut. Five additions, all small, wiring into existing commands rather than adding new surface. 1. `installed_at` timestamp on the manifest * Stamped automatically inside plan_install() on both fresh install and update — ISO-8601 UTC, seconds resolution. * Surfaced in `hermes profile info` as `Installed: <ts>`. * Lets users tell "installed 6 months ago, needs update" from "installed yesterday" without guessing from file mtimes. 2. `hermes profile list` grows a `Distribution` column * Plain profiles: "—" * Distribution profiles: "<name>@<version>" (e.g. `telemetry@1.2.3`) * ProfileInfo gains three optional fields — distribution_name, distribution_version, distribution_source — populated by a new _read_distribution_meta() helper that swallows manifest read errors so a broken distribution.yaml in one profile can't break `list` for the others. 3. `hermes profile show` and `hermes profile delete` surface distribution provenance * show: `Distribution: name@version` + `Installed from: <source>` plus a pointer to `hermes profile info <name>` for the full manifest. * delete: same lines in the pre-confirmation preview, so a user deleting "telemetry" can see it came from `github.com/kyle/telemetry-distribution` before they type `telemetry` to confirm. No change to the confirmation gate itself — deletion semantics are identical to plain profiles. 4. Install preview checks env vars against the current environment * Replaces the "Env vars you'll need to set:" header with a simpler "Env vars:" block. * Each required var is labeled: - `✓ set` — already in `os.environ` OR present as a key in the target profile's existing .env (update case). - `needs setting` — required but not found in either place. - `—` — optional. * Mirrors pip's "Requirement already satisfied" UX: no unnecessary nagging about keys the user already has configured. 5. Docs: private distributions * New "Private distributions" section in website/docs/reference/profile-commands.md explaining that we shell out to the user's `git` binary, so SSH keys / credential helpers / GitHub CLI stored creds all work transparently. One paragraph, two examples. * `hermes profile info` section updated to mention `Installed:`. Module-level hoist: * `from datetime import datetime, timezone` was previously lazy-imported inside plan_install(). Hoisted to module scope so tests can monkeypatch `hermes_cli.profile_distribution.datetime` to freeze time. Tests (+7): * TestInstalledAtStamp.test_install_stamps_installed_at — format check (4-digit year, 'T', +00:00 suffix). * TestInstalledAtStamp.test_update_refreshes_installed_at — freezes datetime.now() to 2099-01-01 and confirms update writes a new stamp. * TestProfileInfoDistribution.test_installed_distribution_shows_in_list — ProfileInfo.distribution_{name,version,source} populated after install. * TestProfileInfoDistribution.test_plain_profile_has_no_distribution_fields — plain profiles have None. * TestProfileInfoDistribution.test_malformed_manifest_does_not_break_list — broken distribution.yaml in one profile doesn't break list_profiles(). Validation: * 163/163 tests pass (56 distribution + 102 profile regression + 5 new from this commit — up from 158). * docs-lint: 0 errors. * E2E verified: install preview shows ✓/needs-setting per env var, `profile list` shows Distribution column, `profile show` + `delete` preview mentions source URL, `info` shows Installed: timestamp. * fix(profile-dist): clean errors + warn when overwriting plain profiles Two small polish fixes found during collision sweeps of the PR: 1. ValueError from validate_profile_name now caught cleanly * A distribution.yaml whose 'name' field can't be used as a profile identifier (spaces, path traversal, etc.) raises ValueError from hermes_cli.profiles.validate_profile_name, which was escaping as a raw Python traceback from 'hermes profile install/update/info'. * Broadened the except clause in all three handlers to catch (DistributionError, ValueError) — users now see: Error: Invalid profile name '../../etc/passwd'. Must match [a-z0-9][a-z0-9_-]{0,63} instead of a stack trace. 2. Install preview distinguishes plain profile overwrite from distribution re-install * When plan.target_dir exists and IS a distribution (has distribution.yaml), preview still shows the mild (profile exists — will overwrite distribution-owned files only) * When plan.target_dir exists but is a HAND-BUILT plain profile (no distribution.yaml), preview now shows a loud warning: ⚠ Profile exists but is NOT a distribution. Installing here will overwrite its SOUL.md, skills/, cron/, and mcp.json. Your memories, sessions, auth.json, and .env will be preserved, but any hand-edits to distribution-owned files will be lost. * Users who type 'hermes profile install foo --force' against a profile they hand-built now see what they're signing up for. User data is still safe (memories, sessions, auth, .env are in USER_OWNED_EXCLUDE), but custom SOUL/skills get stomped. Tests (+2): * TestErrorSurfaces.test_bad_profile_name_raises_valueerror_not_traceback * TestErrorSurfaces.test_path_traversal_name_rejected Validation: * 165/165 tests pass (was 163). * E2E: bad manifest names produce 'Error: Invalid profile name ...' with no traceback; installing over a plain profile shows the warning; re-installing over an existing distribution shows the normal overwrite message. * Bad HTTPS URLs still produce 'Error: git clone failed: ...' — git itself generates a clean enough message that no wrapper is needed. * 'install .' works correctly from any cwd. * fix(profiles): reject reserved names at validate time Before: `hermes profile create hermes` / `profile install` / `profile rename` all silently accepted reserved names like `hermes`, `test`, `tmp`, `root`, `sudo`. The profile directory was created; only alias creation failed (via check_alias_collision), leaving a confusingly-named profile on disk — e.g. `~/.hermes/profiles/hermes/` sitting next to `~/.hermes/` itself. The reserved set already exists (_RESERVED_NAMES, introduced alongside alias collision detection). This commit moves the check up one layer to validate_profile_name so every entry point — create, install, import, rename, dashboard web API — shares the same gate. The error message points the user at the cause without being cryptic: Error: Profile name 'hermes' is reserved — it collides with either the Hermes installation itself or a common system binary. Pick a different name. `default` continues to pass through (it's a special alias for ~/.hermes). _HERMES_SUBCOMMANDS (`chat`, `model`, `gateway`, etc.) stays at alias-collision time only — those are fine as bare profile names with `--no-alias`. Tests (+5): test_reserved_names_rejected parametrized over the full _RESERVED_NAMES set, matching the existing pattern in TestValidateProfileName. No existing test uses a reserved name as a profile identifier (greppped create_profile("hermes\|test\|tmp\|root\|sudo") — zero hits). Validation: * 170/170 tests pass in the profile suites. * E2E: `profile create hermes`, `profile install` with manifest name=hermes, and `profile install ... --name hermes` all produce the same clean `Error: Profile name 'hermes' is reserved ...` with rc=1 and no traceback. Normal names (`mybot`) still work.	2026-05-08 10:04:32 -07:00
Teknium	45d860d424	fix(msgraph): stream download_to_file body instead of buffering The prior implementation routed download_to_file through the shared _request() path, which uses httpx.AsyncClient.request() inside a context manager that closes before aiter_bytes() iterates. The body was read into memory first and the chunked write loop replayed it from buffer. On small test payloads this was invisible; on real Teams meeting recordings (hundreds of MB) it would force the full artifact into RAM per download. Rewrites download_to_file to open its own AsyncClient and use client.stream(), keeping the context open across the aiter_bytes iteration so the body is actually streamed chunk-by-chunk to disk. Retry/token-refresh/Retry-After semantics are preserved by handling them inline on the stream path. Partial .part files are cleaned up on transport errors and on exhausted retries. Adds three tests: large-payload streaming verifies the chunk loop runs multiple times (discriminator: 512 KiB at chunk_size=65536 yields 8 chunks under streaming, 1 under buffering), transient-5xx retry recovers after a single retry, and exhausted-retry cleans up the partial file.	2026-05-08 09:27:26 -07:00
Dilee	b878f89f66	test(msgraph): cover concurrent token cache reuse	2026-05-08 09:27:26 -07:00
Dilee	a152c706b7	feat(msgraph): add auth and client foundation	2026-05-08 09:27:26 -07:00
Teknium	839cdd1b05	fix(approval): cron jobs must not be treated as gateway context The new _is_gateway_approval_context() widened the gateway classification to any call with HERMES_SESSION_PLATFORM bound via contextvars. But cron/scheduler.py binds that same contextvar for delivery routing on cron jobs that originate from a gateway platform (telegram/discord/etc.), so those jobs were getting routed through submit_pending with no listener — blocking indefinitely instead of honoring approvals.cron_mode. Short-circuit on HERMES_CRON_SESSION before any gateway check. Cron is always governed by cron_mode config, regardless of where the job was scheduled from. Adds regression coverage in TestCronWithGatewayOrigin and records the contributor email mapping for scripts/release.py.	2026-05-08 07:30:14 -07:00
Zhicheng Han	526c0e018a	feat(api-server): expose run approval events	2026-05-08 07:30:14 -07:00
Teknium	674fad1483	fix(goals): Ctrl+C during /goal loop auto-pauses the goal (#21888 ) Reported: Ctrl+C during an active /goal loop felt like it did nothing — the agent would interrupt the current turn, then immediately queue another continuation and keep going until the session ended or the 20-turn budget ran out. Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally: block around self.chat(...) unconditionally. Whether the turn completed normally, got interrupted, or returned an empty string, the judge ran on whatever was in conversation_history and — because the judge is fail-open — a "continue" verdict pushed another CONTINUATION_PROMPT onto _pending_input. Ctrl+C was invisible to the hook. Fix: - chat() now captures result['interrupted'] onto self._last_turn_interrupted (resets to False at entry so early-returns don't leak prior state). - _maybe_continue_goal_after_turn() checks the flag first: on interrupt, auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print a one-liner pointing the user at /goal resume or /goal clear. No judge call, no continuation enqueued. - Also added an empty-response guard that mirrors gateway/run.py's _handle_message logic (empty reply → transient failure → skip judging so we don't trip the consecutive-parse-failures backstop unnecessarily). The goal stays in the DB as paused, so /goal resume recovers it after the user has sorted out whatever made them cancel. /goal clear still works as before for a full stop. Tests: tests/cli/test_cli_goal_interrupt.py covers: - interrupted turn pauses + doesn't queue + judge is NOT called - paused goal is resumable - empty / whitespace / missing assistant reply skips judging - healthy turn still enqueues continuation / marks done - chat() resets _last_turn_interrupted at entry (anti-leak guard) All 55 existing goal tests still pass.	2026-05-08 06:53:13 -07:00
Shannon Sands	80775d7585	test(auth): assert Nous refresh rotation payload	2026-05-08 04:17:42 -07:00
Shannon Sands	b32461f6e8	fix(auth): send Nous refresh token via header	2026-05-08 04:17:42 -07:00
Teknium	486b14b423	feat(cron): routing intent — deliver=all fans out to every connected channel (#21495 ) Adds one reserved token to the cron `deliver` field: - `all` — expand to every platform with a configured home channel Resolves at fire time, not create time, so a job created before Telegram was wired up picks it up once `TELEGRAM_HOME_CHANNEL` is set. Composes with existing targets: `origin,all`, `all,telegram:-100:17`. Inspired by Vellum Assistant's reminder routing-intent system. ## Changes - cron/scheduler.py: _expand_routing_tokens + integrate into _resolve_delivery_targets - tools/cronjob_tools.py: schema description updated - tests/cron/test_scheduler.py: TestRoutingIntents (5 cases) - website/docs/user-guide/features/cron.md: docs + table rows ## Validation - tests/cron/test_scheduler.py -k 'Routing or Deliver' → 57 passed	2026-05-08 04:17:21 -07:00
kshitijk4poor	81928f03ab	refactor(gmi): move User-Agent to profile.default_headers The previous revision of this PR added six GMI-specific branches (`elif base_url_host_matches(..., 'api.gmi-serving.com')`) across run_agent.py and agent/auxiliary_client.py, plus a _HERMES_UA_HEADERS constant in auxiliary_client.py. ProviderProfile already has a `default_headers: dict[str, str]` field commented as 'Client-level quirks (set once at client construction)'. Other plugins (ai-gateway, kimi-coding) already use it. Two of the four auxiliary_client sites we previously patched already had a generic `else: profile.default_headers` fallback that picked it up (so did both run_agent sites). This revision: * Sets `default_headers={'User-Agent': 'HermesAgent/<ver>'}` on the GMI profile in plugins/model-providers/gmi/__init__.py. * Reverts all six GMI-specific branches in run_agent.py and auxiliary_client.py. * Adds the generic profile-fallback `else` block to the two auxiliary_client sites (`_to_async_client`, `resolve_provider_client`) that didn't have it yet. This benefits every provider whose profile declares default_headers, not just GMI — e.g. Vercel AI Gateway's HTTP-Referer/X-Title now flow through the async client path too. * Replaces the GMI-specific URL-branch tests with a profile-level assertion and keeps the run_agent integration test (with `provider='gmi'` so the fallback picks up the profile). Net diff vs main: +82/-0 across 5 files, touching only the GMI plugin, two generic fallback blocks in auxiliary_client.py, AUTHOR_MAP, and tests. No core files change. Based on #20907 by @isaachuangGMICLOUD.	2026-05-08 03:22:11 -07:00
Teknium	307c85e5c1	fix(goals): auto-pause when judge model returns unparseable output Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose when asked for the strict {done, reason} JSON verdict. The old code failed-open to continue on every such turn, burning the entire turn budget with log lines like judge returned empty response judge reply was not JSON: "Let me analyze whether the goal..." and /goal clear could not stop it mid-loop without /stop. After N=3 consecutive parse failures (transport/API errors don't count — those are transient), the loop auto-pauses and prints: ⏸ Goal paused — the judge model (3 turns) isn't returning the required JSON verdict. Route the judge to a stricter model in ~/.hermes/config.yaml: auxiliary: goal_judge: provider: openrouter model: google/gemini-3-flash-preview Then /goal resume to continue. The counter resets on any usable reply (both "done"/"continue" and API errors) and persists across GoalManager reloads so cross-session resumes carry the correct state. Also fixes test_goal_verdict_send.py sharing a hardcoded session_id across tests — the shared id only worked because the previous _post_turn_goal_continuation was a never-awaited coroutine. Now that PR #19160 made it properly awaited, the xdist test-leakage bug surfaced. Each test gets a unique session_id via uuid suffix.	2026-05-07 17:33:09 -07:00
JC	03ddff8897	fix(gateway): defer goal status notices until after response delivery Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.	2026-05-07 17:33:09 -07:00
Austin Pickett	7f92e5506e	Merge pull request #20942 from NousResearch/austin/fix/personality fix(tui): preserve session when switching personality	2026-05-07 18:54:29 -04:00
teknium	292f468366	fix(mcp): unwrap platforms key in channels_list channels_list was iterating directory.items() directly, yielding ("updated_at", str) and ("platforms", dict) pairs — neither passed the isinstance(entries_list, list) check, so the inner loop never ran and every call returned count=0 even when channel_directory.json was populated. The writer (gateway/channel_directory.py) wraps the payload as {"updated_at": ..., "platforms": {...}}; every other reader in the codebase unwraps via directory.get("platforms", {}). This aligns channels_list with that convention. Also tightens the existing test_channels_with_directory test, which bypassed the bug by asserting against _load_channel_directory() directly instead of calling channels_list. It now calls the tool end-to-end and a new test_channels_with_directory_platform_filter covers the filter path. Both tests fail against the pre-fix code. Closes #21474 Co-authored-by: chrisworksai <262485129+chrisworksai@users.noreply.github.com>	2026-05-07 13:41:16 -07:00
Blake Johnson	9076a2e74e	fix(agent): keep Nous GPT-5 fallback on chat completions	2026-05-07 13:04:42 -07:00
Teknium	24d48ffb82	feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks (#21435 ) * feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks The Triage column shipped with a placeholder 'a specifier will flesh out the spec', but the specifier itself was never built. This wires it up as a dedicated CLI verb. `hermes kanban specify <id>` calls the auxiliary LLM (configured under `auxiliary.triage_specifier`) to expand a rough one-liner into a concrete spec — tightened title plus a body with Goal / Approach / Acceptance criteria / Out-of-scope sections — then atomically flips `status: triage -> todo` and recomputes ready so parent-free tasks go straight to the dispatcher on the same tick. Surface: hermes kanban specify <task_id> # single task hermes kanban specify --all [--tenant T] # sweep triage column hermes kanban specify ... --author NAME # audit-comment author hermes kanban specify ... --json # one JSON line per task Design choices: - Parent gating is preserved. specify_triage_task flips to 'todo', then recompute_ready promotes to 'ready' only when parents are done — same rule as a normal parent-gated todo. - No daemon, no background watcher. Every invocation is explicit — keeps cost predictable and doesn't fight the dispatcher loop. - Response parse is lenient: strict JSON preferred, markdown-fence tolerated, raw-body fallback on malformed JSON so the LLM can't strand a task in triage. - All failure modes (no aux client, API error, task moved out of triage mid-call) return SpecifyOutcome(ok=False, reason=...) so --all continues past individual failures. Changes: hermes_cli/kanban_db.py + specify_triage_task() hermes_cli/kanban_specify.py NEW (~220 LOC — prompt, parse, call) hermes_cli/kanban.py + specify subcommand + _cmd_specify hermes_cli/config.py + auxiliary.triage_specifier task slot website/docs/user-guide/features/kanban.md specify + config notes website/docs/reference/cli-commands.md CLI reference entry tests/hermes_cli/test_kanban_specify_db.py NEW (10 tests) tests/hermes_cli/test_kanban_specify.py NEW (20 tests) Validation: 30/30 targeted tests pass. E2E: triage task -> specify -> ends in 'ready' with events [created, specified, promoted] and the audit comment recorded under the configured author. * feat(kanban): wire specifier into dashboard and gateway slash Follow-ups to the initial PR #21435 — closes the two gaps I'd left as post-merge: dashboard button and first-class gateway surface. Dashboard (plugins/kanban/dashboard/) - POST /tasks/:id/specify NEW endpoint. Thin wrapper around kanban_specify.specify_task(). Returns the CLI outcome shape ({ok, task_id, reason, new_title}); ok=false with a human reason is a 200, not a 4xx, so the UI can render it inline without treating 'no aux client configured' as a crash. - Runs sync in FastAPI's threadpool because the LLM call can take tens of seconds on reasoning models. - Pins HERMES_KANBAN_BOARD around the specify call so the module's argless kb.connect() lands on the right board. - dist/index.js: doSpecify callback threaded through the drawer → TaskDetail → StatusActions prop chain. ✨ Specify button appears ONLY when task.status === 'triage' (elsewhere the backend would reject anyway — hide the button to keep the action row clean). Busy state (Specifying…) + inline success/error banner under the button using the response.reason text. - dist/style.css: tiny hermes-kanban-msg-ok / -err classes using existing --color vars so themes reskin cleanly. Gateway slash (/kanban specify) - Already works via the existing run_slash → build_parser → kanban_command pipeline. No code change needed — slash commands inherit the argparse tree automatically. Added coverage: test_run_slash_specify_end_to_end (create --triage, specify, verify promotion + retitle) and test_run_slash_specify_help_is_reachable. Tests - tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the REST endpoint — happy path, non-triage rejection as ok=false 200, missing aux client as ok=false 200. - tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests. Docs - website/docs/user-guide/features/kanban.md: dashboard action row description mentions ✨ Specify + all three surfaces. REST table gains /tasks/:id/specify. Slash examples include /kanban specify. Validation: 340/340 targeted tests pass. E2E via TestClient: create a triage task over REST → POST /specify with mocked aux client → task moves to 'ready' column on /board with new title and body applied.	2026-05-07 13:04:41 -07:00
adybag14-cyber	732a6c45fa	feat: add termux doctor fallback guidance for blocked extras	2026-05-07 13:04:08 -07:00
adybag14-cyber	dc5ef1ac8e	fix: add termux-all install profile and safe fallbacks	2026-05-07 13:04:08 -07:00
adybag14-cyber	da18fd084a	fix: strengthen termux install network prerequisites	2026-05-07 13:04:08 -07:00
adybag14-cyber	54c0b10d14	fix(update): add heartbeat during dependency install	2026-05-07 13:04:08 -07:00
Abd0r	04193cf71c	feat(web): add Brave Search (free tier) and DDGS search providers Both implement WebSearchProvider via tools/web_providers/ — matching the existing SearXNG pattern (PR #`5c906d702`). Search-only; pair with any extract provider via web.extract_backend. - tools/web_providers/brave_free.py — Brave Search API (free tier, 2k queries/mo). Uses BRAVE_SEARCH_API_KEY as X-Subscription-Token. - tools/web_providers/ddgs.py — DuckDuckGo via the ddgs Python package. No API key; gated on package importability. - tools/web_tools.py: both backends added to _get_backend() config list and auto-detect chain (trails paid providers), _is_backend_available, web_search_tool dispatch, web_extract_tool + web_crawl_tool search-only refusals, check_web_api_key, and the __main__ diagnostic. Introduces _ddgs_package_importable() helper so tests can monkeypatch a single symbol for the ddgs availability check. - hermes_cli/tools_config.py: picker entries for both providers; ddgs gets a post_setup handler that runs `pip install ddgs`. - hermes_cli/config.py: BRAVE_SEARCH_API_KEY in OPTIONAL_ENV_VARS. - scripts/release.py: AUTHOR_MAP entry for @Abd0r. - tests: 14 new tests (brave-free) + 15 new tests (ddgs) covering provider unit behavior, backend wiring, and search-only refusals. Salvages the brave-free + ddgs portion of PR #19796. Not included: the in-line helpers in web_tools.py (replaced with provider modules to match the shipped architecture), the lynx-based extract path (these backends should refuse extract with a clear error — users pair with a real extract provider), and scripts/start-llama-server.sh (unrelated). Co-authored-by: Abd0r <223003280+Abd0r@users.noreply.github.com>	2026-05-07 09:59:17 -07:00
xxxigm	cdc0a47dd5	test(hermes_constants): cover parse_reasoning_effort()	2026-05-07 09:59:07 -07:00
Teknium	7e2af0c2e8	feat(acp): pass image file attachments through as image_url parts Extends PR #21400's resource inlining with image-specific handling: ACP resource_link and embedded blob resources with an image/* mime (or image file suffix when mime is missing) now emit an OpenAI image_url part with a base64 data URL, so vision models actually see the image instead of a [Binary file omitted] note. Non-image resources keep the existing text-inlining behavior. Adds 3 tests: local PNG via resource_link, JPEG mime inferred from suffix when client omits mimeType, and embedded blob PNG.	2026-05-07 09:24:32 -07:00
HenkDz	733e297b8a	fix(acp): inline file attachment resources	2026-05-07 09:24:32 -07:00
Teknium	2564132a1f	fix(telegram): preserve thread_id=1 for forum General typing indicator (#21390 ) The May 5 refactor in `d5357f816` made _message_thread_id_for_typing() symmetric with _message_thread_id_for_send() by mapping the General topic (thread id "1") to None upfront for both. That's correct for sendMessage — Telegram rejects message_thread_id=1 on sends and the topic must be omitted — but it's wrong for sendChatAction. Observed behavior (confirmed via before/after Telegram wire traces): Before `d5357f816`: thread_id=1 → message_thread_id=1 → bubble visible in General After `d5357f816`: thread_id=1 → message_thread_id=None → no visible typing Omitting message_thread_id on sendChatAction does NOT fall back to the General topic's view in a forum-enabled supergroup; the bubble ends up hidden from the client's General-topic pane entirely. For any user on a forum-group, the typing indicator stopped appearing. Fix: drop the symmetric "1 → None" mapping from the typing resolver. sendMessage still maps 1 → None via _message_thread_id_for_send (that side was never broken). The asymmetry is real and required by Telegram's API — document it in the resolver docstring. Partial revert of `d5357f816`; restores the behavior from `0cf7d570e` ("fix(telegram): restore typing indicator and thread routing for forum General topic"). Does not re-introduce the retry-without-thread fallback that `41545f7ec` scoped down for DM topics — with the resolver fixed, the first call already hits the right wire shape. Test updated from test_send_typing_general_topic_uses_none_thread_id (which encoded the broken contract) to test_send_typing_preserves_general_topic_thread_id, asserting the single correct call with message_thread_id=1. 10 other tests in the file untouched and passing.	2026-05-07 08:39:21 -07:00
Teknium	812ce0b987	fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385 ) When empty-response terminal scaffolding fires on a tool-result turn, _drop_trailing_empty_response_scaffolding left the live history ending at a bare 'tool' message. The next user input then landed as [...tool, user], a protocol-invalid sequence that OpenRouter/Opus and other providers silently fail on (returns empty content). That retriggered the empty-retry recovery every turn, and recovery flags never hit SQLite (no column for them), so history kept looking broken on every reload. Two fixes: 1. Scaffolding strip rewinds the orphan assistant(tool_calls)+tool pair after popping sentinels. Only fires when scaffolding flags were actually present, so mid-iteration tool loops are untouched. 2. _repair_message_sequence runs right before every API call as a defensive belt: drops stray tool messages with unknown tool_call_ids, merges consecutive user messages so no user input is lost. Does NOT rewind assistant(tool_calls)+tool+user — that pattern is valid when the user redirected before the model got its continuation turn. Repro: session 20260507_044111_fa7e65. Opus-4.7/OpenRouter returned content-less response after a 42KB execute_code output, nudge+retry chain exhausted (no fallback configured), terminal sentinel appended, scaffolding stripped leaving bare tool tail, user typed 'wtf happened..' and landed as tool→user violation. Every subsequent turn collapsed in <50ms with the same 3-retry empty chain because the API request itself was malformed. Verified live via HTTP mock: pre-fix reproduced 5 api_calls/0.15s exit 'empty_response_exhausted'; post-fix 1 api_call/0.10s exit 'text_response(finish_reason=stop)'. Three-turn session flows cleanly through the scenario. Full run_agent suite: 1242 passed (0 regressions, 2 pre-existing concurrent_interrupt failures unrelated).	2026-05-07 08:35:10 -07:00
Teknium	1d2029b2b7	fix(update): reset-failed before every fallback restart so the gateway can't get stranded (#21371 ) cmd_update's auto-restart path could leave the gateway dead after a transient failure in systemd's own auto-restart window. Reproduced on Ubuntu 25.10 + systemd 257: after update, gateway drains and exits 75, systemd's first respawn 60s later fails (status=200/CHDIR with "No such file or directory" on a WorkingDirectory that demonstrably exists), the unit ends up in RestartMaxDelaySec=300 backoff, and cmd_update's fallback 'systemctl restart' never recovers it — leaving users with a permanently silent gateway until they manually run 'systemctl reset-failed'. The fix mirrors the recovery pattern 'hermes gateway restart' (systemd_restart) got in PR #20949: always reset-failed before restart, on both the initial fallback and the retry. Also rewrites the final failure message to tell the user to reset-failed + restart (not just restart, which is the step that already failed twice).	2026-05-07 08:34:12 -07:00
Teknium	04918345ea	fix(cron): initialize MCP servers before constructing the cron AIAgent (#21354 ) cron/scheduler.py:run_job() constructed AIAgent(...) without ever calling discover_mcp_tools(). The CLI and gateway paths do this at startup; cron jobs inherited none of it and the user's configured mcp_servers were invisible inside every cron run. Insert discover_mcp_tools() right before AIAgent(), wrapped in try/except so a broken MCP server can't kill an otherwise-working cron job. The call is idempotent: register_mcp_servers() short-circuits on already-connected servers, so subsequent ticks in the same scheduler process pay ~0ms. Scoped to the LLM path only; no_agent script jobs skip it entirely. Closes #4219.	2026-05-07 07:53:03 -07:00
WideLee	4de3ef38b1	feat(qqbot): wire native tool-approval UX via inline keyboards Makes the in-tree QQ inline keyboards actually light up when the agent blocks on a dangerous-command approval. Matches the cross-adapter gateway contract already implemented by Discord, Telegram, Slack, Matrix, and Feishu. Gateway/run.py's _approval_notify_sync checks type(adapter).send_exec_approval and falls back to a text prompt when it's missing. Without this wiring, QQ users stared at plain '/approve' text even though the adapter shipped button primitives. ### send_exec_approval(chat_id, command, session_key, description, metadata) Matches the signature the gateway calls with. Builds an ApprovalRequest (command_preview, description, timeout) and delegates to send_approval_request. Uses the last inbound msg_id as reply_to so QQ accepts the passive message. The 'metadata' parameter is accepted for contract parity but intentionally unused — QQ doesn't have thread_id/DM-targeting overrides. ### send_update_prompt(chat_id, prompt, default, session_key, metadata) Signature updated to match the cross-adapter contract used by 'hermes update --gateway' watcher. Renders a 'Update Needs Your Input' prompt with the optional default hint and a Yes/No keyboard. Replaces the earlier 3-arg helper that wasn't wired anywhere. ### Default interaction dispatcher _default_interaction_dispatch() auto-registered as the adapter's interaction callback in __init__. Routes: - approve:<session_key>:<decision> → tools.approval.resolve_gateway_approval Button → choice mapping: allow-once → 'once' allow-always → 'always' deny → 'deny' (QQ's 3-button mobile layout deliberately collapses 'session' + 'always' into one button; /approve session text fallback remains available.) - update_prompt:<answer> → atomic write of y/n to ~/.hermes/.update_response (the detached 'hermes update --gateway' watcher polls this file) - anything else → logged and dropped Resolve exceptions are caught and logged — never propagate into the WS loop. Callers can override via set_interaction_callback() to route clicks elsewhere or pass None to drop them entirely. ### Net effect QQ users now get native tap-to-approve UX on dangerous-command prompts and update-confirmation prompts, without having to type /approve or /deny as text. The adapter hooks into tools.approval the same way every other button-capable platform does. ### Tests 14 new tests cover: - Default callback installed on __init__ - send_exec_approval / send_update_prompt exist as class methods (so the gateway's type-probe detects them) - allow-once/always/deny each map to the correct resolve choice - update_prompt:y / update_prompt:n each write atomically to the response file (via monkeypatched get_hermes_home) - Unknown button_data / empty button_data / resolve exceptions are harmless - send_exec_approval honours last_msg_id reply-to and accepts metadata - send_update_prompt delegates with correct content + keyboard Full qqbot suite: 144 passed (72 pre-existing + 72 from this salvage arc). Also ran tools/test_approval.py alongside — no regressions (276 passed combined). Co-authored-by: WideLee <limkuan24@gmail.com>	2026-05-07 07:48:15 -07:00
Teknium	a1fe5f473d	fix(cron): scan assembled prompt including skill content (#3968 ) (#21350 ) _scan_cron_prompt ran at cron create/update time on the user-supplied prompt but skill content loaded inside _build_job_prompt at runtime was never scanned. Combined with non-interactive auto-approval, a malicious skill carrying an injection payload could execute with full tool access every tick. - cron/scheduler.py: new CronPromptInjectionBlocked exception and _scan_assembled_cron_prompt helper. _build_job_prompt now routes both return paths (with skills / without skills) through the helper, raising on match. run_job catches the exception and returns a clean (False, blocked_doc, "", error) tuple so the operator sees a BLOCKED delivery with the scanner result and an audit hint, rather than a scheduler crash or a silent skip. - tests/cron/test_cron_prompt_injection_skill.py: 10 regression tests. Unit coverage on _scan_assembled_cron_prompt (clean/injection/exfil/ invisible-unicode). End-to-end coverage via _build_job_prompt with planted skills (injection payload, env exfil, zero-width space, clean control, missing-skill-doesn't-crash). Fixture patches tools.skills_tool.SKILLS_DIR / HERMES_HOME so planted skills are visible. Importantly uses the current cron.scheduler module object (not a top-level import) so tests don't break when other fixtures reload cron.scheduler — CronPromptInjectionBlocked identity depends on which module object defined it.	2026-05-07 07:44:10 -07:00
maciekczech	162ad3dd16	fix(kanban): filter dashboard board by selected tenant	2026-05-07 07:39:57 -07:00
maciekczech	f4de3810ef	test(kanban): cover dashboard select filter wiring	2026-05-07 07:39:57 -07:00
Teknium	74c9c0eec9	fix(mcp): gate utility stubs on server-advertised capabilities (#21347 ) For every connected MCP server we register four "utility" tool schemas (mcp_<server>_list_resources, read_resource, list_prompts, get_prompt). The existing gate was `hasattr(server.session, method)` — but `mcp.ClientSession` defines all four methods on the class regardless of what the remote server supports, so the gate never filtered anything. Tools-only servers (e.g. @upstash/context7-mcp which advertises only `tools`) ended up with 4 dead stubs; every model call to them returned JSON-RPC -32601 Method not found, which made the model conclude the server was broken even when the real tools worked. Capture the `InitializeResult` returned by `await session.initialize()` on the `MCPServerTask`, then gate each utility schema on the corresponding `capabilities` sub-object (resources / prompts). A legacy `hasattr` fallback runs when `initialize_result` is missing (older test fixtures / not-yet-captured code paths) so pre-existing behavior is preserved. Verified against real `mcp.types.InitializeResult` pydantic models: - Context7 shape (tools only) → 0 utility stubs registered (was 4) - Resources-only server → 2 stubs (list_resources, read_resource) - Prompts-only server → 2 stubs (list_prompts, get_prompt) - Fully capable server → all 4 stubs Closes #18051. Co-authored-by: nikolay-bratanov <nikolay-bratanov@users.noreply.github.com>	2026-05-07 07:39:50 -07:00
teknium1	898b6d7d55	fix(webhook): widen INSECURE_NO_AUTH loopback check + tests + docs Follow-up to the previous commit: - Add _is_loopback_host() helper covering 127.0.0.1, localhost, ::1, ip6-localhost, ip6-loopback (case-insensitive). Empty/None host is treated as non-loopback since unset usually means public default bind. - Fix mixed-indent comment in the safety rail (comment now aligned with the if-block) and collapse the nested-if into one condition. - Add TestInsecureNoAuthSafetyRail covering rejection on 0.0.0.0, a LAN IP, and empty host; allowance on 127.0.0.1/localhost; plus unit-level parametrized coverage of _is_loopback_host for spellings we can't bind in the hermetic test env (::1, ip6-localhost, ip6-loopback). - Pin test_connect_starts_server + test_webhook_deliver_only defaults to 127.0.0.1 so they keep passing under the new rail. - Document the behavior in website/docs/user-guide/messaging/webhooks.md.	2026-05-07 07:38:43 -07:00

1 2 3 4 5 ...

3400 commits