Commit graph

3400 commits

Author SHA1 Message Date
Teknium
3601e20f47 fix(windows): use PortableGit (not MinGit), fix relaunch os.execvp crash, surface npm errors
Three real bugs from teknium1's first Windows install run:

1. **MinGit has no bash.exe.**  MinGit is the minimal-automation Git for Windows
   distribution — it ships git.exe but deliberately strips bash and the POSIX
   coreutils.  Installer logged "Could not locate bash.exe" and Hermes would
   fail to run any shell command.  Switched to PortableGit — the full Git for
   Windows minus the installer UI.  PortableGit ships bash.exe at
   <root>\bin\bash.exe plus sh, awk, sed, grep, curl, ssh in usr\bin\.  ARM64
   variant is detected separately (PortableGit-*-arm64.7z.exe).  32-bit falls
   back to MinGit-32-bit with a warning (PortableGit is 64-bit only).

   PortableGit ships as a 7z self-extractor (56MB vs MinGit's 38MB).  We
   invoke it with `-o<target> -y` to extract silently — no 7z install needed,
   it's self-contained.

   Updated tools/environments/local.py::_find_bash candidate order to prefer
   the PortableGit layout (<root>\bin\bash.exe) with the MinGit layout
   (<root>\usr\bin\bash.exe) as a fallback so existing installs keep working.

2. **os.execvp "Exec format error" on Windows.**  Setup wizard's "Launch
   hermes chat now? Y" called `os.execvp(["hermes", "chat"])` which on
   Windows can only swap to real Win32 .exe files — chokes with OSError(8)
   on .cmd batch shims and Python console-script wrappers.  Added a
   win32 branch in hermes_cli/relaunch.py::relaunch() that uses
   subprocess.run + sys.exit — functionally identical (user sees "hermes
   exited, then new hermes started") with one extra PID in play.  POSIX
   path is UNCHANGED — still uses os.execvp for in-place replacement.
   Catches OSError in the Windows branch and surfaces a "open a new
   terminal so PATH picks up, then re-run hermes" hint instead of a
   cryptic traceback.

3. **npm install failures silent on Windows.**  The install.ps1 was invoking
   `npm install --silent 2>&1 | Out-Null` inside a try/catch.  PowerShell's
   try/catch does NOT trigger on non-zero process exit codes — only on
   unhandled .NET exceptions — so npm failing printed a generic "npm
   install failed" with zero information about WHY.  The silent pipe ate
   the stderr.

   Rewrote Install-NodeDeps to:
   - Resolve npm.cmd via Get-Command (respects PATHEXT) instead of
     relying on bare `npm` name resolution.
   - Use Start-Process with -PassThru to capture the actual exit code.
   - Redirect stderr to a temp log and surface the first ~800 chars of
     the real npm error when install fails, plus the log path for the
     full text.
   - Fail loudly with the right exit code instead of a misleading success.
   - Bail cleanly with a helpful message when npm isn't on PATH at all.

4. **"True" printing to console after Node check.**  `Test-Node` returns $true;
   installer called it as a bare statement (no assignment, no cast).  PowerShell
   prints bare return values.  Wrapped the call in `[void](Test-Node)`.

## Tests

- Added 3 new tests in tests/hermes_cli/test_relaunch.py covering the
  Windows branch: subprocess is called (not execvp), child exit code
  propagates, OSError surfaces a helpful message.  All 23 tests pass
  (20 existing + 3 new).
- 77 Windows-compat tests still pass, POSIX behaviour unchanged.
2026-05-08 14:27:40 -07:00
Teknium
e93bfc6c93 feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.

Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.

## New module

- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
  windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
  All no-ops on non-Windows.

## CRITICAL fixes (would crash or silently break on Windows)

- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
  AttributeError on import on Windows, breaking `hermes --tui` entirely (it
  spawns this module as a subprocess).  Guard each signal.signal() call with
  hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.

- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
  unguarded.  os.WNOHANG doesn't exist on Windows.  Gate the whole reap loop
  behind `os.name != "nt"` — Windows has no zombies anyway.

- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
  most Windows builds.  Fall back to loopback TCP (AF_INET on 127.0.0.1:0
  ephemeral port) when _IS_WINDOWS.  HERMES_RPC_SOCKET env var now accepts
  either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
  Generated sandbox client parses both.

- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded.  Use
  shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
  readable error when bash is genuinely absent.

- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
  (npm install + node version probe), browser_tool.py x2.  On Windows npm
  is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
  fails with WinError 193.  shutil.which(...) returns the absolute .cmd
  path which CreateProcessW accepts because the extension routes through
  cmd.exe /c.  POSIX behaviour unchanged (shutil.which still returns the
  same path subprocess would resolve itself).

## HIGH fixes (silent misbehaviour on Windows)

- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
  Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
  via MSYS2's virtual /tmp but native Python couldn't open.  Result: cwd
  tracking silently broken — `cd` in terminal tool did nothing.  Windows
  branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
  (works in both bash and Python, guaranteed no spaces).

- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
  in split(":")` heuristic mangles Windows PATH (";" separator).  Gate
  the injection behind `not _IS_WINDOWS`.

- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
  Popen + watcher-script Popen both used start_new_session=True, which
  Windows silently ignores.  Watcher stayed attached to CLI's console,
  died when user closed terminal after `hermes update`, left gateway
  stale.  Now branches through windows_detach_popen_kwargs() helper
  (CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
  Windows, start_new_session=True on POSIX — identical to main).

## MEDIUM fixes

- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
  chain crashes on Windows when user triggers /update in-gateway.  Now
  has sys.platform=="win32" branch using sys.executable + a tiny
  Python watcher with proper detach flags.  POSIX path is unchanged.

- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
  style paths that break subprocess.Popen(cwd=...) and Path().resolve().
  Added _normalize_git_bash_path() helper that translates /c/Users,
  /cygdrive/c, /mnt/c variants to native C:\Users form.  POSIX no-op.
  _git_repo_root() now routes every result through it.

- cli.py worktree .worktreeinclude: os.symlink on directories failed
  hard on Windows (requires admin or Developer Mode).  Falls back to
  shutil.copytree with a warning log.

## Tests

- 29 new tests in tests/tools/test_windows_native_support.py covering:
  subprocess_compat helpers, TUI entry signal guards, kanban waitpid
  guard, code_execution TCP fallback source-level invariants, cron bash
  resolution, npm/npx bare-spawn lint per-file, local env Windows temp
  dir, PATH injection gating, git bash path normalization, symlink
  fallback, gateway detached watcher flags.

- One existing test assertion adjusted in test_browser_homebrew_paths:
  it compared captured Popen argv to the BARE `"npx"` literal; after the
  shutil.which() change argv[0] is the absolute path.  New assertion
  checks the shape (two items, second is `agent-browser`) rather than
  the exact first-item string.  Behaviour unchanged; test was too strict.

All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.

## What's still deferred (LOW priority)

- Visible cmd-window flashes on short-lived console apps (~14 sites) —
  cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
  hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
  reachable only when all env-var candidates fail.
2026-05-08 14:27:40 -07:00
Teknium
b53bd12fe4 fix(windows-editor): default EDITOR=notepad so /edit and Ctrl+X Ctrl+E work
Pre-existing Windows bug surfaced while reviewing the portable-MinGit
install: prompt_toolkit's Buffer.open_in_editor() falls back to POSIX
absolute paths (/usr/bin/nano, /usr/bin/vi, /usr/bin/emacs) that don't
exist on native Windows.  When neither $EDITOR nor $VISUAL is set,
Ctrl+X Ctrl+E ("open prompt in editor") and /edit both silently do
nothing on Windows — the user hits the key, nothing happens, no error.

This wasn't caused by MinGit (full Git for Windows doesn't fix it either,
because the Windows Python subprocess call resolves `/usr/bin/nano` as
`C:\usr\bin\nano`, which doesn't exist even with nano installed).

Fixes:
- hermes_cli/stdio.py::configure_windows_stdio now sets EDITOR=notepad
  on Windows if neither EDITOR nor VISUAL is set.  notepad.exe is in
  every Windows install, works as a blocking editor (subprocess.call
  waits for the window to close), and writes back to the file.
- hermes_cli/config.py (hermes config edit): reorder fallback list so
  Windows tries notepad first — previously nano led the list, which
  required Git Bash / WSL to be in PATH.
- Users who want VSCode / Neovim / Notepad++ can still override via
  $env:EDITOR — that's checked before our default kicks in.  Docstring
  spells out the common overrides.

The Ink TUI (`hermes --tui`) already handled Windows correctly via
ui-tui/src/lib/editor.ts falling back to notepad.exe on win32 — this
commit brings the classic prompt_toolkit CLI into parity.

3 new tests in test_windows_native_support.py verify:
- EDITOR=notepad gets set when unset on Windows
- Explicit $EDITOR is respected
- $VISUAL is respected (not overwritten by our default)
2026-05-08 14:27:40 -07:00
Teknium
9de893e3b0 feat(windows): close native-Windows install gaps — crash-free startup, UTF-8 stdio, tzdata dep, docs
Native Windows (with Git for Windows installed) can now run the Hermes CLI
and gateway end-to-end without crashing.  install.ps1 already existed and
the Git Bash terminal backend was already wired up — this PR fills the
remaining gaps discovered by auditing every Windows-unsafe primitive
(`signal.SIGKILL`, `os.kill(pid, 0)` probes, bare `fcntl`/`termios`
imports) and by comparing hermes against how Claude Code, OpenCode, Codex,
and Cline handle native Windows.

## What changed

### UTF-8 stdio (new module)
- `hermes_cli/stdio.py` — single `configure_windows_stdio()` entry point.
  Flips the console code page to CP_UTF8 (65001), reconfigures
  `sys.stdout`/`stderr`/`stdin` to UTF-8, sets `PYTHONIOENCODING` + `PYTHONUTF8`
  for subprocesses.  No-op on non-Windows.  Opt out via `HERMES_DISABLE_WINDOWS_UTF8=1`.
- Called early in `cli.py::main`, `hermes_cli/main.py::main`, and
  `gateway/run.py::main` so Unicode banners (box-drawing, geometric
  symbols, non-Latin chat text) don't `UnicodeEncodeError` on cp1252
  consoles.

### Crash sites fixed
- `hermes_cli/main.py:7970` (hermes update → stuck gateway sweep): raw
  `os.kill(pid, _signal.SIGKILL)` → `gateway.status.terminate_pid(pid, force=True)`
  which routes through `taskkill /T /F` on Windows.
- `hermes_cli/profiles.py::_stop_gateway_process`: same fix — also
  converted SIGTERM path to `terminate_pid()` and widened OSError catch
  on the intermediate `os.kill(pid, 0)` probe.
- `hermes_cli/kanban_db.py:2914, 3041`: raw `signal.SIGKILL` →
  `getattr(signal, "SIGKILL", signal.SIGTERM)` fallback (matches the
  pattern already used in `gateway/status.py`).

### OSError widening on `os.kill(pid, 0)` probes
Windows raises `OSError` (WinError 87) for a gone PID instead of
`ProcessLookupError`.  Widened the catch at:
- `gateway/run.py:15101` (`--replace` wait-for-exit loop — without this,
  the loop busy-spins the full 10s every Windows gateway start)
- `hermes_cli/gateway.py:228, 460, 940`
- `hermes_cli/profiles.py:777`
- `tools/process_registry.py::_is_host_pid_alive`
- `tools/browser_tool.py:1170, 1206`

### Dashboard PTY graceful degradation
`hermes_cli/pty_bridge.py` depends on `fcntl`/`termios`/`ptyprocess`,
none of which exist on native Windows.  Previously a Windows dashboard
would crash on `import hermes_cli.web_server` because of a top-level
import.  Now:
- `hermes_cli/web_server.py` wraps the pty_bridge import in
  `try/except ImportError` and sets `_PTY_BRIDGE_AVAILABLE=False`.
- The `/api/pty` WebSocket handler returns a friendly "use WSL2 for
  this tab" message instead of exploding.
- Every other dashboard feature (sessions, jobs, metrics, config
  editor) runs natively on Windows.

### Dependency
- `pyproject.toml`: add `tzdata>=2023.3; sys_platform == 'win32'` so
  Python's `zoneinfo` works on Windows (which has no IANA tzdata
  shipped with the OS).  Credits @sprmn24 (PR #13182).

### Docs
- README.md: removed "Native Windows is not supported"; added
  PowerShell one-liner and Git-for-Windows prerequisite note.
- `website/docs/getting-started/installation.md`: new Windows section
  with capability matrix (everything native except the dashboard
  `/chat` PTY tab, which is WSL2-only).
- `website/docs/user-guide/windows-wsl-quickstart.md`: reframed as
  "WSL2 as an alternative to native" rather than "the only way".
- `website/docs/developer-guide/contributing.md`: updated
  cross-platform guidance with the `signal.SIGKILL` / `OSError`
  rules we enforce now.
- `website/docs/user-guide/features/web-dashboard.md`: acknowledged
  native Windows works for everything except the embedded PTY pane.

## Why this shape

Pulled from a survey of how other agent codebases handle native
Windows (Claude Code, OpenCode, Codex, Cline):

- All four treat Git Bash as the canonical shell on Windows, same as
  hermes already does in `tools/environments/local.py::_find_bash()`.
- None of them force `SetConsoleOutputCP` — but they don't have to,
  Node/Rust write UTF-16 to the Win32 console API.  Python does not get
  that for free, so we flip CP_UTF8 via ctypes.
- None of them ship PowerShell-as-primary-shell (Claude Code exposes
  PS as a secondary tool; scope creep for this PR).
- All of them use `taskkill /T /F` for force-kill on Windows, which
  is exactly what `gateway.status.terminate_pid(force=True)` does.

## Non-goals (deliberate scope limits)

- No PowerShell-as-a-second-shell tool — worth designing separately.
- No terminal routing rewrite (#12317, #15461, #19800 cluster) — that's
  the hardest design call and needs a separate doc.
- No wholesale `open()` → `open(..., encoding="utf-8")` sweep (Tianworld
  cluster) — will do as follow-up if users hit actual breakage; most
  modern code already specifies it.

## Validation

- 28 new tests in `tests/tools/test_windows_native_support.py` — all
  platform-mocked, pass on Linux CI.  Cover:
  - `configure_windows_stdio` idempotency, opt-out, env-preservation
  - `terminate_pid` taskkill routing, failure → OSError, FileNotFoundError fallback
  - `getattr(signal, "SIGKILL", …)` fallback shape
  - `_is_host_pid_alive` OSError widening (Windows-gone-PID behavior)
  - Source-level checks that all entry points call `configure_windows_stdio`
  - pty_bridge import-guard present in `web_server.py`
  - README no longer says "not supported"
- 12 pre-existing tests in `tests/tools/test_windows_compat.py` still pass.
- `tests/hermes_cli/` ran fully (3909 passed, 9 failures — all confirmed
  pre-existing on main by stash-test).
- `tests/gateway/` ran fully (5021 passed, 1 pre-existing failure).
- `tests/tools/test_process_registry.py` + `test_browser_*` pass.
- Manual smoke: `import hermes_cli.stdio; import gateway.run;
  import hermes_cli.web_server` — all clean, `_PTY_BRIDGE_AVAILABLE=True`
  on Linux (as expected).

## Files

- New: `hermes_cli/stdio.py`, `tests/tools/test_windows_native_support.py`
- Modified: `cli.py`, `gateway/run.py`, `hermes_cli/main.py`,
  `hermes_cli/profiles.py`, `hermes_cli/gateway.py`,
  `hermes_cli/kanban_db.py`, `hermes_cli/pty_bridge.py`,
  `hermes_cli/web_server.py`, `tools/browser_tool.py`,
  `tools/process_registry.py`, `pyproject.toml`, `README.md`, and 4
  docs pages.

Credits to everyone whose prior PR work informed these fixes — see
the co-author trailers.  All of the PRs listed in
`~/.hermes/plans/windows-support-prs.md` fixing `os.kill` / `signal.SIGKILL`
/ UTF-8 stdio / tzdata / README patterns found the same issues; this PR
consolidates them.

Co-authored-by: Philip D'Souza <9472774+PhilipAD@users.noreply.github.com>
Co-authored-by: Arecanon <42595053+ArecaNon@users.noreply.github.com>
Co-authored-by: XiaoXiao0221 <263113677+XiaoXiao0221@users.noreply.github.com>
Co-authored-by: Lars Hagen <1360677+lars-hagen@users.noreply.github.com>
Co-authored-by: Luan Dias <65574834+luandiasrj@users.noreply.github.com>
Co-authored-by: Ruzzgar <ruzzgarcn@gmail.com>
Co-authored-by: sprmn24 <oncuevtv@gmail.com>
Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com>
Co-authored-by: Prasanna28Devadiga <54196612+Prasanna28Devadiga@users.noreply.github.com>
2026-05-08 14:27:40 -07:00
Dilee
729a659a3c fix(teams-pipeline): add skill asset and fix async test env 2026-05-08 12:41:41 -07:00
Teknium
5e8dfc9f6d fix(teams-pipeline): fill in missing delivery URL in adapter-reuse test
test_build_pipeline_runtime_reuses_existing_teams_adapter_surface set
delivery_mode='incoming_webhook' but omitted incoming_webhook_url.
_teams_delivery_is_configured() requires the URL to mark delivery as
enabled, so the guarded build_pipeline_runtime gate in runtime.py
correctly left teams_sender=None and the assertion failed.

The intent of the test — prove we reuse the existing TeamsSummaryWriter
from plugins/platforms/teams/adapter.py rather than introducing a new
adapter surface elsewhere — is unchanged. Added the URL so the gate
passes and the architectural assertion holds.
2026-05-08 12:00:09 -07:00
Dilee
397f750bb4 feat(teams): add pipeline outbound delivery via existing adapter 2026-05-08 12:00:09 -07:00
Teknium
a99547740d fix(teams-pipeline): drop-scheduler fallback + test wiring for enablement gate
Two salvage follow-ups on top of @dlkakbs's plugin runtime.

1. Install a drop-scheduler when the runtime fails to build.

   Previously when ``build_pipeline_runtime()`` raised (e.g. missing
   Graph env vars, subscription store path unwritable), ``bind_gateway_runtime``
   logged a warning and returned False, leaving the msgraph_webhook
   adapter with no scheduler at all. Incoming Graph notifications
   would then fall back to the adapter's default ``handle_message``
   path, which produces a raw JSON dump as a user-role message — not
   useful and fires every time Graph retries.

   Now a no-op drop-scheduler is installed instead, so:
   - Graph notifications ack cleanly (202) so Graph stops retrying.
   - The failure is surfaced once in the log with the error.
   - No user-role messages get manufactured from raw change payloads.

   The adapter is still bindable later once the runtime becomes
   available (e.g. after the operator runs ``hermes teams-pipeline
   validate`` and fixes the config), since the gateway's
   ``_teams_pipeline_runtime`` sentinel wasn't set to a non-None value.

2. Test wiring for ``_teams_pipeline_plugin_enabled()`` gate.

   The happy-path runner-wiring tests monkeypatched ``bind_gateway_runtime``
   but not ``_load_gateway_config``. In the hermetic test environment
   the real config read ran, saw no enabled plugins, and short-circuited
   the bind call before the test could observe it — so the test
   expected ``calls == [runner]`` but got ``calls == []``.

   Adds a ``_load_gateway_config`` monkeypatch with
   ``plugins.enabled = ["teams_pipeline"]`` to the happy-path tests.
   The explicit-disabled test ``test_gateway_runner_skips_wiring_when_teams_pipeline_plugin_disabled``
   already patches the config correctly.

   Also renames ``test_bind_gateway_runtime_leaves_scheduler_unchanged_on_failure``
   to ``test_bind_gateway_runtime_installs_drop_scheduler_on_failure``
   and updates the assertion — this test contradicted the drop-scheduler
   test in ``tests/plugins/test_teams_pipeline_plugin.py`` which
   expected the scheduler to be installed. The plugin-test name
   (``test_bind_gateway_runtime_drops_notifications_when_unavailable``)
   clearly describes the intended behavior; fixing the wiring-test
   assertion aligns both tests.

Validation:
- ``scripts/run_tests.sh tests/plugins/test_teams_pipeline_plugin.py
  tests/gateway/test_teams_pipeline_runtime_wiring.py
  tests/hermes_cli/test_teams_pipeline_plugin_cli.py`` — 25/25 passed.
2026-05-08 11:18:14 -07:00
Dilee
07bbd93337 feat(teams-pipeline): add plugin runtime and operator cli
Third slice of the Microsoft Teams meeting pipeline stack, salvaged
onto current main. Adds the standalone teams_pipeline plugin that
consumes Graph change notifications from the webhook listener,
resolves meeting artifacts (transcript first, recording + STT fallback
later), persists job state in a durable store, and exposes an operator
CLI for inspection, replay, subscription management, and validation.

Design choices follow maintainer review feedback on PR #19815:

- Standalone plugin rather than bolted-on core surface
  (plugins/teams_pipeline/, kind: standalone in plugin.yaml).
- Zero new model tools. The agent drives the pipeline by invoking
  the operator CLI via the terminal tool, guided by the skill that
  ships with a follow-up PR.
- Reuses the existing msgraph_webhook gateway platform for Graph
  ingress. Pipeline runtime is wired in via bind_gateway_runtime and
  gated on plugins.enabled so gateways that don't run the plugin
  boot cleanly.

Additions:

- plugins/teams_pipeline/: runtime (gateway wiring + config builder),
  pipeline core, durable SQLite store, subscription maintenance
  helpers, Graph artifact resolution, operator CLI (list, show,
  run/replay, fetch dry-run, subscriptions list, subscribe,
  renew-subscription, delete-subscription, maintain-subscriptions,
  token-health, validate).
- hermes_cli/main.py: second-pass plugin CLI discovery so any
  standalone plugin registered via ctx.register_cli_command()
  outside the memory-plugin convention path gets its subcommand
  wired into argparse without touching core.
- gateway/run.py: _teams_pipeline_plugin_enabled() config gate,
  _wire_teams_pipeline_runtime() binding after adapter setup, and
  the two runner attributes used by the runtime.

Credit to @dlkakbs for the entire plugin implementation.
2026-05-08 11:18:14 -07:00
Teknium
d0aad4b021 fix(computer-use): harden image-rejection fallback + AUTHOR_MAP
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.

_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.

Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.

Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.

AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
2026-05-08 11:07:38 -07:00
Teknium
850413f120 feat(computer-use): cua-driver backend, universal any-model schema
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.

Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.

- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
  CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
  Actions: capture (som/vision/ax), click, double_click, right_click,
  middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
  `content: [text, image_url]` parts) that flows through
  handle_function_call into the tool message. Anthropic adapter converts
  into native `tool_result` image blocks; OpenAI-compatible providers
  get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
  recent screenshots carry real image data; older ones become text
  placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
  their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
  instead of its base64 char length (~1MB would have registered as
  ~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
  is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
  and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
  through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
  (curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
  force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
  workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
  entries.

44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.

469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.

- `model_tools.py` provider-gating: the tool is available to every
  provider. Providers without multi-part tool message support will see
  text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
  client-side eviction + compressor pruning cover the same cost ceiling
  without a beta header.

- macOS only. cua-driver uses private SkyLight SPIs
  (SLEventPostToPid, SLPSPostEventRecordTo,
  _AXObserverAddNotificationAndCheckRemote) that can break on any macOS
  update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
  prints the Settings path.

Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
2026-05-08 11:07:38 -07:00
Teknium
b8d7e0e6d3 fix(msgraph_webhook): harden auth surface + IP allowlisting + response hygiene
Defense-in-depth polish on top of the webhook listener before it becomes
a real attack surface once the pipeline starts creating subscriptions
and Graph starts POSTing to the configured public URL.

- Timing-safe clientState comparison. Previously used `==` on strings;
  switches to hmac.compare_digest so a mismatch does not leak how many
  leading characters matched. client_state is documented as a strong
  shared secret (openssl rand -hex 32 in the setup docs), so a
  timing-safe primitive is the right call.

- Split GET and POST handlers. Graph validates a subscription by sending
  GET with validationToken in the query; anything else on GET is now a
  400 so the endpoint cannot be probed or mistakenly used for data
  exfil. Previously a bare GET fell through to the POST path and blew
  up on request.json() with a confusing 400.

- Empty response bodies on success. 202 is returned with no body so
  internal counters (accepted / duplicates / scheduled) do not leak to
  any caller that can reach the endpoint; counters remain observable
  via /health for operators. 403 on every-item-bad-clientState batches
  (so forged POSTs stop retrying), 400 on malformed / unknown-resource
  batches (sender configuration issue).

- Optional source-IP allowlist. New `allowed_source_cidrs` extra field
  (list or comma-separated string) and `MSGRAPH_WEBHOOK_ALLOWED_SOURCE_CIDRS`
  env var let operators restrict the webhook to Microsoft Graph's
  published webhook source ranges in production. Empty = allow all,
  preserving dev-tunnel / localhost workflows. Invalid CIDRs are
  logged and ignored rather than crashing. Also gates the handshake
  endpoint so disallowed IPs cannot probe it.

- Tests updated for the new response contract (empty-body 202,
  auth-only 403, config-error 400) and extended to cover: bare GET
  rejection, POST-with-validationToken handshake tolerance,
  timing-safe compare actually invoked via hmac.compare_digest spy,
  malformed body / missing value array, IP allowlist accept/reject
  paths, handshake IP allowlist, invalid CIDR entries, comma-string
  CIDR list parsing. 52/52 passed (was 40).

Full gateway suite: 5049 passed / 1 pre-existing failure in
test_discord_free_response (unrelated, reproduces on clean origin/main).
2026-05-08 10:29:58 -07:00
Dilee
26a59e4f6c fix(msgraph): normalize webhook dedupe and resource matching 2026-05-08 10:29:58 -07:00
Dilee
2a215de9af fix(msgraph): bound webhook receipt dedupe cache 2026-05-08 10:29:58 -07:00
Dilee
46a6f39024 feat(msgraph): add webhook listener platform 2026-05-08 10:29:58 -07:00
Teknium
f209a35859
feat(profile): shareable profile distributions via git (#20831)
* feat(profile): shareable profile distributions (pack/install/update/info)

Closes #20456.

Turns a profile into a portable, versioned artifact. Packs SOUL.md, config,
skills, cron, and an env-var manifest into a tar.gz that others can install
from a local path, URL, or git repo. Updates re-pull the distribution while
preserving user data (memories, sessions, auth.json, .env) and the user's
config.yaml overrides.

New subcommands (under hermes profile, no parallel tree):
  hermes profile pack    <name> [-o FILE]
  hermes profile install <source> [--name N] [--alias] [--force] [-y]
  hermes profile update  <name> [--force-config] [-y]
  hermes profile info    <name>

Manifest (distribution.yaml at the profile root): name, version,
hermes_requires, author, env_requires, distribution_owned.

Security:
  - Installer shows manifest + env-var requirements before mutating disk;
    confirmation required unless -y.
  - auth.json and .env are never packed (same exclude set as profile export).
  - Cron jobs are packed but NOT auto-scheduled — user is pointed at
    'hermes -p <name> cron list' to review.
  - Archive extraction rejects path traversal (../ members).
  - Alias creation is opt-in via --alias.

Update semantics:
  - Distribution-owned paths (SOUL.md, skills/, cron/, mcp.json, manifest):
    replaced from the new archive.
  - config.yaml: preserved by default; --force-config to overwrite.
  - User-owned paths (memories/, sessions/, auth.json, .env, state.db*,
    logs/, workspace/, plans/, home/, *_cache/, local/): never touched.

Version pin:
  hermes_requires accepts >=, <=, ==, !=, >, < or a bare version (treated
  as >=). Install fails with a clear error when the running Hermes version
  doesn't satisfy the spec.

Sources supported by 'install':
  - Local .tar.gz / .tgz archive
  - Local directory
  - HTTP(S) URL pointing to a .tar.gz (uses httpx, already a dep)
  - Git URL (github.com/user/repo, https://..., git@..., ssh://, git://)

Tests: 43 new unit tests (manifest parsing, version checks, env template,
pack/install/update round-trip, config-preservation, security).
E2E validated via real CLI invocations against an isolated HERMES_HOME
covering pack, install with confirmation, update preservation, update
--force-config, decline-preview, duplicate-install rejection, and
version-requirement rejection.

* refactor(profile-dist): git-only — drop tar.gz/HTTP transports and pack

Scope-cut on top of the original distribution PR: a profile distribution
is now exclusively a git repository (or a local directory during
development). The tar.gz / HTTP archive transports and the matching
`hermes profile pack` subcommand have been removed.

Why:
* GitHub tags, branches, and commits are already the right versioning
  primitive. Tag pushes do for us what 'pack + upload' did.
* `hermes profile export` / `import` already cover local backup and
  restore; they are not a distribution format and stay untouched.
* One transport means one install/update code path, one doc page,
  and one mental model. The extra source types doubled the surface
  for no real user win — GitHub auto-attaches release tarballs, and
  `git bundle` / `git clone --mirror` cover the airgap case.

Changes:
* hermes_cli/profile_distribution.py — removed pack_profile,
  _fetch_tar_archive (_http_fetch), _safe_extract, _archive_roots,
  _safe_parts, _find_dist_root, tarfile/io/urlparse imports. The
  new _stage_source has two arms: git URL → clone, local directory
  → use in place.
* hermes_cli/main.py — removed the 'pack' subparser and action
  handler. Install help text updated to match the reduced source list.
* tests/hermes_cli/test_profile_distribution.py — rewritten around a
  local-directory staging fixture. The install/update/describe suites
  now build a distribution tree on disk directly and install from it,
  which is what a real git clone produces after .git is stripped.
  Dropped TestPack, TestFindDistRoot, and the tar-specific security
  test. New tests cover _looks_like_git_url, env_example emission,
  hermes_requires enforcement, and 'installer does not import
  credentials if an author mistakenly leaks them in the staging tree'.
* website/docs/reference/profile-commands.md — 'Distribution commands'
  section rewritten around git. Added a 'Publishing a distribution'
  section. export/import stay documented as local backup/restore.
* website/docs/reference/cli-commands.md — dropped 'pack' from the
  profile subcommand table.
* website/package.json — 'lint:diagrams' now passes
  --exclude-code-blocks to ascii-guard. Without it, markdown tables
  and box-drawing diagrams inside fenced code blocks were being
  misidentified as malformed ASCII boxes, blocking the PR's
  docs-site-checks CI with 8 false-positive errors.

Validation:
* Targeted suite: tests/hermes_cli/test_profile_distribution.py —
  56/56 pass (down from 43 — reorganized to cover the new
  local-dir paths).
* Regression: test_profiles.py + test_profile_export_credentials.py
  102/102 still pass. export/import behaviour unchanged.
* Docs lint: ascii-guard lint --exclude-code-blocks docs returns
  0 errors (was 8 on the PR before the flag bump).
* E2E: ran the real `hermes profile install`/`info` against a
  local staging dir under an isolated HERMES_HOME — install writes
  SOUL.md + skills to the target profile, info reads the manifest
  back, a bogus source produces a clear error, and `hermes profile
  pack` is now rejected by argparse as expected.

* feat(profile-dist): distribution-aware list/show/delete + installed_at + env preview

Polish pass on top of the git-only scope cut. Five additions, all small,
wiring into existing commands rather than adding new surface.

1. `installed_at` timestamp on the manifest
   * Stamped automatically inside plan_install() on both fresh install
     and update — ISO-8601 UTC, seconds resolution.
   * Surfaced in `hermes profile info` as `Installed:    <ts>`.
   * Lets users tell "installed 6 months ago, needs update" from
     "installed yesterday" without guessing from file mtimes.

2. `hermes profile list` grows a `Distribution` column
   * Plain profiles: "—"
   * Distribution profiles: "<name>@<version>" (e.g. `telemetry@1.2.3`)
   * ProfileInfo gains three optional fields — distribution_name,
     distribution_version, distribution_source — populated by a new
     _read_distribution_meta() helper that swallows manifest read errors
     so a broken distribution.yaml in one profile can't break `list`
     for the others.

3. `hermes profile show` and `hermes profile delete` surface
   distribution provenance
   * show: `Distribution: name@version` + `Installed from: <source>`
     plus a pointer to `hermes profile info <name>` for the full
     manifest.
   * delete: same lines in the pre-confirmation preview, so a user
     deleting "telemetry" can see it came from
     `github.com/kyle/telemetry-distribution` before they type
     `telemetry` to confirm. No change to the confirmation gate itself —
     deletion semantics are identical to plain profiles.

4. Install preview checks env vars against the current environment
   * Replaces the "Env vars you'll need to set:" header with a simpler
     "Env vars:" block.
   * Each required var is labeled:
     - `✓ set` — already in `os.environ` OR present as a key in the
       target profile's existing .env (update case).
     - `needs setting` — required but not found in either place.
     - `—` — optional.
   * Mirrors pip's "Requirement already satisfied" UX: no unnecessary
     nagging about keys the user already has configured.

5. Docs: private distributions
   * New "Private distributions" section in
     website/docs/reference/profile-commands.md explaining that we
     shell out to the user's `git` binary, so SSH keys / credential
     helpers / GitHub CLI stored creds all work transparently. One
     paragraph, two examples.
   * `hermes profile info` section updated to mention `Installed:`.

Module-level hoist:
* `from datetime import datetime, timezone` was previously lazy-imported
  inside plan_install(). Hoisted to module scope so tests can monkeypatch
  `hermes_cli.profile_distribution.datetime` to freeze time.

Tests (+7):
* TestInstalledAtStamp.test_install_stamps_installed_at — format check
  (4-digit year, 'T', +00:00 suffix).
* TestInstalledAtStamp.test_update_refreshes_installed_at — freezes
  datetime.now() to 2099-01-01 and confirms update writes a new stamp.
* TestProfileInfoDistribution.test_installed_distribution_shows_in_list
  — ProfileInfo.distribution_{name,version,source} populated after install.
* TestProfileInfoDistribution.test_plain_profile_has_no_distribution_fields
  — plain profiles have None.
* TestProfileInfoDistribution.test_malformed_manifest_does_not_break_list
  — broken distribution.yaml in one profile doesn't break list_profiles().

Validation:
* 163/163 tests pass (56 distribution + 102 profile regression +
  5 new from this commit — up from 158).
* docs-lint: 0 errors.
* E2E verified: install preview shows ✓/needs-setting per env var,
  `profile list` shows Distribution column, `profile show` + `delete`
  preview mentions source URL, `info` shows Installed: timestamp.

* fix(profile-dist): clean errors + warn when overwriting plain profiles

Two small polish fixes found during collision sweeps of the PR:

1. ValueError from validate_profile_name now caught cleanly
   * A distribution.yaml whose 'name' field can't be used as a profile
     identifier (spaces, path traversal, etc.) raises ValueError from
     hermes_cli.profiles.validate_profile_name, which was escaping as a
     raw Python traceback from 'hermes profile install/update/info'.
   * Broadened the except clause in all three handlers to catch
     (DistributionError, ValueError) — users now see:
       Error: Invalid profile name '../../etc/passwd'. Must match
              [a-z0-9][a-z0-9_-]{0,63}
     instead of a stack trace.

2. Install preview distinguishes plain profile overwrite from
   distribution re-install
   * When plan.target_dir exists and IS a distribution (has
     distribution.yaml), preview still shows the mild
       (profile exists — will overwrite distribution-owned files only)
   * When plan.target_dir exists but is a HAND-BUILT plain profile (no
     distribution.yaml), preview now shows a loud warning:
       ⚠ Profile exists but is NOT a distribution.  Installing here will
         overwrite its SOUL.md, skills/, cron/, and mcp.json.
         Your memories, sessions, auth.json, and .env will be preserved,
         but any hand-edits to distribution-owned files will be lost.
   * Users who type 'hermes profile install foo --force' against a
     profile they hand-built now see what they're signing up for. User
     data is still safe (memories, sessions, auth, .env are in
     USER_OWNED_EXCLUDE), but custom SOUL/skills get stomped.

Tests (+2):
* TestErrorSurfaces.test_bad_profile_name_raises_valueerror_not_traceback
* TestErrorSurfaces.test_path_traversal_name_rejected

Validation:
* 165/165 tests pass (was 163).
* E2E: bad manifest names produce 'Error: Invalid profile name ...'
  with no traceback; installing over a plain profile shows the warning;
  re-installing over an existing distribution shows the normal
  overwrite message.
* Bad HTTPS URLs still produce 'Error: git clone failed: ...' — git
  itself generates a clean enough message that no wrapper is needed.
* 'install .' works correctly from any cwd.

* fix(profiles): reject reserved names at validate time

Before: `hermes profile create hermes` / `profile install` / `profile rename`
all silently accepted reserved names like `hermes`, `test`, `tmp`, `root`,
`sudo`. The profile directory was created; only alias creation failed (via
check_alias_collision), leaving a confusingly-named profile on disk — e.g.
`~/.hermes/profiles/hermes/` sitting next to `~/.hermes/` itself.

The reserved set already exists (_RESERVED_NAMES, introduced alongside alias
collision detection). This commit moves the check up one layer to
validate_profile_name so every entry point — create, install, import,
rename, dashboard web API — shares the same gate.

The error message points the user at the cause without being cryptic:
  Error: Profile name 'hermes' is reserved — it collides with either the
  Hermes installation itself or a common system binary.  Pick a different
  name.

`default` continues to pass through (it's a special alias for ~/.hermes).
_HERMES_SUBCOMMANDS (`chat`, `model`, `gateway`, etc.) stays at
alias-collision time only — those are fine as bare profile names with
`--no-alias`.

Tests (+5): test_reserved_names_rejected parametrized over the full
_RESERVED_NAMES set, matching the existing pattern in TestValidateProfileName.

No existing test uses a reserved name as a profile identifier (greppped
create_profile("hermes|test|tmp|root|sudo") — zero hits).

Validation:
* 170/170 tests pass in the profile suites.
* E2E: `profile create hermes`, `profile install` with manifest
  name=hermes, and `profile install ... --name hermes` all produce the
  same clean `Error: Profile name 'hermes' is reserved ...` with rc=1
  and no traceback. Normal names (`mybot`) still work.
2026-05-08 10:04:32 -07:00
Teknium
45d860d424 fix(msgraph): stream download_to_file body instead of buffering
The prior implementation routed download_to_file through the shared
_request() path, which uses httpx.AsyncClient.request() inside a
context manager that closes before aiter_bytes() iterates. The body
was read into memory first and the chunked write loop replayed it
from buffer. On small test payloads this was invisible; on real
Teams meeting recordings (hundreds of MB) it would force the full
artifact into RAM per download.

Rewrites download_to_file to open its own AsyncClient and use
client.stream(), keeping the context open across the aiter_bytes
iteration so the body is actually streamed chunk-by-chunk to disk.
Retry/token-refresh/Retry-After semantics are preserved by handling
them inline on the stream path. Partial .part files are cleaned up
on transport errors and on exhausted retries.

Adds three tests: large-payload streaming verifies the chunk loop
runs multiple times (discriminator: 512 KiB at chunk_size=65536
yields 8 chunks under streaming, 1 under buffering), transient-5xx
retry recovers after a single retry, and exhausted-retry cleans up
the partial file.
2026-05-08 09:27:26 -07:00
Dilee
b878f89f66 test(msgraph): cover concurrent token cache reuse 2026-05-08 09:27:26 -07:00
Dilee
a152c706b7 feat(msgraph): add auth and client foundation 2026-05-08 09:27:26 -07:00
Teknium
839cdd1b05 fix(approval): cron jobs must not be treated as gateway context
The new _is_gateway_approval_context() widened the gateway classification
to any call with HERMES_SESSION_PLATFORM bound via contextvars. But
cron/scheduler.py binds that same contextvar for delivery routing on
cron jobs that originate from a gateway platform (telegram/discord/etc.),
so those jobs were getting routed through submit_pending with no
listener — blocking indefinitely instead of honoring approvals.cron_mode.

Short-circuit on HERMES_CRON_SESSION before any gateway check. Cron is
always governed by cron_mode config, regardless of where the job was
scheduled from.

Adds regression coverage in TestCronWithGatewayOrigin and records the
contributor email mapping for scripts/release.py.
2026-05-08 07:30:14 -07:00
Zhicheng Han
526c0e018a feat(api-server): expose run approval events 2026-05-08 07:30:14 -07:00
Teknium
674fad1483
fix(goals): Ctrl+C during /goal loop auto-pauses the goal (#21888)
Reported: Ctrl+C during an active /goal loop felt like it did nothing —
the agent would interrupt the current turn, then immediately queue another
continuation and keep going until the session ended or the 20-turn budget
ran out.

Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally:
block around self.chat(...) unconditionally. Whether the turn completed
normally, got interrupted, or returned an empty string, the judge ran on
whatever was in conversation_history and — because the judge is fail-open
— a "continue" verdict pushed another CONTINUATION_PROMPT onto
_pending_input. Ctrl+C was invisible to the hook.

Fix:
- chat() now captures result['interrupted'] onto self._last_turn_interrupted
  (resets to False at entry so early-returns don't leak prior state).
- _maybe_continue_goal_after_turn() checks the flag first: on interrupt,
  auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print
  a one-liner pointing the user at /goal resume or /goal clear. No judge
  call, no continuation enqueued.
- Also added an empty-response guard that mirrors gateway/run.py's
  _handle_message logic (empty reply → transient failure → skip judging
  so we don't trip the consecutive-parse-failures backstop unnecessarily).

The goal stays in the DB as paused, so /goal resume recovers it after
the user has sorted out whatever made them cancel. /goal clear still
works as before for a full stop.

Tests: tests/cli/test_cli_goal_interrupt.py covers:
  - interrupted turn pauses + doesn't queue + judge is NOT called
  - paused goal is resumable
  - empty / whitespace / missing assistant reply skips judging
  - healthy turn still enqueues continuation / marks done
  - chat() resets _last_turn_interrupted at entry (anti-leak guard)

All 55 existing goal tests still pass.
2026-05-08 06:53:13 -07:00
Shannon Sands
80775d7585 test(auth): assert Nous refresh rotation payload 2026-05-08 04:17:42 -07:00
Shannon Sands
b32461f6e8 fix(auth): send Nous refresh token via header 2026-05-08 04:17:42 -07:00
Teknium
486b14b423
feat(cron): routing intent — deliver=all fans out to every connected channel (#21495)
Adds one reserved token to the cron `deliver` field:

- `all` — expand to every platform with a configured home channel

Resolves at fire time, not create time, so a job created before Telegram
was wired up picks it up once `TELEGRAM_HOME_CHANNEL` is set. Composes
with existing targets: `origin,all`, `all,telegram:-100:17`.

Inspired by Vellum Assistant's reminder routing-intent system.

## Changes
- cron/scheduler.py: _expand_routing_tokens + integrate into _resolve_delivery_targets
- tools/cronjob_tools.py: schema description updated
- tests/cron/test_scheduler.py: TestRoutingIntents (5 cases)
- website/docs/user-guide/features/cron.md: docs + table rows

## Validation
- tests/cron/test_scheduler.py -k 'Routing or Deliver' → 57 passed
2026-05-08 04:17:21 -07:00
kshitijk4poor
81928f03ab refactor(gmi): move User-Agent to profile.default_headers
The previous revision of this PR added six GMI-specific branches
(`elif base_url_host_matches(..., 'api.gmi-serving.com')`) across
run_agent.py and agent/auxiliary_client.py, plus a _HERMES_UA_HEADERS
constant in auxiliary_client.py.

ProviderProfile already has a `default_headers: dict[str, str]` field
commented as 'Client-level quirks (set once at client construction)'.
Other plugins (ai-gateway, kimi-coding) already use it. Two of the four
auxiliary_client sites we previously patched already had a generic
`else: profile.default_headers` fallback that picked it up (so did
both run_agent sites).

This revision:

* Sets `default_headers={'User-Agent': 'HermesAgent/<ver>'}` on the
  GMI profile in plugins/model-providers/gmi/__init__.py.
* Reverts all six GMI-specific branches in run_agent.py and
  auxiliary_client.py.
* Adds the generic profile-fallback `else` block to the two
  auxiliary_client sites (`_to_async_client`, `resolve_provider_client`)
  that didn't have it yet. This benefits every provider whose profile
  declares default_headers, not just GMI — e.g. Vercel AI Gateway's
  HTTP-Referer/X-Title now flow through the async client path too.
* Replaces the GMI-specific URL-branch tests with a profile-level
  assertion and keeps the run_agent integration test (with
  `provider='gmi'` so the fallback picks up the profile).

Net diff vs main: +82/-0 across 5 files, touching only the GMI plugin,
two generic fallback blocks in auxiliary_client.py, AUTHOR_MAP, and
tests. No core files change.

Based on #20907 by @isaachuangGMICLOUD.
2026-05-08 03:22:11 -07:00
Teknium
307c85e5c1 fix(goals): auto-pause when judge model returns unparseable output
Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose
when asked for the strict {done, reason} JSON verdict. The old code
failed-open to continue on every such turn, burning the entire turn
budget with log lines like

  judge returned empty response
  judge reply was not JSON: "Let me analyze whether the goal..."

and /goal clear could not stop it mid-loop without /stop.

After N=3 consecutive *parse* failures (transport/API errors don't
count — those are transient), the loop auto-pauses and prints:

  ⏸ Goal paused — the judge model (3 turns) isn't returning the
  required JSON verdict. Route the judge to a stricter model in
  ~/.hermes/config.yaml:
    auxiliary:
      goal_judge:
        provider: openrouter
        model: google/gemini-3-flash-preview
  Then /goal resume to continue.

The counter resets on any usable reply (both "done"/"continue" and
API errors) and persists across GoalManager reloads so cross-session
resumes carry the correct state.

Also fixes test_goal_verdict_send.py sharing a hardcoded session_id
across tests — the shared id only worked because the previous
_post_turn_goal_continuation was a never-awaited coroutine. Now that
PR #19160 made it properly awaited, the xdist test-leakage bug
surfaced. Each test gets a unique session_id via uuid suffix.
2026-05-07 17:33:09 -07:00
JC
03ddff8897 fix(gateway): defer goal status notices until after response delivery
Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.
2026-05-07 17:33:09 -07:00
Austin Pickett
7f92e5506e
Merge pull request #20942 from NousResearch/austin/fix/personality
fix(tui): preserve session when switching personality
2026-05-07 18:54:29 -04:00
teknium
292f468366 fix(mcp): unwrap platforms key in channels_list
channels_list was iterating directory.items() directly, yielding
("updated_at", str) and ("platforms", dict) pairs — neither passed
the isinstance(entries_list, list) check, so the inner loop never ran
and every call returned count=0 even when channel_directory.json was
populated.

The writer (gateway/channel_directory.py) wraps the payload as
{"updated_at": ..., "platforms": {...}}; every other reader in the
codebase unwraps via directory.get("platforms", {}). This aligns
channels_list with that convention.

Also tightens the existing test_channels_with_directory test, which
bypassed the bug by asserting against _load_channel_directory() directly
instead of calling channels_list. It now calls the tool end-to-end and
a new test_channels_with_directory_platform_filter covers the filter
path. Both tests fail against the pre-fix code.

Closes #21474

Co-authored-by: chrisworksai <262485129+chrisworksai@users.noreply.github.com>
2026-05-07 13:41:16 -07:00
Blake Johnson
9076a2e74e fix(agent): keep Nous GPT-5 fallback on chat completions 2026-05-07 13:04:42 -07:00
Teknium
24d48ffb82
feat(kanban): add specify — auxiliary LLM fleshes out triage tasks (#21435)
* feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks

The Triage column shipped with a placeholder 'a specifier will flesh
out the spec', but the specifier itself was never built. This wires
it up as a dedicated CLI verb.

`hermes kanban specify <id>` calls the auxiliary LLM (configured under
`auxiliary.triage_specifier`) to expand a rough one-liner into a
concrete spec — tightened title plus a body with Goal / Approach /
Acceptance criteria / Out-of-scope sections — then atomically flips
`status: triage -> todo` and recomputes ready so parent-free tasks
go straight to the dispatcher on the same tick.

Surface:

  hermes kanban specify <task_id>               # single task
  hermes kanban specify --all [--tenant T]      # sweep triage column
  hermes kanban specify ... --author NAME       # audit-comment author
  hermes kanban specify ... --json              # one JSON line per task

Design choices:

  - Parent gating is preserved. specify_triage_task flips to 'todo',
    then recompute_ready promotes to 'ready' only when parents are
    done — same rule as a normal parent-gated todo.
  - No daemon, no background watcher. Every invocation is explicit —
    keeps cost predictable and doesn't fight the dispatcher loop.
  - Response parse is lenient: strict JSON preferred, markdown-fence
    tolerated, raw-body fallback on malformed JSON so the LLM can't
    strand a task in triage.
  - All failure modes (no aux client, API error, task moved out of
    triage mid-call) return SpecifyOutcome(ok=False, reason=...) so
    --all continues past individual failures.

Changes:

  hermes_cli/kanban_db.py    + specify_triage_task()
  hermes_cli/kanban_specify.py  NEW (~220 LOC — prompt, parse, call)
  hermes_cli/kanban.py       + specify subcommand + _cmd_specify
  hermes_cli/config.py       + auxiliary.triage_specifier task slot
  website/docs/user-guide/features/kanban.md  specify + config notes
  website/docs/reference/cli-commands.md      CLI reference entry
  tests/hermes_cli/test_kanban_specify_db.py    NEW (10 tests)
  tests/hermes_cli/test_kanban_specify.py       NEW (20 tests)

Validation: 30/30 targeted tests pass. E2E: triage task -> specify ->
ends in 'ready' with events [created, specified, promoted] and the
audit comment recorded under the configured author.

* feat(kanban): wire specifier into dashboard and gateway slash

Follow-ups to the initial PR #21435 — closes the two gaps I'd left as
post-merge: dashboard button and first-class gateway surface.

Dashboard (plugins/kanban/dashboard/)
  - POST /tasks/:id/specify  NEW endpoint. Thin wrapper around
    kanban_specify.specify_task(). Returns the CLI outcome shape
    ({ok, task_id, reason, new_title}); ok=false with a human reason
    is a 200, not a 4xx, so the UI can render it inline without
    treating 'no aux client configured' as a crash.
  - Runs sync in FastAPI's threadpool because the LLM call can take
    tens of seconds on reasoning models.
  - Pins HERMES_KANBAN_BOARD around the specify call so the module's
    argless kb.connect() lands on the right board.
  - dist/index.js: doSpecify callback threaded through the drawer →
    TaskDetail → StatusActions prop chain.  Specify button appears
    ONLY when task.status === 'triage' (elsewhere the backend would
    reject anyway — hide the button to keep the action row clean).
    Busy state (Specifying…) + inline success/error banner under the
    button using the response.reason text.
  - dist/style.css: tiny hermes-kanban-msg-ok / -err classes using
    existing --color vars so themes reskin cleanly.

Gateway slash (/kanban specify)
  - Already works via the existing run_slash → build_parser →
    kanban_command pipeline. No code change needed — slash commands
    inherit the argparse tree automatically. Added coverage:
    test_run_slash_specify_end_to_end (create --triage, specify, verify
    promotion + retitle) and test_run_slash_specify_help_is_reachable.

Tests
  - tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the
    REST endpoint — happy path, non-triage rejection as ok=false 200,
    missing aux client as ok=false 200.
  - tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests.

Docs
  - website/docs/user-guide/features/kanban.md: dashboard action row
    description mentions  Specify + all three surfaces. REST table
    gains /tasks/:id/specify. Slash examples include /kanban specify.

Validation: 340/340 targeted tests pass. E2E via TestClient: create a
triage task over REST → POST /specify with mocked aux client → task
moves to 'ready' column on /board with new title and body applied.
2026-05-07 13:04:41 -07:00
adybag14-cyber
732a6c45fa feat: add termux doctor fallback guidance for blocked extras 2026-05-07 13:04:08 -07:00
adybag14-cyber
dc5ef1ac8e fix: add termux-all install profile and safe fallbacks 2026-05-07 13:04:08 -07:00
adybag14-cyber
da18fd084a fix: strengthen termux install network prerequisites 2026-05-07 13:04:08 -07:00
adybag14-cyber
54c0b10d14 fix(update): add heartbeat during dependency install 2026-05-07 13:04:08 -07:00
Abd0r
04193cf71c feat(web): add Brave Search (free tier) and DDGS search providers
Both implement WebSearchProvider via tools/web_providers/ — matching the
existing SearXNG pattern (PR #5c906d702). Search-only; pair with any
extract provider via web.extract_backend.

- tools/web_providers/brave_free.py — Brave Search API (free tier, 2k
  queries/mo). Uses BRAVE_SEARCH_API_KEY as X-Subscription-Token.
- tools/web_providers/ddgs.py — DuckDuckGo via the ddgs Python package.
  No API key; gated on package importability.
- tools/web_tools.py: both backends added to _get_backend() config list
  and auto-detect chain (trails paid providers), _is_backend_available,
  web_search_tool dispatch, web_extract_tool + web_crawl_tool search-only
  refusals, check_web_api_key, and the __main__ diagnostic. Introduces
  _ddgs_package_importable() helper so tests can monkeypatch a single
  symbol for the ddgs availability check.
- hermes_cli/tools_config.py: picker entries for both providers; ddgs
  gets a post_setup handler that runs `pip install ddgs`.
- hermes_cli/config.py: BRAVE_SEARCH_API_KEY in OPTIONAL_ENV_VARS.
- scripts/release.py: AUTHOR_MAP entry for @Abd0r.
- tests: 14 new tests (brave-free) + 15 new tests (ddgs) covering
  provider unit behavior, backend wiring, and search-only refusals.

Salvages the brave-free + ddgs portion of PR #19796. Not included: the
in-line helpers in web_tools.py (replaced with provider modules to match
the shipped architecture), the lynx-based extract path (these backends
should refuse extract with a clear error — users pair with a real
extract provider), and scripts/start-llama-server.sh (unrelated).

Co-authored-by: Abd0r <223003280+Abd0r@users.noreply.github.com>
2026-05-07 09:59:17 -07:00
xxxigm
cdc0a47dd5 test(hermes_constants): cover parse_reasoning_effort() 2026-05-07 09:59:07 -07:00
Teknium
7e2af0c2e8 feat(acp): pass image file attachments through as image_url parts
Extends PR #21400's resource inlining with image-specific handling: ACP
resource_link and embedded blob resources with an image/* mime (or image
file suffix when mime is missing) now emit an OpenAI image_url part
with a base64 data URL, so vision models actually see the image
instead of a [Binary file omitted] note. Non-image resources keep the
existing text-inlining behavior.

Adds 3 tests: local PNG via resource_link, JPEG mime inferred from
suffix when client omits mimeType, and embedded blob PNG.
2026-05-07 09:24:32 -07:00
HenkDz
733e297b8a fix(acp): inline file attachment resources 2026-05-07 09:24:32 -07:00
Teknium
2564132a1f
fix(telegram): preserve thread_id=1 for forum General typing indicator (#21390)
The May 5 refactor in d5357f816 made _message_thread_id_for_typing()
symmetric with _message_thread_id_for_send() by mapping the General
topic (thread id "1") to None upfront for both. That's correct for
sendMessage — Telegram rejects message_thread_id=1 on sends and the
topic must be omitted — but it's wrong for sendChatAction.

Observed behavior (confirmed via before/after Telegram wire traces):
  Before d5357f816: thread_id=1 → message_thread_id=1 → bubble visible in General
  After  d5357f816: thread_id=1 → message_thread_id=None → no visible typing

Omitting message_thread_id on sendChatAction does NOT fall back to
the General topic's view in a forum-enabled supergroup; the bubble
ends up hidden from the client's General-topic pane entirely. For
any user on a forum-group, the typing indicator stopped appearing.

Fix: drop the symmetric "1 → None" mapping from the typing resolver.
sendMessage still maps 1 → None via _message_thread_id_for_send (that
side was never broken). The asymmetry is real and required by
Telegram's API — document it in the resolver docstring.

Partial revert of d5357f816; restores the behavior from 0cf7d570e
("fix(telegram): restore typing indicator and thread routing for
forum General topic"). Does not re-introduce the retry-without-thread
fallback that 41545f7ec scoped down for DM topics — with the resolver
fixed, the first call already hits the right wire shape.

Test updated from test_send_typing_general_topic_uses_none_thread_id
(which encoded the broken contract) to
test_send_typing_preserves_general_topic_thread_id, asserting the
single correct call with message_thread_id=1. 10 other tests in the
file untouched and passing.
2026-05-07 08:39:21 -07:00
Teknium
812ce0b987
fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385)
When empty-response terminal scaffolding fires on a tool-result turn,
_drop_trailing_empty_response_scaffolding left the live history ending at
a bare 'tool' message. The next user input then landed as [...tool, user],
a protocol-invalid sequence that OpenRouter/Opus and other providers
silently fail on (returns empty content). That retriggered the empty-retry
recovery every turn, and recovery flags never hit SQLite (no column for
them), so history kept looking broken on every reload.

Two fixes:

1. Scaffolding strip rewinds the orphan assistant(tool_calls)+tool pair
   after popping sentinels. Only fires when scaffolding flags were
   actually present, so mid-iteration tool loops are untouched.

2. _repair_message_sequence runs right before every API call as a
   defensive belt: drops stray tool messages with unknown tool_call_ids,
   merges consecutive user messages so no user input is lost. Does NOT
   rewind assistant(tool_calls)+tool+user — that pattern is valid when
   the user redirected before the model got its continuation turn.

Repro: session 20260507_044111_fa7e65. Opus-4.7/OpenRouter returned
content-less response after a 42KB execute_code output, nudge+retry
chain exhausted (no fallback configured), terminal sentinel appended,
scaffolding stripped leaving bare tool tail, user typed 'wtf happened..'
and landed as tool→user violation. Every subsequent turn collapsed in
<50ms with the same 3-retry empty chain because the API request itself
was malformed.

Verified live via HTTP mock: pre-fix reproduced 5 api_calls/0.15s exit
'empty_response_exhausted'; post-fix 1 api_call/0.10s exit
'text_response(finish_reason=stop)'. Three-turn session flows cleanly
through the scenario. Full run_agent suite: 1242 passed (0 regressions,
2 pre-existing concurrent_interrupt failures unrelated).
2026-05-07 08:35:10 -07:00
Teknium
1d2029b2b7
fix(update): reset-failed before every fallback restart so the gateway can't get stranded (#21371)
cmd_update's auto-restart path could leave the gateway dead after a
transient failure in systemd's own auto-restart window.  Reproduced
on Ubuntu 25.10 + systemd 257: after update, gateway drains and exits 75,
systemd's first respawn 60s later fails (status=200/CHDIR with
"No such file or directory" on a WorkingDirectory that demonstrably
exists), the unit ends up in RestartMaxDelaySec=300 backoff, and
cmd_update's fallback 'systemctl restart' never recovers it — leaving
users with a permanently silent gateway until they manually run
'systemctl reset-failed'.

The fix mirrors the recovery pattern 'hermes gateway restart'
(systemd_restart) got in PR #20949: always reset-failed before
restart, on both the initial fallback and the retry.  Also rewrites
the final failure message to tell the user to reset-failed +
restart (not just restart, which is the step that already failed
twice).
2026-05-07 08:34:12 -07:00
Teknium
04918345ea
fix(cron): initialize MCP servers before constructing the cron AIAgent (#21354)
cron/scheduler.py:run_job() constructed AIAgent(...) without ever calling
discover_mcp_tools(). The CLI and gateway paths do this at startup; cron
jobs inherited none of it and the user's configured mcp_servers were
invisible inside every cron run.

Insert discover_mcp_tools() right before AIAgent(), wrapped in try/except
so a broken MCP server can't kill an otherwise-working cron job. The call
is idempotent: register_mcp_servers() short-circuits on already-connected
servers, so subsequent ticks in the same scheduler process pay ~0ms.
Scoped to the LLM path only; no_agent script jobs skip it entirely.

Closes #4219.
2026-05-07 07:53:03 -07:00
WideLee
4de3ef38b1 feat(qqbot): wire native tool-approval UX via inline keyboards
Makes the in-tree QQ inline keyboards actually light up when the agent
blocks on a dangerous-command approval. Matches the cross-adapter
gateway contract already implemented by Discord, Telegram, Slack,
Matrix, and Feishu.

Gateway/run.py's _approval_notify_sync checks type(adapter).send_exec_approval
and falls back to a text prompt when it's missing. Without this wiring,
QQ users stared at plain '/approve' text even though the adapter shipped
button primitives.

### send_exec_approval(chat_id, command, session_key, description, metadata)

Matches the signature the gateway calls with. Builds an ApprovalRequest
(command_preview, description, timeout) and delegates to send_approval_request.
Uses the last inbound msg_id as reply_to so QQ accepts the passive
message. The 'metadata' parameter is accepted for contract parity but
intentionally unused — QQ doesn't have thread_id/DM-targeting overrides.

### send_update_prompt(chat_id, prompt, default, session_key, metadata)

Signature updated to match the cross-adapter contract used by
'hermes update --gateway' watcher. Renders a 'Update Needs Your Input'
prompt with the optional default hint and a Yes/No keyboard. Replaces
the earlier 3-arg helper that wasn't wired anywhere.

### Default interaction dispatcher

_default_interaction_dispatch() auto-registered as the adapter's
interaction callback in __init__. Routes:

- approve:<session_key>:<decision> → tools.approval.resolve_gateway_approval
  Button → choice mapping:
    allow-once  → 'once'
    allow-always → 'always'
    deny        → 'deny'
  (QQ's 3-button mobile layout deliberately collapses 'session' + 'always'
  into one button; /approve session text fallback remains available.)
- update_prompt:<answer> → atomic write of y/n to ~/.hermes/.update_response
  (the detached 'hermes update --gateway' watcher polls this file)
- anything else → logged and dropped

Resolve exceptions are caught and logged — never propagate into the WS
loop. Callers can override via set_interaction_callback() to route
clicks elsewhere or pass None to drop them entirely.

### Net effect

QQ users now get native tap-to-approve UX on dangerous-command prompts
and update-confirmation prompts, without having to type /approve or /deny
as text. The adapter hooks into tools.approval the same way every other
button-capable platform does.

### Tests

14 new tests cover:
- Default callback installed on __init__
- send_exec_approval / send_update_prompt exist as class methods (so the
  gateway's type-probe detects them)
- allow-once/always/deny each map to the correct resolve choice
- update_prompt:y / update_prompt:n each write atomically to the response
  file (via monkeypatched get_hermes_home)
- Unknown button_data / empty button_data / resolve exceptions are harmless
- send_exec_approval honours last_msg_id reply-to and accepts metadata
- send_update_prompt delegates with correct content + keyboard

Full qqbot suite: 144 passed (72 pre-existing + 72 from this salvage arc).
Also ran tools/test_approval.py alongside — no regressions (276 passed
combined).

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:48:15 -07:00
Teknium
a1fe5f473d
fix(cron): scan assembled prompt including skill content (#3968) (#21350)
_scan_cron_prompt ran at cron create/update time on the user-supplied
prompt but skill content loaded inside _build_job_prompt at runtime
was never scanned. Combined with non-interactive auto-approval, a
malicious skill carrying an injection payload could execute with full
tool access every tick.

- cron/scheduler.py: new CronPromptInjectionBlocked exception and
  _scan_assembled_cron_prompt helper. _build_job_prompt now routes
  both return paths (with skills / without skills) through the helper,
  raising on match. run_job catches the exception and returns a clean
  (False, blocked_doc, "", error) tuple so the operator sees a BLOCKED
  delivery with the scanner result and an audit hint, rather than a
  scheduler crash or a silent skip.
- tests/cron/test_cron_prompt_injection_skill.py: 10 regression tests.
  Unit coverage on _scan_assembled_cron_prompt (clean/injection/exfil/
  invisible-unicode). End-to-end coverage via _build_job_prompt with
  planted skills (injection payload, env exfil, zero-width space,
  clean control, missing-skill-doesn't-crash). Fixture patches
  tools.skills_tool.SKILLS_DIR / HERMES_HOME so planted skills are
  visible. Importantly uses the current cron.scheduler module object
  (not a top-level import) so tests don't break when other fixtures
  reload cron.scheduler — CronPromptInjectionBlocked identity depends
  on which module object defined it.
2026-05-07 07:44:10 -07:00
maciekczech
162ad3dd16 fix(kanban): filter dashboard board by selected tenant 2026-05-07 07:39:57 -07:00
maciekczech
f4de3810ef test(kanban): cover dashboard select filter wiring 2026-05-07 07:39:57 -07:00
Teknium
74c9c0eec9
fix(mcp): gate utility stubs on server-advertised capabilities (#21347)
For every connected MCP server we register four "utility" tool schemas
(mcp_<server>_list_resources, read_resource, list_prompts, get_prompt).
The existing gate was `hasattr(server.session, method)` — but
`mcp.ClientSession` defines all four methods on the class regardless of
what the remote server supports, so the gate never filtered anything.
Tools-only servers (e.g. @upstash/context7-mcp which advertises only
`tools`) ended up with 4 dead stubs; every model call to them returned
JSON-RPC -32601 Method not found, which made the model conclude the
server was broken even when the real tools worked.

Capture the `InitializeResult` returned by `await session.initialize()`
on the `MCPServerTask`, then gate each utility schema on the
corresponding `capabilities` sub-object (resources / prompts). A
legacy `hasattr` fallback runs when `initialize_result` is missing
(older test fixtures / not-yet-captured code paths) so pre-existing
behavior is preserved.

Verified against real `mcp.types.InitializeResult` pydantic models:
- Context7 shape (tools only) → 0 utility stubs registered (was 4)
- Resources-only server → 2 stubs (list_resources, read_resource)
- Prompts-only server → 2 stubs (list_prompts, get_prompt)
- Fully capable server → all 4 stubs

Closes #18051.

Co-authored-by: nikolay-bratanov <nikolay-bratanov@users.noreply.github.com>
2026-05-07 07:39:50 -07:00
teknium1
898b6d7d55 fix(webhook): widen INSECURE_NO_AUTH loopback check + tests + docs
Follow-up to the previous commit:
- Add _is_loopback_host() helper covering 127.0.0.1, localhost, ::1,
  ip6-localhost, ip6-loopback (case-insensitive). Empty/None host is
  treated as non-loopback since unset usually means public default bind.
- Fix mixed-indent comment in the safety rail (comment now aligned with
  the if-block) and collapse the nested-if into one condition.
- Add TestInsecureNoAuthSafetyRail covering rejection on 0.0.0.0, a LAN
  IP, and empty host; allowance on 127.0.0.1/localhost; plus unit-level
  parametrized coverage of _is_loopback_host for spellings we can't bind
  in the hermetic test env (::1, ip6-localhost, ip6-loopback).
- Pin test_connect_starts_server + test_webhook_deliver_only defaults
  to 127.0.0.1 so they keep passing under the new rail.
- Document the behavior in website/docs/user-guide/messaging/webhooks.md.
2026-05-07 07:38:43 -07:00