hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-03 12:23:08 +00:00

Author	SHA1	Message	Date
Austin Pickett	fc3fd6bb6b	fix(dashboard): UI polish — modals, layout, consistency, test fixes Dashboard UX polish pass — consolidates create forms into modals triggered from the page header, fixes layout inconsistencies, adds scroll-to navigation for the Keys page, and aligns the TokenBar with the design system. Changes: - App.tsx: add padding to sidebar header - resolve-page-title.ts: add missing routes, better fallback title - en.ts: fix nav labels (Profiles was 'profiles : multi agents') - ModelsPage: two-col layout, auxiliary tasks modal, TokenBar redesign - ProfilesPage: create button in header, form in modal, Checkbox component - CronPage: create button in header, form in modal - EnvPage: scroll-to sub-nav in header, fix text overflow Modal and dialog standardization: - Replace all native confirm()/window.confirm() with ConfirmDialog (OAuthProvidersCard, PluginsPage, ModelsPage, ConfigPage) - Add useModalBehavior hook (Escape-to-close, scroll lock, focus restore) - Apply hook to ProfilesPage, CronPage, AuxiliaryTasksModal Component fixes (from PR review): - Checkbox: fix controlled/uncontrolled mismatch, add focus-visible ring - TokenBar: add rounded-full to legend dots, remove dead code CI/test fixes: - Fix TS unused imports (noUnusedLocals), type-narrow PickerTarget union - Add windows-footgun suppression on platform-guarded os.killpg - Fix 19 stale unit tests + 9 e2e tests broken by recent main changes - Restore minimal example-dashboard plugin for plugin auth test	2026-05-12 13:59:22 -04:00
Teknium	c1eb2dcda7	feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220 ) * feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback Three coordinated mitigations for the Mini Shai-Hulud worm hitting mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package compromise that follows. # What this PR makes true 1. Users with the poisoned mistralai 2.4.6 in their venv get a loud detection banner with copy-pasteable remediation steps the moment they run hermes (and on every gateway startup). 2. One quarantined / yanked PyPI package can no longer silently demote a fresh install to 'core only' — the installer keeps every other extra and tells the user which tier landed. 3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can lazy-install on first use under a strict allowlist, instead of eagerly pulling everything at install time. # Detection: hermes_cli/security_advisories.py - ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for mistralai==2.4.6). Adding the next one is a single dataclass. - detect_compromised() uses importlib.metadata.version() — no pip dependency, works in uv venvs that lack pip. - Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits the startup banner to once per 24h per advisory. - Acks persisted to security.acked_advisories in config.yaml; never re-banner after ack. - Wired into: * hermes doctor — runs first, prints full remediation block * hermes doctor --ack <id> — dismisses an advisory * cli.py interactive run() and single-query branches — short stderr banner pointing at hermes doctor * gateway/run.py startup — operator-visible warning in gateway.log # Lazy-install framework: tools/lazy_deps.py - LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs, memory.honcho, provider.bedrock, etc.) to pip specs. - ensure(feature) installs missing deps in the active venv via the uv → pip → ensurepip ladder (matches tools_config._pip_install). - Strict spec safety regex rejects URLs, file paths, shell metas, pip flag injection, control chars — only PyPI-by-name accepted. - Gated on security.allow_lazy_installs (default true) plus the HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs. - Migrated three backends as proof of pattern: * tools/tts_tool.py — _import_elevenlabs() calls ensure first * plugins/memory/honcho/client.py — get_honcho_client lazy-installs * tts.mistral / stt.mistral entries pre-registered for when PyPI restores mistralai # Installer fallback tiers scripts/install.sh, scripts/install.ps1, setup-hermes.sh: - Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one array when a transitive breaks; users keep every other extra. - New 'all minus known-broken' tier between [all] and the existing PyPI-only-extras tier. Only kicks in when [all] fails resolve. - All three tiers explicit: every fallback announces which tier landed and prints a re-run hint when not on Tier 1. - install.ps1 and install.sh both regenerate their tier specs from the same _BROKEN_EXTRAS array so updates stay in sync. Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral' in its extra list — bug fixed by the refactor (mistral is filtered out). # Config hermes_cli/config.py — DEFAULT_CONFIG.security gains: - acked_advisories: [] (advisory IDs the user has dismissed) - allow_lazy_installs: True (security gate for ensure()) No config version bump needed — both keys nest under existing security: block, and load_config's deep-merge picks up DEFAULT_CONFIG defaults for users with older configs. # Tests tests/hermes_cli/test_security_advisories.py — 23 tests covering: - detect_compromised matches/non-matches, wildcard frozenset - ack persistence, idempotence, blank rejection, config-failure path - banner cache rate limiting + 24h re-banner + ack-stops-banner - short_banner_lines / full_remediation_text / render_doctor_section / gateway_log_message - shipped catalog well-formedness invariant tests/tools/test_lazy_deps.py — 40 tests covering: - spec safety: 11 safe parametrized + 18 unsafe parametrized - allowlist: unknown-feature rejection, namespace.name shape, every shipped spec passes the safety regex - security gating: config flag, env var, default, fail-open - ensure() happy/sad paths: already-satisfied, install success, pip stderr surfaced on failure, install-succeeds-but-still-missing - is_available, feature_install_command Combined: 63 new tests, all passing under scripts/run_tests.sh. # Validation - scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py tests/tools/test_lazy_deps.py → 63/63 passing - scripts/run_tests.sh tests/hermes_cli/test_doctor.py tests/hermes_cli/test_doctor_command_install.py tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing - scripts/run_tests.sh tests/hermes_cli/ tests/tools/ → 9191 passed, 8 pre-existing failures (verified on origin/main before this change) - bash -n on install.sh and setup-hermes.sh → OK - py_compile on all modified .py files → OK - End-to-end smoke test of detect_compromised + render_doctor_section + gateway_log_message with mocked installed version → produces copy-pasteable remediation output # Community Full advisory + remediation steps: website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md Short-form post drafts (Discord, GitHub pinned issue, README banner): scripts/community-announcement-shai-hulud.md Refs: PR #24205 (mistral disabled), Socket Security advisory <https://socket.dev/blog/mini-shai-hulud-worm-pypi> * build(deps): pin every direct dep to ==X.Y.Z (no ranges) Companion to the supply-chain advisory work: replace every >=/</~= range in pyproject.toml's [project.dependencies] and [project.optional-dependencies] with an exact ==X.Y.Z pin sourced from uv.lock. Why: ranges allow PyPI to ship a fresh version of any direct dep at any time without a code review on our side. With ranges, the malicious mistralai 2.4.6 release would have been pulled by every fresh 'pip install -e .[all]' for the hours between upload and PyPI's quarantine — exactly the install window we got hit on. Exact pins close that window: the only way a new package version reaches a user is via an intentional update on our end. What the user-facing change is: nothing, behavior-wise. Every package resolves to the same version it was already resolving to via uv.lock — the pins just remove the resolver's freedom to pick a different one. Cost: any user installing Hermes alongside another package that requires a newer pin gets a resolver conflict. Acceptable for our isolated-venv install path; documented in the new comment block. Build-system requires line (setuptools>=61.0) is intentionally left as a range — pinning the build backend would block fresh pip from bootstrapping the build on architectures where that exact wheel isn't available. mistral extra (mistralai==2.3.0) is pinned but stays out of [all] (per PR #24205). 'uv lock' regeneration will fail until PyPI restores mistralai; lockfile regeneration is gated behind that, NOT on every PR. LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy- install pathway can never resolve a different version than the one declared in pyproject.toml. Validation: - Cross-checked all 77 pinned direct deps in pyproject.toml against uv.lock — every pin matches the resolved version exactly. - Cross-checked all LAZY_DEPS specs against uv.lock — same. - 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly. - tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py → 63/63 passing (every shipped spec passes the safety regex). - Doctor + TTS + transcription targeted suite → 146/146 passing. * build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra You asked: 'what about the dependencies the dependencies rely on?' — correctly noting that exact-pinning direct deps in pyproject.toml does NOT cover the transitive graph. `pip install` and `uv pip install` both re-resolve transitives fresh from PyPI at install time, so a compromised transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would still hit our users even with every direct dep exact-pinned. # What this commit fixes 1. Both real installer scripts now prefer `uv sync --locked` as Tier 0. uv.lock records SHA256 hashes for every transitive — a compromised package with a different hash gets REJECTED. Falls through to the existing `uv pip install` cascade if the lockfile is missing or stale, with a loud warning that the fallback path does NOT hash-verify transitives. Previously only `setup-hermes.sh` (the dev path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1` (the paths fresh users actually run) skipped it. 2. Removed the `[mistral]` extra entirely. The `mistralai` PyPI project is fully quarantined right now — every version returns 404, so any pin we wrote was unresolvable, which broke `uv lock --check` in CI. Restoration is documented in pyproject.toml as a 5-step checklist (verify, re-add extra, re-enable in 4 modules, regenerate lock, optionally re-add to [all]). 3. Regenerated uv.lock. 262 packages, mistralai/eval-type-backport/ jsonpath-python pruned. `uv lock --check` now passes. # Defense-in-depth view \| Layer \| Where \| Protects against \| \|----------------------------\|-------------------\|-------------------------------------------\| \| Exact pins in pyproject \| direct deps \| new mistralai 2.4.6-style direct compromise \| \| uv.lock + `--locked` install \| transitive graph \| transitive worm injection \| \| Tier-0 hash-verified path \| install.sh / .ps1 \| actually USE the lockfile in fresh installs \| \| `uv lock --check` CI gate \| every PR \| drift between pyproject and lockfile \| \| `hermes_cli/security_advisories.py` \| runtime \| cleanup for users who already got hit \| The exact pinning + hash verification together close the supply-chain gap. Without the lockfile path, exact pins alone are theater. # Validation - `uv lock --check` → passes (262 packages resolved, no drift). - `bash -n` on install.sh + setup-hermes.sh → OK. - 209/209 tests passing across new + adjacent test files (test_lazy_deps.py, test_security_advisories.py, test_doctor.py, test_tts_mistral.py, test_transcription_tools.py). - TOML parse OK. * chore: remove community announcement drafts (PR body covers it) * build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard) Extends the lazy-install framework to cover everything that's not used by every hermes session. Base install drops from ~60 packages to 45. Moved out of core dependencies = []: - anthropic (only when provider=anthropic native, not via aggregators) - exa-py, firecrawl-py, parallel-web (search backends; only when picked) - fal-client (image gen; only when picked) - edge-tts (default TTS but still optional) New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web] [fal] [edge-tts]. All added to [all]. New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel}, tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix}, terminal.{modal,daytona,vercel}, tool.dashboard. Each import site now calls ensure() before importing the SDK. Where the module had a top-level try/except (telegram, discord, fastapi), the graceful-fallback pattern was extended to lazy-install on first check_*_requirements() call and re-bind module globals. Updated test_windows_native_support.py tzdata check from snapshot (>=2023.3 literal) to invariant (any version + win32 marker). Validation: - Base install: 45 packages (was ~60); 6 newly-extracted packages absent - uv lock --check: passes (262 packages, no drift) - 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing - py_compile clean on all 12 modified modules	2026-05-12 01:02:25 -07:00
Teknium	99ad2d1372	fix(deps): unbreak [all] install — drop mistralai while PyPI quarantined (#24205 ) The `mistralai` PyPI package was quarantined on 2026-05-12 after a malicious 2.4.6 release. Every fresh resolve (AUR makepkg, Docker build, CI run, install.sh first-run) currently fails on `mistralai>=2.3.0,<3` because PyPI returns zero candidates. Existing users running `hermes update` mostly didn't notice — `hermes update` falls back from `.[all]` to per-extra retries and silently skips mistral with a warning that scrolls past. But fresh installs hard-fail or lose every other extra. Changes: - pyproject.toml: drop `hermes-agent[mistral]` from `[all]` and `[termux-all]`. The `mistral` extra itself is preserved so users can opt back in once PyPI un-quarantines. - hermes_cli/tools_config.py: hide Mistral Voxtral TTS from the `hermes tools` provider picker until restored. - hermes_cli/web_server.py: drop "mistral" from dashboard STT options. - tools/transcription_tools.py: explicit `provider: mistral` returns "none" with a clear status message; auto-detect skips mistral. - tools/tts_tool.py: dispatcher returns a clear "temporarily disabled" error before any SDK import attempt (avoids cached-stale-package surprises). - tests/tools/: update three test files to assert the new disabled behavior. Each test docstring records why and points at the rollback trigger (PyPI un-quarantines mistralai). Restore plan: revert this commit once the package is available on PyPI again. The behavior change is intentional and documented in code comments + test docstrings to make the rollback trivial. Validation: - scripts/run_tests.sh tests/tools/ -k 'mistral or stt or tts' → 425/425 passing. Refs: https://pypi.org/simple/mistralai/ (currently "pypi:project-status: quarantined").	2026-05-11 23:02:15 -07:00
Siddharth Balyan	271883447e	feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847 ) Set HERMES_SESSION_ID using the existing session_context.py ContextVar system for concurrency safety (multiple gateway sessions in one process won't cross-talk). Also writes os.environ as fallback for CLI mode. Touchpoints: - gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry - run_agent.py: Set both ContextVar and os.environ at init and on context-compression rotation - tools/environments/local.py: Bridge ContextVars into subprocess env in _make_run_env() (ContextVars don't propagate to child processes) - tests/run_agent/test_session_id_env.py: 3 tests covering env, provided ID, and ContextVar paths execute_code subprocess already passes HERMES_* prefixed vars through _scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_'). Primary use case: webhook-triggered agents that need to include a `--resume <session_id>` takeover command in their output.	2026-05-12 00:16:45 +05:30
kshitij	ce0f529cde	chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940 ) C401: set(x for x in y) -> {x for x in y} (set comprehension) C416: [(k,v) for k,v in d] -> list(d.items()) (unnecessary listcomp) C408: tuple()/dict() -> ()/{} (unnecessary collection call) PLR1722: exit() -> sys.exit() (adds import sys where needed) 21 instances fixed, 0 remaining. 19 files, +40/-36.	2026-05-11 11:20:58 -07:00
kshitij	2ec8d2b42f	chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937 ) Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero).	2026-05-11 11:13:25 -07:00
kshitij	657874460f	chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926 ) PLR5501 (collapsible-else-if): 28 instances — else: if: → elif: PLR1730 (if-stmt-min-max): 15 instances — if x<y: x=y → x=max(x,y) C420 (dict.fromkeys): 2 instances — dictcomp → dict.fromkeys PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix) C414 (unnecessary-list): 1 instance — sorted(list(x)) → sorted(x) 28 files, -44 net lines. All mechanical, zero logic changes. 17,211 tests pass, zero regressions.	2026-05-11 11:03:29 -07:00
fr33d3m0n	976d8e27ad	fix(approval): catch sudo with stdin/askpass/shell privilege flags Adds the only #17873 category not covered by the in-flight PRs #17962 (briandevans, reverse shell + download-execute) and #7993 (SHL0MS, credential reads + curl/wget exfiltration): sudo invocations that an LLM-driven agent can drive without TTY interaction. The agent has no TTY, so the sudo forms that succeed without human involvement are those reading the password from stdin (`-S` / `--stdin`) or via an askpass helper (`-A` / `--askpass`). The shell-launch (`-s`) and list-privileges (`-a`) flags are also gated since they are privilege-relevant invocations the agent can chain after acquiring the password (e.g. read SUDO_PASSWORD from .env -> sudo -S -s -> root shell). Plain `sudo cmd` (no flag) is TTY-bound and excluded. Two patterns: 1. Direct flag: `\bsudo\b[^;\|&\n]?\s+(?:-s\b\|--stdin\b\|-a\b\|--askpass\b)` The lazy `[^;\|&\n]?` consumes flag-arguments without spanning command separators, so `sudo -u root -S whoami` matches (a textbook offensive form that a strict `(?:\s+-[^\s]+)` "leading flags only" pattern would have missed because `root` is a flag-value not a flag). 2. Combined short flags: `\bsudo\b[^;\|&\n]?\s+-[a-z][sa][a-z]\b` Catches packed forms like `sudo -nS id` where multiple flags share a single `-X` token. `_normalize_command_for_detection` lowercases input before pattern matching (tools/approval.py:340), so case variants of S/s and A/a collapse — both letter-pairs are gated since each is a privilege- relevant invocation. Tests: 21 new cases in TestDetectSudoStdin (12 positive covering all flag-order permutations including herestring source and printf-piped forms; 9 negative including TTY-bound `sudo whoami`, interactive `sudo -i`, env-var reference `$SUDO_USER`, doc lookup `man sudo`, package install, and the `pseudosudo` word-boundary edge case). Empirical coverage: 11/11 attacks matched, 0/10 false positives. Refs: #17873 category 4. Adjacent: #17962 (reverse shell + download- execute), #7993 (credential reads + curl/wget exfiltration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 06:56:30 -07:00
OpenClaw Agent	9520a1ccdf	fix(terminal): block sudo -S password guessing when SUDO_PASSWORD is not set Fixes #9590: Block explicit sudo -S (stdin password mode) commands when the SUDO_PASSWORD environment variable is not configured. The attack vector: the LLM constructs 'echo guessedpass \| sudo -S cmd' to brute-force sudo passwords, iterates based on sudo's error output ('Sorry, try again'). The existing _transform_sudo_command only injects -S when SUDO_PASSWORD exists; without it, the LLM's explicit sudo -S must be treated as a guessing attempt. Changes: - Add _check_sudo_stdin_guard() in approval.py: detects sudo -S when SUDO_PASSWORD is absent, anchored to command-start positions (^ ; && \|\| \| etc.) to avoid false positives on literal text - Integrate into check_all_command_guards() above yolo/mode=off so the block is unconditional (like the hardline floor) - Add 6 tests covering: detection, allow-list, SUDO_PASSWORD bypass, integration with check_all_command_guards, yolo non-bypass, container backend bypass	2026-05-11 06:56:30 -07:00
Dominikh	8ac998cb0c	fix(send_message): allow kanban workers to call send_message The kanban dispatcher sets HERMES_KANBAN_TASK on every spawned worker but launches it with the assignee profile's HERMES_HOME (e.g. ~/.hermes/profiles/<name>/), which has no gateway.pid file. The existing _check_send_message therefore returned False from the is_gateway_running() fallback, even though the parent gateway is alive and reachable. Net effect: workers could call kanban_* tools (gated on HERMES_KANBAN_TASK in _check_kanban_mode) but not send_message. This breaks the natural pattern of "worker does the job, calls send_message to deliver rich content to the originating chat, then calls kanban_complete with a one-line summary" because the kanban notifier's payload_summary is hard-truncated to the first line (~200 chars) at gateway/run.py:3963 — anything richer has to ship via send_message. Honoring HERMES_KANBAN_TASK in _check_send_message — symmetric with _check_kanban_mode in kanban_tools.py:42 — closes the gap. No new state, no new env var, no profile-config changes required.	2026-05-11 06:44:58 -07:00
Mibayy	ebf2ea584a	feat(terminal,cli): docker_extra_args + display.timestamps Two independent opt-in QoL toggles, both off by default. terminal.docker_extra_args: - List of extra flags appended verbatim to docker run after security defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or other docker run options not exposed by existing config keys. - Non-string entries are logged and skipped. - Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var. display.timestamps: - Appends [HH:MM] to user input bullet and the assistant response box header. Single hub in _format_submitted_user_message_preview() covers both single-line and multi-line user previews; assistant response label gets the timestamp at box-open time. Closes #1569 (timestamps). Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>	2026-05-10 22:43:39 -07:00
konsisumer	62cfe79e93	fix(tools): clarify kanban_complete phantom-card retry guidance When kanban_complete rejects a created_cards list as hallucinated, the task is intentionally left in-flight (the gate runs before the write txn) so the worker can retry with a corrected list or pass created_cards=[] to skip the check. The retry path already worked, but the previous error wording read like a terminal failure and workers were observed abandoning the run instead of trying again. Spell out the recovery path explicitly in the tool_error response ("Your task is still in-flight ... Retry kanban_complete with ...") and add regression coverage at both the kernel and tool layers so the retry contract — and the wording the worker depends on to discover it — is pinned. Fixes #22923	2026-05-10 16:14:43 -07:00
LeonSGP43	673418dfa1	fix(kanban): reject toolset names in task skills	2026-05-10 08:41:28 -07:00
Teknium	d4b26df897	perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226 ) Adds CDPSupervisor.evaluate_runtime() and wires it into _browser_eval as a fast path when a supervisor is alive for the current task_id. Replaces the ~180ms agent-browser subprocess fork+exec+Node-startup hop with a ~1ms Runtime.evaluate over the supervisor's already-connected WebSocket. Falls through to the existing agent-browser CLI path when no supervisor is running (e.g. backends without CDP, or before the first browser_navigate attaches one), so behaviour is unchanged where it can't apply. JS-side exceptions surface directly without falling through to the subprocess (the subprocess would just re-raise the same error, slower); supervisor-side failures (loop down, no session) fall through cleanly. Benchmark — 30 iterations of `1 + 1` against headless Chrome: supervisor WS mean= 0.96ms median= 0.91ms agent-browser subprocess mean=179.35ms median=167.73ms → 187x speedup mean Tests: 14 unit tests (mocked supervisor + response-shape coverage), 5 real-Chrome e2e tests in test_browser_supervisor.py (gated on Chrome being installed). Browser test suite: 355 passed, 1 skipped.	2026-05-10 07:37:55 -07:00
Teknium	2704e7b67e	fix(kanban): restrict board routing tools to orchestrators Adapted from PR #20568 commit `ce3518578` (Eric Litovsky / @kallidean). Adds two-tier gating for the kanban tool surface so dispatcher-spawned workers see only task-lifecycle tools (show/complete/block/heartbeat/ comment/create/link) while orchestrator profiles with `toolsets: [kanban]` also see board-routing tools (kanban_list, kanban_unblock). Workers shouldn't be enumerating or unblocking the board — they should close their own task via the lifecycle tools. Hiding board-routing tools from worker schemas keeps the worker focused and the toolset-isolation contract honest. Plus inherited from the same upstream commit: - 50/200 row bound on kanban_list with `truncated` + `next_limit` metadata. - Belt-and-suspenders runtime guard `_require_orchestrator_tool()` inside the orchestrator handlers in case a stale registration ever routes a worker to one of them. - Tests for the new gate, the stricter bound, and the fact that even a worker with `toolsets: [kanban]` in config still doesn't see board routing. Co-authored-by: Eric Litovsky <elitovsky@zenproject.net>	2026-05-10 05:58:44 -07:00
Eric Litovsky	50d281495e	fix(kanban): parse triage flag explicitly	2026-05-10 05:58:44 -07:00
Eric Litovsky	26bf45f8c5	fix(kanban): parse include_archived explicitly	2026-05-10 05:58:44 -07:00
Eric Litovsky	236cbe16b6	feat(kanban): add orchestrator board tools	2026-05-10 05:58:44 -07:00
Teknium	3800972dd0	feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 ) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).	2026-05-09 21:06:19 -07:00
Teknium	08ec602770	fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913 ) Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB (32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach in _write_to_sandbox put the entire tool result inside the 'bash -c' arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument list too long' before the heredoc ever ran. The caller logged a warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the warning never reached agent.log either, and the agent saw a 1.5 KB preview tagged 'Full output could not be saved to sandbox'. Hits delegate_task with 3+ subagent outputs routinely now. Switch to passing content via env.execute(stdin_data=...). cmd is now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight payload travels through stdin where there is no argv-element limit. E2E reproduced the user's exact 144,778-char delegate_task envelope: old code OSError'd, new code round-trips cleanly to disk with all three task summaries intact.	2026-05-09 18:44:58 -07:00
Wesley Simplicio	116a1446a4	fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV Problem: terminal.docker_env set in config.yaml was silently ignored. Docker containers never received the user-specified env vars. Root cause: docker_env was missing from all three config→env bridging maps (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync) and from the terminal_tool _get_env_config() reader. _create_environment() consumed the key from container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV was never set. Also extend the list-serialisation branches in cli.py and gateway/run.py to handle dict values via json.dumps (lists already used json.dumps; plain str() on a dict produces undecodable output). Fix: - cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings; serialise dict values with json.dumps alongside existing list path - gateway/run.py: same additions to _terminal_env_map and serialisation - hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV" to _config_to_env_sync so `hermes config set terminal.docker_env …` persists to .env correctly - tools/terminal_tool.py: add docker_env key to _get_env_config() reading TERMINAL_DOCKER_ENV via _parse_env_var with default "{}" Tests: add test_docker_env_is_bridged_everywhere to tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on origin/main, passes with fix. Fixes #20537	2026-05-09 17:53:35 -07:00
Wesley Simplicio	53ec32819c	fix(process_registry): kill orphaned Popen on post-spawn setup failure After Popen succeeds with os.setsid (detached process group), 5 things happen with no try/except: Thread construction, reader.start(), lock acquisition, prune+register, checkpoint write. If any raises, the Popen object goes unregistered and the detached process group leaks indefinitely. Wrap the post-spawn setup in try/except. On failure: - os.killpg(getpgid(pid), SIGKILL) takes down the entire process group (not just the shell - important because of detached PG + -lic shell wrapper that may have spawned children) - proc.kill() fallback for ProcessLookupError/PermissionError/OSError - proc.wait(timeout=5) reaps with a bound - re-raise to preserve original traceback Nested try/except around cleanup so a secondary failure can't mask the original. Closes #2749.	2026-05-09 17:53:24 -07:00
Wesley Simplicio	2245879af0	fix(checkpoint): guard _touch_project against non-dict project metadata Problem ======= `tools.checkpoint_manager._touch_project` reads the project metadata file with `json.loads(meta_path.read_text(...))`, then immediately does: meta["workdir"] = str(_normalize_path(working_dir)) The `except` block only catches `(OSError, ValueError)`. When the file parses successfully but returns a non-dict value (a list `[]`, `null`, or a scalar from a corrupted or hand-truncated write), `json.loads` succeeds without error and `meta` is set to, e.g., `[]`. The subsequent subscript assignment then raises `TypeError: list indices must be integers or slices, not str`, which is NOT caught by the narrow except clause. This TypeError propagates up through `_take` to `ensure_checkpoint`, where the broad `except Exception` safety net swallows it. The effect is that `ensure_checkpoint` silently returns False for the entire session — all checkpoints are skipped for the affected working directory without any user-visible error. Root cause ========== Missing `isinstance(meta, dict)` guard after `json.loads`, identical in pattern to bugs fixed in `cron/jobs.py` (#22569) and `tools/process_registry.py` (#22544). The same guard is already present one function below in `_list_projects` (line 506), but was inadvertently omitted in `_touch_project`. Fix === Add two lines after the try/except: ```python if not isinstance(meta, dict): meta = {} ``` This matches the existing guard in `_list_projects` and ensures a fresh empty dict is used whenever the persisted value is not a mapping — preserving the `created_at` semantics via `setdefault` on the next line. Tests ===== `TestTouchProjectMalformedMeta` covers four non-dict root values (`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file, calls `_touch_project`, and asserts: (a) no exception raised, (b) the metadata file is rewritten as a valid dict containing `last_touch` and `workdir`. All four fail on main with `TypeError`, pass with fix. Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed.	2026-05-09 17:53:13 -07:00
heathley	0c5c4d1b8d	fix(skills-hub): cover remaining SSRF fetch paths after #10029	2026-05-09 17:52:12 -07:00
Teknium	c1cc3d4ea6	perf(image_gen): defer fal_client import to first generation request (#22859 ) `tools/image_generation_tool.py` did `import fal_client` at module top, which pulled the entire fal_client + httpx + rich stack on every process that ran `discover_builtin_tools()` — every `hermes` cold start, even ones that never touch image generation. Make the import lazy: replace the eager import with a placeholder (`fal_client: Any = None`) and add an idempotent `_load_fal_client()` that rebinds the module global on first use. Call it from the two runtime entry points (`_ManagedFalSyncClient.__init__` and `_submit_fal_request`) and from the SDK-presence check in `check_image_generation_requirements`. The loader short-circuits if the global is already truthy, which preserves the test pattern of monkeypatching `fal_client` to install a mock — the `monkeypatch.setattr(image_tool, "fal_client", ...)` calls in test_image_generation.py keep working unchanged. Measured impact (15-run min times, 9950X3D): tools.image_generation_tool alone: 77 → 20 ms (-74%) 36 → 20 MB (-44%) import cli (full): 734 → 720 ms (-2%) import model_tools: 372 → 366 ms (-2%) The microbench is dramatic but the full-CLI win is small — fal_client shares its httpx + rich dependencies with the rest of the agent, so on a real cold start most of the 16 MB / 64 ms is already paid by other imports. The win matters mostly for processes that touch this tool without otherwise loading httpx (rare) and for architectural consistency with the previous lazy-load PRs (#22681 google_chat, #22831 teams). Tests: 55/55 `tests/tools/test_image_generation.py` pass, including the cases that monkeypatch the module global to install a mock fal_client. End-to-end verification confirms `import model_tools` no longer pulls `fal_client` into `sys.modules`.	2026-05-09 17:45:09 -07:00
Teknium	c7f0aab949	feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838 ) Pick openrouter/pareto-code as your model and OpenRouter auto-routes each request to the cheapest model meeting your coding-quality bar (ranked by Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0, default 0.65) tunes the floor. - hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so it shows up in the picker with a description - hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands on a mid-tier coder on the current Pareto frontier) - plugins/model-providers/openrouter: emit extra_body.plugins = [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code AND the score is a valid float in [0.0, 1.0] - agent/transports/chat_completions.py: same emission on the legacy flag path (when no provider profile is loaded) - run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into both build_kwargs() invocations and the context-summary extra_body path - cli.py: read openrouter.min_coding_score once at init, validate float in [0,1], pass to AIAgent constructions (CLI + background-task paths) - cron/scheduler.py, batch_runner.py, tools/delegate_tool.py, tui_gateway/server.py: propagate the kwarg (mirrors providers_order plumbing — subagents inherit, cron/batch read from config) - tests: profile-level + transport-level coverage of the model gating, unset/empty/out-of-range handling, and the legacy flag path - docs: new 'OpenRouter Pareto Code Router' section in providers.md Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a mid-tier coder, at omission we get the strongest. Score is silently dropped on any model other than openrouter/pareto-code, so it's safe to leave set.	2026-05-09 14:47:00 -07:00
Henkey	b349ae1e4c	fix(acp): honor task cwd for foreground terminal commands	2026-05-09 14:46:34 -07:00
HenkDz	840ebe063e	fix: make session search initialize session db	2026-05-09 14:36:58 -07:00
Wesley Simplicio	ca13993217	fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680 ) acp_command / acp_args descriptions previously primed the model to populate them — "Per-task ACP command override (e.g. 'copilot')" — even when no ACP CLI was installed. Models with weaker schema-following discipline would set them and the spawn would fail. Add explicit "Do NOT set unless the user has explicitly told you" guidance at both the top-level acp_command and the per-task override. Strengthen acp_args to mention it's empty unless acp_command is set. Adds 2 tests pinning the descriptions. Note: this is a cosmetic prompt-engineering fix — the params remain exposed in the schema. The fully-correct fix is to gate them behind a config flag or runtime ACP-CLI detection so the schema only emits them when an ACP harness is available. Tracked as a follow-up; this PR ships the low-cost stopgap. Salvage of #22680 (delegate schema only). The original PR also bundled unrelated fixes for #22548, #21944, #22150 — those need separate PRs since #22548 and #21944 are already addressed on main (#22780 + #22798 in flight) and #22150 deserves its own review. Closes #22013.	2026-05-09 13:37:30 -07:00
Wesley Simplicio	48bf0ea249	fix(browser_tool): fall through to autodetect on config read failure	2026-05-09 13:35:39 -07:00
Wesley Simplicio	3170c8d448	fix(browser_tool): do not cache transient None cloud provider resolution Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True` before resolution. If credentials were briefly unavailable on the first call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned the entire process to local mode forever, even after credentials self-healed seconds later. Root cause: bookkeeping was set up-front, so any code path that fell through to `return _cached_cloud_provider` (config read failure, no credentials yet, explicit-provider instantiation failure) committed the transient `None` to the cache permanently. Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now set only when (a) the user explicitly chose `cloud_provider: local`, or (b) a provider was successfully resolved. All transient `None` paths return without poisoning the cache, so the next call retries. Explicit provider instantiation failures now log at warning level with stack trace so operators can diagnose them. Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py covering explicit local, successful resolution, no-credentials-yet, config read failure, and explicit provider instantiation failure. Stash-verify confirmed the 3 transient-None tests fail without the fix. All 320 existing browser tests still green. Closes #22324	2026-05-09 13:35:39 -07:00
Teknium	8f711f79a4	fix(tools): install cua-driver when Computer Use is enabled via 'hermes tools' (#22765 ) Returning users who enabled '🖱️ Computer Use (macOS)' via 'hermes tools' saw '✓ Saved configuration' but no install — cua-driver was never on PATH and the toolset failed at first use. Two compounding causes: 1. _toolset_needs_configuration_prompt fell through to _toolset_has_keys, which returned True for any provider with empty env_vars. cua-driver has no env vars, so the gate skipped _configure_toolset entirely and _run_post_setup('cua_driver') never ran. 2. No stable CLI entry-point existed for re-running the install when the picker no-op'd it (e.g. when toggling the toolset off+on inside one picker session, where 'added' is empty). Changes: - hermes_cli/tools_config.py: add _POST_SETUP_INSTALLED registry mapping post_setup keys to installed-state predicates. The gate now returns True when any visible provider has a registered post_setup whose predicate fails. cua_driver is the only opt-in for now; other post_setup hooks keep their existing behaviour. - hermes_cli/main.py: add 'hermes computer-use install' and 'hermes computer-use status' as a stable docs target. install reuses the same _run_post_setup('cua_driver') path that the picker invokes; status reports whether cua-driver is on PATH. - tools/computer_use/cua_backend.py: install hint now points users at 'hermes computer-use install' first. - website/docs/user-guide/features/computer-use.md: document the new command as the primary install path. - website/docs/reference/cli-commands.md: catalog 'hermes computer-use' alongside 'hermes tools'. - tests/hermes_cli/test_post_setup_gating.py: regression coverage for the gate predicate (missing -> setup forced, installed -> setup skipped, broken predicate -> non-blocking, unregistered keys -> behaviour unchanged). Fixes #22737. Reported by @f-trycua.	2026-05-09 13:02:25 -07:00
Teknium	b6ff96c057	fix(cron): allow quoted URL in github auth-header allowlist The github-pr-workflow skill wraps the URL in double-quotes ('curl -H ... "https://api.github.com/..."'), which the original allowlist regex (\s+https://api...) did not match. Without this, the bundled github-pr-workflow skill is still blocked at every cron tick despite #22605's fix landing for the bare-URL form. Make the leading quote optional and add a regression test pinning both single- and double-quoted forms.	2026-05-09 11:11:45 -07:00
qWaitCrypto	691778a08b	fix(cron): keep auth-header exfiltration blocked	2026-05-09 11:11:45 -07:00
qWaitCrypto	783d11717a	fix(cron): avoid github skill false positives in scanner	2026-05-09 11:11:45 -07:00
Teknium	1f4200debf	feat(delegate): show user's actual concurrency / spawn-depth limits in tool description (#22694 ) The delegate_task tool description hardcoded 'default 3' / 'default 2' for max_concurrent_children / max_spawn_depth, which misled the model on any install that raised these limits — the schema text said 'default 3' even when the user had set max_concurrent_children=15 / max_spawn_depth=3, so the model would self-cap at 3 and never use the headroom. Make the description dynamic. ToolEntry gains an optional dynamic_schema_overrides callable; registry.get_definitions() merges its output on top of the static schema before returning it. delegate_tool registers a builder that reads the current delegation.* config and emits: - 'up to N items concurrently for this user' (N = max_concurrent_children) - 'Nested delegation IS enabled / OFF for this user (max_spawn_depth=N)' - 'orchestrator children can themselves delegate up to M more level(s)' - 'orchestrator_enabled=false' when the kill switch is set The model_tools cache key already includes config.yaml mtime+size, so edits to delegation.* in config invalidate the cached tool definitions without an explicit hook. CLI_CONFIG staleness within a process is a pre-existing limitation of _load_config and out of scope here. Static description / tasks.description / role.description in DELEGATE_TASK_SCHEMA are placeholders so module import doesn't trigger cli.CLI_CONFIG load before the test conftest can redirect HERMES_HOME.	2026-05-09 11:07:53 -07:00
GodsBoy	93e25ceb13	feat(plugins): add standalone_sender_fn for out-of-process cron delivery Plugin platforms (IRC, Teams, Google Chat) currently fail with `No live adapter for platform '<name>'` when a `deliver=<plugin>` cron job runs in a separate process from the gateway, even though the platforms are eligible cron targets via `cron_deliver_env_var` (added in #21306). Built-in platforms (Telegram, Discord, Slack, etc.) use direct REST helpers in `tools/send_message_tool.py` so cron can deliver without holding the gateway in the same process; plugin platforms historically depended on `_gateway_runner_ref()` which returns `None` out of process. This change adds an optional `standalone_sender_fn` field to `PlatformEntry` so plugins can register an ephemeral send path that opens its own connection, sends, and closes without needing the live adapter. The dispatch site in `_send_via_adapter` falls through to the hook when the gateway runner is unavailable, with a descriptive error when neither path applies. The hook is optional, so existing plugins are unaffected. Reference migrations land in the same change for IRC, Teams, and Google Chat, exercising the hook across stdlib (asyncio + IRC protocol), Bot Framework OAuth client_credentials, and Google service-account flows respectively. Security hardening on the new code paths: * IRC: control-character stripping on chat_id and message body to block CRLF command injection; bounded nick-collision retries; JOIN before PRIVMSG so channels with the default `+n` mode accept the delivery. * Teams: TEAMS_SERVICE_URL validated against an allowlist of known Bot Framework hosts (`smba.trafficmanager.net`, `smba.infra.gov.teams.microsoft.us`) to block SSRF; chat_id and tenant_id constrained to the documented Bot Framework character set; per-request timeouts so a slow STS endpoint cannot starve the activity POST. * Google Chat: chat_id and thread_id validated against strict resource-name regexes; service-account refresh wrapped in `asyncio.wait_for` so a hung token endpoint cannot stall the scheduler. Test coverage: 20 new tests covering happy path, missing-config errors, network failure modes, and each defensive validation. Existing tests unchanged. `bash scripts/run_tests.sh tests/tools/test_send_message_tool.py tests/gateway/test_irc_adapter.py tests/gateway/test_teams.py tests/gateway/test_google_chat.py` reports 341 passed, 0 regressions. Documentation: new "Out-of-process cron delivery" section in website/docs/developer-guide/adding-platform-adapters.md and an entry in gateway/platforms/ADDING_A_PLATFORM.md naming the hook.	2026-05-09 02:56:29 -07:00
Zhekinmaksim	4a1840e683	fix(async): replace get_event_loop() with get_running_loop() in async contexts Follow-up to PR #21293 (cli.py), which fixed the same anti-pattern. `asyncio.get_event_loop()` is documented as effectively "always returns the running loop when called from a coroutine" and emits DeprecationWarning/RuntimeWarning in some interpreter configurations. The Python docs explicitly recommend get_running_loop() inside coroutines. Replaces the remaining 9 call sites that are unconditionally inside async def bodies: - tools/browser_cdp_tool.py — _cdp_call() (4 sites): deadline + remaining computations inside the async websockets.connect context manager. - hermes_cli/web_server.py — get_status, _start_device_code_flow, submit_oauth_code (3 sites): all FastAPI async endpoints offloading blocking httpx / PKCE work to run_in_executor. - environments/agent_loop.py — HermesAgentLoop (1 site): tool dispatch inside the async rollout loop. - environments/benchmarks/terminalbench_2/terminalbench2_env.py — rollout_and_score_eval (1 site): test verification thread offload. All 9 sites are unconditionally inside async def bodies, so a running loop is guaranteed and no try/except RuntimeError fallback is needed (unlike the cli.py case in #21293, which ran from a background thread). Behavior is identical on supported Python versions; aligns the codebase with the post-#21293 idiom and avoids future warnings as the deprecation hardens. Salvaged from PR #21930 by @Zhekinmaksim onto current main (the original branch was 109 commits behind and carried unintended stale-branch reverts of unrelated landed changes — _tail_lines encoding=utf-8 and the Windows PTY bridge guard). Only the 9 swaps from the PR's intended scope are applied here.	2026-05-09 02:34:19 -07:00
memosr	9bbad3cc10	fix(security): drop caller-controlled author override in kanban_comment Comments are injected into the next worker's system prompt by build_worker_context() as '{author} (timestamp): {body}'. The previous code accepted args['author'] as a free-form override and exposed it on KANBAN_COMMENT_SCHEMA, which let a worker: 1. Receive a prompt-injection in a malicious task body. 2. Call kanban_comment with author='hermes-system' (or any other authoritative-looking name) on a sibling task. 3. The next worker assigned to that sibling task sees the forged comment in its boot context as what reads like a system-authored directive. Always derive author from HERMES_PROFILE (the dispatcher already sets this per worker at hermes_cli/kanban_db.py:3718), and remove the 'author' property from the tool schema so the LLM can't see the override surface. Cross-task commenting itself remains unrestricted (see #19713) — comments are the deliberate handoff channel between tasks; only the author-override surface is closed. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-05-09 02:32:16 -07:00
Bartok	326ca754ad	fix(delegate): accept JSON string batch tasks Recover delegate_task batch inputs when open-weight models emit tasks as a JSON-encoded array string, and return clear errors for malformed task lists. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 02:18:57 -07:00
kshitij	2a7047c2ed	fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043 ) SQLite's WAL mode requires shared-memory (mmap) coordination and fcntl byte-range locks that don't reliably work on network filesystems. Upstream documents this explicitly: https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode On NFS / SMB / some FUSE mounts / WSL1, 'PRAGMA journal_mode=WAL' raises 'sqlite3.OperationalError: locking protocol' (SQLITE_PROTOCOL). Before this change, every feature backed by state.db or kanban.db broke silently: - /resume, /title, /history, /branch returned 'Session database not available.' with no cause - gateway logged the init failure at DEBUG (invisible in errors.log) - kanban dispatcher crashed every 60s, driving the known migration race (duplicate column name: consecutive_failures, #21708 / #21374) Changes: - hermes_state.apply_wal_with_fallback(): shared helper that tries WAL and falls back to DELETE on SQLITE_PROTOCOL-style errors with one WARNING explaining why - hermes_state.get_last_init_error() + format_session_db_unavailable(): capture the init failure cause and surface it in user-facing strings (with an NFS/SMB pointer for 'locking protocol') - hermes_cli/kanban_db.connect(): use the shared helper - gateway/run.py: bump SessionDB init failure log DEBUG -> WARNING (matches cli.py's existing correct behavior) - cli.py (4 sites) + gateway/run.py (5 sites): replace bare 'Session database not available.' with format_session_db_unavailable() Tests: 12 new tests in tests/test_hermes_state_wal_fallback.py + 1 new test in tests/hermes_cli/test_kanban_db.py. Existing suites (state, kanban, gateway, cli) remain green for all tests unrelated to pre-existing failures on main. Evidence: real-world user on NFSv3 mount (172.26.224.200:d2dfac12/home, local_lock=none) reporting 'Session database not available.' on /resume; 'locking protocol' appears in 4 distinct log entries across backup, kanban, TUI, and CLI paths in the same session. closes #22032	2026-05-09 02:09:35 -07:00
kshitij	ae005ec588	fix(send_message): map Telegram General topic id to None for forum groups (#22423 ) Telegram forum supergroups address the General topic as `message_thread_id="1"` on incoming updates, but the Bot API rejects sends with `message_thread_id=1` ("Message thread not found"). The gateway adapter has a `_message_thread_id_for_send` helper that maps "1" to None for that reason; the standalone `_send_telegram` helper used by the `send_message` tool never got the same mapping, so any `send_message` call to a Topics-enabled group's General topic (target shape `telegram:<chat_id>:1`) failed with "Message thread not found." Reuse the adapter's helper when available, with an explicit fallback to the same mapping for environments where the adapter import path fails (e.g. python-telegram-bot missing in this venv). Fixes #22267	2026-05-09 01:58:33 -07:00
helix4u	e407376c50	fix(cron): normalize partial job records	2026-05-09 01:11:41 -07:00
briandevans	3adcc64419	fix(patch-tool): advertise per-mode required params in schema descriptions Models that enforce required-only constraints (e.g. kimi-k2.x) were omitting old_string/new_string for replace mode and patch for patch mode because the schema only declared required: ["mode"]. Add explicit "REQUIRED when mode='X'" markers to each conditionally-required property description and a top-level "REQUIRED PARAMETERS: ..." summary for each mode. Avoids anyOf/oneOf which break Anthropic, Fireworks, and Kimi/Moonshot providers. Add TestPatchSchemaShape to lock the shape. Fixes #15524 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 16:59:24 -07:00
Teknium	0ec052ca24	perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 ) Interactive `hermes` launch drops from ~21s to ~2.5s. Three independent fixes, each targets a distinct hot spot in the banner / tool-registration path that fires on every CLI invocation. 1. `get_external_skills_dirs()` in-process mtime cache (~10s saved) The function re-read + YAML-parsed the full ~/.hermes/config.yaml on every call. Banner build invokes it once per skill to resolve the category column, which on a 120-skill install meant ~120 reparses of a 15 KB config (~85 ms each). Added a `(config_path, mtime_ns) -> list[Path]` memo; stat() is ~2 us vs ~85 ms for the parse. Edits to config.yaml invalidate the cache on the next call via mtime. 2. Feishu availability probe uses `importlib.util.find_spec` (~5.2s saved) `tools/feishu_doc_tool.py::_check_feishu` and the identical helper in `feishu_drive_tool.py` were calling `import lark_oapi` purely to detect whether the SDK was installed. Executing the real import pulls in websockets + dispatcher + every v2 API model — ~5 seconds of work that fires at every tool-registry bootstrap. `find_spec` answers the same question ("is lark_oapi importable?") without executing the module. The actual tool handlers still do the real import on invoke, so runtime behavior is unchanged. 3. `_web_requires_env` no longer triggers Nous portal refresh (~800ms saved) `tools/web_tools.py::_web_requires_env` used `managed_nous_tools_enabled()` to gate four gateway env-var names in the returned list. The gate called `get_nous_auth_status()` -> `resolve_nous_runtime_credentials()` -> live HTTP POST to the portal on every tool-registry bootstrap. But the list is pure metadata — if the env var is set at runtime, the tool lights up; otherwise it doesn't. Including the four names unconditionally is harmless for unsubscribed users (vars just aren't set) and eliminates the sync HTTP round trip from startup. Test: - tests/agent/test_external_skills_dirs_cache.py (new, 6 cases): returns config'd dir, caches on second call (yaml_load patched to raise — never invoked), invalidates on mtime bump, empty when config missing, returned list is a defensive copy, per-HERMES_HOME cache key isolation. - Existing tests/agent/test_external_skills.py and tests/tools/ continue to pass modulo pre-existing flakes on main (test_delegate, test_send_message — unrelated, pass in isolation). Measured: bare `hermes` (cold → REPL ready) 21,519ms -> 2,618ms on Teknium's install (119 skills, 15 KB config.yaml, Nous auth logged in, lark_oapi installed). 8x faster.	2026-05-08 16:39:32 -07:00
Teknium	cc38282b04	feat(cross-platform): psutil for PID/process management + Windows footgun checker ## Why Hermes supports Linux, macOS, and native Windows, but the codebase grew up POSIX-first and has accumulated patterns that silently break (or worse, silently kill!) on Windows: - `os.kill(pid, 0)` as a liveness probe — on Windows this maps to CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console process group (bpo-14484, open since 2012). - `os.killpg` — doesn't exist on Windows at all (AttributeError). - `os.setsid` / `os.getuid` / `os.geteuid` — same. - `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr errors at runtime on Windows. - `open(path)` / `open(path, "r")` without explicit encoding= — inherits the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX), causing mojibake round-tripping between hosts. - `wmic` — removed from Windows 10 21H1+. This commit does three things: 1. Makes `psutil` a core dependency and migrates critical callsites to it. 2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that blocks new instances of any of the above patterns. 3. Fixes every existing instance in the codebase so the baseline is clean. ## What changed ### 1. psutil as a core dependency (pyproject.toml) Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical cross-platform answer for "is this PID alive" and "kill this process tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on Windows (NOT a signal call), and its `Process.children(recursive=True)` + `.kill()` combo replaces `os.killpg()` portably. ### 2. `gateway/status.py::_pid_exists` Rewrote to call `psutil.pid_exists()` first, falling back to the hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows (and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing — e.g. during the scaffold phase of a fresh install before pip finishes. ### 3. `os.killpg` migration to psutil (7 callsites, 5 files) - `tools/code_execution_tool.py` - `tools/process_registry.py` - `tools/tts_tool.py` - `tools/environments/local.py` (3 sites kept as-is, suppressed with `# windows-footgun: ok` — the pgid semantics psutil can't replicate, and the calls are already Windows-guarded at the outer branch) - `gateway/platforms/whatsapp.py` ### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines) Grep-based checker with 11 rules covering every Windows cross-platform footgun we've hit so far: 1. `os.kill(pid, 0)` — the silent killer 2. `os.setsid` without guard 3. `os.killpg` (recommends psutil) 4. `os.getuid` / `os.geteuid` / `os.getgid` 5. `os.fork` 6. `signal.SIGKILL` 7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT` 8. `subprocess` shebang script invocation 9. `wmic` without `shutil.which` guard 10. Hardcoded `~/Desktop` (OneDrive trap) 11. `asyncio.add_signal_handler` without try/except 12. `open()` without `encoding=` on text mode Features: - Triple-quoted-docstring aware (won't flag prose inside docstrings) - Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments) - Guard-hint aware (skips lines with `hasattr(os, ...)`, `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.) - Inline suppression with `# windows-footgun: ok — <reason>` - `--list` to print all rules with fixes - `--all` / `--diff <ref>` / staged-files (default) modes - Scans 380 files in under 2 seconds ### 5. CI integration A GitHub Actions workflow that runs the checker on every PR and push is staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this commit because the GH token on the push machine lacks `workflow` scope. A maintainer with `workflow` permissions should add it as `.github/workflows/windows-footguns.yml` in a follow-up. Content: ```yaml name: Windows footgun check on: push: branches: [main] pull_request: branches: [main] jobs: check: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: {python-version: "3.11"} - run: python scripts/check-windows-footguns.py --all ``` ### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion Expanded from 5 to 16 rules, each with message, example, and fix. Recommends psutil as the preferred API for PID / process-tree operations. ### 7. Baseline cleanup (91 → 0 findings) - 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or `encoding='utf-8-sig'` (user-editable files that Notepad may BOM) - 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin tool subprocess management → annotated with `# windows-footgun: ok — <reason>` - 7 `os.killpg` sites → migrated to psutil (see §3 above) ## Verification ``` $ python scripts/check-windows-footguns.py --all ✓ No Windows footguns found (380 file(s) scanned). $ python -c "from gateway.status import _pid_exists; import os > print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))" self: True bogus: False ``` Proof-of-repro that `os.kill(pid, 0)` was actually killing processes before this fix — see commit ``1cbe39914`` and bpo-14484. This commit removes the last hand-rolled ctypes path from the hot liveness-check path and defers to the best-maintained cross-platform answer.	2026-05-08 14:27:40 -07:00
Teknium	324567c936	fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper On Windows, Python's ``os.kill(pid, 0)`` is NOT a no-op. CPython's implementation (``Modules/posixmodule.c::os_kill_impl``) treats sig=0 as ``CTRL_C_EVENT`` because the two integer values collide at the C layer, and routes it through ``GenerateConsoleCtrlEvent(0, pid)`` — which sends a Ctrl+C to the ENTIRE console process group containing the target PID, not just the PID itself. Any caller that wanted to check "is PID X alive" via the classic POSIX ``os.kill(pid, 0)`` idiom was silently killing that process (and often unrelated processes in the same console group) on Windows. Long-standing Python Windows quirk; see bpo-14484 (open since 2012). This manifested in Hermes as: every ``hermes gateway status`` invocation would read the gateway's PID from the PID file, call ``os.kill(pid, 0)`` via ``gateway.status.get_running_pid()`` as a "liveness check", and instantly terminate the gateway it was trying to report on. No shutdown log, no traceback, no atexit hook fire, no exit-diag entry — just silent termination of the detached pythonw process. "Bot answered one message then stopped typing" was the characteristic end-user symptom because `os.kill(pid, 0)` fires mid-response-send and kills the gateway between logs. Reproduction (verified in this branch before the fix): $ hermes gateway start # gateway alive, PID 37520 $ hermes gateway status # reports "No gateway process detected" $ tasklist /FI "PID eq 37520" # INFO: No tasks are running # — gateway terminated silently Root-cause fix is a new ``gateway.status._pid_exists(pid)`` helper: - On Windows: Win32 ``OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION \| SYNCHRONIZE, False, pid)`` + ``WaitForSingleObject(handle, 0)`` via ctypes. Zero signal delivery, zero console-group side effects. Pins ctypes return types to avoid DWORD-vs-signed-int parse bugs on WAIT_TIMEOUT (0x102). Distinguishes ERROR_INVALID_PARAMETER (PID gone) from ERROR_ACCESS_DENIED (alive but another user). - On POSIX: the canonical ``os.kill(pid, 0)`` idiom that actually is a no-op there. Then patch every ``os.kill(pid, 0)`` liveness-check callsite to route through ``_pid_exists`` instead. Total 14 callsites across 11 files; every single one was a latent silent-kill on Windows: gateway/run.py:2810 — /restart watcher (inline subprocess) gateway/run.py:15195 — --replace wait loop gateway/status.py:572 — acquire_gateway_runtime_lock stale check gateway/status.py:828 — get_running_pid (THE killer for status) gateway/platforms/whatsapp.py:111 hermes_cli/gateway.py:228, 522, 1012 — gateway-related drain loops hermes_cli/kanban_db.py:2826 — _pid_alive was claiming to be cross-platform but used os.kill(pid, 0) on Windows hermes_cli/main.py:5792 — CLI process-kill polling hermes_cli/profiles.py:782 — profile stop wait loop plugins/google_meet/process_manager.py:74 tools/browser_tool.py:1215, 1255 — browser daemon ownership probes tools/mcp_tool.py:1255, 3374 — MCP stdio orphan tracking The watcher source in gateway/run.py:2810 is a multi-line string that gets spawned as an inline ``python -c "..."`` subprocess, so it can't import gateway.status. The fix for that callsite inlines the same ctypes probe directly into the watcher source. Tested on Windows 10 with the hermes gateway + Telegram bot: - gateway start → alive - 5 consecutive ``hermes gateway status`` invocations → gateway alive after every one, same PID reported each time (37520, 21952) - gateway.log shows uninterrupted operation; no spurious shutdown entries; cron ticker and kanban dispatcher still running on their 60-second cadence - bot continues answering Telegram messages throughout Ships alongside an exit-path diagnostic wrapper in ``hermes_cli/gateway.py::run_gateway()`` that captures every way ``asyncio.run(start_gateway(...))`` can return (success, SystemExit, KeyboardInterrupt, BaseException, atexit) with full traceback to ``logs/gateway-exit-diag.log``. This was used to prove the gateway was being hard-killed externally (no exit event fired) and should be kept for future Windows debugging. Refs: https://bugs.python.org/issue14484 See also: references/windows-subprocess-sigint-storm.md in the hermes-agent skill.	2026-05-08 14:27:40 -07:00
Teknium	0ba1e12abc	fix(windows): browser tool + spurious SIGINT from subprocess spawning Three related Windows-only fixes that together make the browser toolset actually usable on Windows. Symptom chain: user invokes browser_navigate -> tool returns {"success": false, "error": "Daemon process exited during startup with no error output"} and the CLI exits mid-turn with the session summary. Root cause (3 layers): 1. tools/browser_tool.py::_find_agent_browser() resolved node_modules/.bin/agent-browser to the extensionless POSIX shell shim via Path.exists(). On Windows, CreateProcessW cannot execute that script (WinError 193 "not a valid Win32 application"). Fix: delegate to shutil.which with path=node_modules/.bin so PATHEXT picks up agent-browser.CMD on Windows and the extensionless shim stays correct on POSIX. 2. Windows Terminal / Win32 delivers a spurious CTRL_C_EVENT to the parent hermes.exe whenever a background thread spawns a .cmd subprocess. Python 3.11's default SIGINT handler raises KeyboardInterrupt in MainThread, which unwinds prompt_toolkit's app.run() -> cli.py::run()'s finally block calls _run_cleanup() -> _emergency_cleanup_all_sessions -> spawns a concurrent _run_browser_command("close", ...) on the same session the agent thread just opened. Two agent-browser processes race on the same --session name, the daemon startup loses, and the tool returns the "Daemon process exited during startup" error. Fix: install a Windows-only SIGINT handler that absorbs the signal silently. Real user Ctrl+C still routes through prompt_toolkit's own c-c keybinding at the TUI layer, which is how Claude Code handles the same quirk (driving cancellation via the TUI key handler, not signals). 3. In tools/browser_tool.py, both Popen sites now pass creationflags=CREATE_NO_WINDOW \| STARTF_USESTDHANDLES with close_fds=True on Windows. CREATE_NO_WINDOW suppresses the .cmd console flash; STARTF_USESTDHANDLES + close_fds ensures the child inherits only our three chosen handles (DEVNULL stdin, temp-file stdout/stderr) and no leaked parent console handles that could confuse agent-browser's native daemon spawn. Notably we do NOT add CREATE_NEW_PROCESS_GROUP - on Python 3.11 Windows the flag interacts badly with asyncio's ProactorEventLoop and makes things worse. Verified end-to-end on Windows 10 / Windows Terminal / PowerShell: browser_navigate to https://example.com returns {"success": true, "title": "Example Domain"} and the CLI stays alive for follow-up tool calls and assistant turns. Refs: earlier Windows quirks commits `1cebb3bad` (Ctrl+Enter newline), `26f5af52a` (environment hints), `aefd1a37f` (Playwright Chromium).	2026-05-08 14:27:40 -07:00
Teknium	cbce5e93fc	codebase: add encoding='utf-8' to all bare open() calls (PLW1514) Closes the last Python-on-Windows UTF-8 exposure by making every text-mode open() call explicit about its encoding. Before: on Windows, bare open(path, 'r') defaults to the system locale encoding (cp1252 on US-locale installs). That means reading any config/yaml/markdown/json file with non-ASCII content either crashes with UnicodeDecodeError or silently mis-decodes bytes. After: all 89 affected call sites in production code now pass encoding='utf-8' explicitly. Works identically on every platform and every locale, no surprise behavior. Mechanical sweep via: ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' . All 89 fixes have the same shape: open(x) or open(x, mode) became open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing else changed. Every modified file still parses and the Windows/sandbox test suite is still green (85 passed, 14 skipped, 0 failed across tests/tools/test_code_execution_windows_env.py + tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py + tests/test_hermes_bootstrap.py). Scope notes: - tests/ excluded: test fixtures can use locale encoding intentionally (exercising edge cases). If we want to tighten tests later that's a separate PR. - plugins/ excluded: plugin-specific conventions may differ; plugin authors own their code. - optional-skills/ and skills/ excluded: skill scripts are user-authored and we don't want to mass-edit them. - website/ and tinker-atropos/ excluded: vendored / generated content. 46 files touched, 89 +/- lines (symmetric replacement). No behavior change on POSIX or on Windows when the file is ASCII; bug fix on Windows when the file contains non-ASCII.	2026-05-08 14:27:40 -07:00
Teknium	107de0321d	execute_code: set PYTHONIOENCODING=utf-8 + PYTHONUTF8=1 in child env Third Windows-specific sandbox bug (after WinError 10106 and the UTF-8 file-write bug): user scripts that print non-ASCII to stdout crash with UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position N: character maps to <undefined> Root cause: Python's sys.stdout on Windows is bound to the console code page (cp1252 on US-locale installs) when the process is attached to a pipe without PYTHONIOENCODING set. LLM-generated scripts routinely print em-dashes, arrows, accented chars, and emoji — all of which cp1252 can't encode. Fix: spawn the sandbox child with: PYTHONIOENCODING=utf-8 # sys.stdin/stdout/stderr all UTF-8 PYTHONUTF8=1 # PEP 540 UTF-8 mode — open() defaults to UTF-8 too PYTHONUTF8 is the belt-and-suspenders half: LLM scripts that call open(path, 'w') without encoding= in user code will now produce UTF-8 files by default, matching what the sandbox already does for its own staging files. The parent side already decodes child stdout/stderr as UTF-8 with errors='replace' (lines 1345-1347) so the end-to-end chain is clean. On POSIX these values usually match the locale default already, so setting them is harmless belt-and-suspenders for C/POSIX-locale containers and minimal base images. Tests added (4) — total file now at 28 passed, 1 skipped on Windows: - test_popen_env_sets_pythonioencoding_utf8 (source grep) - test_popen_env_sets_pythonutf8_mode (source grep) - test_live_child_can_print_non_ascii (cross-platform live test) - test_windows_child_without_utf8_env_would_fail (Windows negative control — actually reproduces the bug without our env overrides, proving the fix is load-bearing on this system)	2026-05-08 14:27:40 -07:00

1 2 3 4 5 ...

1303 commits