Commit graph

12599 commits

Author SHA1 Message Date
Brooklyn Nicholson
2dfcead683 feat(computer-use): make the preflight cross-platform (win/linux)
The card was macOS-only. cua-driver also runs on Windows and Linux, so
fold `cua-driver doctor` (cross-platform binary/health probes) into a
single OS-aware `ready` signal:

- macOS: ready == both TCC grants; keeps the permission rows + grant flow.
- Windows/Linux: no TCC toggles, so ready == driver health, with a
  per-OS note (SmartScreen/UIAccess on Windows; X11/XWayland on Linux).

`computer_use_status()` replaces the macOS-only `permissions_status()` and
surfaces `platform`, `ready`, `can_grant`, and the doctor `checks` (non-ok
ones render as warnings). CLI `permissions status`, the REST endpoint, and
the desktop card all key off the one payload. Grant stays macOS-only (400
elsewhere — nothing to grant).
2026-06-22 17:48:43 -05:00
Brooklyn Nicholson
0223ea5f59 feat(computer-use): surface macOS permission preflight in the desktop
Computer Use already worked through the desktop backend (the cua-driver
toolset enables + installs via Settings -> Skills & Tools), but there was
no in-app way to see or grant the two macOS permissions it needs, so "give
a model my Mac" was tribal knowledge.

The grants attach to cua-driver's OWN TCC identity (com.trycua.driver /
the installed CuaDriver.app), not Hermes -- so no app entitlement is
involved. cua-driver 0.5+ exposes `permissions status/grant`, which we wrap:

- tools/computer_use/permissions.py: thin client over the two subcommands
- hermes computer-use permissions {status,grant}: CLI parity
- GET /api/tools/computer-use/status, POST .../permissions/grant: desktop REST
- ComputerUsePanel: live Accessibility + Screen Recording state with a
  Grant button (dialog attributed to CuaDriver), shown in the expanded
  Computer Use toolset row. Binary install stays in the existing provider
  post-setup runner.

Follow-ups: i18n the card copy; a "Stop driver" control (cua-driver stop)
for the runaway-`serve` case.
2026-06-22 17:33:52 -05:00
kshitijk4poor
c080b2dc3e fix(gateway): redact credentials from TUI approval prompts (#48456)
Follow-up to #50767, which redacted the chat-platform (_approval_notify_sync)
and SSE/API (_approval_notify) approval transports. The TUI JSON-RPC transport
is the third egress and was missed: three register_gateway_notify callbacks in
tui_gateway/server.py emitted the raw approval_data — including the unredacted
command Tirith flagged — straight to the TUI client via _emit.

Route all three registrations through a new module-level _emit_approval_request()
helper that redacts payload['command'] via the shared
gateway.run._redact_approval_command seam before emitting, matching the pattern
used for the other two transports. Completes the whole-bug-class fix for #48456.

Tests: assert the helper emits a redacted command (real credential pattern),
handles missing/None command, and a wiring guard that no registration emits the
raw payload directly (only the helper may). Both mutation-checked.

The #48456 fix series originated from @liuhao1024's #48462 — credit to them for
the original report and chat-platform fix; this completes the remaining transport.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-23 03:14:18 +05:30
kshitijk4poor
0e69cd4b37 fix(memory): honor configured char limits in the no-agent on-disk store
Follow-up to the /memory approve fresh-store fix. Both the CLI fallback and
the messaging-gateway handler built a bare MemoryStore() with the hardcoded
default char limits (2200/1375), ignoring the user's configured
memory.memory_char_limit / user_char_limit. A live agent honors those
overrides (agent/agent_init.py), so an approval applied without a live agent
could accept a write the user's lower cap would reject, or vice versa.

Extract a shared tools.memory_tool.load_on_disk_store() factory that reads
the configured limits (falling back to defaults if config can't load) and
wire both the CLI and gateway handlers to it, closing the gap on both
surfaces and de-duplicating the construction block.
2026-06-23 03:10:53 +05:30
Max Hsu
3147cbb136 fix(memory): apply /memory approve against a fresh store when no live agent
The CLI /memory slash handler (cli_commands_mixin._handle_memory_command)
passed self.agent._memory_store straight through, which is None when the
command runs without a live agent — e.g. /memory approve from the Desktop
GUI. The shared write-approval handler then returns "memory store
unavailable" and applies nothing, even with built-in memory enabled and
pending writes present.

Fall back to a freshly loaded on-disk MemoryStore when no live store is
available, mirroring the gateway path (gateway/slash_commands.py). It
persists to the same MEMORY/USER.md and creates MEMORY.md on the first
approved write.

Fixes #46783

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 03:10:53 +05:30
kshitijk4poor
100e7be20e fix(security): deny root-level credential stores in media delivery
The media-delivery denylist in gateway/platforms/base.py enumerated only
.env/auth.json/credentials/config.yaml under HERMES_HOME, so other
credential stores that live at the root fell through and could be
auto-attached to chat replies. The reported case: the Google Workspace
skill's google_token.json refreshes every turn, bumping its mtime to
'now', which kept passing the strict-mode recency window and re-sent the
OAuth token on every reply.

Extend the explicit per-file denylist to mirror the canonical credential
set already enforced by the read/write guards in agent/file_safety.py:
google_token.json, google_oauth_pending.json, auth/google_oauth.json,
.anthropic_oauth.json, webhook_subscriptions.json, cache/bws_cache.json,
auth.lock, and the pairing/ token directory.

Targeted per-file additions (not a blanket ~/.hermes deny, which was
declined in #32090/#34425 because it would block skills/, logs/, and
ad-hoc agent-written deliverables). mcp-tokens/ (#37222) and
state.db/kanban.db (#41071) are left to their sibling targeted PRs.

Reported-by: xxxigm (#50912)
2026-06-23 02:56:48 +05:30
Teknium
e9b86f352f fix(discord): delete obsolete slash commands before creating new ones
Discord enforces a hard 100-command limit per app and rejects an upsert that would push the live total over 100 (error 30032), which silently breaks ALL slash commands. The sync deleted obsolete commands AFTER creating new ones, so an app already at the cap momentarily exceeded it and the whole sync failed.

Reorder: delete no-longer-desired commands up front, then create/update. Removes the now-redundant trailing delete loop. Adapts @infinitycrew39 PR #50890 to current main (the original adapter diff no longer applied after the platform refactor); test commit cherry-picked with authorship preserved.
2026-06-22 13:58:33 -07:00
infinitycrew39
91c465f6e7 test(discord): add regression test for 100-command sync limit
Add a test to verify that _safe_sync_slash_commands deletes obsolete
commands before creating new ones. This ensures we never temporarily
exceed Discord's 100-command limit during sync, which would trigger
error 30032 and break all slash commands.

This test guards against the regression where sync could fail even though
the registration cap was properly enforced.
2026-06-22 13:58:33 -07:00
helix4u
ae7e857420 fix(cron): deliver max-iteration fallback reports 2026-06-22 13:57:59 -07:00
helix4u
3972701424 fix(agent): complete final text on last turn 2026-06-22 13:57:59 -07:00
Teknium
0f741cef28 fix(tests): update cua install tests for cross-platform support
f-trycua's #50855 test file predated the cross-platform PR (#50552) and
reintroduced two stale tests asserting Linux is unsupported
(test_*_non_macos_*, patching platform.system="Linux" and expecting a
no-op/warn). Linux + Windows are supported now, so install proceeds on
those platforms. Restore main's cross-platform-correct versions:
test_*_on_unsupported_platform_* using FreeBSD as the genuinely
unsupported case.
2026-06-22 13:41:03 -07:00
Francesco Bonacci
5f1d23cfb2 fix(computer-use): delete broken pre-install asset probe; trust the upstream installer
`hermes computer-use install` refused to install on Linux, Windows, and
macOS x86_64 because the pre-install asset probe was hitting the wrong
GitHub endpoint AND duplicating tag-resolution logic the upstream
installer already does correctly.

`_check_cua_driver_asset_for_arch()` queried
`https://api.github.com/repos/trycua/cua/releases/latest`. On trycua/cua:

- cua-driver-rs releases (the binary the installer fetches) are marked
  **prerelease** on every cut. GitHub's `/releases/latest` explicitly
  skips prereleases.
- The Python package releases (`cua-agent`, `cua-computer`, `cua-train`)
  are non-prerelease and end up as the "latest" instead.

Live API check today:

  $ curl -sf https://api.github.com/repos/trycua/cua/releases/latest \
      | jq '{tag:.tag_name, asset_count: (.assets|length)}'
  { "tag": "agent-v0.8.3", "asset_count": 0 }

The probe sees zero assets, prints "Latest CUA release has no Linux
x86_64 asset", and skips install on every Linux / Windows / macOS-x86_64
host — even though the cua-driver-rs-v0.6.0 release ships 19 binary
assets covering all those platforms.

Filtering `/releases?per_page=N` for the `cua-driver-rs-v*` prefix
fixes the bug, but it duplicates tag-resolution logic the upstream
`_install-rust.sh` already does correctly via `CUA_DRIVER_RS_BAKED_VERSION`
(auto-baked by CD on every release, with a `/releases?per_page=N` API
fallback for dev checkouts). The right answer is to trust that
contract instead of mirroring it in Python where it can drift.

Two paths get the same outcome without the probe:

1. **Fresh install**: run `install.sh` directly. It has the baked
   release tag, fetches the right asset, and errors with a clear
   message on missing-arch downloads. No preflight needed.
2. **Upgrade path**: `cua_driver_update_check()` (separately added)
   shells `cua-driver check-update --json` against the installed
   binary, which returns the canonical update answer from the same
   source the installer uses.

- `hermes_cli/tools_config.py`: delete `_check_cua_driver_asset_for_arch`
  and its two call sites in `install_cua_driver`. Replace with an
  inline comment near the top of the module explaining the rationale.
- `tests/hermes_cli/test_install_cua_driver.py`: drop the
  `TestCheckCuaDriverAssetForArch` block. Add `TestArchProbeRemoval`
  with three regressions:

  - `test_probe_function_is_gone` — asserts the deleted helpers stay
    deleted.
  - `test_fresh_install_does_not_call_github_api` — asserts the
    install path doesn't hit GitHub directly from Python anymore.
  - `test_upgrade_with_binary_does_not_call_github_api_directly` —
    same for the upgrade path.

All 9 `test_install_cua_driver` tests pass.

Reported by @teknium1 while testing on a headed Ubuntu host.
2026-06-22 13:41:03 -07:00
Teknium
f721d2cda9
fix(image/video gen): make schema delivery instruction platform-neutral (#51031)
* chore: re-trigger CI (workflows did not dispatch on prior head)

* fix(image/video gen): make schema delivery instruction platform-neutral

The image_generate and video_generate tool schema descriptions hardcoded
a gateway-only delivery instruction ('display it with markdown
![description](url-or-path) and the gateway will deliver it'). That schema
is sent on every platform, so on CLI it directly contradicted the CLI
platform hint ('Do NOT emit MEDIA:/path tags ... state its absolute path
in plain text'), and on messaging platforms it was also wrong about the
mechanism (local file paths are delivered via MEDIA: tags, not markdown
image syntax — markdown ![]() only works for URLs).

The per-platform file-delivery convention is already owned correctly by
the platform hints in prompt_builder.py. The tool schema now just
describes the result shape (URL or absolute path in the image/video field)
and defers 'how to deliver' to the active platform's guidance.

Provider/model injection already works via _build_dynamic_image_schema()
(the 'Active backend: <provider> · model: <model>' line); no change there.
2026-06-22 13:40:42 -07:00
Austin Pickett
2a58fee1a1
fix(api): allow dashboard updates for git checkouts in containers (#51005)
Salvages #50469 by @libre-7.

_dashboard_local_update_managed_externally() previously blocked every containerized dashboard from the local update API, even when the running install was a bind-mounted git checkout that can be updated with hermes update.

Allow the dashboard updater only for git installs inside containers, while keeping hosted /opt/data, docker, and pip installs managed externally. Pip remains blocked because its apply path mutates the running container filesystem and is not the self-managed checkout case.

Adds regression coverage for docker, git, and pip install-method handling inside containers, and maps the contributor email for release attribution.

Co-authored-by: libre-7 <libre-7@users.noreply.github.com>
2026-06-22 15:55:33 -04:00
Teknium
6681f28d5b fix(telegram): disable DM topic mode when last binding is pruned
Follow-up to #31501. When the send-fallback prune removes a chat's
final telegram_dm_topic_bindings row, also flip
telegram_dm_topic_mode.enabled to 0 in the same transaction.

Without this, a user who turns topics off in the Telegram client
(rather than via /topic off) leaves enabled=1 with zero lanes:
_recover_telegram_topic_thread_id keeps treating the chat as
topic-enabled and lobby messages keep hunting for bindings that no
longer exist. Clearing the flag makes recovery fully stand down once
the dead topics are gone.

Adds 3 regression tests covering the last-binding clear, the
multi-binding no-op, and the unmatched-prune no-op.
2026-06-22 12:29:05 -07:00
xxxigm
11246dbe21 tests: regression coverage for stale topic-binding prune (#31501)
Thirteen tests across four layers:

* ``SessionDB.delete_telegram_topic_binding`` — pin the new
  helper's contract: removes only the (chat_id, thread_id) row
  it was asked about, leaves siblings alone, returns 0 silently
  when the row never existed, and is a no-op on a pristine
  database whose topic-mode tables haven't been migrated yet.
* ``TelegramAdapter._prune_stale_dm_topic_binding`` — the glue
  must drop the binding when ``self._session_store._db``
  exposes the helper, swallow exceptions so a failed cleanup
  never breaks the user-facing send, and refuse to issue a
  DELETE for ``chat_id=None`` / ``thread_id=None`` so a
  bookkeeping miss can't accidentally null-match every row.
* Source-level guards on ``TelegramAdapter.send`` and
  ``_send_message_with_thread_fallback`` — the prune call must
  sit beside the two existing "Thread X not found, retrying
  without message_thread_id" warnings, before the retry runs,
  so a future refactor can't silently drop the cleanup wire.
* End-to-end semantic — once a topic is pruned, the
  ``GatewayRunner._recover_telegram_topic_thread_id`` walk
  steers future inbound messages to the surviving binding
  instead of the dead one.  This is the exact behaviour change
  the bug report's reproduction asks for: no more landings in
  the wrong topic until the operator hand-edits ``state.db``.

Refs #31501
2026-06-22 12:29:05 -07:00
xxxigm
142a5751a2 gateway/telegram: prune stale DM topic binding on Thread-not-found (#31501)
Both fallback sites that currently log "Thread X not found,
retrying without message_thread_id" now also drop the
``telegram_dm_topic_bindings`` row keyed on
``(chat_id, thread_id)``:

* The streaming send loop (``send`` body) — fires on the
  second failure, after the same-thread one-shot retry confirms
  the thread really is gone (the first attempt is left alone
  because Bot API has been observed to return a transient
  "Thread not found" that recovers on immediate retry).
* The control-message helper ``_send_message_with_thread_fallback``
  (approval prompts, model picker, update prompts) — single-shot
  retry, prune unconditionally on the BadRequest match.

Without this prune, a user who deletes a Telegram DM topic in
the client keeps getting their next inbound message recovered
back to the dead thread by
``_recover_telegram_topic_thread_id`` in ``gateway/run.py``,
which walks the per-user binding list newest-first and treats
the deleted thread as authoritative.  The reproduction in the
bug report is exactly this: tool progress, approvals, activity
messages and replies all land in the wrong place until the user
manually runs DELETE on state.db.

Cleanup is best-effort — we log at INFO when it succeeds, swallow
any exception from the SessionDB call, and the user-facing send
proceeds either way.

Refs #31501
2026-06-22 12:29:05 -07:00
xxxigm
4849a8e555 hermes_state: add SessionDB.delete_telegram_topic_binding (#31501)
Targeted ``(chat_id, thread_id)`` prune for the
``telegram_dm_topic_bindings`` table — the missing piece for
#31501, where the Telegram adapter detects a topic the user
deleted out-of-band but the binding row keeps living in
state.db.  The recovery logic in
``gateway.run._recover_telegram_topic_thread_id`` then steers
every future inbound message back to the dead topic, dropping
tool progress, approvals and replies into the wrong place.

Returns the number of rows deleted; silently no-ops when the
topic-mode tables haven't been migrated yet (read-only / pristine
profile) so the helper is safe to call from a send-fallback
hot path before the schema has run.
2026-06-22 12:29:05 -07:00
Teknium
2ba1cfeb2e
feat(goals): completion contracts for /goal — evidence-based judging (#50501)
Adds an optional structured completion contract to the standing-goal loop,
adapted from OpenAI Codex's /goal guidance (a durable objective works best
when it names what done means, how to prove it, what not to break, what's in
scope, and when to stop).

A contract has five optional fields — outcome, verification, constraints,
boundaries, stop_when. When set, the continuation prompt tells the agent to
target the verification surface and respect constraints, and the judge marks
the goal done only when the verification criterion is met with concrete
evidence (command result, file excerpt, test output) instead of a loose
"looks done" claim. This tightens the most common /goal failure mode:
premature completion / endless over-continuation on an underspecified goal.

Two ways to set a contract, both backward compatible (bare /goal <text>
behaves exactly as before):
- /goal draft <objective>  — expands plain text into a full contract via the
  goal_judge aux model (cache-safe side call), falls back to a free-form goal
  if the model is unavailable.
- /goal <text> with inline 'field: value' lines (verify:, constraints:,
  boundaries:, stop when:, ...). Plain goals with an incidental colon are not
  mangled — only known field prefixes are pulled out.
- /goal show prints the active contract.

Contracts persist in SessionDB.state_meta alongside the goal (survive /resume),
compose with /subgoal criteria, and old goal rows load unchanged. CLI + every
gateway platform via the shared GoalManager engine; zero new model tools.

Tests: +18 in tests/hermes_cli/test_goals.py (parse/serialize/judge-prompt/
draft/fallback), 73/73 green; 42/42 across the broader goal test surface;
live E2E roundtrip (set -> persist -> reload -> contract-aware prompts) green.
2026-06-22 12:20:09 -07:00
Teknium
ff08e60c63
feat(skills): add cloudflare-temporary-deploy optional skill (#50849)
* chore: re-trigger CI (workflows did not dispatch on prior head)

* feat(skills): add cloudflare-temporary-deploy optional skill

Optional web-development skill teaching the agent to deploy a Worker to a
live workers.dev URL with no Cloudflare account via 'wrangler deploy
--temporary' (Wrangler 4.102.0+). Cloudflare provisions a throwaway,
claimable account valid for 60 minutes — ideal for an autonomous
write->deploy->verify loop with no OAuth/signup hard stop.

- SKILL.md: when/when-not, prereqs (unauth requirement, version floor),
  step-by-step deploy + verify flow, product limits table, pitfalls
  (hidden flag, stale global wrangler, auth-present error, rate limits,
  workers.dev edge cache), verification.
- scripts/parse_deploy_output.py: stdlib-only parser extracting live URL,
  claim URL, account name/state, expiry, deploy status from wrangler output.
- tests/skills/test_cloudflare_temporary_deploy_skill.py: 16 tests incl.
  a real-output regression case.

Verified live end-to-end: temporary account created with no creds,
deployed to a live URL, curl confirmed body, redeploy reused the account.
2026-06-22 12:14:30 -07:00
brooklyn!
7dece1d933
Merge pull request #50977 from NousResearch/bb/composer-fixed-portal
fix(desktop): keep floating composer on-screen, scoped to the thread area
2026-06-22 14:12:02 -05:00
Brooklyn Nicholson
de7ad8b78e fix(desktop): guarantee out-of-bounds composer is reclamped on load
Re-clamp once more on the next frame after pop-out so layout (sidebar widths,
fonts) has settled, and treat a degenerate pre-layout bounds rect as "unknown"
(fall back to the window) so we never clamp the box into a collapsed area. Net:
anyone who loads in with a stranded position is pulled back on-screen and the
fix is persisted, even if the first measure was premature.
2026-06-22 13:59:26 -05:00
Brooklyn Nicholson
ea5fa505d9 fix(desktop): clamp floating composer to the thread area, not the whole window
Now that the popped-out composer is fixed to the viewport, clamping against the
window let it slide under a pinned sidebar. Confine it to the thread region
(data-slot="composer-bounds") instead — its rect already excludes a pinned
sidebar and the header — falling back to the full window before it's measured.
This subsumes the old titlebar top-margin (the thread rect starts below the
header).
2026-06-22 13:57:53 -05:00
Brooklyn Nicholson
aff5ae692f fix(desktop): move composer out of contain wrapper instead of portaling
Replaces the body-portal approach: render ChatBar as a sibling of the
contain:[layout paint] chat wrapper (inside the same runtime boundary) rather
than portaling the floating instance to <body>. The wrapper is a containing
block for — and clips — position:fixed descendants, which is what stranded the
popped-out composer off-screen. As a sibling it anchors to the outer relative
container: docked stays absolute (identical placement), floating resolves
against the viewport. Both states stay mounted, so dock<->float no longer
remounts the editor (the portal toggle did).
2026-06-22 13:41:53 -05:00
Brooklyn Nicholson
79f270f549 fix(desktop): portal floating composer to body so it can't be clipped off-screen
The popped-out composer is position:fixed, but the chat content wrapper sets
`contain: layout paint`, which makes it a containing block for — and clips —
fixed descendants. Inline, the floating composer was positioned/clipped relative
to the chat column (which shifts with the sidebars), not the viewport, so the
viewport-based bounds clamp from #50466 couldn't keep it reachable: users still
lost it off-screen. Portal it to <body> when popped out so fixed positioning and
the clamp finally share the viewport as their reference. Docked stays inline
(it's absolute within the chat column by design).
2026-06-22 13:37:31 -05:00
kshitij
5937b95192
Merge pull request #50773 from NousResearch/salvage/43719-dashboard-plugin-rce
fix(security): restrict dashboard plugin backend auto-import to bundled plugins — defense-in-depth (#43719)
2026-06-22 22:57:33 +05:30
kshitijk4poor
e2bea0abe6 refactor(security): centralize non-bundled plugin sources in one constant
/simplify-code (LOW, flagged by two reviewers): the source tags 'user' /
'project' / 'bundled' were bare string literals scattered across the discovery
scrub and the two mount-time refuse guards. A typo in any one site (e.g.
'users') would SILENTLY disable a security gate with no error — the exact
failure mode this RCE boundary must not have.

Introduce a shared module-level _NON_BUNDLED_PLUGIN_SOURCES frozenset referenced
by both the discovery scrub and the (now single) mount guard, so the
auto-import policy lives in one place. The two mount guards collapse into one
gate that still emits the distinct per-source operator message via a map (no
loss of guidance). Behavior unchanged: 39 RCE-bypass tests pass, and the
constant is mutation-checked (typo'ing it fails the bypass tests).

Defence-in-depth (discovery scrub + mount refuse) is retained intentionally.
2026-06-22 22:48:37 +05:30
Teknium
f1e6d39a74
feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842)
* feat(computer_use): disable cua-driver telemetry by default, add opt-in

cua-driver ships anonymous PostHog usage telemetry ENABLED by default
upstream (fires cua_driver_install / cua_driver_doctor events to
eu.i.posthog.com). Hermes now disables it for our users unless they
explicitly opt in.

- New config key `computer_use.cua_telemetry` (default false) in
  DEFAULT_CONFIG.
- `cua_backend.cua_driver_child_env()` injects
  `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is
  disabled (the default); leaves the var untouched on opt-in so the driver
  uses its own default. Reads config fail-safe — any error defaults to
  telemetry off.
- Routed every cua-driver spawn site through the policy: MCP backend
  (StdioServerParameters env), `cua_driver_update_check`, doctor's
  health_report Popen, the install.sh/install.ps1 runner, and the
  `--version` / status probes.
- Docs: new Telemetry subsection in computer-use.md (EN).
- Tests: tests/computer_use/test_cua_telemetry.py — default disables,
  explicit-false disables, opt-in leaves var untouched, config-failure
  fails safe, inherited-enabled is overridden off.

Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with
the var=0 the driver reports "telemetry: disabled via
CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs
"sending event: cua_driver_doctor". 213 computer_use + install tests green.

* fix(dashboard): fold computer_use config category into agent tab

The new computer_use.cua_telemetry key created a single-field dashboard
config category, tripping test_no_single_field_categories (web_server's
invariant that categories with <2 fields must be merged to avoid tab
sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the
existing onboarding/telegram single-field folds.
2026-06-22 09:57:16 -07:00
Teknium
ed711e1c2c chore: add iaji to AUTHOR_MAP for salvaged Slack mention_patterns fix 2026-06-22 09:44:52 -07:00
iaji
441bd6d8db fix(slack): split csv mention pattern fallback 2026-06-22 09:44:52 -07:00
devorun
4966268764 fix(slack): honor documented mention_patterns wake words
The Slack docs document `slack.mention_patterns` as custom wake words that
trigger the bot alongside `@mention`, and the config layer bridges the key into
the Slack adapter's `config.extra` — but the adapter never read it. With
`require_mention` on, a channel message containing a configured wake word (and
no literal `<@BOTUID>`) was silently ignored. Every other adapter that
documents `mention_patterns` (Telegram, DingTalk, Mattermost, WhatsApp,
BlueBubbles, Photon) implements it; Slack was the odd one out.

Add `_slack_mention_patterns()` (compiled, cached; reads `slack.mention_patterns`
as a list/string or `SLACK_MENTION_PATTERNS` as a JSON/CSV/newline list, invalid
regexes warned and skipped) and `_slack_message_matches_mention_patterns()`,
mirroring the existing adapters. Channel mention detection now also triggers on
a wake-word match, so the documented field works as described.

Adds tests for pattern compilation (list/string/env/invalid-regex) and for the
channel-trigger gating with a wake word under require_mention.
2026-06-22 09:44:52 -07:00
Teknium
2617946397
fix(delegation): emit high-concurrency cost warning once per process (#50848)
* chore: re-trigger CI (workflows did not dispatch on prior head)

* fix(delegation): emit high-concurrency cost warning once per process

_get_max_concurrent_children() runs on every get_definitions() schema
rebuild (via _build_top_level_description / _build_tasks_param_description),
not just on actual delegate_task calls. With max_concurrent_children>10 the
cost advisory fired on every turn / agent spawn across every session, spamming
the log even when delegate_task was never used. Gate it behind a module-level
_HIGH_CONCURRENCY_WARNED flag so it warns at most once per process.
2026-06-22 09:44:30 -07:00
Teknium
b1b20270c4 refactor(memory): move write-mirror gating behind MemoryManager interface
The success/staged gating and op-expansion for mirroring built-in memory
writes to external providers lived in a standalone agent/memory_write_bridge.py
helper called inline from two core call sites (tool_executor.py,
agent_runtime_helpers.py). That left the mirror decision-making in the agent
loop, outside the memory-provider interface.

Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the
loop now hands over the raw tool result + args and a metadata callback, and the
manager decides whether/what to mirror. Both core call sites collapse to a
single call; the orphan module is removed. No MemoryProvider ABC change.

Tests rewritten as behavior tests against the manager method.
2026-06-22 07:00:42 -07:00
Hao Zhe
027cb649ef fix(memory): fail closed on unclear write results 2026-06-22 07:00:42 -07:00
Hao Zhe
c7e0501e9b fix(openviking): drain memory mirror workers on shutdown 2026-06-22 07:00:42 -07:00
Hao Zhe
70e7132e2f fix(openviking): gate memory writes and add viking_forget
Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove.

Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.
2026-06-22 07:00:42 -07:00
teknium1
38c56a1e86 fix(computer_use): probe cua-driver-rs release tag, not monorepo releases/latest
The install pre-flight asset probe queried trycua/cua's `releases/latest`,
which floats across the monorepo's components (agent-*, computer-*, lume-*,
train-*) — most ship zero binary assets. So the probe false-negatived and
hard-blocked `install_cua_driver` (line 770: `if not probe: return False`)
BEFORE the upstream installer ran, on Linux, Windows, and Intel macOS — even
though the installer it gates resolves the right tag and would have succeeded.

Net effect: the normal enable path (`hermes tools` → Computer Use post-setup,
and `hermes computer-use install`) refused to install on every platform this
PR claims to support.

Fix: list `/releases?per_page=100`, pick the newest `cua-driver-rs-v*` tag,
and match its assets on OS-token + arch — mirroring what the upstream
`install.sh` already does. Fail open if no driver release surfaces (installer
remains the source of truth). Adds an OS-token gate so a darwin asset can't
satisfy a Linux probe.

Tests: updated the install-probe fixtures to the list-of-releases shape with
`cua-driver-rs-v*` tags + OS-token asset names; added a regression guard
(`test_releases_latest_tag_ignored_picks_driver_rs_tag`) for the monorepo
floating-latest case. 25/25 install + 192 computer_use tests green.

Verified live: probe returns True for all six platform/arch combos against
the real GitHub releases API.
2026-06-22 06:42:30 -07:00
teknium1
e3505c7f73 fix(computer_use): reconcile Linux gate with stale "gated off" comments
The runtime gate (check_computer_use_requirements) and the hermes tools
platform_gate both enable linux alongside darwin/win32, but several
docstrings/comments still described Linux as "alpha, gated off until it
flips upstream" — contradicting the code that ships it. Bring the prose in
line with the gate that's actually live:

- tool.py / cua_backend.py module docstrings: Linux is enabled (X11 today,
  Wayland via XWayland), not gated off.
- toolsets.py description and hermes tools display name: (macOS/Windows) ->
  (macOS/Windows/Linux).

No behavior change — the gate already allowed all three platforms.
2026-06-22 06:42:30 -07:00
Francesco Bonacci
f2e37549c6 feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux)
Make the computer_use toolset platform-agnostic by driving cua-driver on
macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces
(capability discovery, structuredContent AX tree, opaque element_token,
click button enum, explicit mimeType, machine-readable manifest,
structured list_windows, structured health_report), each degrading
gracefully on older drivers.

Adds `hermes computer-use doctor` (drives cua-driver health_report with a
per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full
typed wrappers for the previously-uncovered cua-driver tools plus a generic
call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware
system-prompt guidance (host-deterministic, cache-safe), and honors
HERMES_CUA_DRIVER_CMD end-to-end.

Replaces the macOS-only skills/apple/macos-computer-use skill with a
cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans
docs.

Supersedes #44221 (Windows-enablement salvage of #30660).

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-22 06:42:30 -07:00
Teknium
17dfc6bec4
fix(desktop): set AppUserModelID on Windows so notifications fire (#50808)
Windows toast notifications silently no-op unless the app sets an
AppUserModelID — new Notification().show() returns without error and
nothing appears. The desktop's native-notification system (approval,
turn-done, input, etc.) was therefore dead on Windows while working on
macOS/Linux.

Set the AUMID to the build appId (com.nousresearch.hermes) on Windows
right after app.setName, so toasts route to the installed Start Menu
shortcut. No-op on macOS/Linux, which don't require it.
2026-06-22 06:31:39 -07:00
Teknium
ff85af3fc7
feat(goals): /goal wait <pid> — park the loop on a background process (#50503)
* feat(goals): add /goal wait <pid> barrier to park the loop on a background process

The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.

/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.

Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.

* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)

The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.

* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait

Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
  {"verdict":"wait","wait_on_pid":N}      → park until that process exits
  {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.

* test(goals): update kanban _fake_judge to the 4-tuple judge contract

CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.

* feat(goals): trigger-based wait — park on a process's own signal, not just exit

Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.

Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.

Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
2026-06-22 06:27:29 -07:00
teknium1
d4fa2db1c5 fix(desktop): show all of a provider's models when searching the composer picker
The composer model picker capped each provider's search matches at 12
(PER_PROVIDER_SEARCH). A provider serving more than 12 models (e.g.
opencode-go with 19) showed only a truncated subset when the user typed
its name to find it — exactly the models they were searching for got
cut. Edit Models showed the full list because it never applied this cap.

A search is already a narrowing action, so capping a single provider's
own matches is wrong. Remove the slice; search now lists every matching
model for the provider. The no-search default still shows the curated
top-N per provider via the visibility set.

Follow-up to #47077 (the backend dedup fix); this closes the remaining
frontend truncation users saw in the composer.
2026-06-22 06:23:49 -07:00
teknium1
a6ce9b2fbb fix(picker): keep flat-namespace reseller first-party models in desktop picker
OpenCode Go (and OpenCode Zen) showed only a subset of the models they
serve in the desktop/CLI model picker — e.g. opencode-go rendered 13 of
19, silently dropping minimax-m3/m2.7/m2.5, glm-5/5.1, deepseek-v4-flash.

Root cause: the picker dedup in build_models_payload strips any model
from an aggregator row that overlaps a user-defined provider's catalog
(so a local proxy isn't shadowed by OpenRouter). It gated on
is_aggregator(), which is True for opencode-go/zen because their flat
/v1/models returns bare IDs the model-switch resolver searches. But
those are flat-namespace RESELLERS, not routing aggregators — every
model they list is first-party, so deduping them against a user proxy
that happens to serve a same-named model guts their own catalog.

Fix: add is_routing_aggregator() (True only for true routers like
OpenRouter and custom:* proxies; False for opencode-go/zen) and gate the
picker dedup on it. is_aggregator() is unchanged so model-switch flat
catalog resolution keeps working. Both desktop entry points
(model.options JSON-RPC and /api/model/options REST) and hermes model
share build_models_payload, so all surfaces get the full list.

Fixes #47077
2026-06-22 06:09:08 -07:00
Teknium
ef6492b648
fix(gateway): cold-start installed Windows gateway after update when none was running (#50804)
The post-update gateway resume path (`_resume_windows_gateways_after_update`)
only relaunched gateways that were *running* when the update began — it
enumerates live PIDs in `_pause_windows_gateways_for_update` and respawns
exactly those. A gateway that had already died between updates (e.g. it was
launched attached to a terminal/TUI that later closed, taking the child with
it) was never brought back: the Startup-folder / Scheduled-Task autostart
entry only fires on the next login, not after an in-place update.

So a Desktop-GUI update (which runs `hermes update --yes --gateway`) on a box
whose gateway had quietly died would complete with no gateway running, and the
user had no indication anything should have come up.

Fix: when no gateway is running at pause time but an autostart entry is
installed (`gateway_windows.is_installed()` — an explicit "I want a gateway"
signal), return a `cold_start_if_installed` token. The resume step then does a
fresh detached spawn via `gateway_windows._spawn_detached()` — the same
windowless `pythonw` + `CREATE_BREAKAWAY_FROM_JOB` path `hermes gateway start`
uses. It re-checks liveness immediately before spawning so a concurrent start
(autostart entry firing) can't produce a duplicate.

Gateway-less users (no autostart entry) get nothing forced on them — the
pause step still returns None for them. POSIX is unaffected: enabled systemd
units already restart via `Restart=always`.

Windows-only; best-effort throughout (logs at debug and no-ops on any error).

Tests: pause returns the cold-start token only when installed, returns None
when not installed, resume cold-starts on the token, and resume skips the
cold-start when a gateway is already running.
2026-06-22 06:02:31 -07:00
teknium
da498ed99b chore(release): map ScotterMonk for PR #50145 salvage 2026-06-22 05:41:22 -07:00
teknium
e9cd8c5bf3 fix(delivery): drop env-var knob, flag all chunking adapters
Follow-up to ScotterMonk's cron-truncation fix:

- Remove HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var. Behavioral config
  belongs in config.yaml, not a new HERMES_* env var (.env is secrets
  only). The actual bug is fixed entirely by the adapter-aware skip; the
  configurable cap was unneeded scope. MAX_PLATFORM_OUTPUT is a constant
  again, collapsing the max_output=0 disable branch and the
  audit-vs-truncation threshold divergence.
- Flag the remaining verified-chunking adapters (slack, matrix, feishu,
  mattermost, teams, whatsapp, whatsapp_cloud, weixin, bluebubbles,
  yuanbao) with splits_long_messages=True so the fix covers the whole
  bug class, not just Discord/Telegram. Each verified to chunk in its
  own send() via truncate_message().
- SMS deliberately left False: it chunks for normal replies but a
  multi-segment cron blast is cost-bearing; the 4000-cap + file save is
  the safer default there.
- Update tests: drop the two env-override tests, add a test asserting a
  save failure during truncation (non-chunking) propagates.
2026-06-22 05:41:22 -07:00
ScotterMonk
86e4521cb1 fix(delivery): make cron output truncation configurable + adapter-aware
Gateway-level truncation (MAX_PLATFORM_OUTPUT=4000) was pre-empting
adapter-side message splitting. Discord and Telegram both chunk long
content natively in their send() via truncate_message(), but the
delivery router truncated to 3800 chars + footer before the adapter
ever saw the full payload — so long cron output was cut short instead
of being delivered as multiple messages (issue #50126).

Changes:
- HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var makes the cap configurable
  (default 4000, backward compatible). Set to 0 to disable truncation.
- TRUNCATED_VISIBLE (3800) removed — visible portion now derived
  dynamically from max_output minus the actual footer length.
- New BasePlatformAdapter.splits_long_messages capability flag (default
  False). Adapters that chunk in send() set True; delivery skips
  truncation for them but still saves full output to disk as audit.
- Flagged Discord and Telegram (both verified to chunk in send()).

Fixes #50126
2026-06-22 05:41:22 -07:00
Teknium
eecb5b9dd1
fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind') (#50784)
* chore: re-trigger CI (workflows did not dispatch on prior head)

* fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind')

Installer checkouts are shallow (git clone --depth 1). The CLI banner and
hermes update --check both did a plain git fetch (silently unshallowing the
repo) then git rev-list --count HEAD..origin/main, which counts across the
shallow boundary and prints a huge nonsense number like '12492 commits behind'.

Detect shallow up front, fetch with --depth 1 to preserve the boundary, and
compare tip SHAs instead of counting:
- banner _check_via_local_git: returns UPDATE_AVAILABLE_NO_COUNT when behind
  (renders as 'update available') instead of the bogus count.
- _cmd_update_check: reports presence-only on shallow clones.
Full clones keep the exact count path unchanged. Mirrors the desktop fix in
apps/desktop/electron/main.cjs (commit 2950c6fa2).
2026-06-22 05:39:11 -07:00
Kartik
2e779d11a0
feat(mem0): v3 API, OSS mode, update/delete tools, telemetry & review fixes (#15624)
* fix: update to version 3 endpoints and adding update and delete tool

* chore: removing the test md file

* fix: prevent circuit breaker on client errors in Mem0 provider

* chore: add telemetry for platform version

* feat: add OSS mode support to Mem0 memory provider

* chore: bump mem0ai dependency to >=2.0.1 in memory plugin

* refactor: enhance dependency checks and embedder config in mem0 backend

* refactor: adjust fact storage message for OSS mode

* refactor: expand user paths, add collection recreation on dimension change for Qdrant

* fix(mem0): make MEM0_USER_ID override gateway-native ids and tag writes with channel

When MEM0_USER_ID was configured (env or mem0.json), the gateway-native id
from kwargs (Telegram numeric id, Discord snowflake, ...) still won, so the
same human ended up under different user_ids per channel and memories never
merged across CLI / Telegram / Slack / Discord. Mirrors openclaw's cfg.userId
pattern: configured override wins, gateway-native id is the fallback.

The legacy "hermes-user" placeholder default written by the setup wizard is
treated as unset to avoid silently bucketing every gateway user together.

Also tag every write with metadata.channel (cli/telegram/discord/...) so the
dashboard can offer per-channel filtered views without coupling identity to
the channel; document the read/write filter asymmetry as intentional
(reads scope to user_id only for cross-agent recall).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: improve Mem0 memory provider backend, pagination, config, and error handling

* refactor: update mem0 telemetry code, docs, and bump version

* fix(mem0): make get_config_schema() return unified schema with mode-aware required flag

Schema always includes api_key field so picker shows "API key / local" for
both modes. In OSS mode api_key.required=False so status won't mislead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: improve mem0 telemetry, add env var key and OSS mode detection

* chore: bump mem0ai lower bound to 2.0.4 (latest SDK release)

* refactor: set telemetry sample rate to 1.0 and update docs for opt‑out

* fix(mem0): resolve 15 correctness, thread-safety, and resource bugs

Thread safety:
- Protect circuit breaker counters with _breaker_lock (race between
  prefetch/sync daemon threads and main thread)
- Wrap sync_turn thread creation in _sync_lock; skip if previous sync
  is still alive after 5 s join to prevent duplicate memory ingestion
- Guard _schedule_flush timer creation under _queue_lock (TOCTOU race)
- Capture local `backend` reference in prefetch/sync closures so
  shutdown() nulling self._backend cannot crash in-flight threads

Correctness:
- Fix bool("false")==True for rerank param; parse string values explicitly
- Guard page/top_k with max(1,...) and move int() inside try blocks
- Fix fact_count=0 always in OSS mode (Memory.add returns list, not dict)
- Fix prefetch() not clearing result when thread still alive after timeout
- Fix atexit.register accumulating on repeated initialize() calls

Backend / setup:
- Handle Qdrant named-vector collections in _recreate_collection_if_dims_changed
  (vectors is a dict; .size access raised AttributeError, swallowed silently)
- Wrap QdrantClient and psycopg2 conn/cursor in try/finally to prevent leaks
- Resolve ollama_bin at top of _ensure_ollama; use it for ollama pull
- Fix embedder key lookup when LLM provider has no env_var (e.g. ollama)

Also: remove _telemetry_enabled cache (env var check is cheap), bump
required mem0ai to >=2.0.7, minor README wording fix.

* fix(mem0): fix brittle qdrant path test + add telemetry sample-rate docs

- Replace generator-throw lambda with a proper def in
  test_qdrant_path_not_writable; use tmp_path instead of a hardcoded
  /nonexistent path so the test is root-safe
- Add MEM0_TELEMETRY_SAMPLE_RATE to memory-providers.md (was only
  in the plugin README, not the user-guide docs)

* revert: remove MEM0_TELEMETRY_SAMPLE_RATE from user-guide docs

* refactor: remove telemetry from mem0 plugin and update documentation

* fix(mem0): set stdin=DEVNULL on setup subprocess calls

The TUI stdin guard (scripts/check_subprocess_stdin.py) requires every
subprocess call in plugin code to set stdin= so it can't inherit the
gateway's JSON-RPC stdin fd. Muzzle the docker/ollama calls in the OSS
setup wizard with stdin=subprocess.DEVNULL (none need interactive input).
Also covers the docker-inspect call the linter's regex misses.

---------

Co-authored-by: chaithanyak42 <chaithanya.kumar42a@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-22 12:30:47 +00:00
Eugeniusz Gilewski
8845f3316c fix(security): restrict dashboard plugin backend import to bundled plugins (#43719)
Defense-in-depth for the dashboard plugin auto-import path. The web server
auto-imports and mounts the Python backend (dashboard/manifest.json -> api file)
of plugins found in ~/.hermes/plugins/ (user) and ./.hermes/plugins/ (project),
not just bundled plugins. So any plugin that reaches one of those dirs gets
arbitrary Python executed on the next dashboard start.

NOTE ON THREAT MODEL: #43719's originally-documented delivery chain (a public
--insecure dashboard + open API used to git clone a malicious repo into
~/.hermes/plugins/) is ALREADY mitigated on main — since the June 2026
hermes-0day hardening, a non-loopback bind ALWAYS requires an auth provider and
--insecure no longer bypasses the auth gate. This change is therefore NOT
closing that (now-authenticated) network path; it removes the residual
'arbitrary code executes merely because a plugin is on disk' hazard, which still
applies when a plugin arrives by other means: a socially-engineered git clone,
a supply-chain drop, an authenticated-but-malicious actor, or a future
regression in the auth gate. Untrusted on-disk code should not auto-execute.

Restrict dashboard backend Python auto-import to BUNDLED plugins only. User and
project plugins may still extend the dashboard UI via static JS/CSS, but their
api Python file is never auto-imported. Two layers: _discover_dashboard_plugins
scrubs api/_api_file for user/project sources (and bundled wins name conflicts
so a non-bundled plugin cannot shadow a trusted backend route);
_mount_plugin_api_routes re-refuses user/project at mount time. Tightens the
prior GHSA-5qr3-c538-wm9j / #29156 hardening (bundled+user) to bundled-only.

Salvaged from #44472 (@egilewski) onto current main.
2026-06-22 17:51:37 +05:30