Commit graph

6006 commits

Author SHA1 Message Date
kshitij
5937b95192
Merge pull request #50773 from NousResearch/salvage/43719-dashboard-plugin-rce
fix(security): restrict dashboard plugin backend auto-import to bundled plugins — defense-in-depth (#43719)
2026-06-22 22:57:33 +05:30
Teknium
f1e6d39a74
feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842)
* feat(computer_use): disable cua-driver telemetry by default, add opt-in

cua-driver ships anonymous PostHog usage telemetry ENABLED by default
upstream (fires cua_driver_install / cua_driver_doctor events to
eu.i.posthog.com). Hermes now disables it for our users unless they
explicitly opt in.

- New config key `computer_use.cua_telemetry` (default false) in
  DEFAULT_CONFIG.
- `cua_backend.cua_driver_child_env()` injects
  `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is
  disabled (the default); leaves the var untouched on opt-in so the driver
  uses its own default. Reads config fail-safe — any error defaults to
  telemetry off.
- Routed every cua-driver spawn site through the policy: MCP backend
  (StdioServerParameters env), `cua_driver_update_check`, doctor's
  health_report Popen, the install.sh/install.ps1 runner, and the
  `--version` / status probes.
- Docs: new Telemetry subsection in computer-use.md (EN).
- Tests: tests/computer_use/test_cua_telemetry.py — default disables,
  explicit-false disables, opt-in leaves var untouched, config-failure
  fails safe, inherited-enabled is overridden off.

Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with
the var=0 the driver reports "telemetry: disabled via
CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs
"sending event: cua_driver_doctor". 213 computer_use + install tests green.

* fix(dashboard): fold computer_use config category into agent tab

The new computer_use.cua_telemetry key created a single-field dashboard
config category, tripping test_no_single_field_categories (web_server's
invariant that categories with <2 fields must be merged to avoid tab
sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the
existing onboarding/telegram single-field folds.
2026-06-22 09:57:16 -07:00
iaji
441bd6d8db fix(slack): split csv mention pattern fallback 2026-06-22 09:44:52 -07:00
devorun
4966268764 fix(slack): honor documented mention_patterns wake words
The Slack docs document `slack.mention_patterns` as custom wake words that
trigger the bot alongside `@mention`, and the config layer bridges the key into
the Slack adapter's `config.extra` — but the adapter never read it. With
`require_mention` on, a channel message containing a configured wake word (and
no literal `<@BOTUID>`) was silently ignored. Every other adapter that
documents `mention_patterns` (Telegram, DingTalk, Mattermost, WhatsApp,
BlueBubbles, Photon) implements it; Slack was the odd one out.

Add `_slack_mention_patterns()` (compiled, cached; reads `slack.mention_patterns`
as a list/string or `SLACK_MENTION_PATTERNS` as a JSON/CSV/newline list, invalid
regexes warned and skipped) and `_slack_message_matches_mention_patterns()`,
mirroring the existing adapters. Channel mention detection now also triggers on
a wake-word match, so the documented field works as described.

Adds tests for pattern compilation (list/string/env/invalid-regex) and for the
channel-trigger gating with a wake word under require_mention.
2026-06-22 09:44:52 -07:00
Teknium
b1b20270c4 refactor(memory): move write-mirror gating behind MemoryManager interface
The success/staged gating and op-expansion for mirroring built-in memory
writes to external providers lived in a standalone agent/memory_write_bridge.py
helper called inline from two core call sites (tool_executor.py,
agent_runtime_helpers.py). That left the mirror decision-making in the agent
loop, outside the memory-provider interface.

Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the
loop now hands over the raw tool result + args and a metadata callback, and the
manager decides whether/what to mirror. Both core call sites collapse to a
single call; the orphan module is removed. No MemoryProvider ABC change.

Tests rewritten as behavior tests against the manager method.
2026-06-22 07:00:42 -07:00
Hao Zhe
027cb649ef fix(memory): fail closed on unclear write results 2026-06-22 07:00:42 -07:00
Hao Zhe
c7e0501e9b fix(openviking): drain memory mirror workers on shutdown 2026-06-22 07:00:42 -07:00
Hao Zhe
70e7132e2f fix(openviking): gate memory writes and add viking_forget
Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove.

Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.
2026-06-22 07:00:42 -07:00
teknium1
38c56a1e86 fix(computer_use): probe cua-driver-rs release tag, not monorepo releases/latest
The install pre-flight asset probe queried trycua/cua's `releases/latest`,
which floats across the monorepo's components (agent-*, computer-*, lume-*,
train-*) — most ship zero binary assets. So the probe false-negatived and
hard-blocked `install_cua_driver` (line 770: `if not probe: return False`)
BEFORE the upstream installer ran, on Linux, Windows, and Intel macOS — even
though the installer it gates resolves the right tag and would have succeeded.

Net effect: the normal enable path (`hermes tools` → Computer Use post-setup,
and `hermes computer-use install`) refused to install on every platform this
PR claims to support.

Fix: list `/releases?per_page=100`, pick the newest `cua-driver-rs-v*` tag,
and match its assets on OS-token + arch — mirroring what the upstream
`install.sh` already does. Fail open if no driver release surfaces (installer
remains the source of truth). Adds an OS-token gate so a darwin asset can't
satisfy a Linux probe.

Tests: updated the install-probe fixtures to the list-of-releases shape with
`cua-driver-rs-v*` tags + OS-token asset names; added a regression guard
(`test_releases_latest_tag_ignored_picks_driver_rs_tag`) for the monorepo
floating-latest case. 25/25 install + 192 computer_use tests green.

Verified live: probe returns True for all six platform/arch combos against
the real GitHub releases API.
2026-06-22 06:42:30 -07:00
Francesco Bonacci
f2e37549c6 feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux)
Make the computer_use toolset platform-agnostic by driving cua-driver on
macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces
(capability discovery, structuredContent AX tree, opaque element_token,
click button enum, explicit mimeType, machine-readable manifest,
structured list_windows, structured health_report), each degrading
gracefully on older drivers.

Adds `hermes computer-use doctor` (drives cua-driver health_report with a
per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full
typed wrappers for the previously-uncovered cua-driver tools plus a generic
call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware
system-prompt guidance (host-deterministic, cache-safe), and honors
HERMES_CUA_DRIVER_CMD end-to-end.

Replaces the macOS-only skills/apple/macos-computer-use skill with a
cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans
docs.

Supersedes #44221 (Windows-enablement salvage of #30660).

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-22 06:42:30 -07:00
Teknium
ff85af3fc7
feat(goals): /goal wait <pid> — park the loop on a background process (#50503)
* feat(goals): add /goal wait <pid> barrier to park the loop on a background process

The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.

/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.

Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.

* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)

The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.

* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait

Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
  {"verdict":"wait","wait_on_pid":N}      → park until that process exits
  {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.

* test(goals): update kanban _fake_judge to the 4-tuple judge contract

CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.

* feat(goals): trigger-based wait — park on a process's own signal, not just exit

Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.

Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.

Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
2026-06-22 06:27:29 -07:00
teknium1
a6ce9b2fbb fix(picker): keep flat-namespace reseller first-party models in desktop picker
OpenCode Go (and OpenCode Zen) showed only a subset of the models they
serve in the desktop/CLI model picker — e.g. opencode-go rendered 13 of
19, silently dropping minimax-m3/m2.7/m2.5, glm-5/5.1, deepseek-v4-flash.

Root cause: the picker dedup in build_models_payload strips any model
from an aggregator row that overlaps a user-defined provider's catalog
(so a local proxy isn't shadowed by OpenRouter). It gated on
is_aggregator(), which is True for opencode-go/zen because their flat
/v1/models returns bare IDs the model-switch resolver searches. But
those are flat-namespace RESELLERS, not routing aggregators — every
model they list is first-party, so deduping them against a user proxy
that happens to serve a same-named model guts their own catalog.

Fix: add is_routing_aggregator() (True only for true routers like
OpenRouter and custom:* proxies; False for opencode-go/zen) and gate the
picker dedup on it. is_aggregator() is unchanged so model-switch flat
catalog resolution keeps working. Both desktop entry points
(model.options JSON-RPC and /api/model/options REST) and hermes model
share build_models_payload, so all surfaces get the full list.

Fixes #47077
2026-06-22 06:09:08 -07:00
Teknium
ef6492b648
fix(gateway): cold-start installed Windows gateway after update when none was running (#50804)
The post-update gateway resume path (`_resume_windows_gateways_after_update`)
only relaunched gateways that were *running* when the update began — it
enumerates live PIDs in `_pause_windows_gateways_for_update` and respawns
exactly those. A gateway that had already died between updates (e.g. it was
launched attached to a terminal/TUI that later closed, taking the child with
it) was never brought back: the Startup-folder / Scheduled-Task autostart
entry only fires on the next login, not after an in-place update.

So a Desktop-GUI update (which runs `hermes update --yes --gateway`) on a box
whose gateway had quietly died would complete with no gateway running, and the
user had no indication anything should have come up.

Fix: when no gateway is running at pause time but an autostart entry is
installed (`gateway_windows.is_installed()` — an explicit "I want a gateway"
signal), return a `cold_start_if_installed` token. The resume step then does a
fresh detached spawn via `gateway_windows._spawn_detached()` — the same
windowless `pythonw` + `CREATE_BREAKAWAY_FROM_JOB` path `hermes gateway start`
uses. It re-checks liveness immediately before spawning so a concurrent start
(autostart entry firing) can't produce a duplicate.

Gateway-less users (no autostart entry) get nothing forced on them — the
pause step still returns None for them. POSIX is unaffected: enabled systemd
units already restart via `Restart=always`.

Windows-only; best-effort throughout (logs at debug and no-ops on any error).

Tests: pause returns the cold-start token only when installed, returns None
when not installed, resume cold-starts on the token, and resume skips the
cold-start when a gateway is already running.
2026-06-22 06:02:31 -07:00
teknium
e9cd8c5bf3 fix(delivery): drop env-var knob, flag all chunking adapters
Follow-up to ScotterMonk's cron-truncation fix:

- Remove HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var. Behavioral config
  belongs in config.yaml, not a new HERMES_* env var (.env is secrets
  only). The actual bug is fixed entirely by the adapter-aware skip; the
  configurable cap was unneeded scope. MAX_PLATFORM_OUTPUT is a constant
  again, collapsing the max_output=0 disable branch and the
  audit-vs-truncation threshold divergence.
- Flag the remaining verified-chunking adapters (slack, matrix, feishu,
  mattermost, teams, whatsapp, whatsapp_cloud, weixin, bluebubbles,
  yuanbao) with splits_long_messages=True so the fix covers the whole
  bug class, not just Discord/Telegram. Each verified to chunk in its
  own send() via truncate_message().
- SMS deliberately left False: it chunks for normal replies but a
  multi-segment cron blast is cost-bearing; the 4000-cap + file save is
  the safer default there.
- Update tests: drop the two env-override tests, add a test asserting a
  save failure during truncation (non-chunking) propagates.
2026-06-22 05:41:22 -07:00
ScotterMonk
86e4521cb1 fix(delivery): make cron output truncation configurable + adapter-aware
Gateway-level truncation (MAX_PLATFORM_OUTPUT=4000) was pre-empting
adapter-side message splitting. Discord and Telegram both chunk long
content natively in their send() via truncate_message(), but the
delivery router truncated to 3800 chars + footer before the adapter
ever saw the full payload — so long cron output was cut short instead
of being delivered as multiple messages (issue #50126).

Changes:
- HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var makes the cap configurable
  (default 4000, backward compatible). Set to 0 to disable truncation.
- TRUNCATED_VISIBLE (3800) removed — visible portion now derived
  dynamically from max_output minus the actual footer length.
- New BasePlatformAdapter.splits_long_messages capability flag (default
  False). Adapters that chunk in send() set True; delivery skips
  truncation for them but still saves full output to disk as audit.
- Flagged Discord and Telegram (both verified to chunk in send()).

Fixes #50126
2026-06-22 05:41:22 -07:00
Teknium
eecb5b9dd1
fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind') (#50784)
* chore: re-trigger CI (workflows did not dispatch on prior head)

* fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind')

Installer checkouts are shallow (git clone --depth 1). The CLI banner and
hermes update --check both did a plain git fetch (silently unshallowing the
repo) then git rev-list --count HEAD..origin/main, which counts across the
shallow boundary and prints a huge nonsense number like '12492 commits behind'.

Detect shallow up front, fetch with --depth 1 to preserve the boundary, and
compare tip SHAs instead of counting:
- banner _check_via_local_git: returns UPDATE_AVAILABLE_NO_COUNT when behind
  (renders as 'update available') instead of the bogus count.
- _cmd_update_check: reports presence-only on shallow clones.
Full clones keep the exact count path unchanged. Mirrors the desktop fix in
apps/desktop/electron/main.cjs (commit 2950c6fa2).
2026-06-22 05:39:11 -07:00
Kartik
2e779d11a0
feat(mem0): v3 API, OSS mode, update/delete tools, telemetry & review fixes (#15624)
* fix: update to version 3 endpoints and adding update and delete tool

* chore: removing the test md file

* fix: prevent circuit breaker on client errors in Mem0 provider

* chore: add telemetry for platform version

* feat: add OSS mode support to Mem0 memory provider

* chore: bump mem0ai dependency to >=2.0.1 in memory plugin

* refactor: enhance dependency checks and embedder config in mem0 backend

* refactor: adjust fact storage message for OSS mode

* refactor: expand user paths, add collection recreation on dimension change for Qdrant

* fix(mem0): make MEM0_USER_ID override gateway-native ids and tag writes with channel

When MEM0_USER_ID was configured (env or mem0.json), the gateway-native id
from kwargs (Telegram numeric id, Discord snowflake, ...) still won, so the
same human ended up under different user_ids per channel and memories never
merged across CLI / Telegram / Slack / Discord. Mirrors openclaw's cfg.userId
pattern: configured override wins, gateway-native id is the fallback.

The legacy "hermes-user" placeholder default written by the setup wizard is
treated as unset to avoid silently bucketing every gateway user together.

Also tag every write with metadata.channel (cli/telegram/discord/...) so the
dashboard can offer per-channel filtered views without coupling identity to
the channel; document the read/write filter asymmetry as intentional
(reads scope to user_id only for cross-agent recall).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: improve Mem0 memory provider backend, pagination, config, and error handling

* refactor: update mem0 telemetry code, docs, and bump version

* fix(mem0): make get_config_schema() return unified schema with mode-aware required flag

Schema always includes api_key field so picker shows "API key / local" for
both modes. In OSS mode api_key.required=False so status won't mislead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: improve mem0 telemetry, add env var key and OSS mode detection

* chore: bump mem0ai lower bound to 2.0.4 (latest SDK release)

* refactor: set telemetry sample rate to 1.0 and update docs for opt‑out

* fix(mem0): resolve 15 correctness, thread-safety, and resource bugs

Thread safety:
- Protect circuit breaker counters with _breaker_lock (race between
  prefetch/sync daemon threads and main thread)
- Wrap sync_turn thread creation in _sync_lock; skip if previous sync
  is still alive after 5 s join to prevent duplicate memory ingestion
- Guard _schedule_flush timer creation under _queue_lock (TOCTOU race)
- Capture local `backend` reference in prefetch/sync closures so
  shutdown() nulling self._backend cannot crash in-flight threads

Correctness:
- Fix bool("false")==True for rerank param; parse string values explicitly
- Guard page/top_k with max(1,...) and move int() inside try blocks
- Fix fact_count=0 always in OSS mode (Memory.add returns list, not dict)
- Fix prefetch() not clearing result when thread still alive after timeout
- Fix atexit.register accumulating on repeated initialize() calls

Backend / setup:
- Handle Qdrant named-vector collections in _recreate_collection_if_dims_changed
  (vectors is a dict; .size access raised AttributeError, swallowed silently)
- Wrap QdrantClient and psycopg2 conn/cursor in try/finally to prevent leaks
- Resolve ollama_bin at top of _ensure_ollama; use it for ollama pull
- Fix embedder key lookup when LLM provider has no env_var (e.g. ollama)

Also: remove _telemetry_enabled cache (env var check is cheap), bump
required mem0ai to >=2.0.7, minor README wording fix.

* fix(mem0): fix brittle qdrant path test + add telemetry sample-rate docs

- Replace generator-throw lambda with a proper def in
  test_qdrant_path_not_writable; use tmp_path instead of a hardcoded
  /nonexistent path so the test is root-safe
- Add MEM0_TELEMETRY_SAMPLE_RATE to memory-providers.md (was only
  in the plugin README, not the user-guide docs)

* revert: remove MEM0_TELEMETRY_SAMPLE_RATE from user-guide docs

* refactor: remove telemetry from mem0 plugin and update documentation

* fix(mem0): set stdin=DEVNULL on setup subprocess calls

The TUI stdin guard (scripts/check_subprocess_stdin.py) requires every
subprocess call in plugin code to set stdin= so it can't inherit the
gateway's JSON-RPC stdin fd. Muzzle the docker/ollama calls in the OSS
setup wizard with stdin=subprocess.DEVNULL (none need interactive input).
Also covers the docker-inspect call the linter's regex misses.

---------

Co-authored-by: chaithanyak42 <chaithanya.kumar42a@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-22 12:30:47 +00:00
Eugeniusz Gilewski
8845f3316c fix(security): restrict dashboard plugin backend import to bundled plugins (#43719)
Defense-in-depth for the dashboard plugin auto-import path. The web server
auto-imports and mounts the Python backend (dashboard/manifest.json -> api file)
of plugins found in ~/.hermes/plugins/ (user) and ./.hermes/plugins/ (project),
not just bundled plugins. So any plugin that reaches one of those dirs gets
arbitrary Python executed on the next dashboard start.

NOTE ON THREAT MODEL: #43719's originally-documented delivery chain (a public
--insecure dashboard + open API used to git clone a malicious repo into
~/.hermes/plugins/) is ALREADY mitigated on main — since the June 2026
hermes-0day hardening, a non-loopback bind ALWAYS requires an auth provider and
--insecure no longer bypasses the auth gate. This change is therefore NOT
closing that (now-authenticated) network path; it removes the residual
'arbitrary code executes merely because a plugin is on disk' hazard, which still
applies when a plugin arrives by other means: a socially-engineered git clone,
a supply-chain drop, an authenticated-but-malicious actor, or a future
regression in the auth gate. Untrusted on-disk code should not auto-execute.

Restrict dashboard backend Python auto-import to BUNDLED plugins only. User and
project plugins may still extend the dashboard UI via static JS/CSS, but their
api Python file is never auto-imported. Two layers: _discover_dashboard_plugins
scrubs api/_api_file for user/project sources (and bundled wins name conflicts
so a non-bundled plugin cannot shadow a trusted backend route);
_mount_plugin_api_routes re-refuses user/project at mount time. Tightens the
prior GHSA-5qr3-c538-wm9j / #29156 hardening (bundled+user) to bundled-only.

Salvaged from #44472 (@egilewski) onto current main.
2026-06-22 17:51:37 +05:30
kshitij
a904ff1724
Merge pull request #50781 from NousResearch/salvage/output-token-reservation-threshold
fix(compress): reserve output tokens in the compaction threshold (#23767, #43547)
2026-06-22 17:33:09 +05:30
kshitijk4poor
623b21bf24 fix(compress): reserve output tokens in the compaction threshold (#23767, #43547)
The compaction trigger compared estimated input against context_length *
threshold, but the provider reserves max_tokens of OUTPUT out of the same
window. With a large max_tokens (e.g. 65536 on a custom provider) the usable
input budget is materially smaller than the raw window, so sessions hit a
provider 400 before compaction ever fired.

_compute_threshold_tokens now subtracts the output reservation
(context_length - max_tokens) before applying the percentage and the
small-window 85% guard. max_tokens is stored on the compressor (threaded from
agent.max_tokens at construction) and reused across update_model() switches;
None = provider default = no reservation (full-window behavior, unchanged).

Reimplemented on the current _compute_threshold_tokens surface (the inline
threshold calc the original PR targeted was since refactored for the
small-window #14690 fix); composes with that 85% guard on the effective budget.

Credit: @kyssta-exe (#43651) — original design for the output-token
reservation in the compaction threshold.

Closes #43547.
2026-06-22 17:26:17 +05:30
Ben Barclay
75a70d98f3
feat(relay): forward a stable instance id at self-provision (Phase 6 Unit α) (#50772)
Add relay_instance_id() (env GATEWAY_RELAY_INSTANCE_ID first, then
gateway.relay_instance_id in config.yaml, mirroring the other relay readers) and
forward it in the /relay/provision body so the connector can bind
gatewayId -> instanceId and route inbound per-instance once Phase 6 delivery
lands.

The value is gateway-asserted but safely scoped: the org/tenant stays
NAS-token-verified at the connector, so a dishonest gateway can only bind its
OWN tenant's instance — same posture as relay_endpoint(). instanceId is only
added to the body when present, so omitting it lets the connector store null
(back-compat: self-hosted / pre-Phase-6 gateways simply have no binding yet).

For a managed (NAS-hosted) agent the id is NAS's AgentInstance.id, stamped into
the container env beside GATEWAY_RELAY_URL.

Tests: reader (env/config/absent), self_provision_relay forwards the id (set +
absent), and the real _post_provision body includes instanceId ONLY when set.

Refs: ~/nous/specs/gateway-gateway plan.md Phase 6 Unit α; decisions.md Q11.
2026-06-22 21:46:59 +10:00
kshitij
065946d84f
Merge pull request #50762 from NousResearch/salvage/defer-preflight-after-compaction
fix(agent): defer preflight compaction until real usage after a compaction (#23767, #36718)
2026-06-22 17:10:03 +05:30
kshitij
1f28b1a9b9
fix(gateway): redact credentials from approval prompts before sending to clients (#48456) (#50767)
Tirith redacts its own findings, but the approval-request callbacks built the
operator prompt from the RAW command string, so a credential-shaped value
Tirith flagged was sent verbatim to clients, undoing the redaction one layer up.

Two egress transports carried the leak; both are fixed via a shared
module-level seam _redact_approval_command() (redact_sensitive_text force=True):
  1. chat platforms — _approval_notify_sync (gateway/run.py): redact before
     both the button path (send_exec_approval) and the plain-text /approve
     fallback.
  2. SSE/API stream — _approval_notify (gateway/platforms/api_server.py):
     redact event['command'] before it is enqueued to API/desktop clients.
     (whole-bug-class: sibling call path on a separate transport.)

force=True so the prompt — a hard secret-egress boundary — honors redaction
even when security.redact_secrets is off. Clean commands pass through unchanged.

Tests bind the seam (synthetic credential-format fixtures, force-when-disabled) AND assert
BOTH callbacks ASSIGN the redacted result before the send/enqueue sink, via an
AST contract that rejects a discarded-result call. All mutation-checked.
2026-06-22 11:39:45 +00:00
kshitijk4poor
b2c84a1626 fix(agent): defer preflight compaction until real usage after a compaction (#23767, #36718)
After a compaction, the post-compression path parks last_prompt_tokens=-1 and
sets awaiting_real_usage_after_compression=True, but last_real_prompt_tokens
still holds the stale pre-compression value (above threshold). should_defer_
preflight_to_real_usage() hit the 'last_real_prompt_tokens >= threshold => False'
short-circuit and let preflight fire a SECOND compaction before the provider
reported real post-compaction usage. Add an early-return on the awaiting flag so
deferral holds for exactly one turn; update_from_response() clears it.

The flag-setting half (#36718) already landed on main via the in-place
compaction path (conversation_compression.py); this adds the missing
should_defer guard that consumes it.

Credit:
- @ashishpatel26 (#38133) — diagnosis + the should_defer early-return design
- @Tranquil-Flow (#36769) — same #36718 fix, identical guard placement

Closes #36718.
2026-06-22 16:33:18 +05:30
Basil Al Shukaili
72f75f8456 fix(compressor): count tool_call envelope in tail-budget token estimate (#28053)
The tail-protection budget walks estimated an assistant message's tokens from content + function.arguments only, dropping each tool_call's id, type and function.name (plus JSON structure). Assistant turns that fan out into parallel tool calls were undercounted by 2-15x (a 4-tool-call turn measures ~73 vs ~1,090 real tokens), so the protected tail overshot tail_token_budget and compression ran far below its intended ratio — context kept growing.

Consolidate the three duplicated budget walks (_prune_old_tool_results and the two passes in _find_tail_cut_by_tokens) into a single _estimate_msg_budget_tokens() helper that counts the full tool_call envelope via len(str(tc)), consistent with how _estimate_message_chars estimates message size elsewhere.

Tested on Windows: new tests/agent/test_compressor_tool_call_budget.py plus the existing compression suite (test_context_compressor, compressor_image_tokens, cross_session_guard, infinite_compaction_loop) — 209 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 16:26:56 +05:30
kshitij
0e87c0a41b
Merge pull request #50117 from NousResearch/salvage/f5-cron-mcp-per-job
fix(cron): layer enabled MCP servers onto per-job enabled_toolsets (#23997)
2026-06-22 16:03:54 +05:30
kshitij
aa83213c53
Merge pull request #50740 from NousResearch/salvage/preflight-token-progress
Some checks failed
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run
Typecheck / typecheck (apps/desktop) (push) Waiting to run
Typecheck / typecheck (apps/shared) (push) Waiting to run
Typecheck / typecheck (ui-tui) (push) Waiting to run
Typecheck / typecheck (web) (push) Waiting to run
Typecheck / desktop-build (push) Waiting to run
Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled
Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled
fix(agent): count tokens, not just rows, as preflight compression progress (#23767, #39548)
2026-06-22 15:58:58 +05:30
kshitij
21541ce6e9
Merge pull request #50108 from NousResearch/salvage/f4m1-anthropic-pool
fix(auth): consult credential_pool in resolve_anthropic_token (#26344)
2026-06-22 15:58:01 +05:30
sherman-yang
74a5905aea fix(cron): layer enabled MCP servers onto per-job enabled_toolsets
A cron job that sets `enabled_toolsets` to a list of *native* toolsets (e.g.
`["web", "terminal"]`) silently got ZERO MCP tools, while a job with no
per-job list got every globally-enabled MCP server. `_resolve_cron_enabled_
toolsets` returned the per-job list verbatim, bypassing the MCP-merge that the
platform-fallback branch performs via `_get_platform_tools`. So
`discover_mcp_tools()` registered the MCP tools into the registry, but
`get_tool_definitions(enabled_toolsets=...)` kept only the named native
toolsets — the agent then rejected every `mcp_*` call as "Unknown tool". (R2
of #23997.)

Fix: `_merge_mcp_into_per_job_toolsets` layers MCP membership onto a per-job
allowlist with the SAME semantics as `_get_platform_tools`:
  * `no_mcp` sentinel present -> no MCP servers (sentinel stripped)
  * one or more MCP server names already listed -> treat as an allowlist
  * otherwise -> union in every globally-enabled MCP server

To avoid duplicating the "which MCP servers are enabled" computation (it
already existed inline in `_get_platform_tools`), this extracts a shared
`enabled_mcp_server_names(config)` helper in `hermes_cli.tools_config` and has
BOTH the gateway/CLI platform resolver and the cron per-job resolver call it —
so every path agrees on MCP membership (extend, don't duplicate).

Note: the issue's *headline* — bare MCP server names rejected, registry never
includes them — was already fixed on main (commits c10fea8d2 + 04918345e,
both before the issue was filed). This PR closes the remaining cron-specific
gap (R2). The `server:*` / `mcp:server` alias-notation rejection (R1) and the
quiet-mode silent-drop (R3) are tracked separately.

Salvaged from #32788 by sherman-yang (credited below). Reworked to reuse the
shared `enabled_mcp_server_names` helper instead of re-implementing the MCP
membership set in cron/scheduler.py.

Fixes #23997

Co-authored-by: sherman-yang <58446328+sherman-yang@users.noreply.github.com>
2026-06-22 15:52:58 +05:30
kshitij
b9f302441f
Merge pull request #50112 from NousResearch/salvage/f5-cron-storage-root
fix(cron): anchor cron storage at the default root home (#32091)
2026-06-22 15:51:59 +05:30
kshitijk4poor
69de0360a1 fix(agent): align preflight token-progress floor to 5% (#23767, #39548)
Follow-up to the salvaged preflight token-progress fix: require a material
(>5%) token reduction to count as progress, matching the overflow-handler
retry path (conversation_loop.py, #39550), so a sub-5% wobble can't keep the
3-pass preflight loop spinning. Adds boundary + zero-token regression tests.
2026-06-22 15:51:52 +05:30
kshitij
f509d65336
Merge pull request #50109 from NousResearch/salvage/f5-disabled-bundle-core
fix(tools): preserve core tools when a platform bundle is disabled
2026-06-22 15:51:50 +05:30
kshitij
2649f7360c
Merge pull request #50062 from NousResearch/salvage/cron-missed-grace-runonce
fix(cron): run missed-grace jobs once instead of deferring forever
2026-06-22 15:50:54 +05:30
kshitijk4poor
3545d29422 refactor(auth): drop dead select() fallback in anthropic pool resolver
/simplify-code QUALITY finding: the `if callable(_available_entries): ... else:
pool.select()` ladder was dead for the real CredentialPool type (`_available_entries`
is always a bound method) AND the select() fallback violated the helper's read-only
contract — select() -> _select_unlocked() runs _available_entries(clear_expired=True,
refresh=True), which persists to auth.json and triggers a network refresh. Call
_available_entries(clear_expired=False, refresh=False) directly inside the existing
try/except instead.

Also drops the now-dead `select=` stubs from the 6 pool tests (they only existed to
satisfy the removed fallback branch). Behavior unchanged; 6 pool tests pass and the
read-only / null-token contract tests were mutation-checked (flipping the flags /
removing the None-guard fails the respective test).
2026-06-22 15:50:26 +05:30
JackJin
b08ee8ad04 fix(agent): count tokens, not just rows, as preflight compression progress
Rebased onto god-file Phase 1 refactor — preflight compression has moved
from agent/conversation_loop.py to agent/turn_context.py (no semantic
change in the refactor itself; the bug below was carried over verbatim).

The preflight compression loop in ``turn_context.py`` uses
``len(messages) >= _orig_len`` to decide whether a compression pass has
made progress. That conflates two different conditions: a true no-op
(transcript materially unchanged) and effective token compression that
summarises message contents but keeps the same number of rows. The
second case is misread as "Cannot compress further" — the session then
surfaces ``Context length exceeded`` and auto-resets even when the
post-compression estimate is far below the model context window.

Observed example from #39548: a Telegram session on GPT-5.5 with a 1M
context dropped from ~288k → ~183k tokens (a 36% reduction) while
preserving 220 messages. The loop treats that as exhaustion and the
gateway auto-resets the session.

Fix
---
Add ``_compression_made_progress(orig_len, new_len, orig_tokens, new_tokens)``
and call it after the post-pass ``estimate_request_tokens_rough`` (which
is moved up to run *before* the progress check instead of after it).
Either a row-count reduction OR a token-count reduction now counts as
progress; only when neither moves do we break out as "stuck".

Fixes #39548
2026-06-22 15:49:19 +05:30
kshitij
33efff0d8c
Merge pull request #50726 from NousResearch/salvage/compression-token-progress
fix(agent): count tokens, not just message rows, as compression progress (#23767, #39550)
2026-06-22 15:44:38 +05:30
Ben Barclay
64a507da44
feat(relay): handle passthrough_forward over the WS (Phase 5 §5.1, gateway half) (#50702)
The connector half (gateway-gateway) moves the passthrough plane's post-ACK
forward off the HTTP gatewayEndpoint onto the gateway's outbound /relay WS via
a new passthrough_forward frame. This is the gateway side: the relay adapter
now RECEIVES and handles that frame, so a hosted gateway (no public IP) can
process forwarded Class-2/3 traffic (Discord interactions, Twilio) over the
socket it already holds — closing the "passthrough inbound doesn't work for
hosted gateways" gap.

- ws_transport.py: decode the passthrough_forward frame; PassthroughForward
  dataclass + _passthrough_from_wire (base64 body -> exact bytes, byte parity
  with the connector's toPassthroughForward); set_passthrough_handler mirrors
  set_interrupt_inbound_handler.
- transport.py: PassthroughHandler type + set_passthrough_handler on the
  RelayTransport protocol.
- adapter.py: connect() wires the passthrough handler; _on_passthrough decodes
  the (already-sanitized, token-free) forward and, for a Discord interaction,
  converts it to a MessageEvent routed through the normal agent path
  (handle_message) — the reply egresses over the outbound / token-less
  follow_up path, so the gateway never holds the interaction credential. Never
  raises (a bad forward can't kill the read loop). Non-discord forwards (Twilio)
  are logged + dropped for now.
- docs/relay-connector-contract.md: document the passthrough_forward frame +
  PassthroughForward shape + §3.1.

The interaction -> MessageEvent CONVERSION semantics (slash-command vs button
UX, option rendering) are the open sub-design flagged in the spec; the TRANSPORT
+ receive mechanism (this) is settled per Ben's Gate-2 decision: "the relay
adapter handles receiving these events over the WS."

Tests (tests/gateway/relay/test_relay_passthrough.py): byte-preservation
round-trip (+ malformed-body tolerance), connect() wiring, application-command
and message-component interactions route through handle_message with correct
session source + scope capture, malformed/non-discord forwards dropped cleanly.
100 relay tests green. Pairs with the connector PR (gateway-gateway).
2026-06-22 20:10:57 +10:00
kshitijk4poor
ebd38e1280 test(agent): regression for token-only compression progress (#39550, #23767)
Adds test_413_retries_on_token_only_compression: same message count but
materially fewer tokens after compaction must count as progress and retry,
not abort. Fails on main without the salvaged fix, passes with it.
2026-06-22 15:26:29 +05:30
Teknium
5ff11a689b
feat(cli): /timestamps command + timestamps in /history (#50506)
display.timestamps already drove the [HH:MM] suffix on live submitted and
streamed message labels, but there was no runtime command to toggle it and
/history ignored the setting entirely. Add /timestamps [on|off|status]
(alias /ts) and render [HH:MM] in /history for turns that carry a stored
unix timestamp (resumed sessions). Live unsaved turns without a stored time
are never given a fabricated one. Uses the existing sanctioned non-wire
'timestamp' message key (stripped before the API call in chat_completions),
so message-alternation and prompt-cache invariants are untouched.
2026-06-21 22:44:25 -07:00
Shannon Sands
b9b4756ab4 fix dashboard chat session titles 2026-06-21 22:44:02 -07:00
Shannon Sands
5dae502b86 Address email pairing review feedback 2026-06-21 22:43:57 -07:00
Shannon Sands
2455e1801b Make email pairing opt-in 2026-06-21 22:43:57 -07:00
Teknium
74f0dd62e8
feat(cli): Ctrl+G submits the edited draft on save (TUI parity) (#50560)
Ctrl+G already opened $EDITOR with the current draft, but used
open_in_editor(validate_and_handle=False), which only loaded the saved text
back into the input area — the user still had to press Enter. The TUI's
Ctrl+G (openEditor) submits the draft on a clean exit. Since CLI submission
is driven by the custom Enter keybinding (not the buffer accept_handler),
validate_and_handle can't route through it; instead chain a done-callback on
the editor Task that calls the new _submit_editor_buffer(), which mirrors the
Enter handler's idle/queue/slash branches and drops an empty save.
2026-06-21 22:43:55 -07:00
Shannon Sands
4b09903de5 fix Nous auth refresh for idle agents 2026-06-21 22:43:48 -07:00
teknium1
b5bd66eac9 fix(telegram): observed/replied group docs of any type are cached too
Follow-up to the accept-any-file-type change. The observe-unmentioned and
replied-media paths relied on cache_media_bytes() returning None for
unsupported document types to emit an 'unsupported, not cached' note. Now
that any file type is always cached, those docs are cached and surfaced with
a path-pointing note — consistent with the main document path. The
remaining cached-is-None branch is image-validation-failure only; its note
is reworded accordingly. Updates the group-gating test to the new contract.
2026-06-21 22:43:45 -07:00
teknium1
4314d451ca fix(gateway): accept any inbound file type across all messaging platforms
Authorization to message the agent is the gate, not the file extension.
Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was
opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass
at all on Telegram/Slack — so an .html (or any non-allowlisted type) was
dropped or hard-rejected before the agent saw it.

Now every authorized upload is cached and surfaced to the agent regardless
of type:
- base.cache_media_bytes(): unknown types cache as octet-stream (or the
  caller-supplied MIME) instead of returning None — fixes the chokepoint
  that Teams/Telegram-media route through.
- discord/telegram/slack adapters: removed the allowlist reject/skip; any
  non-media attachment is typed DOCUMENT and cached. Known types keep their
  precise MIME.
- Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text +
  code + config + markup) instead of a blind UTF-8 decode, so binary formats
  (PDF/zip/docx) with ASCII headers are never inlined.
- gateway/run.py emits the path-pointing context note for every DOCUMENT,
  including non text/application MIME types.
- discord.allow_any_attachment is now a documented no-op kept for config
  back-compat.

Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types
cache, known types stay precise, PDFs are not inlined.
2026-06-21 22:43:45 -07:00
Ben Barclay
de6b3ae377
fix(terminal): bridge docker_extra_args to TERMINAL_DOCKER_EXTRA_ARGS in CLI + gateway (#50631)
terminal.docker_extra_args passes flags verbatim to `docker run` (e.g.
--gpus=all, --shm-size=16g). It was wired into DEFAULT_CONFIG,
TERMINAL_CONFIG_ENV_MAP (so `hermes config set` bridged it),
terminal_tool._get_env_config (reads TERMINAL_DOCKER_EXTRA_ARGS), and
DockerEnvironment (applies extra_args) -- but it was MISSING from cli.py's
env_mappings and gateway/run.py's _terminal_env_map.

Consequence: a user who hand-edits config.yaml (rather than running
`hermes config set`) has docker_extra_args silently dropped on the CLI and
gateway/desktop startup paths, while docker_image / docker_volumes (which
ARE in those maps) bridge correctly -- producing the reported 'Hermes
partially reads the Docker config' symptom where --gpus=all and
--shm-size=16g never reach docker run.

This is the same bridge-coverage bug class that shipped before for
docker_run_as_host_user (cli + gateway) and docker_mount_cwd_to_workspace
(gateway). Fix by adding the key to both maps, plus a dedicated regression
pin in test_terminal_config_env_sync.py mirroring the existing
test_docker_*_is_bridged_everywhere guards.
2026-06-22 15:41:23 +10:00
Ben Barclay
6202fdfc35
fix(container): detect dashboard role under s6-overlay v3 (#49196) (#50600)
* fix(gateway): walk /proc/*/cmdline to find main-wrapper.sh under s6-overlay v3 (#49196)

(cherry picked from commit 3a108c2df0)

* fix(container): peel s6-v3 rc.init prefix so dashboard role is detected

kyssta-exe's preceding commit (#49238) fixed _read_container_argv() to
locate the rc.init-launched main-wrapper.sh process under s6-overlay v3,
but the skip still never fired: _strip_container_argv_prefix() only peeled
a prefix when args[0] was init/main-wrapper.sh/hermes. Under s6 v3 the
matched argv is

    /bin/sh -e /run/s6/basedir/scripts/rc.init top
        /opt/hermes/docker/main-wrapper.sh dashboard ...

so args[0] stayed /bin/sh, _is_dashboard_container() returned False, and
the dashboard container reconciled + started its own gateway-default —
the exact dual Telegram getUpdates 409 in issue #49196.

Fix: strip everything up to and including the main-wrapper.sh token (the
stable boundary the image owns), covering both the v2 (/init ...) and v3
(/bin/sh ... rc.init top ...) shapes with one rule, instead of matching
launcher tokens positionally. This also repairs _is_legacy_gateway_run_request()
under v3, which shares the same strip helper (the issue called this out).

Tests: extend the dashboard true/false parametrize sets with the s6-v3
argv shape, and add test_main_skips_reconcile_in_dashboard_container_s6v3
exercising main() end-to-end with the v3 argv. Verified via mutation that
both new v3 assertions fail under the old positional strip and pass with
the fix.

---------

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-22 15:35:38 +10:00
Teknium
9e96e70995
feat(cli): /prompt — compose your next prompt in $EDITOR (#50509)
* feat(cli): /prompt — compose your next prompt in $EDITOR

Adds /prompt (alias /compose): opens $VISUAL/$EDITOR on a temp markdown
file so you can hand-edit a multi-line prompt, then sends the saved buffer
as the next agent turn. Text after the command pre-seeds the buffer; an
empty save cancels. Reuses the one-shot _pending_agent_seed the interactive
loop already consumes (same mechanism as /blueprint), so no changes to the
input event loop or message pipeline. CLI-only.

* feat(tui): /prompt slash command opens $EDITOR (parity with CLI)

The TUI already opens $EDITOR via Ctrl+G (openEditor), but had no /prompt
slash command like the classic CLI. Wire openEditor into the slash handler
context and register /prompt (alias /compose) to call it; inline text after
the command is dropped into the composer first so it carries into the editor,
matching the CLI's /prompt <text>.
2026-06-21 20:21:33 -07:00
Teknium
95d53c3bcb
feat(cli): /reasoning full — show complete thinking, not 10-line clamp (#50499)
* feat(cli): /reasoning full to show complete thinking, not 10-line clamp

The post-response Reasoning recap box hard-clamped long thinking to the
first 10 lines, so there was no way to see the full reasoning trace after
a turn (live streaming already shows it in full). Add display.reasoning_full
(default off) plus /reasoning full|clamp to toggle it at runtime; the clamp
truncation note now points at the command. Addresses repeated user requests
to show all thinking tokens.

* test(gateway): de-snapshot /reasoning help assertion

The test froze the exact args-hint literal '/reasoning [level|show|hide]',
which the new full/clamp args change to '[level|show|hide|full|clamp]'.
Convert to an invariant: assert /reasoning is in help and carries its core
args, not the exact hint string.

* feat(tui): /reasoning full|clamp parity in tui_gateway

The classic-CLI reasoning_full toggle had no TUI equivalent — typing
/reasoning full in the TUI fell through to parse_reasoning_effort and
errored. The TUI renders thinking as an expand/collapse section (no fixed
10-line recap), so map full -> sections.thinking=expanded (raw, uncapped
via thinkingPreview mode='full') and clamp -> collapsed, persisting
display.reasoning_full for cross-surface config consistency.
2026-06-21 20:21:11 -07:00