The Slack docs document `slack.mention_patterns` as custom wake words that
trigger the bot alongside `@mention`, and the config layer bridges the key into
the Slack adapter's `config.extra` — but the adapter never read it. With
`require_mention` on, a channel message containing a configured wake word (and
no literal `<@BOTUID>`) was silently ignored. Every other adapter that
documents `mention_patterns` (Telegram, DingTalk, Mattermost, WhatsApp,
BlueBubbles, Photon) implements it; Slack was the odd one out.
Add `_slack_mention_patterns()` (compiled, cached; reads `slack.mention_patterns`
as a list/string or `SLACK_MENTION_PATTERNS` as a JSON/CSV/newline list, invalid
regexes warned and skipped) and `_slack_message_matches_mention_patterns()`,
mirroring the existing adapters. Channel mention detection now also triggers on
a wake-word match, so the documented field works as described.
Adds tests for pattern compilation (list/string/env/invalid-regex) and for the
channel-trigger gating with a wake word under require_mention.
* chore: re-trigger CI (workflows did not dispatch on prior head)
* fix(delegation): emit high-concurrency cost warning once per process
_get_max_concurrent_children() runs on every get_definitions() schema
rebuild (via _build_top_level_description / _build_tasks_param_description),
not just on actual delegate_task calls. With max_concurrent_children>10 the
cost advisory fired on every turn / agent spawn across every session, spamming
the log even when delegate_task was never used. Gate it behind a module-level
_HIGH_CONCURRENCY_WARNED flag so it warns at most once per process.
The success/staged gating and op-expansion for mirroring built-in memory
writes to external providers lived in a standalone agent/memory_write_bridge.py
helper called inline from two core call sites (tool_executor.py,
agent_runtime_helpers.py). That left the mirror decision-making in the agent
loop, outside the memory-provider interface.
Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the
loop now hands over the raw tool result + args and a metadata callback, and the
manager decides whether/what to mirror. Both core call sites collapse to a
single call; the orphan module is removed. No MemoryProvider ABC change.
Tests rewritten as behavior tests against the manager method.
Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove.
Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.
The install pre-flight asset probe queried trycua/cua's `releases/latest`,
which floats across the monorepo's components (agent-*, computer-*, lume-*,
train-*) — most ship zero binary assets. So the probe false-negatived and
hard-blocked `install_cua_driver` (line 770: `if not probe: return False`)
BEFORE the upstream installer ran, on Linux, Windows, and Intel macOS — even
though the installer it gates resolves the right tag and would have succeeded.
Net effect: the normal enable path (`hermes tools` → Computer Use post-setup,
and `hermes computer-use install`) refused to install on every platform this
PR claims to support.
Fix: list `/releases?per_page=100`, pick the newest `cua-driver-rs-v*` tag,
and match its assets on OS-token + arch — mirroring what the upstream
`install.sh` already does. Fail open if no driver release surfaces (installer
remains the source of truth). Adds an OS-token gate so a darwin asset can't
satisfy a Linux probe.
Tests: updated the install-probe fixtures to the list-of-releases shape with
`cua-driver-rs-v*` tags + OS-token asset names; added a regression guard
(`test_releases_latest_tag_ignored_picks_driver_rs_tag`) for the monorepo
floating-latest case. 25/25 install + 192 computer_use tests green.
Verified live: probe returns True for all six platform/arch combos against
the real GitHub releases API.
The runtime gate (check_computer_use_requirements) and the hermes tools
platform_gate both enable linux alongside darwin/win32, but several
docstrings/comments still described Linux as "alpha, gated off until it
flips upstream" — contradicting the code that ships it. Bring the prose in
line with the gate that's actually live:
- tool.py / cua_backend.py module docstrings: Linux is enabled (X11 today,
Wayland via XWayland), not gated off.
- toolsets.py description and hermes tools display name: (macOS/Windows) ->
(macOS/Windows/Linux).
No behavior change — the gate already allowed all three platforms.
Make the computer_use toolset platform-agnostic by driving cua-driver on
macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces
(capability discovery, structuredContent AX tree, opaque element_token,
click button enum, explicit mimeType, machine-readable manifest,
structured list_windows, structured health_report), each degrading
gracefully on older drivers.
Adds `hermes computer-use doctor` (drives cua-driver health_report with a
per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full
typed wrappers for the previously-uncovered cua-driver tools plus a generic
call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware
system-prompt guidance (host-deterministic, cache-safe), and honors
HERMES_CUA_DRIVER_CMD end-to-end.
Replaces the macOS-only skills/apple/macos-computer-use skill with a
cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans
docs.
Supersedes #44221 (Windows-enablement salvage of #30660).
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Windows toast notifications silently no-op unless the app sets an
AppUserModelID — new Notification().show() returns without error and
nothing appears. The desktop's native-notification system (approval,
turn-done, input, etc.) was therefore dead on Windows while working on
macOS/Linux.
Set the AUMID to the build appId (com.nousresearch.hermes) on Windows
right after app.setName, so toasts route to the installed Start Menu
shortcut. No-op on macOS/Linux, which don't require it.
* feat(goals): add /goal wait <pid> barrier to park the loop on a background process
The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.
/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.
Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.
* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)
The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.
* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait
Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
{"verdict":"wait","wait_on_pid":N} → park until that process exits
{"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.
* test(goals): update kanban _fake_judge to the 4-tuple judge contract
CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.
* feat(goals): trigger-based wait — park on a process's own signal, not just exit
Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.
Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.
Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
The composer model picker capped each provider's search matches at 12
(PER_PROVIDER_SEARCH). A provider serving more than 12 models (e.g.
opencode-go with 19) showed only a truncated subset when the user typed
its name to find it — exactly the models they were searching for got
cut. Edit Models showed the full list because it never applied this cap.
A search is already a narrowing action, so capping a single provider's
own matches is wrong. Remove the slice; search now lists every matching
model for the provider. The no-search default still shows the curated
top-N per provider via the visibility set.
Follow-up to #47077 (the backend dedup fix); this closes the remaining
frontend truncation users saw in the composer.
OpenCode Go (and OpenCode Zen) showed only a subset of the models they
serve in the desktop/CLI model picker — e.g. opencode-go rendered 13 of
19, silently dropping minimax-m3/m2.7/m2.5, glm-5/5.1, deepseek-v4-flash.
Root cause: the picker dedup in build_models_payload strips any model
from an aggregator row that overlaps a user-defined provider's catalog
(so a local proxy isn't shadowed by OpenRouter). It gated on
is_aggregator(), which is True for opencode-go/zen because their flat
/v1/models returns bare IDs the model-switch resolver searches. But
those are flat-namespace RESELLERS, not routing aggregators — every
model they list is first-party, so deduping them against a user proxy
that happens to serve a same-named model guts their own catalog.
Fix: add is_routing_aggregator() (True only for true routers like
OpenRouter and custom:* proxies; False for opencode-go/zen) and gate the
picker dedup on it. is_aggregator() is unchanged so model-switch flat
catalog resolution keeps working. Both desktop entry points
(model.options JSON-RPC and /api/model/options REST) and hermes model
share build_models_payload, so all surfaces get the full list.
Fixes#47077
The post-update gateway resume path (`_resume_windows_gateways_after_update`)
only relaunched gateways that were *running* when the update began — it
enumerates live PIDs in `_pause_windows_gateways_for_update` and respawns
exactly those. A gateway that had already died between updates (e.g. it was
launched attached to a terminal/TUI that later closed, taking the child with
it) was never brought back: the Startup-folder / Scheduled-Task autostart
entry only fires on the next login, not after an in-place update.
So a Desktop-GUI update (which runs `hermes update --yes --gateway`) on a box
whose gateway had quietly died would complete with no gateway running, and the
user had no indication anything should have come up.
Fix: when no gateway is running at pause time but an autostart entry is
installed (`gateway_windows.is_installed()` — an explicit "I want a gateway"
signal), return a `cold_start_if_installed` token. The resume step then does a
fresh detached spawn via `gateway_windows._spawn_detached()` — the same
windowless `pythonw` + `CREATE_BREAKAWAY_FROM_JOB` path `hermes gateway start`
uses. It re-checks liveness immediately before spawning so a concurrent start
(autostart entry firing) can't produce a duplicate.
Gateway-less users (no autostart entry) get nothing forced on them — the
pause step still returns None for them. POSIX is unaffected: enabled systemd
units already restart via `Restart=always`.
Windows-only; best-effort throughout (logs at debug and no-ops on any error).
Tests: pause returns the cold-start token only when installed, returns None
when not installed, resume cold-starts on the token, and resume skips the
cold-start when a gateway is already running.
Follow-up to ScotterMonk's cron-truncation fix:
- Remove HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var. Behavioral config
belongs in config.yaml, not a new HERMES_* env var (.env is secrets
only). The actual bug is fixed entirely by the adapter-aware skip; the
configurable cap was unneeded scope. MAX_PLATFORM_OUTPUT is a constant
again, collapsing the max_output=0 disable branch and the
audit-vs-truncation threshold divergence.
- Flag the remaining verified-chunking adapters (slack, matrix, feishu,
mattermost, teams, whatsapp, whatsapp_cloud, weixin, bluebubbles,
yuanbao) with splits_long_messages=True so the fix covers the whole
bug class, not just Discord/Telegram. Each verified to chunk in its
own send() via truncate_message().
- SMS deliberately left False: it chunks for normal replies but a
multi-segment cron blast is cost-bearing; the 4000-cap + file save is
the safer default there.
- Update tests: drop the two env-override tests, add a test asserting a
save failure during truncation (non-chunking) propagates.
Gateway-level truncation (MAX_PLATFORM_OUTPUT=4000) was pre-empting
adapter-side message splitting. Discord and Telegram both chunk long
content natively in their send() via truncate_message(), but the
delivery router truncated to 3800 chars + footer before the adapter
ever saw the full payload — so long cron output was cut short instead
of being delivered as multiple messages (issue #50126).
Changes:
- HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var makes the cap configurable
(default 4000, backward compatible). Set to 0 to disable truncation.
- TRUNCATED_VISIBLE (3800) removed — visible portion now derived
dynamically from max_output minus the actual footer length.
- New BasePlatformAdapter.splits_long_messages capability flag (default
False). Adapters that chunk in send() set True; delivery skips
truncation for them but still saves full output to disk as audit.
- Flagged Discord and Telegram (both verified to chunk in send()).
Fixes#50126
* chore: re-trigger CI (workflows did not dispatch on prior head)
* fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind')
Installer checkouts are shallow (git clone --depth 1). The CLI banner and
hermes update --check both did a plain git fetch (silently unshallowing the
repo) then git rev-list --count HEAD..origin/main, which counts across the
shallow boundary and prints a huge nonsense number like '12492 commits behind'.
Detect shallow up front, fetch with --depth 1 to preserve the boundary, and
compare tip SHAs instead of counting:
- banner _check_via_local_git: returns UPDATE_AVAILABLE_NO_COUNT when behind
(renders as 'update available') instead of the bogus count.
- _cmd_update_check: reports presence-only on shallow clones.
Full clones keep the exact count path unchanged. Mirrors the desktop fix in
apps/desktop/electron/main.cjs (commit 2950c6fa2).
* fix: update to version 3 endpoints and adding update and delete tool
* chore: removing the test md file
* fix: prevent circuit breaker on client errors in Mem0 provider
* chore: add telemetry for platform version
* feat: add OSS mode support to Mem0 memory provider
* chore: bump mem0ai dependency to >=2.0.1 in memory plugin
* refactor: enhance dependency checks and embedder config in mem0 backend
* refactor: adjust fact storage message for OSS mode
* refactor: expand user paths, add collection recreation on dimension change for Qdrant
* fix(mem0): make MEM0_USER_ID override gateway-native ids and tag writes with channel
When MEM0_USER_ID was configured (env or mem0.json), the gateway-native id
from kwargs (Telegram numeric id, Discord snowflake, ...) still won, so the
same human ended up under different user_ids per channel and memories never
merged across CLI / Telegram / Slack / Discord. Mirrors openclaw's cfg.userId
pattern: configured override wins, gateway-native id is the fallback.
The legacy "hermes-user" placeholder default written by the setup wizard is
treated as unset to avoid silently bucketing every gateway user together.
Also tag every write with metadata.channel (cli/telegram/discord/...) so the
dashboard can offer per-channel filtered views without coupling identity to
the channel; document the read/write filter asymmetry as intentional
(reads scope to user_id only for cross-agent recall).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: improve Mem0 memory provider backend, pagination, config, and error handling
* refactor: update mem0 telemetry code, docs, and bump version
* fix(mem0): make get_config_schema() return unified schema with mode-aware required flag
Schema always includes api_key field so picker shows "API key / local" for
both modes. In OSS mode api_key.required=False so status won't mislead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: improve mem0 telemetry, add env var key and OSS mode detection
* chore: bump mem0ai lower bound to 2.0.4 (latest SDK release)
* refactor: set telemetry sample rate to 1.0 and update docs for opt‑out
* fix(mem0): resolve 15 correctness, thread-safety, and resource bugs
Thread safety:
- Protect circuit breaker counters with _breaker_lock (race between
prefetch/sync daemon threads and main thread)
- Wrap sync_turn thread creation in _sync_lock; skip if previous sync
is still alive after 5 s join to prevent duplicate memory ingestion
- Guard _schedule_flush timer creation under _queue_lock (TOCTOU race)
- Capture local `backend` reference in prefetch/sync closures so
shutdown() nulling self._backend cannot crash in-flight threads
Correctness:
- Fix bool("false")==True for rerank param; parse string values explicitly
- Guard page/top_k with max(1,...) and move int() inside try blocks
- Fix fact_count=0 always in OSS mode (Memory.add returns list, not dict)
- Fix prefetch() not clearing result when thread still alive after timeout
- Fix atexit.register accumulating on repeated initialize() calls
Backend / setup:
- Handle Qdrant named-vector collections in _recreate_collection_if_dims_changed
(vectors is a dict; .size access raised AttributeError, swallowed silently)
- Wrap QdrantClient and psycopg2 conn/cursor in try/finally to prevent leaks
- Resolve ollama_bin at top of _ensure_ollama; use it for ollama pull
- Fix embedder key lookup when LLM provider has no env_var (e.g. ollama)
Also: remove _telemetry_enabled cache (env var check is cheap), bump
required mem0ai to >=2.0.7, minor README wording fix.
* fix(mem0): fix brittle qdrant path test + add telemetry sample-rate docs
- Replace generator-throw lambda with a proper def in
test_qdrant_path_not_writable; use tmp_path instead of a hardcoded
/nonexistent path so the test is root-safe
- Add MEM0_TELEMETRY_SAMPLE_RATE to memory-providers.md (was only
in the plugin README, not the user-guide docs)
* revert: remove MEM0_TELEMETRY_SAMPLE_RATE from user-guide docs
* refactor: remove telemetry from mem0 plugin and update documentation
* fix(mem0): set stdin=DEVNULL on setup subprocess calls
The TUI stdin guard (scripts/check_subprocess_stdin.py) requires every
subprocess call in plugin code to set stdin= so it can't inherit the
gateway's JSON-RPC stdin fd. Muzzle the docker/ollama calls in the OSS
setup wizard with stdin=subprocess.DEVNULL (none need interactive input).
Also covers the docker-inspect call the linter's regex misses.
---------
Co-authored-by: chaithanyak42 <chaithanya.kumar42a@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The compaction trigger compared estimated input against context_length *
threshold, but the provider reserves max_tokens of OUTPUT out of the same
window. With a large max_tokens (e.g. 65536 on a custom provider) the usable
input budget is materially smaller than the raw window, so sessions hit a
provider 400 before compaction ever fired.
_compute_threshold_tokens now subtracts the output reservation
(context_length - max_tokens) before applying the percentage and the
small-window 85% guard. max_tokens is stored on the compressor (threaded from
agent.max_tokens at construction) and reused across update_model() switches;
None = provider default = no reservation (full-window behavior, unchanged).
Reimplemented on the current _compute_threshold_tokens surface (the inline
threshold calc the original PR targeted was since refactored for the
small-window #14690 fix); composes with that 85% guard on the effective budget.
Credit: @kyssta-exe (#43651) — original design for the output-token
reservation in the compaction threshold.
Closes#43547.
Add relay_instance_id() (env GATEWAY_RELAY_INSTANCE_ID first, then
gateway.relay_instance_id in config.yaml, mirroring the other relay readers) and
forward it in the /relay/provision body so the connector can bind
gatewayId -> instanceId and route inbound per-instance once Phase 6 delivery
lands.
The value is gateway-asserted but safely scoped: the org/tenant stays
NAS-token-verified at the connector, so a dishonest gateway can only bind its
OWN tenant's instance — same posture as relay_endpoint(). instanceId is only
added to the body when present, so omitting it lets the connector store null
(back-compat: self-hosted / pre-Phase-6 gateways simply have no binding yet).
For a managed (NAS-hosted) agent the id is NAS's AgentInstance.id, stamped into
the container env beside GATEWAY_RELAY_URL.
Tests: reader (env/config/absent), self_provision_relay forwards the id (set +
absent), and the real _post_provision body includes instanceId ONLY when set.
Refs: ~/nous/specs/gateway-gateway plan.md Phase 6 Unit α; decisions.md Q11.
Tirith redacts its own findings, but the approval-request callbacks built the
operator prompt from the RAW command string, so a credential-shaped value
Tirith flagged was sent verbatim to clients, undoing the redaction one layer up.
Two egress transports carried the leak; both are fixed via a shared
module-level seam _redact_approval_command() (redact_sensitive_text force=True):
1. chat platforms — _approval_notify_sync (gateway/run.py): redact before
both the button path (send_exec_approval) and the plain-text /approve
fallback.
2. SSE/API stream — _approval_notify (gateway/platforms/api_server.py):
redact event['command'] before it is enqueued to API/desktop clients.
(whole-bug-class: sibling call path on a separate transport.)
force=True so the prompt — a hard secret-egress boundary — honors redaction
even when security.redact_secrets is off. Clean commands pass through unchanged.
Tests bind the seam (synthetic credential-format fixtures, force-when-disabled) AND assert
BOTH callbacks ASSIGN the redacted result before the send/enqueue sink, via an
AST contract that rejects a discarded-result call. All mutation-checked.
After a compaction, the post-compression path parks last_prompt_tokens=-1 and
sets awaiting_real_usage_after_compression=True, but last_real_prompt_tokens
still holds the stale pre-compression value (above threshold). should_defer_
preflight_to_real_usage() hit the 'last_real_prompt_tokens >= threshold => False'
short-circuit and let preflight fire a SECOND compaction before the provider
reported real post-compaction usage. Add an early-return on the awaiting flag so
deferral holds for exactly one turn; update_from_response() clears it.
The flag-setting half (#36718) already landed on main via the in-place
compaction path (conversation_compression.py); this adds the missing
should_defer guard that consumes it.
Credit:
- @ashishpatel26 (#38133) — diagnosis + the should_defer early-return design
- @Tranquil-Flow (#36769) — same #36718 fix, identical guard placement
Closes#36718.
The tail-protection budget walks estimated an assistant message's tokens from content + function.arguments only, dropping each tool_call's id, type and function.name (plus JSON structure). Assistant turns that fan out into parallel tool calls were undercounted by 2-15x (a 4-tool-call turn measures ~73 vs ~1,090 real tokens), so the protected tail overshot tail_token_budget and compression ran far below its intended ratio — context kept growing.
Consolidate the three duplicated budget walks (_prune_old_tool_results and the two passes in _find_tail_cut_by_tokens) into a single _estimate_msg_budget_tokens() helper that counts the full tool_call envelope via len(str(tc)), consistent with how _estimate_message_chars estimates message size elsewhere.
Tested on Windows: new tests/agent/test_compressor_tool_call_budget.py plus the existing compression suite (test_context_compressor, compressor_image_tokens, cross_session_guard, infinite_compaction_loop) — 209 passed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A cron job that sets `enabled_toolsets` to a list of *native* toolsets (e.g.
`["web", "terminal"]`) silently got ZERO MCP tools, while a job with no
per-job list got every globally-enabled MCP server. `_resolve_cron_enabled_
toolsets` returned the per-job list verbatim, bypassing the MCP-merge that the
platform-fallback branch performs via `_get_platform_tools`. So
`discover_mcp_tools()` registered the MCP tools into the registry, but
`get_tool_definitions(enabled_toolsets=...)` kept only the named native
toolsets — the agent then rejected every `mcp_*` call as "Unknown tool". (R2
of #23997.)
Fix: `_merge_mcp_into_per_job_toolsets` layers MCP membership onto a per-job
allowlist with the SAME semantics as `_get_platform_tools`:
* `no_mcp` sentinel present -> no MCP servers (sentinel stripped)
* one or more MCP server names already listed -> treat as an allowlist
* otherwise -> union in every globally-enabled MCP server
To avoid duplicating the "which MCP servers are enabled" computation (it
already existed inline in `_get_platform_tools`), this extracts a shared
`enabled_mcp_server_names(config)` helper in `hermes_cli.tools_config` and has
BOTH the gateway/CLI platform resolver and the cron per-job resolver call it —
so every path agrees on MCP membership (extend, don't duplicate).
Note: the issue's *headline* — bare MCP server names rejected, registry never
includes them — was already fixed on main (commits c10fea8d2 + 04918345e,
both before the issue was filed). This PR closes the remaining cron-specific
gap (R2). The `server:*` / `mcp:server` alias-notation rejection (R1) and the
quiet-mode silent-drop (R3) are tracked separately.
Salvaged from #32788 by sherman-yang (credited below). Reworked to reuse the
shared `enabled_mcp_server_names` helper instead of re-implementing the MCP
membership set in cron/scheduler.py.
Fixes#23997
Co-authored-by: sherman-yang <58446328+sherman-yang@users.noreply.github.com>
* feat(desktop): add Update now button to About panel
The About > Updates panel only surfaced "See what's new" when an update
was available, which just opens the changelog overlay — there was no way
to start the install directly from About. Add an "Update now" primary
button that opens the updates overlay (for apply progress) and kicks off
the install for the active target (backend in remote mode, else client).
* feat(desktop): PR-style file diffs in chat
Render write_file/edit_file/patch as a reviewable diff instead of raw
result JSON, closer to a Cursor/T3 per-edit review.
- Unified diff via FileDiffPanel: strip git file-header + @@ hunk noise,
drop the +/- gutter, color by line with a 2px gutter accent, full-bleed
to the card, transparent context lines, compact scroll height.
- Header shows filename + language icon + +N/-N stats; full path moves to
a hover tooltip (no Edited verb, no ms).
- Treat the three file-edit tools uniformly (isFileEditTool); read diff
from inline_diff or patch's diff field; suppress raw-arg detail.
- Reusable FileTypeIcon primitive sharing the code-block icon mapping
(codiconForFilename), codicon fallback.
- Per-row scaffolding fade (not the group wrapper, which trapped child
opacity); expanded edits stay full, collapsed fade; keyboard-only focus
lift. Hide diff-less rehydrated creates that read as dupes.
* style(desktop): lead --dt-font-mono with bundled JetBrains Mono
Code/diff blocks preferred a system Cascadia Code before the bundled
JetBrains Mono, so they drifted from the terminal (which leads with
JetBrains Mono) on machines where Cascadia is installed. Reorder so every
mono surface uses the face we actually ship.
* feat(desktop): syntax-highlight inline diffs via Shiki
Unify the diff renderer onto the same Shiki path as code blocks: highlight
the marker-stripped change content in the file's language, then a per-line
transformer layers the add/remove tint + gutter accent on top. Falls back
to the plain color-only renderer when the language is unknown, over budget,
or while Shiki loads.
- shikiLanguageForFilename(): extension → bundled-language id (shared
filename-token helper with codiconForFilename).
- code display:grid so full-width line tints don't double with newline
nodes; theme surface stripped so context lines stay transparent.
* style(desktop): use github-dark-dimmed for inline diffs
The vivid github-dark-default tokens read harsh behind the add/remove
tint in dark mode; switch the diff's dark theme to GitHub's lower-contrast
dimmed palette. Light mode and code blocks are unchanged.
* style(desktop): dim code-block syntax theme + share with diffs
Apply github-dark-dimmed to code blocks too (not just inline diffs) and
export one shared SHIKI_THEME so the two highlighters can't drift. Lower
contrast reads easier at our small code size in dark mode.
* style(desktop): soften shiki token contrast in dark mode
github-dark-dimmed only dims the background, which the diff/code surfaces
strip — so the bright token foregrounds were unchanged. Pull saturation +
brightness back a touch (hues preserved) on .shiki in dark mode for both
code blocks and inline diffs.
Follow-up to the salvaged preflight token-progress fix: require a material
(>5%) token reduction to count as progress, matching the overflow-handler
retry path (conversation_loop.py, #39550), so a sub-5% wobble can't keep the
3-pass preflight loop spinning. Adds boundary + zero-token regression tests.
/simplify-code QUALITY finding: the `if callable(_available_entries): ... else:
pool.select()` ladder was dead for the real CredentialPool type (`_available_entries`
is always a bound method) AND the select() fallback violated the helper's read-only
contract — select() -> _select_unlocked() runs _available_entries(clear_expired=True,
refresh=True), which persists to auth.json and triggers a network refresh. Call
_available_entries(clear_expired=False, refresh=False) directly inside the existing
try/except instead.
Also drops the now-dead `select=` stubs from the 6 pool tests (they only existed to
satisfy the removed fallback branch). Behavior unchanged; 6 pool tests pass and the
read-only / null-token contract tests were mutation-checked (flipping the flags /
removing the None-guard fails the respective test).
Rebased onto god-file Phase 1 refactor — preflight compression has moved
from agent/conversation_loop.py to agent/turn_context.py (no semantic
change in the refactor itself; the bug below was carried over verbatim).
The preflight compression loop in ``turn_context.py`` uses
``len(messages) >= _orig_len`` to decide whether a compression pass has
made progress. That conflates two different conditions: a true no-op
(transcript materially unchanged) and effective token compression that
summarises message contents but keeps the same number of rows. The
second case is misread as "Cannot compress further" — the session then
surfaces ``Context length exceeded`` and auto-resets even when the
post-compression estimate is far below the model context window.
Observed example from #39548: a Telegram session on GPT-5.5 with a 1M
context dropped from ~288k → ~183k tokens (a 36% reduction) while
preserving 220 messages. The loop treats that as exhaustion and the
gateway auto-resets the session.
Fix
---
Add ``_compression_made_progress(orig_len, new_len, orig_tokens, new_tokens)``
and call it after the post-pass ``estimate_request_tokens_rough`` (which
is moved up to run *before* the progress check instead of after it).
Either a row-count reduction OR a token-count reduction now counts as
progress; only when neither moves do we break out as "stuck".
Fixes#39548
Share one SHIKI_THEME (github-dark-dimmed) across code blocks and inline
diffs so they can't drift, and pull token saturation/brightness back via a
`.shiki` dark-mode filter. The dimmed theme alone only changes the
background — which both surfaces strip — so the bright foregrounds needed
the filter to actually calm down.
The connector half (gateway-gateway) moves the passthrough plane's post-ACK
forward off the HTTP gatewayEndpoint onto the gateway's outbound /relay WS via
a new passthrough_forward frame. This is the gateway side: the relay adapter
now RECEIVES and handles that frame, so a hosted gateway (no public IP) can
process forwarded Class-2/3 traffic (Discord interactions, Twilio) over the
socket it already holds — closing the "passthrough inbound doesn't work for
hosted gateways" gap.
- ws_transport.py: decode the passthrough_forward frame; PassthroughForward
dataclass + _passthrough_from_wire (base64 body -> exact bytes, byte parity
with the connector's toPassthroughForward); set_passthrough_handler mirrors
set_interrupt_inbound_handler.
- transport.py: PassthroughHandler type + set_passthrough_handler on the
RelayTransport protocol.
- adapter.py: connect() wires the passthrough handler; _on_passthrough decodes
the (already-sanitized, token-free) forward and, for a Discord interaction,
converts it to a MessageEvent routed through the normal agent path
(handle_message) — the reply egresses over the outbound / token-less
follow_up path, so the gateway never holds the interaction credential. Never
raises (a bad forward can't kill the read loop). Non-discord forwards (Twilio)
are logged + dropped for now.
- docs/relay-connector-contract.md: document the passthrough_forward frame +
PassthroughForward shape + §3.1.
The interaction -> MessageEvent CONVERSION semantics (slash-command vs button
UX, option rendering) are the open sub-design flagged in the spec; the TRANSPORT
+ receive mechanism (this) is settled per Ben's Gate-2 decision: "the relay
adapter handles receiving these events over the WS."
Tests (tests/gateway/relay/test_relay_passthrough.py): byte-preservation
round-trip (+ malformed-body tolerance), connect() wiring, application-command
and message-component interactions route through handle_message with correct
session source + scope capture, malformed/non-discord forwards dropped cleanly.
100 relay tests green. Pairs with the connector PR (gateway-gateway).
Unify the diff renderer onto the same Shiki path as code blocks: highlight
the marker-stripped change content in the file's language, then a per-line
transformer layers the add/remove tint + gutter accent on top. Falls back
to the plain color-only renderer when the language is unknown, over budget,
or while Shiki loads.
- shikiLanguageForFilename(): extension → bundled-language id (shared
filename-token helper with codiconForFilename).
- code display:grid so full-width line tints don't double with newline
nodes; theme surface stripped so context lines stay transparent.
Code/diff blocks preferred a system Cascadia Code before the bundled
JetBrains Mono, so they drifted from the terminal (which leads with
JetBrains Mono) on machines where Cascadia is installed. Reorder so every
mono surface uses the face we actually ship.
Render write_file/edit_file/patch as a reviewable diff instead of raw
result JSON, closer to a Cursor/T3 per-edit review.
- Unified diff via FileDiffPanel: strip git file-header + @@ hunk noise,
drop the +/- gutter, color by line with a 2px gutter accent, full-bleed
to the card, transparent context lines, compact scroll height.
- Header shows filename + language icon + +N/-N stats; full path moves to
a hover tooltip (no Edited verb, no ms).
- Treat the three file-edit tools uniformly (isFileEditTool); read diff
from inline_diff or patch's diff field; suppress raw-arg detail.
- Reusable FileTypeIcon primitive sharing the code-block icon mapping
(codiconForFilename), codicon fallback.
- Per-row scaffolding fade (not the group wrapper, which trapped child
opacity); expanded edits stay full, collapsed fade; keyboard-only focus
lift. Hide diff-less rehydrated creates that read as dupes.
Adds test_413_retries_on_token_only_compression: same message count but
materially fewer tokens after compaction must count as progress and retry,
not abort. Fails on main without the salvaged fix, passes with it.
Compression can materially reduce request size (tool-result pruning,
in-place summarization) without reducing message count. The two
compression-success checks in conversation_loop.py (413 handler and
context-overflow handler) only compared len(messages) to detect
success, missing token-only compression.
Now re-estimates tokens after compress_context() returns and treats
any >=5% reduction as a successful compression pass. Error logs
also use the post-compression token count instead of the stale
pre-compression estimate.
Fixes: #39550