Commit graph

1790 commits

Author SHA1 Message Date
Teknium
f721d2cda9
fix(image/video gen): make schema delivery instruction platform-neutral (#51031)
* chore: re-trigger CI (workflows did not dispatch on prior head)

* fix(image/video gen): make schema delivery instruction platform-neutral

The image_generate and video_generate tool schema descriptions hardcoded
a gateway-only delivery instruction ('display it with markdown
![description](url-or-path) and the gateway will deliver it'). That schema
is sent on every platform, so on CLI it directly contradicted the CLI
platform hint ('Do NOT emit MEDIA:/path tags ... state its absolute path
in plain text'), and on messaging platforms it was also wrong about the
mechanism (local file paths are delivered via MEDIA: tags, not markdown
image syntax — markdown ![]() only works for URLs).

The per-platform file-delivery convention is already owned correctly by
the platform hints in prompt_builder.py. The tool schema now just
describes the result shape (URL or absolute path in the image/video field)
and defers 'how to deliver' to the active platform's guidance.

Provider/model injection already works via _build_dynamic_image_schema()
(the 'Active backend: <provider> · model: <model>' line); no change there.
2026-06-22 13:40:42 -07:00
Teknium
f1e6d39a74
feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842)
* feat(computer_use): disable cua-driver telemetry by default, add opt-in

cua-driver ships anonymous PostHog usage telemetry ENABLED by default
upstream (fires cua_driver_install / cua_driver_doctor events to
eu.i.posthog.com). Hermes now disables it for our users unless they
explicitly opt in.

- New config key `computer_use.cua_telemetry` (default false) in
  DEFAULT_CONFIG.
- `cua_backend.cua_driver_child_env()` injects
  `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is
  disabled (the default); leaves the var untouched on opt-in so the driver
  uses its own default. Reads config fail-safe — any error defaults to
  telemetry off.
- Routed every cua-driver spawn site through the policy: MCP backend
  (StdioServerParameters env), `cua_driver_update_check`, doctor's
  health_report Popen, the install.sh/install.ps1 runner, and the
  `--version` / status probes.
- Docs: new Telemetry subsection in computer-use.md (EN).
- Tests: tests/computer_use/test_cua_telemetry.py — default disables,
  explicit-false disables, opt-in leaves var untouched, config-failure
  fails safe, inherited-enabled is overridden off.

Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with
the var=0 the driver reports "telemetry: disabled via
CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs
"sending event: cua_driver_doctor". 213 computer_use + install tests green.

* fix(dashboard): fold computer_use config category into agent tab

The new computer_use.cua_telemetry key created a single-field dashboard
config category, tripping test_no_single_field_categories (web_server's
invariant that categories with <2 fields must be merged to avoid tab
sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the
existing onboarding/telegram single-field folds.
2026-06-22 09:57:16 -07:00
Teknium
2617946397
fix(delegation): emit high-concurrency cost warning once per process (#50848)
* chore: re-trigger CI (workflows did not dispatch on prior head)

* fix(delegation): emit high-concurrency cost warning once per process

_get_max_concurrent_children() runs on every get_definitions() schema
rebuild (via _build_top_level_description / _build_tasks_param_description),
not just on actual delegate_task calls. With max_concurrent_children>10 the
cost advisory fired on every turn / agent spawn across every session, spamming
the log even when delegate_task was never used. Gate it behind a module-level
_HIGH_CONCURRENCY_WARNED flag so it warns at most once per process.
2026-06-22 09:44:30 -07:00
teknium1
e3505c7f73 fix(computer_use): reconcile Linux gate with stale "gated off" comments
The runtime gate (check_computer_use_requirements) and the hermes tools
platform_gate both enable linux alongside darwin/win32, but several
docstrings/comments still described Linux as "alpha, gated off until it
flips upstream" — contradicting the code that ships it. Bring the prose in
line with the gate that's actually live:

- tool.py / cua_backend.py module docstrings: Linux is enabled (X11 today,
  Wayland via XWayland), not gated off.
- toolsets.py description and hermes tools display name: (macOS/Windows) ->
  (macOS/Windows/Linux).

No behavior change — the gate already allowed all three platforms.
2026-06-22 06:42:30 -07:00
Francesco Bonacci
f2e37549c6 feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux)
Make the computer_use toolset platform-agnostic by driving cua-driver on
macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces
(capability discovery, structuredContent AX tree, opaque element_token,
click button enum, explicit mimeType, machine-readable manifest,
structured list_windows, structured health_report), each degrading
gracefully on older drivers.

Adds `hermes computer-use doctor` (drives cua-driver health_report with a
per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full
typed wrappers for the previously-uncovered cua-driver tools plus a generic
call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware
system-prompt guidance (host-deterministic, cache-safe), and honors
HERMES_CUA_DRIVER_CMD end-to-end.

Replaces the macOS-only skills/apple/macos-computer-use skill with a
cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans
docs.

Supersedes #44221 (Windows-enablement salvage of #30660).

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-22 06:42:30 -07:00
Teknium
ff85af3fc7
feat(goals): /goal wait <pid> — park the loop on a background process (#50503)
* feat(goals): add /goal wait <pid> barrier to park the loop on a background process

The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.

/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.

Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.

* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)

The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.

* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait

Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
  {"verdict":"wait","wait_on_pid":N}      → park until that process exits
  {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.

* test(goals): update kanban _fake_judge to the 4-tuple judge contract

CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.

* feat(goals): trigger-based wait — park on a process's own signal, not just exit

Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.

Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.

Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
2026-06-22 06:27:29 -07:00
Teknium
b0a25980f8
fix(terminal): make hermes install dir reachable in subshell PATH (#50534)
Plugins shelling out to bare `hermes` via the terminal tool hit
`command not found` (exit 127) when the gateway was launched without the
hermes install dir on PATH (systemd, service managers, cron, desktop
launchers) — even though `hermes` works in the user's own interactive
terminal, which sources the shell rc that exports that dir.

The terminal tool's subshell PATH was the agent process PATH plus a
static set of system dirs (_SANE_PATH); it never included wherever the
hermes console-script actually lives (~/.local/bin, the venv bin/Scripts,
pipx, nix). Resolve that dir once (which/argv0/sys.executable) and
prepend-if-missing it so bare `hermes` resolves regardless of launch
method.
2026-06-21 20:00:06 -07:00
teknium1
8cfcbd327d fix(process): SIGKILL the whole tree on escalation, not just wait_procs survivors
Live testing against a real SIGTERM-ignoring process TREE (parent + children,
the agent-browser daemon + renderer shape) revealed psutil.wait_procs's
gone/alive partition mis-handles a parent/child tree: it reaps via
Process.wait() and could mark targets gone/alive inconsistently across the
tree, leaving survivors un-killed (flaky — sometimes the parent lived,
sometimes a child). Replace it with: sleep out the grace window, then
directly re-probe every captured target (_proc_alive, treating zombies as
dead) and SIGKILL any that's still running. Add a multi-child-tree regression
test. 6/6 escalation tests green across repeated runs; the real-tree E2E now
kills the full tree 6/6 runs.
2026-06-21 19:08:52 -07:00
teknium1
8cecaf0b29 feat(process): escalate SIGTERM->SIGKILL on host-pid termination after grace
A daemon that ignores or stalls in its SIGTERM handler currently survives the
process-registry reap and leaks until reboot (observed as agent-browser
daemons accumulating to EMFILE on long-running gateways). _terminate_host_pid
now snapshots the tree, SIGTERMs it, waits a bounded grace window
(terminal.daemon_term_grace_seconds, default 2.0s, 0 disables), then SIGKILLs
any survivor. The recycled-PID identity guard still gates the whole path, so
escalation never reaches a stranger; Windows is unchanged (taskkill /F is
already a hard kill).

Config lives in config.yaml (terminal.daemon_term_grace_seconds), NOT an env
var, per the .env-secrets-only policy.

Implements the SIGKILL-escalation idea from @tkwong's #15008, reworked onto the
current _terminate_host_pid tree-kill path (the original predated it) and
config-gated instead of env-var-gated.

Co-authored-by: Benjamin Wong <tkwong@inspiresynergy.com>
2026-06-21 19:08:52 -07:00
valentt
e447723149 fix(process-registry): re-validate PID identity before killing host processes
The background-process registry signalled host PIDs (recovery adoption,
detached-session kill, tree-kill) using a number captured at spawn, guarded
only by a bare liveness check. Once a session's process exits and is reaped the
kernel recycles that PID onto an unrelated process, so an alive-but-different
PID passed the check and got tree-killed.

Observed in the wild: a recycled background-session PID landed on Firefox's
session leader; a later kill/refresh walked its process tree and SIGTERMed
every tab — Firefox "closing" at irregular intervals with no crash/coredump.

This is the same PID/PGID-recycling class fixed for the MCP orphan reaper in
7bd1f8a2d, but the process_registry subsystem was never guarded — so the bug
persisted.

Fix: record each host process's kernel start time (/proc/<pid>/stat field 22)
at spawn, persist it in the checkpoint, and re-validate it before every signal
via `_host_pid_is_ours`. A PID whose start time no longer matches — or that is
gone — is never signalled:
  - recover_from_checkpoint: a recycled PID is not adopted as a session.
  - _refresh_detached_session: a recycled detached PID is marked exited.
  - kill_process / _terminate_host_pid: refuse to tree-kill a stranger.
Legacy checkpoints and platforms without /proc (no baseline) degrade to the
prior best-effort liveness behaviour, so nothing else changes.

Adds TestPidReuseGuard: real-process tests proving a mismatched start time
refuses termination while a matching one still kills, plus recovery/refresh
recycling paths. 74 registry + 22 MCP-stability tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 17:23:33 -07:00
Teknium
84e1d31e54
refactor(kanban): fold worker/orchestrator skills into injected guidance (#50473)
The kanban-worker and kanban-orchestrator bundled skills existed only to
be force-loaded into dispatcher-spawned workers, gated by
environments:[kanban] so they wouldn't leak into normal CLI listings.
That gating was fragile (the leak that #50443 patched) and the
--skills auto-load was already best-effort — most workers ran without it
because the bundled skill isn't present in profile-scoped skills dirs.

Remove the skills entirely and promote their load-bearing content
(workspace kinds, deliverable artifacts, created-card integrity, profile
discovery) into KANBAN_GUIDANCE, which is already injected into every
kanban worker's system prompt. Net result: every worker reliably gets
the guidance, nothing can leak into a CLI/blank-slate session, and the
gating machinery is gone.

- agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE
- hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe
- hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card
- hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text
- delete skills/devops/kanban-{worker,orchestrator}
- docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references
- tests: update spawn-argv expectations; re-bound the guidance-size guard

Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).
2026-06-21 17:06:48 -07:00
Dusk1e
84fcbbf6a9 fix(security): quote HERMES_TIMEZONE in remote code execution to prevent shell injection 2026-06-21 16:55:12 -07:00
Dusk1e
8fcb8136bb fix(security): harden smart approval guard against prompt injection
# Conflicts:
#	tools/approval.py
2026-06-21 16:39:48 -07:00
teknium1
624580e836 fix(browser): verify daemon identity before orphan reaper kills a PID (#14073)
The browser orphan reaper reads a daemon PID from a `.pid` file in a
world-writable, predictably-named temp dir (`/tmp/agent-browser-h_*`) it
does not write itself, then tree-kills that PID via `_terminate_host_pid`
after only a liveness check. A same-user actor could plant a fake socket
dir whose `.pid` points at an arbitrary victim process, and OS PID reuse
after the real daemon exits could land the recorded PID on an unrelated
process — either way an arbitrary same-user process (and its whole tree)
gets SIGTERMed. Local DoS.

Add `_verify_reapable_browser_daemon()`, gated before the kill: via psutil
(a hard dep, fine cross-platform for the same-user processes the reaper can
signal) require both (1) identity — `agent-browser` in the process
name/cmdline — and (2) binding — the live process references *this* session's
socket dir in its cmdline or `AGENT_BROWSER_SOCKET_DIR`. The binding check is
the real spoof defense: a planted/recycled PID won't embed our exact socket
path. Fail-closed on any ambiguity (unreadable cmdline, no match), leaving the
process and its socket dir untouched for a later sweep.

Builds on @sgaofen's fix in #14394 (cmdline identity check); rewritten to use
psutil instead of `/proc`+`ps` (cross-platform, Windows-covered) and to add
the session-socket-dir binding check for recycled-PID / spoof resistance.

Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
2026-06-21 15:23:47 -07:00
sprmn24
ed966696eb fix(security): handle IPv6 scope IDs in URL safety checks to prevent bypass
ipaddress.ip_address() raises ValueError on IPv6 addresses with scope
IDs (e.g. 'fe80::1%eth0'). Both is_always_blocked_url() and is_safe_url()
silently skipped these via `except ValueError: continue`.

If ALL resolved addresses for a hostname carry scope IDs, every address
is skipped and the URL passes all safety checks — a potential SSRF
bypass vector against link-local or metadata endpoints.

Fix:
- Strip the scope ID (%eth0) before parsing in both functions
- is_safe_url(): fail closed (return False) with a warning log if still
  unparseable after stripping
- is_always_blocked_url(): use continue (not return False) to preserve
  multi-address scanning, with a warning log

Affected: tools/url_safety.py — is_always_blocked_url(), is_safe_url()
2026-06-21 13:56:35 -07:00
panghuer023
a9c8025984 fix(approval): honor interrupt in blocking gateway approval wait (#8697)
A dangerous-command gateway approval blocks the agent's execution thread
inside _await_gateway_decision() on threading.Event.wait() until the user
responds or the 5-minute approval timeout fires. The poll loop never checked
is_interrupted(), so /stop (which flags the agent's execution thread via
AIAgent.interrupt()) was silently ignored — the session stayed wedged until
timeout, even though /stop reported the session unlocked.

Check is_interrupted() at the top of the poll loop. The wait runs on the
agent's execution thread, the exact thread interrupt() flags, so the check
sees the signal and resolves the pending approval as deny — the agent loop
receives a normal denial and unwinds cleanly. Covers /stop, /new, and the
gateway inactivity-timeout interrupt through the single shared wait loop used
by both the terminal and execute_code guards.
2026-06-21 13:33:48 -07:00
Eugeniusz Gilewski
def3f6388f fix(file): anchor device symlink guard to task cwd
The read_file device guard now walks symlink hops before the file operation
layer, but that hop walk still interpreted relative paths against the Python
process cwd. In sessions where TERMINAL_CWD points at the task workspace, a
relative workspace symlink to a blocked alias such as /dev/../dev/stdin could
therefore miss the intermediate device target before later task-cwd resolution.

Anchor relative device checks to the task base before symlink-hop inspection so
the pre-I/O guard sees the same workspace path that read_file would otherwise
read. Absolute device paths and the existing final realpath fallback remain
unchanged.

Refs #10141
Refs #29158
2026-06-21 12:16:10 -07:00
Teknium
7a131f7f40
fix(api-server): stop silently promising async delivery on stateless HTTP path (#50319)
* fix(api-server): stop silently promising async delivery on stateless HTTP path

terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True)
silently no-op'd on the API server / WebUI path (#10760): the watcher / detached
child registered, but every API-server route (OpenAI-spec /v1/chat/completions
and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its
channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A
completion that fires after the response closed had nowhere to go — from the
agent side, indistinguishable from a hang.

There is no spec-compliant surface to wake the agent later on a stateless HTTP
client, so make the no-op honest instead of silent:

- Add a per-adapter capability flag supports_async_delivery (default True;
  APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY
  contextvar via async_delivery_supported(). Toggle on the adapter, not a
  hardcoded platform string — a future stateless adapter is correct-by-default.
- terminal: when delivery is unsupported, skip watcher registration, force
  notify_on_complete off, and return a notify_unsupported note telling the
  agent to process(action='poll').
- delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS
  execution (work runs and returns in the same response) with a note, instead
  of handing out a handle that never resolves.

CLI (in-process completion_queue) and the real gateway platforms are unchanged.

Fixes #10760

* refactor(api-server): route session binding through a single no-delivery chokepoint

Add APIServerAdapter._bind_api_server_session() and route both agent-entry
paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs
_run_sync path) through it. The helper hardwires platform="api_server" and
async_delivery=False with no async_delivery parameter to pass, so a future
route added to the API server physically cannot reintroduce the silent
no-op (#10760) by forgetting to mark the channel as non-delivering.

The binding stays request-scoped (cleared per turn), so a session resumed
later on a delivering interface (CLI / gateway platform) re-binds fresh and
is NOT blocked — the no-delivery decision tracks the interface handling the
current turn, never the session.
2026-06-21 12:15:14 -07:00
Stephen Chin
3b56d3a29a fix(security): redact secrets in kanban tool payloads before persistence 2026-06-21 12:02:30 -07:00
Brandon Zarnitz
71274f264b fix(file): reject read_file line-numbered writeback 2026-06-21 11:55:59 -07:00
Teknium
41ba90f814 fix(process): keep CLI drain dedup after poll goes read-only (#10156)
Follow-up to @de1tydev's poll-read-only fix. Removing the
_completion_consumed.add() from poll() fixes the gateway/tui watcher
suppression (#10156) but reintroduces the CLI duplicate that #8228 fixed:
a notify_on_complete process always enqueues a completion event, and the
CLI idle/post-turn drain would re-inject it as a [SYSTEM: ...] message
even though the agent already saw the exit inline in its poll result.

Add a separate _poll_observed set that poll() populates on an observed
exit. drain_notifications() (CLI only) skips poll-observed sessions; the
gateway/tui watchers keep checking only is_completion_consumed, so a
read-only poll never suppresses their autonomous delivery turn.

- _poll_observed pruned alongside _completion_consumed in _prune_if_needed
- 4 tests: CLI drain dedup after poll, gateway gate untouched, running
  poll doesn't mark observed, wait/log still skip CLI drain
2026-06-21 11:11:23 -07:00
Liao Shiwu
6f5f58e34b fix: keep poll read-only for notify_on_complete watcher 2026-06-21 11:11:23 -07:00
Eugeniusz Gilewski
9078b4bbdf fix(file): harden read_file device alias blocking
Security-hardening fix for the read_file device guard, not a new sandbox
boundary. The guard already rejects direct device paths and upstream now
has a resolved-path pass for workspace symlinks to blocked devices, but
its concrete-path helper still compared the expanded path before
normalization. That leaves residual alias cases where the dangerous path
is visible before final terminal-specific resolution, for example:

  1. /dev/../dev/zero and /dev/./urandom should match the blocked-device
     list as concrete paths, not only after final realpath;
  2. /dev/stdin-style aliases can disappear once realpath follows them
     to /proc/self/fd/0 and then to a tty path;
  3. a user symlink to /dev/../dev/stdin exposes the dangerous
     intermediate target before final resolution, but not necessarily
     after it.

Normalize expanded paths before matching and inspect each symlink hop
before falling back to realpath. This preserves the existing /proc fd and
/proc pseudo-file guards while enforcing the intended security invariant:
model-supplied read paths must not reach blocking or infinite device
streams through spelling, normalization, or symlink-hop tricks.

Classification: security hardening / residual bypass fix for the
read_file device blocklist. This is defensive code at the file-tool
boundary, but it fixes a concrete denial-of-service class tracked as
security in #10141 and #29158.

Tests:
  - normalized /dev/../dev/zero and /dev/./urandom aliases
  - symlink to /dev/../dev/stdin blocked before realpath
  - existing symlink-to-device and regular-symlink guards still pass

Fixes #10141
Fixes #29158
2026-06-21 11:11:19 -07:00
kshitij
ed8f7898b9
Merge pull request #50136 from NousResearch/fix/context-aware-tool-budget
fix(agent): scale tool-output budget to the model context window (#23767)
2026-06-21 20:01:32 +05:30
liuhao1024
6984026f12 fix(browser): enable SSRF guard when terminal runs in container
When terminal.backend is docker/modal/daytona/ssh/singularity, the
terminal runs in a sandboxed container with network isolation, but the
browser still runs on the host.  The SSRF guard was skipped because
_is_local_backend() only checked browser.cloud_provider, not the
terminal backend.

Now _is_local_backend() also checks TERMINAL_ENV — when the terminal
is containerized, the browser is treated as non-local and SSRF
protection is enabled.

Fixes #38690
2026-06-21 07:26:18 -07:00
kshitijk4poor
1965d56219 fix(agent): scale tool-output budget to the model context window (#23767)
The tool-result persistence budget was a fixed 100K chars/result and 200K
chars/turn regardless of the active model. On a small-context model (e.g. a
65K-token local model switched into mid-session) a single large tool result
(reporter: a 279K-char search result) or a full 200K-char turn (~50K tokens)
could by itself approach or exceed the window, forcing an oversized request
that the provider rejects as "Prompt too long".

- budget_config.budget_for_context_window() scales per-result/per-turn char
  caps to a fraction of the model window, clamped to the historical 100K/200K
  defaults (large models unchanged) and floored so small models stay usable.
- resolve_threshold() now caps the per-tool registry value at default_result_size
  so tools that register a fixed 100K cap (web/terminal/x_search) don't re-inflate
  a scaled-down budget. No-op for the default budget (both 100K).
- tool_executor wires the agent's live context_length (recomputed on model
  switch) into all four persist/turn-budget call sites.

read_file stays inf-pinned (no persist loop). Verified E2E: a 279K-char result
against a 65K model collapses to a ~1.6K preview; a 200K model is byte-identical
to today.
2026-06-21 17:46:38 +05:30
xxxigm
472c068159 fix(mcp): detect 'unknown method' phrasing in ping keepalive fallback
A server that doesn't implement the optional 'ping' utility answers a
keepalive ping with JSON-RPC method-not-found. _is_method_not_found_error
latches that condition so the probe falls back to list_tools instead of
reconnect-looping.

The substring fallback only matched 'method not found' / '-32601' /
'not found: ping'. Servers that surface method-not-found as the common
'Unknown method: <name>' phrasing without a structural -32601 code (e.g.
agentmemory's MCP server) slipped through, so the fallback never latched
and the keepalive reconnect-looped every cycle.

Add 'unknown method' to the substring fallback so the ping->list_tools
keepalive fallback latches for these servers too.

Fixes #50028.
2026-06-21 16:02:56 +05:30
kyssta-exe
65d7c7fafd fix(cron): execute job immediately on action='run'
`cronjob(action='run')` (and `hermes cron run`) only set `next_run_at = now`
and returned success, relying on the scheduler ticker to actually execute the
job on its next tick. When no gateway/ticker is running — a CLI-only setup, or
the Windows case in #41037 — the job never executed: `run` reported success,
but `last_run_at` stayed null forever, no output, no delivery.

A manual `run` should actually run. `_execute_job_now` now:

- **claims the job via `claim_job_for_fire`** — the same at-most-once CAS the
  scheduler/external-provider fire path uses. This both advances `next_run_at`
  for recurring jobs and blocks a concurrently-running gateway ticker from
  double-firing the same job; if the claim is lost, the run is skipped (the
  tool reports `execution_skipped`). This closes the double-fire race that a
  bare `advance_next_run` left open (a tick whose `get_due_jobs` already
  captured the job between trigger and advance would still fire it).
- **delegates firing to `run_one_job`** — the single shared
  execute→save→deliver→mark body the ticker and external providers use — so
  failure delivery, `[SILENT]` handling, and live-adapter delivery stay
  identical across paths and can't drift. (The original salvage re-implemented
  this sequence inline and had already dropped failure delivery + `[SILENT]`.)

The tool response carries `executed`, `execution_success`, and either
`execution_error` or `execution_skipped`. The `hermes cron run` CLI message no
longer claims "It will run on the next scheduler tick" — it reports the actual
"Ran now: succeeded/failed" outcome (or the skip).

Salvaged from #41130 by @kyssta-exe (authorship preserved); reworked to reuse
`claim_job_for_fire` + `run_one_job` per review rather than re-implementing the
fire sequence inline. Adds tests for the claim-then-fire path, claim-lost skip,
failure reporting, and exception capture.

Fixes #41037

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-21 13:28:04 +05:30
Teknium
c6bf6bda90
fix(memory): recover from missing old_text on single-op replace/remove (#49997)
Single-op replace/remove failed with a dead-end 'old_text is required'
error when a structured-output client omitted the optional old_text field
(it can't be schema-required without a top-level if/then combinator that
OpenAI's Codex backend 400s on). The model couldn't recover.

Now a missing old_text returns the current entry inventory plus a retry
instruction (mirroring the batch path's _batch_error), so the model can
reissue the call with old_text set. Also sharpens the old_text schema
description to state it's required for replace/remove.

Fixes #49466, #43412.
2026-06-20 23:46:52 -07:00
Sworntech-dev
9f507a0aa3 docs: remove file tools TBD placeholder 2026-06-20 23:23:47 -07:00
Andres Sommerhoff
97563ab821 fix: warn on line-oriented newline search patterns 2026-06-20 23:23:47 -07:00
Andres Sommerhoff
eb9a002284 docs: clarify search_files newline regex behavior 2026-06-20 23:23:47 -07:00
lkz-de
6403ed06b3 docs(session-search): document source-first retrieval limits
Clarify that session_search is secondary context and direct source identifiers must be inspected first when accessible. Add regression coverage for the tool description.
2026-06-20 23:23:47 -07:00
Teknium
ea8a8b4af8
feat(delegation): background fan-out — parallel subagents, one consolidated return (#49734)
* feat(delegation): single-task delegate_task always runs in the background

The model no longer decides whether a subagent runs in the background — a
single-task delegate_task from the top-level agent is now always dispatched
async, so the parent turn returns immediately and the subagent's result
re-enters the conversation when it finishes.

- run_agent._dispatch_delegate_task (the live model path) forces
  background=True for top-level single-task calls; the schema-level
  `background` param is ignored.
- A batch (tasks with >1 item) stays synchronous (fan-out can't go async).
- A delegation from an orchestrator subagent (depth > 0) stays synchronous —
  it needs its workers' results within its own turn.
- The function-level default is unchanged, so direct Python callers/tests keep
  the historical synchronous behavior.
- On async-pool capacity rejection, single-task now falls through to a
  synchronous run instead of erroring (the child stays attached for interrupt
  propagation; detach happens only on a successful dispatch).
- Schema `background` param marked deprecated/ignored; tool description
  updated to state the always-background single-task rule.

* feat(delegation): all delegate_task fan-out runs in the background

Extend the always-background behavior to the full fan-out. A batch is now
dispatched as N independent async subagents (one handle each), instead of
running synchronously. Single task and batch both return immediately; each
subagent's result re-enters the conversation as its own message when it
finishes.

- delegate_task: when background is set, loop over ALL built children and
  dispatch each via dispatch_async_delegation; return a combined handle block
  (count + per-task delegation_ids). Children the async pool rejects (at
  capacity) run synchronously inline and are reported alongside the dispatched
  handles, so nothing is silently dropped.
- run_agent._dispatch_delegate_task + registry handler: force background for
  any top-level model delegation (single OR batch); orchestrator subagents
  (depth > 0) still run synchronously since they need workers' results within
  their own turn.
- Removed the v1 'batch async not supported' rejection.
- Tool description updated: BOTH MODES RUN IN THE BACKGROUND.
- Tests updated to assert batch fan-out dispatches each task async (verified
  E2E: 3-task batch -> 3 independent completion-queue events).

* fix(delegation): background fan-out joins and returns one consolidated block

Correct the fan-out semantics: a backgrounded batch is dispatched as ONE
async unit (one handle, one async-pool slot), not N independent dispatches.
The unit runs all children in parallel, waits on every one, and emits a
SINGLE completion event carrying the consolidated per-task results. The chat
is never blocked; when all subagents finish, their full summaries re-enter
the conversation together as one message.

- async_delegation.dispatch_async_delegation_batch + _finalize_batch: a batch
  occupies one slot; its runner returns the combined {results:[...]} dict and
  one event with the full results list is pushed to the completion queue.
- delegate_tool: extract the sync execution+aggregation into
  _execute_and_aggregate(); background dispatches it via the batch unit and
  returns one handle; on pool-capacity rejection it runs the batch inline.
- process_registry._format_async_delegation: render a consolidated multi-task
  block (TASK i/N + per-task summary) when the event carries is_batch/results.
- Tests updated; E2E verified: 3-task batch -> immediate return -> one combined
  completion block with all three summaries.
2026-06-20 11:27:12 -07:00
Teknium
5600105478 refactor(gateway): migrate slack/dingtalk/whatsapp/matrix/feishu/telegram/wecom/email/sms adapters to bundled plugins
Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.

Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).

Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
2026-06-20 10:26:45 -07:00
lkz-de
905820b59f fix(signal): share markdown formatting across send paths
Route Signal send paths through shared markdown formatting helpers and render markdown bullets consistently as Unicode bullets. Add coverage for Signal formatting and send_message integration.
2026-06-20 13:47:14 +05:30
kshitijk4poor
a6f08ff0c8 docs(delegate): clarify subagent model is config-level, not per-call
delegate_task has never exposed a per-call model parameter (removed
intentionally in fb0f579b1). The tool description gave no hint about how
subagent model is actually controlled, so users kept expecting a model
arg and filing it as a dropped/ignored param (e.g. #49332, #23467).

Add one bullet to the dynamically-built tool description stating that
children inherit the parent model + fallback chain, and that pinning all
subagents to a specific model is done via delegation.provider /
delegation.model in config.yaml. No behavior change.
2026-06-20 12:13:39 +05:30
hakanpak
d45addc2f1 fix(tools): never let a model whitelist strip the prompt / source images
_build_fal_payload and _build_fal_edit_payload assemble the request and then
filter it down to the model's supports / edit_supports whitelist. That filter
also covers prompt (and image_urls for edits), which every FAL endpoint
requires. Today all model configs happen to list those keys, but a single
config that omits one would silently produce a request with no prompt or no
source images — a broken generation with no error.

Always keep the mandatory keys regardless of the whitelist so a missing
whitelist entry can only drop optional knobs, never the prompt or the images.
2026-06-19 16:59:54 -07:00
KeyArgo
1e40b21b2e
docs: clean up three stale comments from the #32848 audit (#45638)
* docs: clean up three stale comments from the #32848 audit

- tools/memory_tool.py:20 — 'read' action was intentionally removed
  but the docstring still listed it. Now matches the schema.
- tools/fuzzy_match.py:9 — unicode_normalized was added but the
  chain-count docstring still said '8-strategy'. Now says '9'.
- run_agent.py:1485 — 'See #<TBD>.' placeholder was never filled in.
  Replaced with a backfill note.

Fixes #32848 (parts 3, 4, and 12)

* docs(memory): also remove stray memory(action=read) references in lines 144 and 201

The original #32848 audit fix (in 6fd661d6) only addressed line 20
(the action list in the module docstring), but the action was
referenced in two other places:

- tools/memory_tool.py:144 — in a class docstring, claimed
  'memory(action=read)' was a way to SEE poisoned entries
- tools/memory_tool.py:201 — in a user-facing warning message,
  told the user to 'use memory(action=read) to inspect'

Since the schema on line 683 only allows add/replace/remove, both
references were misleading: the first claimed a way to inspect
poisoned entries that doesn't exist, the second would error out
when the user followed the warning.

This commit removes both references:
- Line 144: '...keep the original text so the user can still SEE
  poisoned entries by inspecting the source files directly, and
  remove them — silently dropping them would hide the attack
  from the user.'
- Line 201: '...use memory(action=remove) to delete the
  original. (drop the read-action reference)'

Followup to the previous commit on this branch.

---------

Co-authored-by: KeyArgo <keyargo@argobox.com>
2026-06-19 16:09:30 -07:00
emozilla
40722058e5 fix(mcp): keep short-TTL HTTP sessions alive with configurable ping keepalive
MCP Streamable HTTP servers that garbage-collect idle sessions on a short
TTL (e.g. Unreal Engine's editor MCP, ~15s) were unusable: the keepalive
was hardcoded at 180s, so the session was always dead by the time it ran,
and every idle tool call then landed on an expired session and paid the
full reconnect path (observed hangs of 113-143s until interrupt, bounded
only by the 300s tool_timeout).

Two coordinated, backward-compatible changes:

- Add per-server `keepalive_interval` (config.yaml, not an env var per the
  contribution rubric). Default 180s — byte-identical to the old hardcoded
  value when unset — floored at 5s. Servers with short session TTLs set it
  below their TTL so the session stays warm.

- Switch the keepalive probe from `list_tools()` to `ping` (the MCP base
  protocol liveness primitive). On large servers `list_tools` pulled ~1 MB
  every cycle (830 tools = 1,068,041 bytes); `ping` is ~55 bytes and works
  uniformly across tool/prompt/resource servers. Tool-list changes still
  arrive out-of-band via notifications/tools/list_changed -> _refresh_tools.

`ping` is an OPTIONAL utility, so to guarantee zero regression for a
tool-capable server that doesn't implement it: the first -32601 latches
`_ping_unsupported` and the probe falls back to the pre-ping `list_tools`
path for that connection (no reconnect loop). The latch resets on each
fresh connection (_discover_tools, all transport paths) so a server that
gains ping support after a reconnect is re-probed with the cheap path.
Non-(-32601) ping errors propagate as genuine liveness failures.

Verified end-to-end against a live Unreal MCP server (idle 22s past the
~15s TTL -> post-idle tool call returns in 0.31s, no teardown) and with a
simulated ping-less tool server driving the real keepalive loop (ping once,
list_tools thereafter, no reconnect). 25/25 unit tests pass.

Note: a separate upstream defect (modelcontextprotocol/python-sdk#2604)
still tears down the whole session when one tool-call POST returns 4xx;
that is not addressed here.
2026-06-19 12:16:33 -07:00
alt-glitch
16642e2769 fix(mcp): revert ACP rebuild to original; harden generation guard
CI caught 3 ACP test failures (tests/acp/test_server.py,
tests/acp/test_mcp_e2e.py). Root cause: routing ACP's tool-surface rebuild
through the shared refresh_agent_mcp_tools helper (added in the round-2 pass)
broke a deliberate, pre-existing ACP contract:

- the ACP tests assert `agent.tools is <get_tool_definitions return>` (object
  identity) and an exact get_tool_definitions(enabled_toolsets=[...],
  disabled_toolsets=..., quiet_mode=True) call signature; the shared helper
  list()-copies and re-derives differently, breaking identity; and
- the tests use a MagicMock agent whose _tool_snapshot_generation is a mock, so
  the new `int < published_gen` generation guard raised TypeError and the whole
  ACP refresh silently failed.

ACP already preserves memory-provider tools (its own inject call) and excludes
context_engine, so there was no bug to fix there — only over-reach. Reverted ACP
to its original rebuild. (Same lesson as the gateway path: leave call sites that
carry their own tested contract alone; a reviewer's "inert today, fragile" note
meant leave-it, not change-it.)

Also hardened the generation guard defensively: tolerate a non-int
_tool_snapshot_generation (mock / partially-built agent) instead of throwing
TypeError and silently failing the refresh.
2026-06-19 11:57:43 -07:00
alt-glitch
88d523220f fix(mcp): address adversarial review round 2 (stale-publish race, parity holes)
Second review pass (Codex + Hermes subagent). Codex reproduced a real race with
a two-thread harness; both converged on the remaining issues.

- Generation-aware publish (fixes a lost-update race): two refresh callers (the
  late-refresh daemon and the between-turns prologue around turn 1) could each
  compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry
  generation could acquire the publish lock after a newer caller and clobber it,
  deleting just-landed tools. refresh_agent_mcp_tools now captures
  registry._generation before computing and refuses to publish a stale set;
  agent._tool_snapshot_generation tracks the published generation.
- Context-engine routing names (_context_engine_tool_names) are now staged on a
  local and published atomically with the snapshot, and only claimed when this
  rebuild actually appended the schema — matching agent_init's dedup so a
  registry/plugin tool of the same name keeps its own dispatch. (Previously
  mutated live, before the publish lock, and on no-change refreshes.)
- CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a
  server newly ENABLED in config mid-session wasn't picked up (TUI already
  re-resolved). Merge now-connected MCP server names into the override (unless
  the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in
  sync. Closes the CLI/TUI parity hole.
- ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th
  sibling rebuild that re-injected memory tools but NOT context-engine tools and
  bypassed the atomic/name-diff path (inert today, fragile).
- mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG
  (single source of truth) instead of a stale hardcoded 5.0 literal.
- Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout
  fallback uses DEFAULT_CONFIG.
2026-06-19 11:57:43 -07:00
alt-glitch
b6e2a54a94 fix(mcp): address adversarial review round 1 (cache parity, gates, races)
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):

- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
  dropping post-build-injected memory-provider (mem0/honcho/…) and context-
  engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
  the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
  tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
  (tools, valid_tool_names) pair are consistent even under concurrent callers
  (no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
  tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
  with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
  just enabled in config this session is picked up; automatic paths reuse the
  agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
  between-turns refresh, so the startup wait is only a small turn-1 UX bump
  rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
  zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
  (alternating tool sets), added the #5544-gate regression, the memory/context
  preservation regression, and a "callable next turn via valid_tool_names"
  contract; removed a dead monkeypatch line.
2026-06-19 11:57:43 -07:00
alt-glitch
3713483874 fix(mcp): refresh agent tool snapshot between turns (cache-safe late-binding)
A slow MCP server (HTTP/OAuth, 2-6s cold connect) that finishes connecting
after the agent's one-time tool snapshot was uncallable for the rest of the
session. The merged pre-first-turn late-refresh only helps during the dead air
before the user's first keystroke; once a turn starts it bails to protect the
prompt cache, so a user who types before the server connects never gets the
tools without a manual /reload-mcp.

Refresh the snapshot in the per-turn prologue (build_turn_context), before this
turn's first API call assembles tools=. This is cache-safe by construction: the
refresh only ever extends a fresh request prefix at a turn boundary, never
mutates the cached prefix of an in-flight turn. So late tools become callable on
the user's NEXT turn automatically, with no /reload-mcp and no cache cost.

- tools/mcp_tool.py: has_registered_mcp_tools() — cheap guard so sessions with
  no MCP servers (the common case) skip the rebuild entirely.
- agent/turn_context.py: call the shared refresh_agent_mcp_tools() helper at the
  top of the prologue when MCP servers are registered.
- tests: 3 contract tests through the real build_turn_context (adds late tool;
  skipped when no servers; no snapshot churn when unchanged).

.hermes/plans/: SPEC + PLAN documenting the root cause, the cache-safety
constraint, and why the existing fixes (#48403/#41630/#42802) don't close it.
2026-06-19 11:57:43 -07:00
alt-glitch
93d6e73028 fix(mcp): expose late-connecting MCP tools to the agent (TUI/CLI/gateway)
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:

1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
   commonly take 2-6s on a cold connect, so they missed the window and
   their tools never entered the agent's snapshot. `thread.join(timeout)`
   already returns the instant discovery completes, so raising the bound
   costs ~0s for the common case (no MCP / fast servers) and only ever
   blocks for a genuinely-pending server, capped so a dead server can't
   freeze startup. The bound is now configurable via
   `mcp_discovery_timeout` (config.yaml, default 5.0s).

2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
   `reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
   thread), and the late-refresh detected changes by tool COUNT — missing
   an equal-size add/remove swap. Consolidated into one shared
   `tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
   tool NAME, mutates the agent under a lock (thread-safe), and respects
   the agent's own enabled/disabled toolsets.

The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.

Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
2026-06-19 11:57:43 -07:00
Ludo Galabru
239740a19e feat(tools): MCP elicitation handler with gateway-aware approval routing
Wires support for the MCP `elicitation/create` request (Python SDK 1.11+)
so MCP servers can ask the user to confirm sensitive operations
mid-tool-call (payment authorization, OAuth confirmation, etc.) instead
of failing closed or requiring out-of-band biometrics.

Behavior:

- `tools/mcp_tool.py` adds `ElicitationHandler`, attached per server task
  and passed to `ClientSession` as `elicitation_callback`. Form-mode
  requests route through the existing approval system; URL-mode requests
  decline cleanly (out of scope for this pass).
- `tools/approval.py` adds `request_elicitation_consent()`, which dispatches
  to whichever surface owns the active session — `_await_gateway_decision`
  for Telegram / Slack / etc. (so the approval prompt lands on the right
  platform), `prompt_dangerous_approval` for CLI / TUI. Fails closed on
  timeout, missing notify_cb, or exception.
- The MCP tool wrapper snapshots `contextvars.copy_context()` into
  `MCPServerTask._pending_call_context` before each `session.call_tool`
  and clears it after. The recv-loop task that dispatches incoming
  `elicitation/create` requests does not inherit the agent task's
  contextvars (HERMES_SESSION_PLATFORM and friends), so without the
  bridge `_is_gateway_approval_context()` returns False on every
  gateway session and the elicitation falls through to a CLI prompt
  that has no TTY → fail-closed decline. The handler now reads the
  snapshot via its `owner` back-reference and replays it through
  `Context.copy().run(...)` so attribution survives the task hop.

Tests (`tests/tools/test_mcp_elicitation.py`):

- form-mode accept / decline / cancel
- URL-mode declined without prompting
- exception in approval system → decline
- timeout in approval → cancel
- context-bridge regression tests (replay observed in consent call,
  missing-context fallback, multiple-replay safety, owner with
  cleared `_pending_call_context`)

Verified end-to-end against pay's MCP server on macOS: agent message
arrives via Telegram, agent calls `mcp_pay_curl` against a paid endpoint,
pay returns 402, ElicitationHandler routes the approval prompt back to
the originating Telegram chat, user replies in TG, the curl tool signs
and completes.

Platforms tested: macOS 14 (darwin/arm64). No Unix-only syscalls
introduced; Windows footgun checker passes on the touched files.
2026-06-19 11:46:25 -07:00
teknium1
a58287afcb
Merge remote-tracking branch 'origin/main' into pr48275-rebase
# Conflicts:
#	cron/scheduler.py
2026-06-19 07:40:29 -07:00
Sahil Saghir
a5e06078b2 fix(cron): compact cron failure messages + repair bare repo dirs after git gc
Two small, focused fixes for the cron scheduler and checkpoint manager.

1. _summarize_cron_failure_for_delivery (cron/scheduler.py):
   Replaces the raw error dump in _process_job with a compact
   pattern-matched summary. Provider rate limits, timeouts, and
   authentication errors now produce a short human-readable message
   instead of dumping multi-KB provider JSON into the delivery channel.

2. _repair_bare_repo_dirs (tools/checkpoint_manager.py):
   Recreates refs/heads/ and branches/ directories after git gc
   --prune=now, which can remove empty dirs from bare repos and cause
   subsequent git add -A to fail with 'fatal: not a git repository'.
   Called after all four git gc call sites.

Both fixes use only standard library imports and plug into existing
call sites with no architectural changes.
2026-06-19 07:35:29 -07:00
Ben Barclay
f538470cf4 feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A)
The credential gate. When multiplexing is active, a profile's secrets resolve
from a context-local scope, never the process-global os.environ (which in a
multiplexer may hold another profile's keys, and is inherited by every
subprocess spawned with env=dict(os.environ)).

- agent/secret_scope.py: get_secret() backed by a secret-scope contextvar.
  FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped
  read RAISES UnscopedSecretError instead of falling back to os.environ — a
  missed/new call site crashes loudly at that line rather than leaking a
  cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths,
  …) keep reading os.environ via an allowlist. load_env_file/build_profile_
  secret_scope parse a profile .env into an isolated dict WITHOUT mutating
  os.environ. Off by default => transparent os.getenv behavior.
- hermes_cli/runtime_provider.py: all credential/provider/base-url reads go
  through _getenv -> get_secret.
- agent/credential_pool.py: env fallbacks route through get_secret (the
  ~/.hermes/.env-first preference is preserved and already profile-correct via
  the home override).
- tools/mcp_tool.py: MCP config  interpolation resolves through
  get_secret, so a server's  picks up the routed profile's value.
- gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env
  reload is a no-op for credentials in multiplex mode (secrets come from the
  scope, not global env); _profile_runtime_scope context manager combines the
  HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in
  that scope (resolved via _resolve_profile_home_for_source) when multiplexing.

Propagates into the agent worker thread for free via the existing
copy_context() in _run_in_executor_with_context.

Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing
without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove
two profiles isolated, unscoped read raises, globals still read environ).
2026-06-19 07:34:15 -07:00
alt-glitch
9e1f616136 fix(clarify): docstring — put options in choices[] only, never enumerate in question text
The model was enumerating options inside the question string (dead prose the UI
can't render as pickable rows). Schema description now spells out: choices[] is
REQUIRED for selectable options; question holds ONLY the question.
2026-06-19 07:34:02 -07:00