egilewski found the prior sink gate was transient: it only applied while
PluginManager executed register(ctx). A plugin could defer a direct
registry.register(..., override=True) to a post-load callback/thread, after
the scope was cleared, and still replace a built-in.
Make authorization durable by binding it to where the handler is DEFINED
(handler.__globals__['__name__']) rather than to call timing. At load, each
plugin's module namespace is mapped to its allow_tool_override opt-in in a
table that is never cleared. The sink resolves the handler's owning plugin
module and rejects an override from any plugin namespace without opt-in,
regardless of when or on which thread the call happens. Plugin namespaces
with no recorded policy are treated as not-opted-in (fail-closed). Built-in
and MCP handlers live outside the plugin namespace and are unaffected.
Adds a regression test for the delayed/post-load direct-registry override.
The opt-in gate lived only in PluginContext.register_tool, so a plugin
could bypass it by importing tools.registry and calling
registry.register(..., override=True) directly. Enforce the same gate at
the sink: during plugin load, the registry rejects an override from a
plugin without operator opt-in regardless of the path taken. Built-in and
MCP registrations (no active plugin scope) are unaffected.
Adds a regression test covering the direct-registry bypass.
The standalone send path (_send_telegram, used by the send_message tool,
cron delivery, and out-of-process callers) chunked the *raw* message on
UTF-16 length, then formatted and sent the result un-rechunked. MarkdownV2
escaping inflates the text (`!`/`.`/`-` -> `\!`/`\.`/`\-`), so a
4096 UTF-16-unit raw message can become ~8192 units once formatted and gets
rejected by Telegram as 'Message is too long'.
Move all text chunking into _send_telegram, after formatting: split the
formatted MarkdownV2/HTML text on UTF-16 length so every send is <=4096,
with per-chunk plain-text fallback and thread-not-found retry preserved.
Media attaches after all text chunks. (#28557)
Defense-in-depth follow-up to the runtime guard added in the previous commit.
Models on headless hosts (Railway / Fly / Docker / fresh VPS) without any ACP
CLI installed occasionally hallucinate ``acp_command="copilot"`` from the
schema description, despite the explicit "Do NOT set" instruction. The runtime
guard prevented the crash but the model still wasted a tool turn and got an
opaque silent fallback.
This commit removes the temptation at its source: ``_build_dynamic_schema_overrides``
now strips ``acp_command`` and ``acp_args`` from both the top-level and per-task
schemas when none of the known ACP CLIs (``copilot``, ``claude``, ``codex``) are
detectable on PATH. The model literally never sees the fields, so it cannot
pass them.
The runtime guard from the previous commit stays in place as defense-in-depth
for internal callers, tests, and any future code path that bypasses the schema.
``_acp_binary_available`` is intentionally NOT cached: ``shutil.which`` is
cheap, and avoiding the cache means the schema reacts to mid-session installs
without requiring a process restart.
Tests:
- ``test_schema_prunes_acp_command_when_no_acp_binary``
- ``test_schema_keeps_acp_command_when_binary_available``
- ``test_acp_binary_available_checks_known_clis``
Full ``test_delegate.py`` suite: 136/136 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a model passes `acp_command="copilot"` (or any other binary name) in a
`delegate_task` tool call, `_build_child_agent` unconditionally sets
`effective_provider = "copilot-acp"`, which routes the subagent through
`CopilotACPClient`. That client spawns the named binary via subprocess; if it
isn't on PATH, every retry raises RuntimeError and an asyncio cleanup race
during error delivery can take the entire gateway down.
This is a real failure mode on headless deploys (Railway / Fly / VPS / Docker)
where `copilot` / `claude` / etc. aren't installed. The schema does say
"Do NOT set unless the user explicitly told you an ACP CLI is installed,"
but models occasionally pass it anyway — particularly for X (Twitter) search
prompts where Grok seems to associate ACP with "search assistance."
Reproduction:
- Headless install (no `copilot` binary on PATH)
- Set provider to xai-oauth + model grok-4.3
- Telegram prompt: "Search X for crypto twitter trends"
- Grok decides to delegate and passes `acp_command="copilot"`
- Subagent crashes 3x, gateway crashes on the 3rd retry teardown
Fix: validate the binary exists on PATH via `shutil.which` before honoring
the override. If missing, log a warning and fall through to the parent's
default transport. No behavior change when the binary IS present (covered
by `test_build_child_agent_honors_acp_command_when_binary_present`).
Tests:
- `test_build_child_agent_ignores_acp_command_when_binary_missing`
- `test_build_child_agent_honors_acp_command_when_binary_present`
Verified on Python 3.11 (macOS) and 3.12 (Debian 13 container).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Concurrent ACP sessions run on a shared ThreadPoolExecutor (max_workers=4).
Each _run_agent mutated the process-global os.environ["HERMES_INTERACTIVE"]
and restored it in finally, so one session's restore could clobber another's
set mid-run — dropping the second session onto the non-interactive
auto-approve path, executing a dangerous command without the approval
callback firing (GHSA-96vc-wcxf-jjff).
Replace the env-var flag with a thread/task-local contextvar in
tools.approval. The two HERMES_INTERACTIVE read sites in approval.py now go
through _is_interactive_cli() (contextvar-first, env fallback for legacy
single-threaded CLI callers). The ACP executor sets the contextvar instead
of os.environ; the existing contextvars.copy_context() wrapper isolates each
session's write.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
delegate_task's per-task completion display emitted lines like
"✓ [1/3] Research done (17.92s)" via a bare print(). Under ACP (and any
headless JSON-RPC stdio host where AIAgent routes human output to stderr
via a custom _print_fn), these landed on stdout and corrupted the
protocol frame stream, surfacing as "Failed to parse JSON message: ✓
[3/3] …" in the ACP adapter.
Add _emit_parent_console() which prefers parent_agent._safe_print (the
same hook AIAgent uses for every other user-facing print) and falls back
to print() only when no router is wired up or it raises. CLI behavior is
unchanged.
The PR's other fix (preset toolset expansion) is already covered on main
by _expand_parent_toolsets(), so only the stdio-safe printing change is
salvaged here.
Batch delegation returned each subagent's full final_response verbatim
into the parent's context. A fan-out of N children could dump 60k+ tokens
at once, blowing the parent's context window and — on rate-limited
providers — triggering a compression/429 death spiral (429 misread as
context-too-large -> window step-down -> retry loop -> conversation dies).
Cap each summary against the parent's *remaining* context headroom split
across the batch (not a magic char count). When trimming, mirror the
web_extract convention: spill the full text to cache/delegation (mounted
into remote backends via credential_files._CACHE_DIRS) and return a
head+tail window (75/25, line-snapped) plus a footer with the exact
read_file offset to page the omitted middle. Both the subagent's opening
AND its closing (outcomes / files-changed / issues, which live at the end)
survive in-context, and nothing is lost — the parent can read_file the
full version on any backend.
delegation.max_summary_chars (default 24000) is a static ceiling layered
on top as belt-and-suspenders for models that ignore 'be concise'; 0
disables it. Child prompt tightened to lead with outcomes / bullets.
Co-authored-by: rc-int <rcint@klaith.com>
Three independent security-scanner hardenings, re-homed onto the current
shared threat-pattern architecture (tools/threat_patterns.py):
- approval.py: add bash/sh/zsh/ksh heredoc to DANGEROUS_PATTERNS. The
existing heredoc pattern only covered python/perl/ruby/node, so
`bash <<'EOF' ... EOF` ran arbitrary shell — including exfil pipelines
whose inner commands don't individually match a pattern — with no prompt.
- threat_patterns.py: apply unicodedata.normalize("NFKC", ...) before
pattern matching so full-width / compatibility homographs (e.g.
`cat ~/.hermes/.env`) are folded to ASCII and no longer bypass the
keyword scanners. Invisible-char detection still runs on the raw content
first (NFKC can strip those codepoints).
- code_execution_tool.py: add CREDS/BEARER/APIKEY to _SECRET_SUBSTRINGS so
vars like HERMES_LLM_CREDS, API_BEARER, MY_APIKEY are scrubbed from the
sandbox env. PASS was intentionally dropped from the original proposal —
it false-positives on BYPASS_CACHE / COMPASS_DIR / PASSENGER_HOST while
PASSWORD/PASSWD already cover the credential cases.
The original PR also proposed a 'synonym' injection pattern block
(overlook/forget/set aside/bypass/discard + developer-mode); dropped here
because it false-positives on ordinary AGENTS.md/SOUL.md prose ("don't
forget to follow the rules", "run in developer mode"), exactly the
bossy-English class threat_patterns.py is documented to avoid.
Salvaged from #9028.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
patch_tool extracts V4A patch paths so _check_sensitive_path can refuse
writes to /etc/*, /boot/*, etc. before they reach the low-level file ops.
The extraction regex had two gaps:
1. `*** Move File: src -> dst` was never extracted (regex only matched
Update/Add/Delete), so a Move targeting /etc/crontab skipped the
pre-check and fell back on the narrower file_operations deny list.
2. The regex required `\\s+` after `***` but patch_parser uses `\\s*`, so
`***Update File: /etc/hosts` (no space) parsed + applied while
skipping the check.
Loosen the leading whitespace to \\s* and add a Move regex that checks
both endpoints. Move endpoints also run through the same '..' traversal
rejection as the other V4A headers (closes the sibling gap on current
main, which gained that traversal guard after this PR was opened).
The gateway-lifecycle guard's hermes-CLI pattern required `hermes`
and `gateway` to be adjacent, so a profile flag slipped the agent
past it: `hermes -p ade gateway restart` was not flagged. That is the
exact form from the 2026-04-11 ade-profile self-kill loop. Allow an
optional run of global flags (`-p ade`, `--profile ade`, multiple
flags) between `hermes` and the gateway subcommand.
launchctl self-termination is already covered on main by #33071; this
narrows the only remaining real gap.
Ensure Windows desktop and local terminal teardown kill full process trees so Git Bash descendants cannot survive wrapper exits and accumulate across retries.
Two robustness gaps from the #54843 truncate-store path:
- _store_full_text wrote the full clean page to cache/web with no upper
bound (path.write_text(content)); a multi-MB page → unbounded per-extract
disk write. Cap at MAX_STORED_TEXT_CHARS (2MB, the pre-truncate-store
refusal ceiling) with a marker when capped.
- The truncation footer told the model 'read_file ... offset=<line>' — a
literal placeholder it had to guess. Compute the real starting line of the
omitted middle (head line count + 1) so the first read_file lands in the gap.
Follow-up to the judge gate. judge_goal() is fail-open at the source:
when no auxiliary model is reachable it returns a "continue" verdict
that is indistinguishable from a real "not done yet" judgment. The gate
treated any non-"done" verdict as a rejection, so an unconfigured or
degraded auxiliary model would wedge every goal_mode worker — it could
never close its own task. That contradicted the gate's own "fail-open"
comment.
Probe judge availability before enforcing (the same auxiliary client
lookup judge_goal performs) and only gate when a judge is actually
reachable. When none is, completion proceeds.
Also fix the rejection guidance: kanban_create takes parents=[...], not
parent=.
Add test_complete_goal_mode_allows_when_judge_unavailable covering the
fail-open path; update the rejection test to force the availability probe.
Apply naqerl's review comments on PR #38388:
- Hoist `from hermes_cli.goals import judge_goal` to module-level
imports so an import failure surfaces at module init, not lazily
on the first goal-mode completion (no circular import: hermes_cli
package init is trivial and does not load tools.kanban_tools).
- Narrow the fail-open `try` to wrap only the judge_goal() call.
The verdict check and its rejection `return tool_error(...)` now
live outside the handler, so a failure there can no longer be
swallowed by the broad except.
- Pass `exc_info=True` to the logger.warning call per CONTRIBUTING.md.
Update the test mock target to tools.kanban_tools.judge_goal, since
the hoisted import rebinds the name into this module's namespace.
Prevents workers in goal_mode from bypassing the auxiliary judge by
calling kanban_complete before acceptance criteria are met. The tool
handler now synchronously invokes the goal judge against the task's
title/body and the completion summary. If the verdict is not "done",
the completion is rejected with actionable guidance for the agent.
This keeps kanban_db.py as a pure SQLite wrapper while intercepting
the bypass exactly at the agent tool-call boundary, aligning with
Hermes separation of concerns.
Fixes#38367
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Add durable public-URL output and URL-based chaining to xAI Grok Imagine:
- Store generated media on files-cdn with permanent public HTTPS URLs
(public_url: true, no expiry by default).
- Chain by URL: generate -> edit -> extend each take a prior result's
public HTTPS URL (or a data URI / local file for inputs).
- Add provider-specific xai_video_edit and xai_video_extend tools.
- Image generation: public-URL/storage output, multi-reference edits,
and ~/ local-path support for image edits.
Credentials use xAI Grok device-code OAuth (separate PR).
* feat(web_extract): truncate-and-store instead of LLM summarization
web_extract no longer runs an auxiliary LLM over scraped pages. The extract
backends (Firecrawl/Tavily/Exa/Parallel) already return clean, boilerplate-
stripped markdown, so we return it directly: pages within a char budget
(default 15000, web.extract_char_limit) come back whole; larger pages get a
head+tail window plus an explicit footer giving the stored full-text path and
the read_file call to page through the omitted middle. The full clean text is
written to cache/web (mounted read-only into remote backends like the other
cache dirs), so nothing is lost.
Inline base64 images are converted to [IMAGE: alt] placeholders (token bombs
dropped) while real http(s) image URLs are preserved as links so the agent can
still web_extract/vision_analyze them.
Removes process_content_with_llm + the chunked summarizer + check_auxiliary_model
+ _resolve_web_extract_auxiliary. context_references._default_url_fetcher is
updated to the truncate path and its stale data.documents shape read is fixed
to results (it was silently returning empty).
Live before/after eval (firecrawl, 4 URLs): 11.7x faster overall (176.6s ->
15.1s); 10-60x on large pages. Quality identical; findability 4/4 (answer
recoverable from stored full text on every truncated page). web_search is
unchanged.
No own scraper added; no changes to web_search.
* fix(web_extract): add char_limit to execute_code web_extract stub
The new web_extract char_limit param must appear in the code_execution_tool
_TOOL_STUBS signature (and doc line) or test_stubs_cover_all_schema_params
fails — the stub schema must cover every real schema param.
Wire the salvaged _safe_command_timeout() guard into the surviving
open-timeout call site. _get_open_command_timeout() feeds the
browser_navigate 'open' path; this closes the last call site that
could observe a None timeout from a torn cache (#14331), since the
original PR's max(_get_command_timeout(), 60) site no longer exists
on main (now routed through _get_open_command_timeout).
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.
The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:
- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
encode/resize/dimension-check off the caller's loop, sized to the host's
usable core count with NO fixed ceiling — the cap tracks the actual
exhausted resource (cores), not a magic number. Excess encodes queue on
the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
(honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.
Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.
Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).
It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.
Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.
Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
When a Camofox browser tab is garbage collected (idle timeout, browser
recycle), the held tab_id becomes stale. The next browser_navigate call
hits /tabs/{stale_id}/navigate -> HTTP 404 -> unhandled HTTPError.
Catch the 404 in camofox_navigate, clear the stale tab_id, and create a
fresh tab via _ensure_tab. The agent recovers transparently without
requiring a session restart.
Other tab operations (snapshot, click, type, etc.) use the same pattern
but only fail if the tab dies between successful calls — much rarer.
The navigate fix covers 95%+ of cases since navigate is always the entry
point.
The Camofox browser backend hardcoded a 30s HTTP timeout via
_DEFAULT_TIMEOUT, ignoring the user's browser.command_timeout config.
The main browser_tool path already reads this config via
_get_command_timeout().
This commit adds an equivalent _get_command_timeout() to
browser_camofox.py that reads browser.command_timeout from config
with caching, and switches all HTTP helper methods (_post, _get,
_get_raw, _delete) to use it as the default timeout.
Fixes#40843
The five HTTP call sites in browser_camofox.py (_ensure_tab, _post,
_get, _get_raw, _delete) did not include Authorization headers, causing
403 Forbidden when the Camofox server has API key auth enabled.
Added _auth_headers() helper and wired it into all five call sites.
The health check endpoint (/health) is left without auth since it is
a connectivity probe, not a browser operation.
Regression test covers: header present when key set, absent when unset,
blank key produces empty headers.
Fixes#20476
The Camoufox REST API server expects `listItemId` in the `POST /tabs`
body, but `_ensure_tab` was sending `sessionKey`. This caused a 400
Bad Request on every `browser_navigate` call.
The parameter name mismatch is visible in the same file: line 283
already reads `tab.get("listItemId")` when adopting existing tabs,
confirming the server-side field name.
Fixes#37960
The supermemory and mem0 memory providers shipped third-party SDKs
(supermemory / mem0ai) that are not core dependencies, but — unlike the
honcho and hindsight providers — they imported those SDKs directly with
no tools.lazy_deps.ensure() preflight and had no LAZY_DEPS allowlist
entry. On the published Docker image the agent venv is sealed
(HERMES_DISABLE_LAZY_INSTALLS=1) and lazy installs are redirected to a
writable durable target (HERMES_LAZY_INSTALL_TARGET). honcho/hindsight
route through ensure() and install fine there; supermemory/mem0 never
called it, so their SDK was never installed on a hosted instance and the
provider silently reported itself unavailable even with the API key set.
Fixes:
- Add memory.supermemory + memory.mem0 to the LAZY_DEPS allowlist
(tools/lazy_deps.py), pinned to current PyPI releases.
- Call ensure('memory.<x>', prompt=False) at each SDK-import chokepoint
(_SupermemoryClient.__init__; Mem0MemoryProvider._create_backend),
mirroring honcho's wrapped try/except shape.
- Drop the SDK-import gate from supermemory's is_available() — it was a
chicken-and-egg trap (provider never loaded on a sealed venv, so
ensure() never ran). Now key-presence only, like honcho/mem0.
- Add matching pyproject extras [supermemory]/[mem0]; update the
lazy-covered-extras contract test (excluded from [all] by policy).
Tests prove each path fails without the fix and the real sealed-venv
durable-target gate accepts both features.
Make the read-only agent terminal mirrors stream in real time and give
the agent a desktop-only way to dismiss its own tabs.
- Stream background output live: the local reader used a blocking
read(4096) that buffered small periodic output until EOF, so agent
tabs only "filled in" at process exit. Switch to buffer.read1(4096)
(decoded) for incremental chunks.
- Route agent.terminal.output / terminal.close to the window that owns
the process (its gateway session) instead of an empty session id, so
events actually reach the desktop renderer.
- Add close_terminal: a HERMES_DESKTOP-gated tool (sibling of
read_terminal) that drops a process's read-only tab WITHOUT killing it
via process_registry.on_close; output keeps buffering and the user can
reopen from the status stack.
- ⌘W now closes a focused agent tab: mark the agent instance
data-terminal and focus it on activation so isFocusWithin routes there.
- ensureTerminal() no longer spawns an extra user shell when a tab
already exists (e.g. opening a background task from the status stack).
_normalize_approval_mode() previously accepted any string, so an unknown
value like 'auto' fell through every downstream mode check (off/smart) and
silently behaved like manual with no signal. Validate against the known
modes (manual/smart/off), emit a warning for anything else, and default to
manual to match the config default and the rest of the function.
Bug 1 from the original PR (/approve & /deny bypassing the running-agent
guard) already landed on main independently, so only the mode-validation
fix is salvaged here.
Fixes#4261
Co-authored-by: Hermes Agent <agent@nousresearch.com>
All three .env parsers use `line.partition("=")` without stripping the
bash-compatible `export ` prefix first. A line like `export API_KEY=sk-...`
produces key `"export API_KEY"` instead of `"API_KEY"`, silently ignoring
the variable and causing auth failures for users who copy-paste from
bash profiles or follow tutorials that include `export`.
- tools/skills_tool.py: `load_env()` for skill environment
- hermes_cli/config.py: `load_env()` for core config
- hermes_cli/main.py: `_has_any_provider_configured()` inline parser
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix(terminal): require approval for host-bound Docker commands
The Docker terminal backend blanket-skips dangerous-command approval on
the assumption that the container is isolated from the host. That holds
only when nothing is bind-mounted in. Once a host path is exposed (via
TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE or a host-path entry in
TERMINAL_DOCKER_VOLUMES), a command like `rm -rf /workspace` reaches
real host files but is still auto-approved.
Detect host bind mounts and route those sessions through the normal
approval flow. Isolated Docker keeps the fast path. The same gating is
applied to the execute_code guard, which had the identical blanket skip.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
* chore: add AUTHOR_MAP entry for PR #6436 salvage (Kolektori)
* test: accept has_host_access kwarg in _check_all_guards mocks
The host-bound Docker approval fix adds a has_host_access kwarg to the
_check_all_guards wrapper. Six pre-existing tests monkeypatch it with a
fixed (command, env_type) / (cmd, env) lambda signature, which now
raises TypeError when terminal_tool passes the new kwarg. Widen those
mock signatures to accept **kwargs.
---------
Co-authored-by: Kolektori <256073454+Kolektori@users.noreply.github.com>
Co-authored-by: Hermes Agent <agent@nousresearch.com>
On hosts where the cgroup v2 cpu/memory/pids controllers are not delegated
to the docker/podman process (unprivileged Proxmox LXCs, some rootless and
nested setups), --pids-limit/--cpus/--memory cause every container start to
fail with OCI runtime error / exit 126, breaking terminal + execute_code.
- Add _cgroup_limits_available(image): one-shot, host-wide cached probe that
spawns a throwaway container from the sandbox image itself (sleep 0) with
all three flags together, mirroring the existing _storage_opt_supported
probe-and-degrade pattern.
- Remove --pids-limit from static _BASE_SECURITY_ARGS; apply it (default 256
via _DEFAULT_PIDS_LIMIT) in resource_args gated on the probe.
- Gate --cpus and --memory on the same probe.
Behavior unchanged on cgroup-capable hosts; graceful degradation with a
one-time warning where controllers aren't delegated.
Fixes#6568.
(cherry picked from commit c933880b7e)
Co-authored-by: angelos <angelos@oikos.lan.home.malaiwah.com>
Replace the 5s output_tail poll (which often showed nothing) with a real push
stream. The process registry gains an on_output sink called from its reader
threads with each chunk; the tui_gateway wires it to emit agent.terminal.output
{process_id, chunk} (write_json is _stdout_lock-guarded, so emitting from the
reader thread is safe). The desktop routes chunks by process id straight into
the read-only agent xterm via a small writer registry, with a capped backlog so
a tab opened mid-stream (or reopened) replays what it missed.
Drops the fragile poll/tail path: no session-key matching, no truncation, no
lag — full-fidelity ANSI, env-agnostic (local/docker/ssh).
windows_hide_flags() already returns 0 on POSIX (and creationflags=0 is
the no-op default there, exactly how server.py::_list_repo_files does it),
so drop the IS_WINDOWS import + ternary/one-use-dict gating and just pass
creationflags=windows_hide_flags() directly. Tests lose the now-pointless
IS_WINDOWS monkeypatch.
The #54236/#54417 backend git/gh sweep routed git_probe, the repo-file
picker, coding_context, context_references, copilot_auth, and the gateway
process scans through CREATE_NO_WINDOW, but two sibling spawn legs that
also run inside the console-less desktop/gateway backend were missed:
- tools/checkpoint_manager.py `_run_git` (and the one-shot `git init
--bare` in `_init_store`) — when checkpoints are enabled, every
file-mutating turn fires multiple bare `git` calls (status, add,
write-tree/commit-tree, update-ref). Spawned from a parent with no
console (Electron spawns the backend with windowsHide → CREATE_NO_WINDOW),
each one allocates its own conhost window → a flurry of terminal popups.
- tools/skills_hub.py `GitHubAuth._try_gh_cli` — `gh auth token`, the same
bug class as the already-fixed copilot_auth gh probe.
Route both through `windows_hide_flags()` (no-op on POSIX), matching the
established per-site pattern. Tests added to
tests/test_windows_subprocess_no_window_flags.py.
* fix(daytona): quote single-upload mkdir parent path
The single-file _daytona_upload() path shelled out 'mkdir -p {parent}'
with the remote parent interpolated unquoted, so shell metacharacters in
the path could break the command or inject arbitrary commands into the
sandbox. The bulk-upload, bulk-download, and delete paths were already
hardened with shlex-quoting helpers; this single-upload path was missed.
Route it through the existing quoted_mkdir_command() helper and add a
regression test covering a path with shell metacharacters.
Reported by @Gutslabs (#3960); the original branch predated the
file_sync refactor, so the fix is re-applied to the current code path.
* docs(infographic): daytona quote-sync fix
The SSRF cluster (7a6fe9bb, 48f5c425, 7ef04ae7) sealed
browser_snapshot, browser_vision, and _browser_eval against
eval-navigated private pages, but browser_get_images bypasses
_browser_eval and calls _run_browser_command("eval", ...) directly.
An eval-driven navigation to a private address followed by
browser_get_images would leak image src URLs and alt text from the
private page.
Add the same _eval_ssrf_guard_active + _current_page_private_url
recheck before returning image data, matching the pattern established
by the sibling guards.
5 new tests cover: block on private page, allow on public page, skip
for local backend, skip when private URLs allowed, no guard needed on
failed eval.
When a local browser_navigate (or any browser command) fails fast because
Chromium isn't on disk, attempt a one-shot binary download via
`agent-browser install` and retry instead of only printing a hint.
Scope is narrow on purpose:
- binary only, never `--with-deps` (that shells apt/needs root, so missing
system libraries stay a user action)
- gated by `security.allow_lazy_installs` (same opt-out as every lazy install)
- skipped in Docker (Chromium ships in the image)
- attempted once per process
Follow-up to #54353, which made the cold-start failure legible; this closes
the "doesn't actually install the missing browser" gap for the common case.
Local browser_navigate cold-starts the agent-browser daemon and Chromium;
60s was too short on slow Linux hosts and timeouts discarded stderr,
leaving users with a generic failure. Use a 120s floor on first open,
inject --no-sandbox in Docker, include captured daemon output plus install
hints when commands time out, and show "Failed to open" in the desktop
tool chip when navigation returns success=false.
main (cb982ad99) wired windows_hide_flags() into the auxiliary git/gh/wmic/
bash/powershell/taskkill legs but left two it didn't reach, plus the Electron
backend-launch leg it explicitly deferred. Cover them the same way:
- apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE
pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a
console python.exe and flashes a conhost the desktop backend can't suppress.
Both backend creators put the venv site-packages on PYTHONPATH so imports
still resolve under the base interpreter. (main's commit said this Electron
leg "needs a Windows-tested change of its own".)
- tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord:
ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags().
- plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via
windows_hide_flags().
All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg
legs pass CREATE_NO_WINDOW.
OAuth-protected MCP servers (e.g. Hospitable) return 200 text/html on an
unauthenticated HEAD probe — a login/landing page the server cannot substitute
for a real MCP response without a Bearer token. The preflight cannot
distinguish this from a misconfigured URL, so it raises NonMcpEndpointError
before the OAuth browser flow has a chance to run.
Add `and self._auth_type != "oauth"` to the preflight condition in
MCPServerTask.run(). The probe is inapplicable to OAuth servers: their URL
legitimacy is established by .well-known/oauth-protected-resource during the
OAuth handshake, not by a GET content-type check.
Concrete repro: Hospitable (https://mcp.hospitable.com/mcp) returns
`200 text/html` to an unauthenticated httpx HEAD. Without the guard:
✗ NonMcpEndpointError at `hermes mcp test`
With the guard:
✓ Connected (1487ms) — 63 tools discovered
Relation to open PRs:
- #37598 adds a POST probe fallback for POST-only non-OAuth servers (e.g.
DocuSeal), but only passes when POST returns 2xx + MCP content-type.
Hospitable returns 401 on the POST probe (Bearer challenge), so #37598
does not cover this case.
- #49463 extends the POST probe to also pass on non-2xx auth challenges
(making it OAuth-aware), but is labeled duplicate of #37598 and may not
land independently.
This fix is complementary: it handles OAuth servers with zero extra
round-trips rather than adding a POST probe step.
Tests:
- test_oauth_server_html_response_raises_without_skip: documents that
_preflight_content_type raises NonMcpEndpointError for 200 text/html
(the underlying issue), with an OAuth-server docstring.
- test_run_skips_preflight_for_oauth: verifies that run() does NOT invoke
_preflight_content_type when auth_type=="oauth", using class-level
monkeypatching so the gate is exercised without a live MCP transport.
23 passed tests/tools/test_mcp_preflight_content_type.py
When security.redact_secrets is on (default), read_file/search_files/cat
applied redact_sensitive_text(code_file=True) to file content, which still
ran prefix masking. An API key in config.yaml (ghp_..., sk-..., xai-..., etc.)
came back as a head/tail mask like `ghp_S1...Pn2T` — a plausible-looking
truncated key. When an agent read that and wrote it back to config, the masked
value replaced the real credential, silently breaking auth (401). Production
evidence: a config.yaml found containing the exact 13-char masked GitHub PAT.
The two community PRs (#35529, #35534) fixed the corruption by NOT redacting
prefixes for config reads — but that exposes the user's real keys to the agent
context, model, and logs (a security regression). This takes the safer route:
keep redacting, but for file content emit a NON-REUSABLE sentinel.
- New `_mask_token_nonreusable`: prefix secrets -> `«redacted:ghp_…»` (vendor
label preserved for debuggability; zero secret bytes; angle-bracket/ellipsis
wrapper is syntactically invalid as a token so it can't be mistaken for or
written back as a usable key).
- New `redact_sensitive_text(file_read=True)` routes prefix matches through it
(implies code_file=True). Default/log/display mode is UNCHANGED — `_mask_token`
still keeps head/tail (fine for logs, never written back).
- Wired the 3 file_tools.py call sites (read_file / search_files / cat) to
file_read=True.
Fixes both the corruption AND avoids the secret-exposure of the un-redact
approach. 6 new tests (sentinel shape, no-leak, not-a-plausible-key, default
mode unchanged, file_read implies code_file, sk- prefix); 88 redact tests pass;
mutation-verified (reverting to the old mask fails the sentinel/leak tests).
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: adammatski1972 <289282750+adammatski1972@users.noreply.github.com>
Closes#35519. Supersedes #35529, #35534.
* fix(security): redact secrets in background process + foreground env-dump output
Terminal-output redaction was incomplete (#43025):
- Gap 1: process(action=poll/log/wait) returned background stdout verbatim —
no redaction at all. A background printenv/server/test emitting a key leaked
raw to the model, session.db, and CLI display. Same for the gateway
background-process watcher's completion/progress notifications.
- Gap 2: the foreground terminal path hardcoded code_file=True, which skips the
ENV-assignment pass, so an opaque token (no vendor prefix) from env/printenv
leaked even there.
Adds agent.redact.redact_terminal_output(output, command) as the single policy
for ALL terminal-output surfaces: env-dump commands (env/printenv/set/export/
declare) get the ENV-assignment pass (code_file=False) to mask opaque tokens;
other commands stay on code_file=True to avoid false positives on source dumps.
Wired into terminal_tool, process_registry (_handle_process boundary), and the
gateway watcher. Respects security.redact_secrets (no force) — opt-out preserved.
* docs: add infographic for #43025 terminal-output redaction fix