* docs: clean up three stale comments from the #32848 audit
- tools/memory_tool.py:20 — 'read' action was intentionally removed
but the docstring still listed it. Now matches the schema.
- tools/fuzzy_match.py:9 — unicode_normalized was added but the
chain-count docstring still said '8-strategy'. Now says '9'.
- run_agent.py:1485 — 'See #<TBD>.' placeholder was never filled in.
Replaced with a backfill note.
Fixes#32848 (parts 3, 4, and 12)
* docs(memory): also remove stray memory(action=read) references in lines 144 and 201
The original #32848 audit fix (in 6fd661d6) only addressed line 20
(the action list in the module docstring), but the action was
referenced in two other places:
- tools/memory_tool.py:144 — in a class docstring, claimed
'memory(action=read)' was a way to SEE poisoned entries
- tools/memory_tool.py:201 — in a user-facing warning message,
told the user to 'use memory(action=read) to inspect'
Since the schema on line 683 only allows add/replace/remove, both
references were misleading: the first claimed a way to inspect
poisoned entries that doesn't exist, the second would error out
when the user followed the warning.
This commit removes both references:
- Line 144: '...keep the original text so the user can still SEE
poisoned entries by inspecting the source files directly, and
remove them — silently dropping them would hide the attack
from the user.'
- Line 201: '...use memory(action=remove) to delete the
original. (drop the read-action reference)'
Followup to the previous commit on this branch.
---------
Co-authored-by: KeyArgo <keyargo@argobox.com>
MCP Streamable HTTP servers that garbage-collect idle sessions on a short
TTL (e.g. Unreal Engine's editor MCP, ~15s) were unusable: the keepalive
was hardcoded at 180s, so the session was always dead by the time it ran,
and every idle tool call then landed on an expired session and paid the
full reconnect path (observed hangs of 113-143s until interrupt, bounded
only by the 300s tool_timeout).
Two coordinated, backward-compatible changes:
- Add per-server `keepalive_interval` (config.yaml, not an env var per the
contribution rubric). Default 180s — byte-identical to the old hardcoded
value when unset — floored at 5s. Servers with short session TTLs set it
below their TTL so the session stays warm.
- Switch the keepalive probe from `list_tools()` to `ping` (the MCP base
protocol liveness primitive). On large servers `list_tools` pulled ~1 MB
every cycle (830 tools = 1,068,041 bytes); `ping` is ~55 bytes and works
uniformly across tool/prompt/resource servers. Tool-list changes still
arrive out-of-band via notifications/tools/list_changed -> _refresh_tools.
`ping` is an OPTIONAL utility, so to guarantee zero regression for a
tool-capable server that doesn't implement it: the first -32601 latches
`_ping_unsupported` and the probe falls back to the pre-ping `list_tools`
path for that connection (no reconnect loop). The latch resets on each
fresh connection (_discover_tools, all transport paths) so a server that
gains ping support after a reconnect is re-probed with the cheap path.
Non-(-32601) ping errors propagate as genuine liveness failures.
Verified end-to-end against a live Unreal MCP server (idle 22s past the
~15s TTL -> post-idle tool call returns in 0.31s, no teardown) and with a
simulated ping-less tool server driving the real keepalive loop (ping once,
list_tools thereafter, no reconnect). 25/25 unit tests pass.
Note: a separate upstream defect (modelcontextprotocol/python-sdk#2604)
still tears down the whole session when one tool-call POST returns 4xx;
that is not addressed here.
CI caught 3 ACP test failures (tests/acp/test_server.py,
tests/acp/test_mcp_e2e.py). Root cause: routing ACP's tool-surface rebuild
through the shared refresh_agent_mcp_tools helper (added in the round-2 pass)
broke a deliberate, pre-existing ACP contract:
- the ACP tests assert `agent.tools is <get_tool_definitions return>` (object
identity) and an exact get_tool_definitions(enabled_toolsets=[...],
disabled_toolsets=..., quiet_mode=True) call signature; the shared helper
list()-copies and re-derives differently, breaking identity; and
- the tests use a MagicMock agent whose _tool_snapshot_generation is a mock, so
the new `int < published_gen` generation guard raised TypeError and the whole
ACP refresh silently failed.
ACP already preserves memory-provider tools (its own inject call) and excludes
context_engine, so there was no bug to fix there — only over-reach. Reverted ACP
to its original rebuild. (Same lesson as the gateway path: leave call sites that
carry their own tested contract alone; a reviewer's "inert today, fragile" note
meant leave-it, not change-it.)
Also hardened the generation guard defensively: tolerate a non-int
_tool_snapshot_generation (mock / partially-built agent) instead of throwing
TypeError and silently failing the refresh.
Second review pass (Codex + Hermes subagent). Codex reproduced a real race with
a two-thread harness; both converged on the remaining issues.
- Generation-aware publish (fixes a lost-update race): two refresh callers (the
late-refresh daemon and the between-turns prologue around turn 1) could each
compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry
generation could acquire the publish lock after a newer caller and clobber it,
deleting just-landed tools. refresh_agent_mcp_tools now captures
registry._generation before computing and refuses to publish a stale set;
agent._tool_snapshot_generation tracks the published generation.
- Context-engine routing names (_context_engine_tool_names) are now staged on a
local and published atomically with the snapshot, and only claimed when this
rebuild actually appended the schema — matching agent_init's dedup so a
registry/plugin tool of the same name keeps its own dispatch. (Previously
mutated live, before the publish lock, and on no-change refreshes.)
- CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a
server newly ENABLED in config mid-session wasn't picked up (TUI already
re-resolved). Merge now-connected MCP server names into the override (unless
the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in
sync. Closes the CLI/TUI parity hole.
- ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th
sibling rebuild that re-injected memory tools but NOT context-engine tools and
bypassed the atomic/name-diff path (inert today, fragile).
- mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG
(single source of truth) instead of a stale hardcoded 5.0 literal.
- Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout
fallback uses DEFAULT_CONFIG.
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):
- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
dropping post-build-injected memory-provider (mem0/honcho/…) and context-
engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
(tools, valid_tool_names) pair are consistent even under concurrent callers
(no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
just enabled in config this session is picked up; automatic paths reuse the
agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
between-turns refresh, so the startup wait is only a small turn-1 UX bump
rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
(alternating tool sets), added the #5544-gate regression, the memory/context
preservation regression, and a "callable next turn via valid_tool_names"
contract; removed a dead monkeypatch line.
A slow MCP server (HTTP/OAuth, 2-6s cold connect) that finishes connecting
after the agent's one-time tool snapshot was uncallable for the rest of the
session. The merged pre-first-turn late-refresh only helps during the dead air
before the user's first keystroke; once a turn starts it bails to protect the
prompt cache, so a user who types before the server connects never gets the
tools without a manual /reload-mcp.
Refresh the snapshot in the per-turn prologue (build_turn_context), before this
turn's first API call assembles tools=. This is cache-safe by construction: the
refresh only ever extends a fresh request prefix at a turn boundary, never
mutates the cached prefix of an in-flight turn. So late tools become callable on
the user's NEXT turn automatically, with no /reload-mcp and no cache cost.
- tools/mcp_tool.py: has_registered_mcp_tools() — cheap guard so sessions with
no MCP servers (the common case) skip the rebuild entirely.
- agent/turn_context.py: call the shared refresh_agent_mcp_tools() helper at the
top of the prologue when MCP servers are registered.
- tests: 3 contract tests through the real build_turn_context (adds late tool;
skipped when no servers; no snapshot churn when unchanged).
.hermes/plans/: SPEC + PLAN documenting the root cause, the cache-safety
constraint, and why the existing fixes (#48403/#41630/#42802) don't close it.
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:
1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
commonly take 2-6s on a cold connect, so they missed the window and
their tools never entered the agent's snapshot. `thread.join(timeout)`
already returns the instant discovery completes, so raising the bound
costs ~0s for the common case (no MCP / fast servers) and only ever
blocks for a genuinely-pending server, capped so a dead server can't
freeze startup. The bound is now configurable via
`mcp_discovery_timeout` (config.yaml, default 5.0s).
2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
`reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
thread), and the late-refresh detected changes by tool COUNT — missing
an equal-size add/remove swap. Consolidated into one shared
`tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
tool NAME, mutates the agent under a lock (thread-safe), and respects
the agent's own enabled/disabled toolsets.
The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.
Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
Wires support for the MCP `elicitation/create` request (Python SDK 1.11+)
so MCP servers can ask the user to confirm sensitive operations
mid-tool-call (payment authorization, OAuth confirmation, etc.) instead
of failing closed or requiring out-of-band biometrics.
Behavior:
- `tools/mcp_tool.py` adds `ElicitationHandler`, attached per server task
and passed to `ClientSession` as `elicitation_callback`. Form-mode
requests route through the existing approval system; URL-mode requests
decline cleanly (out of scope for this pass).
- `tools/approval.py` adds `request_elicitation_consent()`, which dispatches
to whichever surface owns the active session — `_await_gateway_decision`
for Telegram / Slack / etc. (so the approval prompt lands on the right
platform), `prompt_dangerous_approval` for CLI / TUI. Fails closed on
timeout, missing notify_cb, or exception.
- The MCP tool wrapper snapshots `contextvars.copy_context()` into
`MCPServerTask._pending_call_context` before each `session.call_tool`
and clears it after. The recv-loop task that dispatches incoming
`elicitation/create` requests does not inherit the agent task's
contextvars (HERMES_SESSION_PLATFORM and friends), so without the
bridge `_is_gateway_approval_context()` returns False on every
gateway session and the elicitation falls through to a CLI prompt
that has no TTY → fail-closed decline. The handler now reads the
snapshot via its `owner` back-reference and replays it through
`Context.copy().run(...)` so attribution survives the task hop.
Tests (`tests/tools/test_mcp_elicitation.py`):
- form-mode accept / decline / cancel
- URL-mode declined without prompting
- exception in approval system → decline
- timeout in approval → cancel
- context-bridge regression tests (replay observed in consent call,
missing-context fallback, multiple-replay safety, owner with
cleared `_pending_call_context`)
Verified end-to-end against pay's MCP server on macOS: agent message
arrives via Telegram, agent calls `mcp_pay_curl` against a paid endpoint,
pay returns 402, ElicitationHandler routes the approval prompt back to
the originating Telegram chat, user replies in TG, the curl tool signs
and completes.
Platforms tested: macOS 14 (darwin/arm64). No Unix-only syscalls
introduced; Windows footgun checker passes on the touched files.
Two small, focused fixes for the cron scheduler and checkpoint manager.
1. _summarize_cron_failure_for_delivery (cron/scheduler.py):
Replaces the raw error dump in _process_job with a compact
pattern-matched summary. Provider rate limits, timeouts, and
authentication errors now produce a short human-readable message
instead of dumping multi-KB provider JSON into the delivery channel.
2. _repair_bare_repo_dirs (tools/checkpoint_manager.py):
Recreates refs/heads/ and branches/ directories after git gc
--prune=now, which can remove empty dirs from bare repos and cause
subsequent git add -A to fail with 'fatal: not a git repository'.
Called after all four git gc call sites.
Both fixes use only standard library imports and plug into existing
call sites with no architectural changes.
The credential gate. When multiplexing is active, a profile's secrets resolve
from a context-local scope, never the process-global os.environ (which in a
multiplexer may hold another profile's keys, and is inherited by every
subprocess spawned with env=dict(os.environ)).
- agent/secret_scope.py: get_secret() backed by a secret-scope contextvar.
FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped
read RAISES UnscopedSecretError instead of falling back to os.environ — a
missed/new call site crashes loudly at that line rather than leaking a
cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths,
…) keep reading os.environ via an allowlist. load_env_file/build_profile_
secret_scope parse a profile .env into an isolated dict WITHOUT mutating
os.environ. Off by default => transparent os.getenv behavior.
- hermes_cli/runtime_provider.py: all credential/provider/base-url reads go
through _getenv -> get_secret.
- agent/credential_pool.py: env fallbacks route through get_secret (the
~/.hermes/.env-first preference is preserved and already profile-correct via
the home override).
- tools/mcp_tool.py: MCP config interpolation resolves through
get_secret, so a server's picks up the routed profile's value.
- gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env
reload is a no-op for credentials in multiplex mode (secrets come from the
scope, not global env); _profile_runtime_scope context manager combines the
HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in
that scope (resolved via _resolve_profile_home_for_source) when multiplexing.
Propagates into the agent worker thread for free via the existing
copy_context() in _run_in_executor_with_context.
Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing
without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove
two profiles isolated, unscoped read raises, globals still read environ).
The model was enumerating options inside the question string (dead prose the UI
can't render as pickable rows). Schema description now spells out: choices[] is
REQUIRED for selectable options; question holds ONLY the question.
The xAI TTS REST endpoint (POST /v1/tts) accepts 'speed' (0.7-1.5)
and 'optimize_streaming_latency' (0/1/2) parameters, but the Hermes
built-in xAI provider was reading neither from config nor sending
either in the request body. Add them as tts.xai.speed and
tts.xai.optimize_streaming_latency config knobs (with global
tts.speed / tts.optimize_streaming_latency fallbacks).
- speed: float, clamped to 0.7-1.5. 1.0 (the API default) is omitted
from the request body to preserve the existing minimal-payload
contract.
- optimize_streaming_latency: int, clamped to 0-2. 0 (best quality,
the API default) is omitted from the request body.
Resolver order: tts.xai.<knob> overrides the global tts.<knob>.
Mirrors the existing Gemini TTS audio-tag rewrite path. When the input
has no explicit user/model speech tags, ask the configured auxiliary
model to insert a richer set of xAI-supported tags (laughs, sighs,
whispers, soft/loud, slow/fast, etc.) so voice-mode replies sound more
expressive. Falls back to the local conservative [pause]-only transform
on any auxiliary-model failure.
The built-in Piper provider (tts.provider: piper, Python piper-tts
package) already constructs piper.SynthesisConfig for the advanced
tuning knobs, but did not forward speaker_id from the user config.
This wires tts.piper.speaker_id through to SynthesisConfig.speaker_id
so multi-speaker ONNX models (e.g. libritts_r) can be addressed via
config without dropping to the command-provider path.
Changes:
- Add speaker_id to the has_advanced tuple so setting it triggers
SynthesisConfig construction (same gating as the other knobs).
- Pass speaker_id=speaker_id to SynthesisConfig. Defaults to 0
(Piper's own default; single-speaker models ignore the field).
- Tolerant parse: bad input (non-int strings, lists, dicts) is
dropped to 0 instead of raising. Booleans are rejected outright
(True/False would silently coerce to 1/0 and hide a config
mistake). Mirrors the same shape as the command-provider's
_resolve_command_tts_optional_number helper.
speaker_id is applied per-call via syn_config.speaker_id, so the
PiperVoice cache key is intentionally left as just (model, cuda) --
the same loaded model serves all speakers. Tests cover the
config knob, the tolerant parse, and the no-reload invariant.
sentence_silence is intentionally not added here: the Python
piper-tts SynthesisConfig does not expose that field (CLI-only).
The Discord fix (previous commit) handles dict-shaped clarify choices at the
Discord adapter only. The same dict-repr leak originates upstream at
tools/clarify_tool.py's str(c).strip() normalization — the single
platform-agnostic point both the CLI and every gateway adapter flow through.
When an LLM emits [{"description": "..."}] instead of bare strings, str(c)
produced {'description': '...'} which leaked onto the CLI panel
(cli.py:13048/13081), was returned verbatim as the user's answer
(cli.py:11945), and hit Telegram's numbered list too.
Add _flatten_choice (same label->description->text->title unwrap as the
Discord adapter, name/value excluded, keyless dicts dropped) and apply it at
the normalization line. Fixes CLI + Telegram + all platforms at the root;
the Discord smart-truncation now operates on already-clean text.
Adds johnjacobkenny to AUTHOR_MAP for the salvaged commit.
systemctl --user restart hermes-gateway run via the terminal tool is a
child of the gateway itself. When systemd delivers SIGTERM the gateway
kills this subprocess before it can complete, so the service may never
restart — reproducing issue #37453.
The hermes gateway restart/stop guard (hermes_cli/gateway.py) and the
cron-path guard (hermes_cli/cron.py) already block equivalent commands
in their respective paths but the terminal tool had no such defense.
Add a hard-block before command execution in terminal_tool: when
_HERMES_GATEWAY=1 and the command matches _contains_gateway_lifecycle_command,
return an error immediately. force=True cannot bypass it — unlike the
normal dangerous-command approval flow, here even a user-approved restart
would fail because the SIGTERM propagates to child processes.
Also extend _GATEWAY_LIFECYCLE_PATTERNS to match systemctl with flags
(e.g. systemctl --user restart) — the previous regex required the
action word immediately after systemctl with no flags in between.
Adds 9 regression tests: 6 blocked variants (parametrized), force bypass
attempt, safe systemctl passthrough, and guard-inactive-outside-gateway.
* feat(image-gen): add image-to-image / editing to image_generate
Brings image generation to parity with video generation: the unified
image_generate tool now edits/transforms a source image (image-to-image)
when given image_url / reference_image_urls, routing to each backend's
edit endpoint, exactly as video_generate routes to image-to-video.
- ImageGenProvider ABC: generate() gains keyword-only image_url +
reference_image_urls; new capabilities() declares modalities +
max_reference_images (defaults to text-only, backward compatible).
success_response gains a modality field; adds normalize_reference_images.
- image_generate tool: schema exposes image_url + reference_image_urls;
dynamic schema reflects the active model's actual edit capability so the
agent knows when image_url is honored. Handler + plugin dispatch forward
the new inputs; legacy/text-only providers get a clear modality_unsupported
error instead of silently dropping the source image.
- In-tree FAL: 7 models gain edit endpoints (flux-2-klein, flux-2-pro,
nano-banana-pro, gpt-image-1.5, gpt-image-2, ideogram/v3, qwen-image)
with per-model edit_supports whitelists + reference caps; routes to the
/edit endpoint and skips the upscaler for edits.
- Plugins: openai (images.edit, 16 refs), xai (/v1/images/edits via
grok-imagine-image-quality, JSON body per xAI docs), krea
(image_style_references, 10 refs). openai-codex stays text-only and
rejects edits with an actionable error.
- Tests: 15 new (payload, routing, dispatch forwarding, dynamic schema,
capabilities); updated 2 change-detector/lambda tests for the new schema.
- Docs: image-generation feature page, image-gen provider plugin guide,
tools reference.
* fix(image-gen): preserve legacy passthrough in fal/krea plugin tests
Two existing plugin tests asserted pre-image-to-image behavior:
- fal: forward image_url/reference_image_urls only when supplied, so a
text-to-image delegation stays byte-identical (no None kwargs).
- krea: keep dict-shaped image_style_references refs verbatim (the unified
string refs go through normalize_reference_images; legacy non-string ref
objects pass through unchanged) — fixes KeyError when callers pass the
richer Krea ref-object shape.
* fix(image-gen): clearer not-capable message for text-to-image-only models
When a text-to-image-only model (incl. gpt-image-2 on the Codex OAuth path,
which can't do editing through the Responses image_generation tool) gets a
source image, say 'this model is not capable of image-to-image / editing —
provide a text-only prompt' rather than sending the user shopping for other
backends. Applies to the openai-codex guard, the in-tree FAL no-edit-endpoint
error, and the dynamic tool-schema text-only line.
When a worker calls kanban_create from inside a session that has a
persistent delivery channel, the originating session is now subscribed
to the new task's completion/block events automatically. The agent
that dispatched the task gets notified instead of having to poll.
- Gateway sessions (telegram/discord/slack): HERMES_SESSION_PLATFORM +
HERMES_SESSION_CHAT_ID ContextVars, set by the messaging gateway.
- TUI / desktop sessions: HERMES_SESSION_KEY in the subprocess env.
The TUI notification poller keys on platform='tui' + chat_id=<key>.
- CLI / cron / test: no persistent channel, no subscription.
Gated by kanban.auto_subscribe_on_create in config.yaml (default True).
Disable to mirror pre-feature behaviour — users who want explicit
kanban_notify-subscribe calls per task can set it to false. This
config gate addresses the design concern that got PR #19718 reverted
upstream (unconditional implicit auto-subscribe on tool-driven
kanban_create was too aggressive for orchestrator users).
HERMES_SESSION_ID is intentionally not a fallback channel — it is
set by ACP/agent subprocess telemetry for every invocation, not just
TUI, so treating it as a notification target would auto-subscribe
every CLI session and re-introduce the over-eager behaviour.
The kanban_create response now includes a 'subscribed' bool so
orchestrators can react if subscription failed (e.g. by falling
back to explicit kanban_notify-subscribe or to polling).
Includes 6 tests covering the gateway / TUI / CLI / partial-context /
gated / add_notify_sub-failure paths. All 90 tests in
test_kanban_tools.py pass; 509 broader kanban tests pass.
The memory tool was strictly one-op-per-call. With the store running near
its char limit by design, a new add that would overflow gets rejected with
'consolidate now, then retry' -- but the model could not consolidate and add
in one call. It had to remove/replace across several turns, then retry the
add, each turn re-sending the whole conversation context. Expensive thrash.
Add an 'operations' array: a list of add/replace/remove ops applied
atomically against the FINAL char budget. The model frees space and adds new
entries in ONE call, even when an add alone would overflow. All-or-nothing:
any bad op aborts the whole batch, nothing written.
Root-cause note: the two agent-level memory interception sites
(agent_runtime_helpers.py, tool_executor.py) silently dropped any param not
in their explicit kwarg list, so 'operations' never reached the handler and
batch calls failed with 'Unknown action None'. Both now pass it through and
bridge each add/replace op to external memory providers.
Also: success response is now terminal (done=true + 'do not repeat' note,
no full-entries echo that invited re-edits); schema rewritten to lead with
the batch mechanism and an explicit one-shot stop rule (2138 -> 1476 chars).
Live-verified: near-full consolidate-and-add went 7 calls -> 1 call,
stable across 3 reps. 103 memory/approval tests + 398 background-review/
run_agent tests green; 6 new batch tests added.
The salvaged guard allowed _rmtree_writable(SKILLS_DIR) itself. No call
site ever passes the root — every site passes a skill subdir or its .bak
sibling — so allowing the root only preserves the #48200 footgun (a dest
that collapses to the root wipes every installed skill). Require a strict
strict-child relationship and update the test that documented the
nonexistent 'full reset' capability.
Defense-in-depth fix for the silent wipe of ~/.hermes/ documented in
#48200. A `hermes update --yes` run silently destroyed a user's
.env, MEMORY.md, kanban.db, custom skills, and scripts. Two changes:
1. `_rmtree_writable` in tools/skills_sync.py now refuses to rmtree
anything outside SKILLS_DIR (the HERMES_HOME/skills/ root).
All five call sites pass paths under SKILLS_DIR, so the guard is
a no-op for current code and a loud, recoverable failure for
any future regression (bad path join, malicious bundled
manifest, stale path in scope after an exception).
2. The default `updates.pre_update_backup` flips from false to
true in hermes_cli/config.py. A few minutes of zip per update
is negligible compared to silent total data loss. Still
overridable; --no-backup still works for one-off opt-out.
Five new tests in TestRmtreeWritableScopeGuard (root path,
hermes home, sibling dir, skills root itself, subdir) plus a
flipped `test_default_enabled_creates_backup` in test_backup.py.
178/178 tests pass in the two affected files. Public method
signatures unchanged, no test-stub blast radius.
Closes#48200
- Use _sanitize_subprocess_env() to filter Hermes-managed credentials
from the cua-driver subprocess environment (issue #37878)
- Prevents credential exfiltration to the third-party cua-driver binary
- Aligns with existing pattern used by browser-tool and other tools
- Add regression test to verify environment sanitization
The cua-driver is a lower-trust MCP subprocess per SECURITY.md §2.3.
Its inherited environment is now scrubbed by default, removing provider
API keys, gateway tokens, and platform credentials that should not leak
to third-party binaries.
Fixes#37878
Cleanup pass on the salvage (behavior-preserving):
- diff_bundled_skill now uses the existing _skill_file_list() helper
instead of reimplementing the rglob/is_file/relative_to file-set
enumeration inline (twice).
- Extract _is_tracked_user_modification(origin_hash, user_hash) and use
it in BOTH the sync loop and list_user_modified_bundled_skills() so the
'kept user edit' rule can't drift between the two sites.
- _read_text_for_diff -> _read_for_diff returns (bytes, text); the binary
branch now compares the bytes it already read instead of re-reading
both files from disk.
- Drop the unused 'user_present' key from diff_bundled_skill's return
contract (no consumer or test ever read it).
- test_update_modified_notice: drop the brittle '>= 2 sites' count-floor
so consolidating the two print paths into a shared helper stays a
welcome refactor; keep the per-site 'count notice => discovery hint'
invariant (still mutation-tested).
`hermes update` keeps (won't overwrite) bundled skills the user edited
locally, but only printed a count — "~ N user-modified (kept)" — with no way
to learn which skills, or see what changed. Reverting already existed
(`hermes skills reset <name> [--restore]`); discovery and inspection did not.
Add two CLI commands (zero model-tool footprint), reusing the manifest
origin-hash that sync already maintains:
- `hermes skills list-modified [--json]` — list the bundled skills whose
on-disk copy diverges from the last-synced origin hash (the exact test the
sync loop uses to decide what to skip).
- `hermes skills diff <name>` — unified diff between the user's copy and the
current bundled (stock) version, so the user can confirm what changed
before reverting.
Both are mirrored as `/skills list-modified` and `/skills diff`. The
`hermes update` notice now points at `hermes skills list-modified`. Core
helpers `list_user_modified_bundled_skills()` and `diff_bundled_skill()` live
in tools/skills_sync.py alongside the existing reset logic.
Follow-up to #47663 (streaming multipart upload), fixing two issues that
landed with it.
1. Temp file leaked on client disconnect. The streaming upload endpoint's
except chain caught only HTTPException / PermissionError / OSError — all
Exception subclasses. asyncio.CancelledError, raised when a browser aborts
a large upload mid-stream (the exact NS-501 scenario), is a BaseException,
so it bypassed every except clause and reached a finally that only closed
the file handle and never unlinked the temp file. Every aborted large
upload orphaned a partial `.{name}.*.upload` file (up to ~100 MB) in the
target directory. Cleanup now lives in finally, keyed on a `renamed`
success flag, so the temp file is removed on every non-success exit
including BaseException paths. Added test_stream_upload_cleans_temp_on_cancellation,
which fails on the pre-fix code (leaks the temp file) and passes with the fix.
2. python-multipart pinned to ==0.0.27 instead of ==0.0.20. The package was
already resolved at 0.0.27 transitively (via daytona) before #47663; the
explicit ==0.0.20 pin in the [web] extra and the tool.dashboard lazy-install
set downgraded it. Bumped both to ==0.0.27 and regenerated with `uv lock`,
keeping the lockfile coherent. The base dependency stays >=0.0.9,<1.
* fix(dashboard): stream file uploads via multipart instead of base64 JSON
The dashboard file manager uploaded files (including backup/restore zip
archives) by reading them client-side with FileReader.readAsDataURL and
POSTing a base64 data URL inside a JSON body to /api/files/upload. For a
large backup this (a) inflates the payload ~33%, (b) buffers the whole
file plus its decoded copy in memory, and (c) reliably trips an upstream
proxy body-size/timeout limit, surfacing as a 502 with the upload
appearing to hang indefinitely (NS-501). Dashboard-only hosted users have
no shell fallback to place the archive, so backup restore was unusable.
Add a streaming multipart endpoint POST /api/files/upload-stream
(UploadFile + Form) that reads the request body in 1 MiB chunks straight
to a sibling temp file, enforces the existing 100 MB size cap as it
streams (413 on overflow, before buffering the whole file), and
atomically renames into place so a partial/aborted/over-limit upload
never clobbers an existing file. The frontend api.uploadFile now sends
multipart/form-data (raw bytes, no base64, browser-set boundary) and
FilesPage passes the File object directly; the dead readAsDataUrl helper
is removed. The legacy base64 JSON endpoint stays for backward compat.
FastAPI's UploadFile/Form require python-multipart, which is NOT pulled in
by fastapi itself, so it is added to the base deps, the [web] extra, and
the tool.dashboard lazy-install set (kept in sync).
Validated: 5 new endpoint tests (roundtrip, multi-chunk >1 MiB,
over-limit 413 without clobbering + no temp-file leak, overwrite=false
conflict, forced-root traversal containment); existing base64 tests still
pass; web typecheck + vite build clean; and a real uvicorn server E2E
(5 MB multipart upload -> HTTP 200 in 0.21s, exact byte match) plus a
30 MB TestClient roundtrip confirm constant-memory streaming end to end.
Reported via beta (NS-501).
* build(deps): regenerate uv.lock for python-multipart (NS-501)
CI ran uv lock --check / uv sync --locked which failed because the
python-multipart dependency add was not reflected in uv.lock. Regenerate
the lockfile (resolves to 0.0.20, matching the [web] extra pin) after
merging current main.
Phase 4F (F.1 + F.2 + F.3, agent side). F.4 is the operator-run live smoke
(needs a NAS deployment); recorded in the PR, not code.
F.1 — on_jobs_changed wiring:
- cron/scheduler.py: _notify_provider_jobs_changed() — resolve the active
provider, call on_jobs_changed(), swallow errors. Lives in scheduler.py (not
jobs.py) so the store stays free of provider imports (no import cycle).
- Wired at the consumer surfaces AFTER a successful mutation: the cronjob model
tool (tools/cronjob_tools.py, create/update/remove/pause/resume) — which the
`hermes cron` CLI also routes through — and the REST handlers
(gateway/platforms/api_server.py, same five). Built-in's no-op default = zero
behavior change on the default path. Sleeping-agent direct jobs.json writes
(no tool/CLI/REST) are covered by reconcile-on-wake in start().
F.2 — config: cron.chronos.{portal_url,callback_url,expected_audience,
nas_jwks_url}. All non-secret; the agent holds no scheduler creds and the
outbound provision call reuses the existing Nous token (no token key). Additive
deep-merge key, no version literal.
F.3 — docs:
- docs/chronos-managed-cron-contract.md: authoritative agent↔NAS wire contract
(the three agent-cron endpoints + inbound /api/cron/fire + the 3-hop trust
model + at-most-once/re-arm semantics). This is what the NAS-side agent builds
against.
- cron-internals.md: "Managed cron (Chronos) for scale-to-zero" section.
- cli-commands.md: cron.provider accepts chronos + the cron.chronos.* keys.
- User docs name no scheduler vendor (QStash is a NAS-internal detail).
INVARIANT re-verified: zero qstash/upstash hits across plugins/cron, gateway,
hermes_cli, tools, website/docs (the one remaining repo hit is an unrelated
Context7 MCP comment in tools/mcp_tool.py).
Tests: test_jobs_changed_notify (5) — notify calls provider hook, swallows
errors, built-in harmless, tool create/remove notify. Full cron + chronos +
webhook + config + api_server_jobs suites green (504 in the cron+chronos+webhook
run).
* feat(search_files): path-grouped lossless densification of content matches
Content-mode search_files results repeat the {path,line,content} JSON keys
and the full path string for every match. Group consecutive same-path matches
under one path header with indented '<line>: <content>' rows — lossless (every
path/line/content byte preserved), self-describing (matches_format key), and
readable by the model with no decode step.
57.8% mean token reduction on real search_files content outputs (422-output
corpus), fires on 97% of them. Gated at >=5 matches; below that the verbose
array is left untouched. Default to_dict(densify=False) is unchanged, so no
other caller is affected.
ripgrep emits matches path-ordered, so consecutive grouping never reorders
results.
* test: accept densify kwarg in _FakeSearchResult.to_dict
The search loop-detection tests stub SearchResult with a fake whose
to_dict() must mirror the real signature now that it takes densify=.
* test(search_files): edge-case losslessness battery for densification
Adversarial single-line content (colons, indentation, unicode/emoji, empty,
trailing whitespace, quotes+commas), paths with spaces, and an explicit
one-line-per-match invariant documenting the ripgrep contract the format
relies on (0/6775 real match contents contained a newline).
* feat(mcp): raise default tool-call timeout 120s -> 300s
Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.
- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md
* refactor: remove agent-callable send_message tool
The agent should not decide on its own to fire off cross-platform
messages or reactions. Outbound platform messaging is handled outside
the agent loop — cron delivery, the gateway kanban notifier
(dashboard-toggled), and the `hermes send` CLI.
Removes the model-tool registration only; the send engine in
send_message_tool.py (_send_to_platform, _send_via_adapter,
_parse_target_ref, per-platform _send_* helpers) is kept intact for
those non-agent callers. Drops the now-empty 'messaging' toolset and
its `hermes tools` toggle. Yuanbao DM guidance now points at the
native yb_send_dm tool.
restore_skill() falls back to p.name.startswith(f"{skill_name}-") when no
archive directory matches the requested name exactly. That fallback is meant
to catch the timestamped duplicate archive_skill() writes on a name collision
(<skill>-YYYYMMDDHHMMSS), but the bare prefix also matches any unrelated
archived skill named <name>-something. So restoring "git" can pull an archived
"git-helpers" out of .archive/, rename it to "git", and report success: the
requested skill is not restored and the sibling is gone from the archive.
Constrain the fallback to the exact suffix archive_skill() produces, a 14 digit
timestamp. The exact-name match and the recursive nested-archive walk are
unchanged, so nested and timestamped restores still work; unrelated siblings no
longer match.
Fixes#47647
Support files under references/, templates/, assets/, and scripts/ are progressive-disclosure data loaded through skill_view(..., file_path=...). They should not be treated as standalone skills during discovery or collision checks.
This prevents archived skill packages or support markdown files inside a real skill from shadowing active skills with the same name while still allowing top-level categories named scripts/templates/assets/references.
Tests cover:
- pruning nested SKILL.md files inside skill support directories
- preserving support-named top-level categories
- avoiding skill_view collisions from support markdown
- keeping archived package SKILL.md files accessible only through file_path
* feat(desktop): stream subagent replies into watch windows
A desktop watch window resumes a child session lazily (no full agent) and
mirrors the parent-relayed `subagent.*` events into native child-session
stream events. The child's streamed reply text was never relayed, so the
window sat blank while the subagent "talked".
- delegate_tool: forward the child's `run_conversation` stream tokens up the
progress relay as `subagent.text` (inert under CLI/TUI — their progress
handlers ignore non-tool event types; only a gateway watch window mirrors it).
- server: mirror `subagent.text` -> `message.delta` on the child sid only, and
skip the parent emit (per-token frames are meaningless on the parent session,
which shows the child via the spawn tree). Demote `subagent.start` to a
one-time goal header and drop the noisy `subagent.progress` mirror — tools
already mirror natively.
- server: guard `_start_agent_build` so a lazy watch session spectating an
in-flight child stays lazy; incidental RPCs were upgrading it to a full
agent mid-stream and silently killing the mirror.
* fix(desktop): keep watch-window chat clear of titlebar chrome
Secondary windows (new-session scratch, subagent watch, cmd-click pop-out)
hide the titlebar tool cluster + session header, so the transcript ran to the
window's top edge and streamed text slid up under the OS traffic lights.
- Gate the hidden chrome on `isSecondaryWindow()` everywhere (app-shell,
chat header, thread list) instead of the narrower new-session flag.
- Add a fixed opaque drag-strip at the top of the secondary-window transcript:
content padding alone scrolls away with the text, so the strip masks
anything behind it and keeps the window draggable like the main header.
* fix: WSL subagent window
* fix: subagent window top padding
---------
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
When display.memory_notifications is set to 'verbose', skill_manage
notifications now show meaningful change details instead of just the
generic tool message.
Before (verbose mode):
💾📝 Patched SKILL.md in skill 'gogcli' (1 replacement).
After (verbose mode):
💾📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..."
Changes:
- skill_manager_tool.py: _patch_skill() now includes old/new string
previews (truncated to 200 chars) in the result via '_change' key.
_create_skill() and _edit_skill() include skill description from
frontmatter for verbose create/edit notifications.
- run_agent.py: Background review notification builder now reads the
'_change' dict from skill tool results and formats descriptive
notifications per action type (patch → old→new diff, create/edit →
description preview). Falls back to generic message when _change
data is unavailable (backwards compatible).
This is especially useful when subagents patch skills, since neither
the user nor the parent agent can see what the subagent changed.
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
* feat(delegation): async background subagents via delegate_task(background=true)
delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.
- tools/async_delegation.py: daemon-executor registry, capacity cap,
rich self-contained completion event pushed onto the shared
process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
results into the originating session (idle + post-turn), session_key
routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).
Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.
* test(delegation): make async non-blocking tests environment-independent
CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.
Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.
* fix(delegation): harden async background delegation
Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
evicts (release_clients), and session close (/new) kill/close the
detached subagent mid-run, defeating the point of background mode.
Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
fallthrough keyed them all as ("", type), suppressing every completion
after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
unused _ASYNC_DEFAULT import.
Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.
* fix(delegation): harden async background completion drains
* fix(teams): package Microsoft Teams SDK as an installable extra
The Teams adapter imports the microsoft-teams-apps SDK, but it was never
declared as a dependency, so source/local installs hit ImportError and the
adapter silently reported the SDK as unavailable. Add a 'teams' extra
(microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'.
Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added
to [all] (they would break every fresh install on a quarantined release); the
teams extra is installed on demand like the other platform backends.
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
* chore: map rio-jeong contributor email for attribution (#43945)
* feat(teams): lazy-install the Teams SDK on demand (parity with other channels)
The teams extra alone left Teams as the only messaging platform that wouldn't
auto-install its SDK — every other channel (telegram, discord, slack, matrix,
dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring
Teams to parity:
- Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp).
- Replace the passive 'check_teams_requirements = check_requirements' alias with
a real lazy-installer that calls ensure_and_bind('platform.teams', ...),
rebinding all Teams SDK globals on success (mirrors check_slack_requirements).
- Call check_teams_requirements() at the top of TeamsAdapter.connect() so
enabling Teams installs the SDK on demand.
- Keep the passive check_requirements() as the registry check_fn so 'gateway
status' probes never trigger a pip install.
The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'.
Tests: rework the alias test into shortcircuit + lazy-install assertions, and
update test_connect_fails_without_sdk to simulate an uninstallable SDK.
---------
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* fix: declare websockets as a core dependency
* fix(deps): relax dev setuptools pin 82.0.1 -> 81.0.0 (torch caps setuptools<82)
torch >= 2.11 publishes Requires-Dist: setuptools<82, so any environment
that resolves the dev extra together with torch is unsatisfiable:
$ uv pip install --dry-run ".[dev]" "torch==2.12.0"
x No solution found when resolving dependencies:
... torch==2.12.0 and all versions of hermes-agent[dev] are incompatible.
81.0.0 is the latest release under the cap and stays inside the declared
build-system window (setuptools>=77.0,<83). uv.lock regenerated with
'uv lock'; diff is scoped to the setuptools entry.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* chore: map salvaged contributor emails for attribution
Add AUTHOR_MAP entries for the two cherry-picked contributors so the
check-attribution CI gate passes:
- yehaotian@xuanshudeMac-mini.local -> ArcanePivot (#45486)
- dbeyer7@gmail.com -> benegessarit (#44693)
---------
Co-authored-by: 玄枢 <yehaotian@xuanshudeMac-mini.local>
Co-authored-by: David Beyer <dbeyer7@gmail.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Track why a background process finished and include that source in notify-on-complete messages so SIGTERM from process.kill, kill_all, backend loss, and ordinary exits are distinguishable.
send_message(target="whatsapp:<group-jid>") silently delivered to the
configured home DM instead of the requested group. Two gaps:
1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us),
user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and
broadcast/newsletter JIDs matched no pattern and fell through to
`return None, None, False`, so the caller treated them as
unresolvable and used the home channel. The bridge's /send endpoint
accepts any chatId, so only the tool-side target parsing was at fault.
Add a whatsapp branch that recognizes native JIDs as explicit targets.
The pre-existing '+'-prefixed E.164 path is preserved.
2. WhatsApp groups have no human-friendly name — the channel directory
is regenerated from session data on a timer, so a group shows up as
its raw 18-digit JID and any hand-edit to channel_directory.json is
clobbered on the next rebuild. Add a user-maintained alias overlay
(~/.hermes/channel_aliases.json) re-applied on every build AND every
load, giving durable friendly names and letting a freshly-created
group be pre-named before its first message.
Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser;
TestChannelAliases (7 cases) for the overlay, plus an autouse fixture
isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the
existing directory tests.
_configured_terminal_cwd and _registered_task_cwd_override carried a
byte-identical sentinel + expanduser + isabs validation tail. Extract it
into _sentinel_free_abs_cwd(raw) so the relative/sentinel rejection rule
lives in one place. Behaviour unchanged (the str() coercion the override
path relied on is preserved in the helper).
The session-cwd fix inserted a registered task/session cwd override step
between the live-cwd and $TERMINAL_CWD fallbacks, but three docstrings still
described the old two-step order — _resolve_base_dir's numbered list was
outright wrong. Update _authoritative_workspace_root, _resolve_base_dir, and
_path_resolution_warning to reflect the actual four-step resolution order.
No behaviour change.