Commit graph

1169 commits

Author SHA1 Message Date
hakanpak
d45addc2f1 fix(tools): never let a model whitelist strip the prompt / source images
_build_fal_payload and _build_fal_edit_payload assemble the request and then
filter it down to the model's supports / edit_supports whitelist. That filter
also covers prompt (and image_urls for edits), which every FAL endpoint
requires. Today all model configs happen to list those keys, but a single
config that omits one would silently produce a request with no prompt or no
source images — a broken generation with no error.

Always keep the mandatory keys regardless of the whitelist so a missing
whitelist entry can only drop optional knobs, never the prompt or the images.
2026-06-19 16:59:54 -07:00
emozilla
40722058e5 fix(mcp): keep short-TTL HTTP sessions alive with configurable ping keepalive
MCP Streamable HTTP servers that garbage-collect idle sessions on a short
TTL (e.g. Unreal Engine's editor MCP, ~15s) were unusable: the keepalive
was hardcoded at 180s, so the session was always dead by the time it ran,
and every idle tool call then landed on an expired session and paid the
full reconnect path (observed hangs of 113-143s until interrupt, bounded
only by the 300s tool_timeout).

Two coordinated, backward-compatible changes:

- Add per-server `keepalive_interval` (config.yaml, not an env var per the
  contribution rubric). Default 180s — byte-identical to the old hardcoded
  value when unset — floored at 5s. Servers with short session TTLs set it
  below their TTL so the session stays warm.

- Switch the keepalive probe from `list_tools()` to `ping` (the MCP base
  protocol liveness primitive). On large servers `list_tools` pulled ~1 MB
  every cycle (830 tools = 1,068,041 bytes); `ping` is ~55 bytes and works
  uniformly across tool/prompt/resource servers. Tool-list changes still
  arrive out-of-band via notifications/tools/list_changed -> _refresh_tools.

`ping` is an OPTIONAL utility, so to guarantee zero regression for a
tool-capable server that doesn't implement it: the first -32601 latches
`_ping_unsupported` and the probe falls back to the pre-ping `list_tools`
path for that connection (no reconnect loop). The latch resets on each
fresh connection (_discover_tools, all transport paths) so a server that
gains ping support after a reconnect is re-probed with the cheap path.
Non-(-32601) ping errors propagate as genuine liveness failures.

Verified end-to-end against a live Unreal MCP server (idle 22s past the
~15s TTL -> post-idle tool call returns in 0.31s, no teardown) and with a
simulated ping-less tool server driving the real keepalive loop (ping once,
list_tools thereafter, no reconnect). 25/25 unit tests pass.

Note: a separate upstream defect (modelcontextprotocol/python-sdk#2604)
still tears down the whole session when one tool-call POST returns 4xx;
that is not addressed here.
2026-06-19 12:16:33 -07:00
alt-glitch
88d523220f fix(mcp): address adversarial review round 2 (stale-publish race, parity holes)
Second review pass (Codex + Hermes subagent). Codex reproduced a real race with
a two-thread harness; both converged on the remaining issues.

- Generation-aware publish (fixes a lost-update race): two refresh callers (the
  late-refresh daemon and the between-turns prologue around turn 1) could each
  compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry
  generation could acquire the publish lock after a newer caller and clobber it,
  deleting just-landed tools. refresh_agent_mcp_tools now captures
  registry._generation before computing and refuses to publish a stale set;
  agent._tool_snapshot_generation tracks the published generation.
- Context-engine routing names (_context_engine_tool_names) are now staged on a
  local and published atomically with the snapshot, and only claimed when this
  rebuild actually appended the schema — matching agent_init's dedup so a
  registry/plugin tool of the same name keeps its own dispatch. (Previously
  mutated live, before the publish lock, and on no-change refreshes.)
- CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a
  server newly ENABLED in config mid-session wasn't picked up (TUI already
  re-resolved). Merge now-connected MCP server names into the override (unless
  the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in
  sync. Closes the CLI/TUI parity hole.
- ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th
  sibling rebuild that re-injected memory tools but NOT context-engine tools and
  bypassed the atomic/name-diff path (inert today, fragile).
- mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG
  (single source of truth) instead of a stale hardcoded 5.0 literal.
- Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout
  fallback uses DEFAULT_CONFIG.
2026-06-19 11:57:43 -07:00
alt-glitch
b6e2a54a94 fix(mcp): address adversarial review round 1 (cache parity, gates, races)
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):

- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
  dropping post-build-injected memory-provider (mem0/honcho/…) and context-
  engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
  the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
  tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
  (tools, valid_tool_names) pair are consistent even under concurrent callers
  (no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
  tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
  with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
  just enabled in config this session is picked up; automatic paths reuse the
  agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
  between-turns refresh, so the startup wait is only a small turn-1 UX bump
  rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
  zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
  (alternating tool sets), added the #5544-gate regression, the memory/context
  preservation regression, and a "callable next turn via valid_tool_names"
  contract; removed a dead monkeypatch line.
2026-06-19 11:57:43 -07:00
alt-glitch
93d6e73028 fix(mcp): expose late-connecting MCP tools to the agent (TUI/CLI/gateway)
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:

1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
   commonly take 2-6s on a cold connect, so they missed the window and
   their tools never entered the agent's snapshot. `thread.join(timeout)`
   already returns the instant discovery completes, so raising the bound
   costs ~0s for the common case (no MCP / fast servers) and only ever
   blocks for a genuinely-pending server, capped so a dead server can't
   freeze startup. The bound is now configurable via
   `mcp_discovery_timeout` (config.yaml, default 5.0s).

2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
   `reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
   thread), and the late-refresh detected changes by tool COUNT — missing
   an equal-size add/remove swap. Consolidated into one shared
   `tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
   tool NAME, mutates the agent under a lock (thread-safe), and respects
   the agent's own enabled/disabled toolsets.

The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.

Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
2026-06-19 11:57:43 -07:00
Ludo Galabru
239740a19e feat(tools): MCP elicitation handler with gateway-aware approval routing
Wires support for the MCP `elicitation/create` request (Python SDK 1.11+)
so MCP servers can ask the user to confirm sensitive operations
mid-tool-call (payment authorization, OAuth confirmation, etc.) instead
of failing closed or requiring out-of-band biometrics.

Behavior:

- `tools/mcp_tool.py` adds `ElicitationHandler`, attached per server task
  and passed to `ClientSession` as `elicitation_callback`. Form-mode
  requests route through the existing approval system; URL-mode requests
  decline cleanly (out of scope for this pass).
- `tools/approval.py` adds `request_elicitation_consent()`, which dispatches
  to whichever surface owns the active session — `_await_gateway_decision`
  for Telegram / Slack / etc. (so the approval prompt lands on the right
  platform), `prompt_dangerous_approval` for CLI / TUI. Fails closed on
  timeout, missing notify_cb, or exception.
- The MCP tool wrapper snapshots `contextvars.copy_context()` into
  `MCPServerTask._pending_call_context` before each `session.call_tool`
  and clears it after. The recv-loop task that dispatches incoming
  `elicitation/create` requests does not inherit the agent task's
  contextvars (HERMES_SESSION_PLATFORM and friends), so without the
  bridge `_is_gateway_approval_context()` returns False on every
  gateway session and the elicitation falls through to a CLI prompt
  that has no TTY → fail-closed decline. The handler now reads the
  snapshot via its `owner` back-reference and replays it through
  `Context.copy().run(...)` so attribution survives the task hop.

Tests (`tests/tools/test_mcp_elicitation.py`):

- form-mode accept / decline / cancel
- URL-mode declined without prompting
- exception in approval system → decline
- timeout in approval → cancel
- context-bridge regression tests (replay observed in consent call,
  missing-context fallback, multiple-replay safety, owner with
  cleared `_pending_call_context`)

Verified end-to-end against pay's MCP server on macOS: agent message
arrives via Telegram, agent calls `mcp_pay_curl` against a paid endpoint,
pay returns 402, ElicitationHandler routes the approval prompt back to
the originating Telegram chat, user replies in TG, the curl tool signs
and completes.

Platforms tested: macOS 14 (darwin/arm64). No Unix-only syscalls
introduced; Windows footgun checker passes on the touched files.
2026-06-19 11:46:25 -07:00
Carlos Diosdado
e00b965406 feat(tts): add xAI TTS speed and optimize_streaming_latency config knobs
The xAI TTS REST endpoint (POST /v1/tts) accepts 'speed' (0.7-1.5)
and 'optimize_streaming_latency' (0/1/2) parameters, but the Hermes
built-in xAI provider was reading neither from config nor sending
either in the request body. Add them as tts.xai.speed and
tts.xai.optimize_streaming_latency config knobs (with global
tts.speed / tts.optimize_streaming_latency fallbacks).

- speed: float, clamped to 0.7-1.5. 1.0 (the API default) is omitted
  from the request body to preserve the existing minimal-payload
  contract.
- optimize_streaming_latency: int, clamped to 0-2. 0 (best quality,
  the API default) is omitted from the request body.

Resolver order: tts.xai.<knob> overrides the global tts.<knob>.
2026-06-19 07:26:56 -07:00
Carlos Diosdado
8ae6bd0823 test(tts): cover xAI auto speech-tags auxiliary rewrite path
The previous xAI auto-speech-tag tests asserted on the local
pause-only fallback and only passed because call_llm silently
returns None in the test environment. They gave zero coverage of
the new auxiliary-rewrite path added in the previous commit.

Add tests that:
- mock agent.auxiliary_client.call_llm and pin down the new contract
  (auxiliary rewriter output wins over the local fallback)
- verify the system prompt lists every documented inline + wrapping
  tag and uses BBCode-style [/tag] closing syntax
- cover markdown-fence stripping (with and without language hint)
- exercise the local fallback on rewriter exception, empty response,
  None response, and missing-choices response
- confirm call_llm is NOT invoked when the input already has
  explicit speech tags, or is empty / whitespace-only
- replace the end-to-end test that asserted on the silent-fallback
  output with one that mocks the rewriter and asserts the
  rewriter's tagged text is what reaches the xAI TTS API
2026-06-19 07:16:57 -07:00
Cdddo
160bb565b4 feat(tts): expose speaker_id on built-in Piper provider
The built-in Piper provider (tts.provider: piper, Python piper-tts
package) already constructs piper.SynthesisConfig for the advanced
tuning knobs, but did not forward speaker_id from the user config.

This wires tts.piper.speaker_id through to SynthesisConfig.speaker_id
so multi-speaker ONNX models (e.g. libritts_r) can be addressed via
config without dropping to the command-provider path.

Changes:
- Add speaker_id to the has_advanced tuple so setting it triggers
  SynthesisConfig construction (same gating as the other knobs).
- Pass speaker_id=speaker_id to SynthesisConfig. Defaults to 0
  (Piper's own default; single-speaker models ignore the field).
- Tolerant parse: bad input (non-int strings, lists, dicts) is
  dropped to 0 instead of raising. Booleans are rejected outright
  (True/False would silently coerce to 1/0 and hide a config
  mistake). Mirrors the same shape as the command-provider's
  _resolve_command_tts_optional_number helper.

speaker_id is applied per-call via syn_config.speaker_id, so the
PiperVoice cache key is intentionally left as just (model, cuda) --
the same loaded model serves all speakers. Tests cover the
config knob, the tolerant parse, and the no-reload invariant.

sentence_silence is intentionally not added here: the Python
piper-tts SynthesisConfig does not expose that field (CLI-only).
2026-06-19 07:04:58 -07:00
teknium1
2c3aebcadc fix(clarify): unwrap dict choices at the source so every surface gets clean text
The Discord fix (previous commit) handles dict-shaped clarify choices at the
Discord adapter only. The same dict-repr leak originates upstream at
tools/clarify_tool.py's str(c).strip() normalization — the single
platform-agnostic point both the CLI and every gateway adapter flow through.

When an LLM emits [{"description": "..."}] instead of bare strings, str(c)
produced {'description': '...'} which leaked onto the CLI panel
(cli.py:13048/13081), was returned verbatim as the user's answer
(cli.py:11945), and hit Telegram's numbered list too.

Add _flatten_choice (same label->description->text->title unwrap as the
Discord adapter, name/value excluded, keyless dicts dropped) and apply it at
the normalization line. Fixes CLI + Telegram + all platforms at the root;
the Discord smart-truncation now operates on already-clean text.

Adds johnjacobkenny to AUTHOR_MAP for the salvaged commit.
2026-06-19 06:31:08 -07:00
Teknium
c02192ff6a
feat(image-gen): add image-to-image / editing to image_generate (#48705)
* feat(image-gen): add image-to-image / editing to image_generate

Brings image generation to parity with video generation: the unified
image_generate tool now edits/transforms a source image (image-to-image)
when given image_url / reference_image_urls, routing to each backend's
edit endpoint, exactly as video_generate routes to image-to-video.

- ImageGenProvider ABC: generate() gains keyword-only image_url +
  reference_image_urls; new capabilities() declares modalities +
  max_reference_images (defaults to text-only, backward compatible).
  success_response gains a modality field; adds normalize_reference_images.
- image_generate tool: schema exposes image_url + reference_image_urls;
  dynamic schema reflects the active model's actual edit capability so the
  agent knows when image_url is honored. Handler + plugin dispatch forward
  the new inputs; legacy/text-only providers get a clear modality_unsupported
  error instead of silently dropping the source image.
- In-tree FAL: 7 models gain edit endpoints (flux-2-klein, flux-2-pro,
  nano-banana-pro, gpt-image-1.5, gpt-image-2, ideogram/v3, qwen-image)
  with per-model edit_supports whitelists + reference caps; routes to the
  /edit endpoint and skips the upscaler for edits.
- Plugins: openai (images.edit, 16 refs), xai (/v1/images/edits via
  grok-imagine-image-quality, JSON body per xAI docs), krea
  (image_style_references, 10 refs). openai-codex stays text-only and
  rejects edits with an actionable error.
- Tests: 15 new (payload, routing, dispatch forwarding, dynamic schema,
  capabilities); updated 2 change-detector/lambda tests for the new schema.
- Docs: image-generation feature page, image-gen provider plugin guide,
  tools reference.

* fix(image-gen): preserve legacy passthrough in fal/krea plugin tests

Two existing plugin tests asserted pre-image-to-image behavior:
- fal: forward image_url/reference_image_urls only when supplied, so a
  text-to-image delegation stays byte-identical (no None kwargs).
- krea: keep dict-shaped image_style_references refs verbatim (the unified
  string refs go through normalize_reference_images; legacy non-string ref
  objects pass through unchanged) — fixes KeyError when callers pass the
  richer Krea ref-object shape.

* fix(image-gen): clearer not-capable message for text-to-image-only models

When a text-to-image-only model (incl. gpt-image-2 on the Codex OAuth path,
which can't do editing through the Responses image_generation tool) gets a
source image, say 'this model is not capable of image-to-image / editing —
provide a text-only prompt' rather than sending the user shopping for other
backends. Applies to the openai-codex guard, the in-tree FAL no-edit-endpoint
error, and the dynamic tool-schema text-only line.
2026-06-18 22:13:07 -07:00
flooryyyy
f8d8f045fa feat(kanban): auto-subscribe calling session on kanban_create
When a worker calls kanban_create from inside a session that has a
persistent delivery channel, the originating session is now subscribed
to the new task's completion/block events automatically. The agent
that dispatched the task gets notified instead of having to poll.

- Gateway sessions (telegram/discord/slack): HERMES_SESSION_PLATFORM +
  HERMES_SESSION_CHAT_ID ContextVars, set by the messaging gateway.
- TUI / desktop sessions: HERMES_SESSION_KEY in the subprocess env.
  The TUI notification poller keys on platform='tui' + chat_id=<key>.
- CLI / cron / test: no persistent channel, no subscription.

Gated by kanban.auto_subscribe_on_create in config.yaml (default True).
Disable to mirror pre-feature behaviour — users who want explicit
kanban_notify-subscribe calls per task can set it to false. This
config gate addresses the design concern that got PR #19718 reverted
upstream (unconditional implicit auto-subscribe on tool-driven
kanban_create was too aggressive for orchestrator users).

HERMES_SESSION_ID is intentionally not a fallback channel — it is
set by ACP/agent subprocess telemetry for every invocation, not just
TUI, so treating it as a notification target would auto-subscribe
every CLI session and re-introduce the over-eager behaviour.

The kanban_create response now includes a 'subscribed' bool so
orchestrators can react if subscription failed (e.g. by falling
back to explicit kanban_notify-subscribe or to polling).

Includes 6 tests covering the gateway / TUI / CLI / partial-context /
gated / add_notify_sub-failure paths. All 90 tests in
test_kanban_tools.py pass; 509 broader kanban tests pass.
2026-06-18 14:10:51 -07:00
Teknium
38c8a9c10f
feat(memory): batch operations for single-turn memory updates (#48507)
The memory tool was strictly one-op-per-call. With the store running near
its char limit by design, a new add that would overflow gets rejected with
'consolidate now, then retry' -- but the model could not consolidate and add
in one call. It had to remove/replace across several turns, then retry the
add, each turn re-sending the whole conversation context. Expensive thrash.

Add an 'operations' array: a list of add/replace/remove ops applied
atomically against the FINAL char budget. The model frees space and adds new
entries in ONE call, even when an add alone would overflow. All-or-nothing:
any bad op aborts the whole batch, nothing written.

Root-cause note: the two agent-level memory interception sites
(agent_runtime_helpers.py, tool_executor.py) silently dropped any param not
in their explicit kwarg list, so 'operations' never reached the handler and
batch calls failed with 'Unknown action None'. Both now pass it through and
bridge each add/replace op to external memory providers.

Also: success response is now terminal (done=true + 'do not repeat' note,
no full-entries echo that invited re-edits); schema rewritten to lead with
the batch mechanism and an explicit one-shot stop rule (2138 -> 1476 chars).

Live-verified: near-full consolidate-and-add went 7 calls -> 1 call,
stable across 3 reps. 103 memory/approval tests + 398 background-review/
run_agent tests green; 6 new batch tests added.
2026-06-18 10:19:33 -07:00
Teknium
25c590ccd0 fix(skills): refuse SKILLS_DIR root in rmtree guard, not just outside-tree
The salvaged guard allowed _rmtree_writable(SKILLS_DIR) itself. No call
site ever passes the root — every site passes a skill subdir or its .bak
sibling — so allowing the root only preserves the #48200 footgun (a dest
that collapses to the root wipes every installed skill). Require a strict
strict-child relationship and update the test that documented the
nonexistent 'full reset' capability.
2026-06-18 08:53:35 -07:00
Kewe63
f1254c8eaf fix(skills): rmtree scope guard + default pre_update_backup to true (#48200)
Defense-in-depth fix for the silent wipe of ~/.hermes/ documented in
#48200. A `hermes update --yes` run silently destroyed a user's
.env, MEMORY.md, kanban.db, custom skills, and scripts. Two changes:

1. `_rmtree_writable` in tools/skills_sync.py now refuses to rmtree
   anything outside SKILLS_DIR (the HERMES_HOME/skills/ root).
   All five call sites pass paths under SKILLS_DIR, so the guard is
   a no-op for current code and a loud, recoverable failure for
   any future regression (bad path join, malicious bundled
   manifest, stale path in scope after an exception).

2. The default `updates.pre_update_backup` flips from false to
   true in hermes_cli/config.py. A few minutes of zip per update
   is negligible compared to silent total data loss. Still
   overridable; --no-backup still works for one-off opt-out.

Five new tests in TestRmtreeWritableScopeGuard (root path,
hermes home, sibling dir, skills root itself, subdir) plus a
flipped `test_default_enabled_creates_backup` in test_backup.py.
178/178 tests pass in the two affected files. Public method
signatures unchanged, no test-stub blast radius.

Closes #48200
2026-06-18 08:53:35 -07:00
Luke The Dev
3c3ac19d9c fix(#37878): Address review feedback — fix trailing whitespace and add ANTHROPIC_API_KEY test
Review feedback from egilewski:
1. Remove trailing whitespace from test docstring and mock patches (lines 1430, 1469, 1476, 1482)
2. Expand test coverage: also verify ANTHROPIC_API_KEY is stripped (not just OPENAI_API_KEY)

Changes:
- Remove trailing whitespace from test file
- Add ANTHROPIC_API_KEY to test environment
- Add assertion verifying ANTHROPIC_API_KEY is stripped from cua-driver subprocess env
- Syntax verified: python3 -m py_compile tests/tools/test_computer_use.py ✓
2026-06-18 08:53:31 -07:00
Luke The Dev
2e5c04aaf7 fix(#37878): scrub operator environment before launching cua-driver MCP
- Use _sanitize_subprocess_env() to filter Hermes-managed credentials
  from the cua-driver subprocess environment (issue #37878)
- Prevents credential exfiltration to the third-party cua-driver binary
- Aligns with existing pattern used by browser-tool and other tools
- Add regression test to verify environment sanitization

The cua-driver is a lower-trust MCP subprocess per SECURITY.md §2.3.
Its inherited environment is now scrubbed by default, removing provider
API keys, gateway tokens, and platform credentials that should not leak
to third-party binaries.

Fixes #37878
2026-06-18 08:53:31 -07:00
xxxigm
481f0417d8 test(skills): cover list-modified + diff for bundled skills
Exercises the real sync pipeline (no mocked comparison logic): a pristine
synced skill is not flagged; an edited one is listed and diffed (modified +
added files); an unknown skill returns not-ok; and `reset --restore` clears
the modified state so revert and discovery stay consistent.
2026-06-18 12:26:20 +05:30
Ben Barclay
4440d77bf3
fix(update): scope install-method stamp to the code tree, not $HERMES_HOME (#48188)
The install method (docker/git/pip/...) describes the *running binary*, but
detect_install_method() read it from $HERMES_HOME/.install_method — a shared
DATA directory. The Docker docs deliberately bind-mount $HERMES_HOME
(~/.hermes:/opt/data) so config/sessions/memory persist and can be shared with
a host-side Desktop/CLI install.

When a containerized gateway and a host install share one $HERMES_HOME, the
home-scoped stamp is a single slot describing two installs: the published image
stamps 'docker' on every boot, the host install then reads 'docker' and the
in-app updater refuses to run 'hermes update' ("doesn't apply inside the Docker
container"). Reinstalling the Desktop app from the DMG doesn't help because the
contaminated stamp is re-read every time.

Fix (option 1 — code-scoped stamp):
- detect_install_method() reads <install tree>/.install_method first (next to
  the running code, immune to the shared data dir). It falls back to the legacy
  $HERMES_HOME stamp for back-compat, but IGNORES a 'docker' home stamp when
  not actually containerized — so already-poisoned shared homes self-heal.
- stamp_install_method() writes the code-scoped stamp.
- install.sh stamps $INSTALL_DIR instead of $HERMES_HOME.
- Dockerfile bakes 'docker' into /opt/hermes/.install_method at build time
  (inside the immutable block); stage2-hook.sh no longer writes the home stamp
  and proactively removes a stale 'docker' one to heal existing shared homes.

Genuine containers still resolve to 'docker' (baked stamp, or legacy home stamp
honored when containerized). Unstamped installs in generic containers still fall
through to git/pip (preserves the #34397 fix).
2026-06-18 14:14:41 +10:00
Gille
3769dff5dd
fix(approval): honor glob command allowlist entries (#43051)
* fix(approval): honor glob command allowlist entries

* fix(approval): guard allowlist globs from shell chaining
2026-06-18 12:48:36 +10:00
shannonsands
6092be413d
Harden hosted Docker install tree against self-modification (#47490)
* Harden hosted Docker install tree

* Document hosted Docker immutable install tree
2026-06-18 09:09:21 +10:00
Teknium
22b6942fc2
feat(search_files): headroom compression evaluation report + lossless densification (#47866)
* feat(search_files): path-grouped lossless densification of content matches

Content-mode search_files results repeat the {path,line,content} JSON keys
and the full path string for every match. Group consecutive same-path matches
under one path header with indented '<line>: <content>' rows — lossless (every
path/line/content byte preserved), self-describing (matches_format key), and
readable by the model with no decode step.

57.8% mean token reduction on real search_files content outputs (422-output
corpus), fires on 97% of them. Gated at >=5 matches; below that the verbose
array is left untouched. Default to_dict(densify=False) is unchanged, so no
other caller is affected.

ripgrep emits matches path-ordered, so consecutive grouping never reorders
results.

* test: accept densify kwarg in _FakeSearchResult.to_dict

The search loop-detection tests stub SearchResult with a fake whose
to_dict() must mirror the real signature now that it takes densify=.

* test(search_files): edge-case losslessness battery for densification

Adversarial single-line content (colons, indentation, unicode/emoji, empty,
trailing whitespace, quotes+commas), paths with spaces, and an explicit
one-line-per-match invariant documenting the ripgrep contract the format
relies on (0/6775 real match contents contained a newline).
2026-06-17 13:45:25 -07:00
Teknium
c6c8abbadb
refactor: remove agent-callable send_message tool (#47856)
* feat(mcp): raise default tool-call timeout 120s -> 300s

Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.

- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
  'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md

* refactor: remove agent-callable send_message tool

The agent should not decide on its own to fire off cross-platform
messages or reactions. Outbound platform messaging is handled outside
the agent loop — cron delivery, the gateway kanban notifier
(dashboard-toggled), and the `hermes send` CLI.

Removes the model-tool registration only; the send engine in
send_message_tool.py (_send_to_platform, _send_via_adapter,
_parse_target_ref, per-platform _send_* helpers) is kept intact for
those non-agent callers. Drops the now-empty 'messaging' toolset and
its `hermes tools` toggle. Yuanbao DM guidance now points at the
native yb_send_dm tool.
2026-06-17 07:11:23 -07:00
Max Freedom Pollard
992b922389 fix(curator): stop restore from matching unrelated skills by name prefix
restore_skill() falls back to p.name.startswith(f"{skill_name}-") when no
archive directory matches the requested name exactly. That fallback is meant
to catch the timestamped duplicate archive_skill() writes on a name collision
(<skill>-YYYYMMDDHHMMSS), but the bare prefix also matches any unrelated
archived skill named <name>-something. So restoring "git" can pull an archived
"git-helpers" out of .archive/, rename it to "git", and report success: the
requested skill is not restored and the sibling is gone from the archive.

Constrain the fallback to the exact suffix archive_skill() produces, a 14 digit
timestamp. The exact-name match and the recursive nested-archive walk are
unchanged, so nested and timestamped restores still work; unrelated siblings no
longer match.

Fixes #47647
2026-06-17 06:04:03 -07:00
Wolfram Ravenwolf
9137b86a52 fix(skills): ignore support docs in skill discovery
Support files under references/, templates/, assets/, and scripts/ are progressive-disclosure data loaded through skill_view(..., file_path=...). They should not be treated as standalone skills during discovery or collision checks.

This prevents archived skill packages or support markdown files inside a real skill from shadowing active skills with the same name while still allowing top-level categories named scripts/templates/assets/references.

Tests cover:
- pruning nested SKILL.md files inside skill support directories
- preserving support-named top-level categories
- avoiding skill_view collisions from support markdown
- keeping archived package SKILL.md files accessible only through file_path
2026-06-16 13:08:34 -07:00
brooklyn!
44e5848e74
feat(desktop): stream subagent activity into watch windows (#47060)
* feat(desktop): stream subagent replies into watch windows

A desktop watch window resumes a child session lazily (no full agent) and
mirrors the parent-relayed `subagent.*` events into native child-session
stream events. The child's streamed reply text was never relayed, so the
window sat blank while the subagent "talked".

- delegate_tool: forward the child's `run_conversation` stream tokens up the
  progress relay as `subagent.text` (inert under CLI/TUI — their progress
  handlers ignore non-tool event types; only a gateway watch window mirrors it).
- server: mirror `subagent.text` -> `message.delta` on the child sid only, and
  skip the parent emit (per-token frames are meaningless on the parent session,
  which shows the child via the spawn tree). Demote `subagent.start` to a
  one-time goal header and drop the noisy `subagent.progress` mirror — tools
  already mirror natively.
- server: guard `_start_agent_build` so a lazy watch session spectating an
  in-flight child stays lazy; incidental RPCs were upgrading it to a full
  agent mid-stream and silently killing the mirror.

* fix(desktop): keep watch-window chat clear of titlebar chrome

Secondary windows (new-session scratch, subagent watch, cmd-click pop-out)
hide the titlebar tool cluster + session header, so the transcript ran to the
window's top edge and streamed text slid up under the OS traffic lights.

- Gate the hidden chrome on `isSecondaryWindow()` everywhere (app-shell,
  chat header, thread list) instead of the narrower new-session flag.
- Add a fixed opaque drag-strip at the top of the secondary-window transcript:
  content padding alone scrolls away with the text, so the strip masks
  anything behind it and keeps the window draggable like the main header.

* fix: WSL subagent window

* fix: subagent window top padding

---------

Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-16 14:30:11 -04:00
Teknium
2dbc3bd937
fix(skills): guard recursive skill delete against tree-escape (#46929)
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.

Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.

Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
2026-06-15 17:14:59 -07:00
Teknium
c66ecf0bc3
feat(delegation): async background subagents via delegate_task(background=true) (#40946)
* feat(delegation): async background subagents via delegate_task(background=true)

delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.

- tools/async_delegation.py: daemon-executor registry, capacity cap,
  rich self-contained completion event pushed onto the shared
  process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
  batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
  task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
  results into the originating session (idle + post-turn), session_key
  routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).

Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.

* test(delegation): make async non-blocking tests environment-independent

CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.

Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.

* fix(delegation): harden async background delegation

Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
  otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
  evicts (release_clients), and session close (/new) kill/close the
  detached subagent mid-run, defeating the point of background mode.
  Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
  concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
  fallthrough keyed them all as ("", type), suppressing every completion
  after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
  lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
  unused _ASYNC_DEFAULT import.

Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.

* fix(delegation): harden async background completion drains
2026-06-15 13:33:12 -07:00
Teknium
3e7e9b24d4 fix: harden salvaged session and browser improvements
Polish salvaged contributor work before PR review:
- read browser inactivity timeout from config with documented fallback
- skip redundant v10 trigram backfill before v11 FTS rebuild
- show delegate_task goals safely in progress previews
- show gateway status model/context without redundant token wording
- wire gateway /sessions to shared session-listing helpers
- map Ravenwolf author emails for release attribution

Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>
Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>
2026-06-15 07:46:34 -07:00
Amy Ravenwolf
2f2e3616b4 fix(config): read browser inactivity timeout from config 2026-06-15 07:46:34 -07:00
Teknium
be7c919bf9
fix(process): label background completion causes (#46659)
Track why a background process finished and include that source in notify-on-complete messages so SIGTERM from process.kill, kill_all, backend loss, and ordinary exits are distinguishable.
2026-06-15 07:08:24 -07:00
Teknium
0d82060c74 fix: harden WhatsApp target alias salvage
Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
2026-06-15 05:51:47 -07:00
Keiron McCammon
ea49a79633 fix(messaging): route WhatsApp group JIDs to the target, not the home DM
send_message(target="whatsapp:<group-jid>") silently delivered to the
configured home DM instead of the requested group. Two gaps:

1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us),
   user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and
   broadcast/newsletter JIDs matched no pattern and fell through to
   `return None, None, False`, so the caller treated them as
   unresolvable and used the home channel. The bridge's /send endpoint
   accepts any chatId, so only the tool-side target parsing was at fault.
   Add a whatsapp branch that recognizes native JIDs as explicit targets.
   The pre-existing '+'-prefixed E.164 path is preserved.

2. WhatsApp groups have no human-friendly name — the channel directory
   is regenerated from session data on a timer, so a group shows up as
   its raw 18-digit JID and any hand-edit to channel_directory.json is
   clobbered on the next rebuild. Add a user-maintained alias overlay
   (~/.hermes/channel_aliases.json) re-applied on every build AND every
   load, giving durable friendly names and letting a freshly-created
   group be pre-named before its first message.

Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser;
TestChannelAliases (7 cases) for the overlay, plus an autouse fixture
isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the
existing directory tests.
2026-06-15 05:51:47 -07:00
helix4u
dcc3216955 fix(mcp): fail fast for noninteractive oauth without tokens 2026-06-15 04:22:07 -07:00
Gille
d6a8d9dcab fix(tools): respect session cwd in file tools 2026-06-15 14:00:42 +05:30
Teknium
61ee2dbfdb
fix(s6): make profile gateway log parent writable (#46291)
* fix(gateway): chown logs/gateways parent so late-added profiles can log

The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.

Two complementary fixes:

- service_manager._render_log_run: chown the gateways/ parent
  (non-recursively) before the leaf chown. Runs on every root-context
  boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
  block; cont-init runs before any service starts, so the parent
  already exists hermes-owned when the first log/run does 'mkdir -p'.

The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.

Fixes #45258

* chore(release): map salvaged contributor

---------

Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
2026-06-15 13:47:05 +10:00
Teknium
f3fe99863d
revert(web): remove keyless Parallel search fallback (#46350)
Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced.

Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.
2026-06-14 16:47:57 -07:00
helix4u
4936a49a0c fix(mcp): preserve loop during probes
Some checks failed
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
OSV-Scanner / Scan lockfiles (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run
Typecheck / typecheck (apps/desktop) (push) Waiting to run
Typecheck / typecheck (apps/shared) (push) Waiting to run
Typecheck / typecheck (ui-tui) (push) Waiting to run
Typecheck / typecheck (web) (push) Waiting to run
uv.lock check / uv lock --check (push) Waiting to run
Nix Lockfile Fix / auto-fix-main (push) Has been cancelled
Nix Lockfile Fix / fix (push) Has been cancelled
Build Skills Index / build-index (push) Has been cancelled
Build Skills Index / trigger-deploy (push) Has been cancelled
2026-06-14 02:09:45 -07:00
Teknium
81e42335a1
fix(file-safety): relax user-write deny policy (#45947)
Allow file tools to edit shell startup files, user package-manager configs, and Hermes control files that the user can already modify directly. Keep hard blocks for SSH keys, .env/OAuth token stores, mcp-tokens, pairing files, and system privilege files.
2026-06-14 02:07:32 -07:00
Teknium
8f278403d1
perf(execute-code): stop waiting on idle RPC accept (#45948) 2026-06-13 21:57:15 -07:00
Teknium
1106879147
perf(process): wake waiters on background completion (#45831) 2026-06-13 21:11:19 -07:00
Max Pollard
9a2b976326 test(skills): add regression tests for bundled-update backup recovery
Three tests covering: a stale .bak poisoning a failed update's move/restore, an orphaned .bak misread as a user deletion, and a partially written dest blocking restore-on-failure. All three fail on current main without the fix.

Refs #44942
2026-06-13 15:01:42 -07:00
Teknium
817f392311
feat(read): extract notebook and office documents (#37082)
Add stdlib-only extraction for `.ipynb`, `.docx`, and `.xlsx` in read_file with lazy integration and malformed-document fallback.
2026-06-13 14:42:51 -07:00
Teknium
2b67e96aec fix(approval): gate in-place edits to sensitive user files
Cover sed, perl, and ruby in-place mutations against shell rc, SSH, and credential files so terminal approvals pair the redirection and copy guards.
2026-06-13 14:35:27 -07:00
helix4u
abd69b8117 fix(approval): detect absolute home shell rc writes 2026-06-13 14:35:27 -07:00
briandevans
da28d5d113 fix(security): gate cp/mv/install into ~/.ssh, credential, and shell-rc files
tools/approval.py already denies tee/redirection writes to every
_SENSITIVE_WRITE_TARGET (~/.ssh/*, ~/.netrc/.pgpass/.npmrc/.pypirc, shell
rc files, ~/.hermes/config.yaml/.env) via the DANGEROUS_PATTERNS tee/`>`
rules, but cp/mv/install were only paired for _SYSTEM_CONFIG_PATH (/etc) and
the project-relative env/config target. So `cp evil ~/.ssh/authorized_keys`
(SSH-key implant / persistence), `cp creds ~/.netrc`, and `cp evil ~/.bashrc`
(login-time command injection) auto-approved while the equivalent tee/`>`
forms were denied — an unpaired write deny is theater (same rationale as
#14639 / commit 4e9d886d, which paired the terminal side for
~/.hermes/config.yaml writes but did not touch these cp/mv/install verbs on
the broader sensitive set).

Add one (cp|mv|install) DANGEROUS_PATTERNS entry reusing the existing
_SENSITIVE_WRITE_TARGET fragment, anchored via _COMMAND_TAIL so it fires on
the destination (last arg) only: reading OUT of a sensitive path
(`cp ~/.ssh/config /tmp/x`) stays auto-approved. Description differs from the
system-config cp entry so the two keep distinct approval keys (no silent
cross-approval). Additive — does not subsume the /etc or project-config rules.

Adds TestSensitiveCopyMovePattern: 5 positive cases (ssh authorized_keys,
ssh private key via mv, netrc via install, bashrc, ~/.hermes/config.yaml) +
2 negative guards (copy FROM ssh, unrelated copy). The ssh/netrc/bashrc
positives fail on main and pass on this branch; the negatives stay green
both ways.
2026-06-13 14:35:27 -07:00
Teknium
1fa761f8de
fix(search): keep partial results on search timeout (#36142)
Treat search command budget timeouts as soft truncation so partial results survive, while real search failures still return structured errors.
2026-06-13 14:35:21 -07:00
Teknium
2a5dc0ef3d
fix(slack): make video attachments available to agents (#45512) 2026-06-13 03:33:27 -07:00
teknium1
05470aa1b6 feat(messaging): expose action='unreact' in send_message + react dispatch tests
Follow-up for salvaged PR #44486: the adapter shipped remove_reaction but
the tool only exposed 'react'. Generalize _handle_react(remove=) and add
tool-level dispatch tests for react/unreact (missing from the original PR).
2026-06-12 01:07:38 -07:00
Brooklyn Nicholson
b2d151abe2 fix(tools): strip default from $ref nodes in tool schemas
Fireworks-hosted Kimi rejects tool requests when nullable MCP/Pydantic
schemas collapse to {"$ref": "...", "default": null}. Strip that sibling
during global schema sanitization so gateway and CLI calls succeed again.
2026-06-12 00:30:51 -05:00