Commit graph

11937 commits

Author SHA1 Message Date
teknium1
c5eb64b9f7 fix(xai): scope native web_search to swap-only + reconcile composer ctx to 200k
Salvage corrections on top of @XVVH's #44341:
- Make native web_search injection a 1:1 swap for an already-present client
  web_search function, NOT an additive grant. The original unconditionally
  appended {"type":"web_search"} on every is_xai_responses turn with any
  tools, force-enabling Grok server-side search even when the user never
  enabled the web toolset (bypassing Hermes web-provider config + tool-trace
  plumbing). Now gated on a client web_search actually being present.
- Reconcile grok-composer context to 200000 (merged in #47908) rather than
  262144; 200k is xAI's published usable context window for Composer 2.5,
  262144 is the /v1/responses input+output budget.
- Update tests to match scoped behavior + add a no-web-toolset guard test.
- AUTHOR_MAP entry for #44341 salvage.

Incomplete-guard (server-side *_call items at in_progress no longer flip
has_incomplete_items) and preflight built-in-tool allowlist kept as-is.
2026-06-17 17:33:32 -07:00
XVVH
6f89e17a33 fix(xai): OAuth Responses native web_search, incomplete guard, grok-composer context
- model_metadata: grok-composer-2.5-fast → 262144 (OAuth slug not in /v1/models)
- codex transport: inject native {"type":"web_search"} for is_xai_responses;
  drop client web_search to avoid duplicate-name 400s
- codex adapter: do not treat in-progress server-side *_call items as incomplete
- tests: adapter, transport build_kwargs, model_metadata, oauth recovery
2026-06-17 17:33:32 -07:00
brooklyn!
4b7a186003
fix(desktop): retry the self-update rebuild once so the app relaunches (#48122)
The desktop self-update runs `hermes update` then `hermes desktop
--build-only`, and only relaunches if the rebuild returns 0. The first
`--build-only` can exit nonzero on a still-settling post-update tree or a
network-blocked Electron fetch that the installer's self-heal repaired
mid-run — so both updaters (the Tauri setup binary and the in-app POSIX
path) bailed before the relaunch step. The update landed but the app
never restarted; a manual launch worked because the heal had completed.

Retry `--build-only` once in both paths before failing, mirroring the
retry-once `hermes update` already does (and the CLI `hermes update`'s
own desktop rebuild). A second run builds clean off the healed dist and
is a near-no-op when the first actually succeeded (content-hash stamp).

- update.rs: retry stage 2; add rebuild_needs_retry() + test
- main.cjs: retry via new update-rebuild.cjs helper (behavior-tested)
2026-06-17 19:33:27 -05:00
Teknium
020e59d3cf
fix(agent): dampen empty-name phantom tool-call loop (#47967) (#48109)
Weak open models (mimo, nemotron-class) that see tool-call XML/JSON sitting in
file contents or tool output get primed and emit their own structured tool
calls mimicking the payload — usually with an empty/whitespace name. Those
calls can't be fuzzy-repaired toward a real tool, so the dispatch loop returns
an error and the model retries. Before this fix, every empty-name error dumped
the full tool catalog back to the model, which fed the priming loop more names
to mimic and inflated context 3-4x across the retry budget.

A blank/whitespace-only tool name now gets a terse anti-priming error that
tells the model in-context tool-call syntax is DATA, with no catalog dump. A
genuinely-wrong-but-nonempty name (a real typo) still gets the full catalog so
the model can self-correct.

Not a sandbox/auth boundary issue: Hermes never parses tool-call text from
content into executable calls (structured tool_calls only; the lone text->call
parser is the Copilot ACP transport and it also rejects empty names). The
reporter's own debug dump confirms the injection never executed.

Behavior-contract test added: empty-name -> terse error, no catalog; nonempty
unknown -> catalog preserved. Exercised end-to-end via run_conversation against
an in-process mock provider.
2026-06-17 17:32:14 -07:00
Ben Barclay
86f2946fbe
fix(dashboard): recover the Chat tab when the agent session ends (NS-504) (#47674)
* fix(dashboard): recover the Chat tab when the agent session ends (NS-504)

In the dashboard Chat tab, when the agent process exits — the user types
`/exit`, or starts a new session that ends the current PTY child — the
`/api/pty` WebSocket closes with a normal code (not one of the
4401/4403/4404/4408/1011 rejection codes the server emits). The frontend
handled only those rejection codes; the normal-exit fallback just printed
"[session ended]" into the dead terminal and stopped, with `wsRef` nulled
and no respawn path. The only recovery was a full page refresh — exactly
the beta report ("typing /exit breaks functionality, no way to restart
without refreshing"; "starting a new session completely breaks the
agent").

On a clean/normal close the Chat tab now flips `sessionEnded` and renders
an in-place "Start new session" overlay (mirroring ChatSidebar's existing
reconnect affordance). Clicking it bumps a `reconnectNonce` that is a
dependency of the connect effect, so the effect tears down and re-runs,
spawning a fresh PTY in place — no page refresh. `onopen` clears the
flag so a successful reconnect dismisses the overlay.

An explicit button (rather than auto-respawn) is deliberate: if the agent
is crash-looping, auto-respawn would hide the failure and spin; the user
stays in control.

Verified against a live uvicorn `/api/pty` socket: a child that exits
closes with a non-rejection code (client sees close_code None / 1000-class),
which is precisely the branch that now sets sessionEnded=true. web
typecheck + vite build clean.

Reported via beta (NS-504).

* docs(assets): add NS-504 chat session recovery infographic
2026-06-18 10:05:26 +10:00
Teknium
9ba4615db2
fix(dump): show commit date instead of release date in hermes debug (#48104)
* feat(mcp): raise default tool-call timeout 120s -> 300s

Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.

- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
  'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md

* fix(dump): show commit date instead of release date in hermes dump

The version line in `hermes dump` (the top of the /debug report) appended
the package release date in parentheses, which reads like a wall-clock
"generated at" timestamp and confuses support triage. Replace it with the
date the HEAD commit was actually made, resolved live via
`git log -1 --format=%cd --date=short`, kept next to the commit SHA.

On Docker/wheel installs with no .git the date resolves to '' and the
suffix is simply omitted (the baked SHA still identifies the build).
2026-06-17 16:53:42 -07:00
brooklyn!
c1f9eb0ec4
fix(desktop): resolve electronDist dynamically + self-heal blocked installs (supersedes #48081/#48082) (#48091)
* fix(desktop): resolve electronDist dynamically + self-heal blocked installs

Supersedes the static-path approach (#48081) and the install-step self-heal
(#48082) with a fix that removes the whole failure class instead of chasing each
symptom. Three distinct faults converged into the June desktop-build outage; this
closes all three.

Root cause (the part #48081 left open — "Gap B"):
  build.electronDist was a static relative path in apps/desktop/package.json, but
  npm workspace hoisting is NOT deterministic — depending on the npm version and
  what else is installed, npm nests the workspace-only electron devDep under
  apps/desktop/node_modules/electron OR hoists it to the repo root. A static path
  matches only one layout, so a clean install intermittently fails with "The
  specified electronDist does not exist". #48081 re-pointed the path at the
  nested layout (correct today) but electron-builder reads electronDist
  STATICALLY, so any future hoist change silently breaks it again — only caught
  by a CI invariant, never self-corrected.

Fix:
- scripts/run-electron-builder.cjs: resolve electron the way Node's runtime does
  — require.resolve("electron/package.json") walks node_modules from the desktop
  project upward and finds electron wherever npm actually put it. The path can
  never drift out of sync with the install layout again, on any OS/npm version.
    * dist present -> pass -c.electronDist=<abs>/dist so electron-builder reuses
      the unpacked runtime (keeps the #38673 fast path that dodges the 26.8.x
      missing-binary re-unpack bug).
    * dist absent  -> omit electronDist; electron-builder fetches Electron itself
      via @electron/get honoring electronVersion + ELECTRON_MIRROR.
  package.json: builder script now runs the wrapper; the static build.electronDist
  is removed (the resolver owns it).
- main.py / install.sh / install.ps1: on a dependency-install failure where the
  electron package staged but its dist is missing (electron's install.js
  process.exit(1) on a blocked/throttled binary download — #47266/#47917/#48021),
  repopulate the dist via electron's downloader (canonical, then npmmirror.com)
  and CONTINUE to the build instead of aborting. npm runs postinstall LAST, so
  the only casualty is electron/dist; bailing here is what made the pack-time
  mirror self-heal unreachable on a blocked network. Hard-fail only when electron
  never staged at all (a genuine dependency error).
- The pack-time mirror fallback now retries the build even when the pre-fetch
  can't populate the dist: the wrapper lets electron-builder download Electron
  itself via the mirror, so the retry is no longer a no-op (it was, when
  electronDist was a static path).

The exact 40.10.2 pin (already on main) keeps the third mode — the native
@electron-internal/extract-zip win32 binding that 40.10.3/40.10.4 ship without a
published prebuild — from recurring.

Tests:
- test_desktop_electron_pin.py: replace the static-path-matches-lockfile
  invariant with contracts that there is no hardcoded electronDist to drift, the
  builder script routes through the resolver, and the resolver uses Node module
  resolution + injects -c.electronDist.
- test_gui_command.py: install-failure self-heal continues to build; genuine
  (electron-never-staged) install failure still hard-fails; pack retries under
  the mirror even when the pre-fetch is blocked.

Salvages/supersedes the overlapping community work in #48003 (sitkarev),
#48012 (omegazheng), #48033 (james47kjv), and #48082.

Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>

* fix(desktop): narrow Electron self-heal to real missing-dist failures

Follow-up on #48091 to remove the remaining misdiagnosis risk from the
installer/build fallback path (#46785 concern): only take the Electron
repair/retry path when Electron's package files are staged and dist is actually
missing/corrupt.

- main.py: add _electron_pkg_staged_missing_dist() and use it to gate install
  failure recovery; fail fast for unrelated npm install errors.
- main.py/install.sh/install.ps1: run cache purge + retry only when dist is
  missing; do not retry unrelated tsc/vite/build failures under an
  Electron-specific narrative.
- install.sh/install.ps1: tighten install-stage self-heal guard to require both
  package.json + install.js and missing dist.
- tests: add coverage that install failure hard-fails when Electron dist already
  exists, and update retry test to reflect the tightened recovery condition.

Validation:
- Python tests: 64 passed
- install.sh-related tests included in the run
- Real mac build on this machine:
  - npm ci at repo root: success
  - cd apps/desktop && npm run pack: success
  - electron-builder packaged darwin arm64 and used custom unpacked Electron dist

* refactor(desktop): trim electron self-heal helpers and comments

Deduplicate mirror-retry into _try_redownload_electron_dist / shell
counterparts; shorten wrapper and install-script commentary without
changing recovery semantics.

---------

Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
2026-06-17 18:48:35 -05:00
Ben
acc8916ac7 test(gateway): live ws-transport round-trip + config-driven registration
- test_ws_transport.py: drives WebSocketRelayTransport against a REAL in-process
  websockets server (not a mock socket): handshake (hello->descriptor), inbound
  frame -> handler, outbound request/response correlation, follow_up routing,
  and clean disconnect failing pending waiters. Skips if websockets is absent.
- test_relay_registration.py: rewritten for the config-driven gate — registers
  when GATEWAY_RELAY_URL is set / an explicit url is passed / force=True; no-op
  without a URL; trailing slash stripped; adapter constructs through the registry.

Full relay suite: 57 passed.
2026-06-17 16:37:45 -07:00
Ben
237fa7d29c feat(gateway): register relay adapter from config; drop HERMES_GATEWAY_RELAY gate
Wire the relay adapter into gateway startup and make activation config-driven
instead of a dark-launch flag.

- gateway/relay/__init__.py: replace relay_enabled()/HERMES_GATEWAY_RELAY with
  relay_url() (GATEWAY_RELAY_URL env or gateway.relay_url in config.yaml) — the
  same shape as gateway.proxy_url. register_relay_adapter() registers when a URL
  is configured and builds a live WebSocketRelayTransport; with no URL it's a
  no-op (direct/single-tenant deployments unaffected). force=True keeps the
  transport-less adapter for unit tests. relay_platform_identity() reads the
  hello platform/botId from GATEWAY_RELAY_PLATFORM/GATEWAY_RELAY_BOT_ID.
- gateway/run.py: call register_relay_adapter() during GatewayRunner.start(),
  right after plugin discovery, so a configured connector relay is registered
  on every boot. Failures are logged, never block startup.

This removes the dark-launch posture: the relay is on whenever it's configured,
shipping the production end state rather than hiding it behind a flag.
2026-06-17 16:37:45 -07:00
Ben
6b03874d07 feat(gateway): production WebSocketRelayTransport + descriptor negotiation
Adds the concrete transport behind the RelayTransport Protocol — the missing
'later-phase work' the relay scaffold deferred. The gateway dials OUT to the
connector over a WebSocket and speaks the newline-delimited JSON frame protocol
(docs/relay-connector-contract.md; connector src/relay/protocol.ts):

- connect(): opens the ws, sends hello{platform,botId}, starts a background
  read loop, and resolves handshake() when the connector's descriptor frame
  arrives.
- inbound frames -> the registered InboundHandler (rebuilt into a MessageEvent
  via _event_from_wire, mapping the snake_case SessionSource wire form back
  onto the gateway dataclasses).
- send_outbound / send_follow_up / get_chat_info: request/response correlated
  by a uuid requestId against a per-request future, with a timeout so a caller
  never hangs; send_interrupt is fire-and-forget.
- disconnect(): cancels the reader, closes the ws, and fails any in-flight
  outbound waiters with a structured error.

RelayAdapter.connect() now negotiates the real CapabilityDescriptor from the
transport and adopts it (_apply_descriptor updates MAX_MESSAGE_LENGTH +
markdown surface), replacing the construction-time placeholder. Lazy
'import websockets' mirrors gateway/platforms/feishu.py; WEBSOCKETS_AVAILABLE
gates construction.
2026-06-17 16:37:45 -07:00
Ben
6e20c1992f docs(gateway): rewrite contract §6 to the A2 trust-boundary model
The contract's §6 still said the connector 'forwards the signed body
byte-for-byte so the gateway's existing crypto validates against unmodified
bytes.' That model is incoherent under an untrusted, disposable tenant
gateway on a shared bot:

- re-validating Twilio HMAC / WeCom crypto needs the shared signing secret
  (handing it over IS the cross-tenant leak),
- WeCom payloads are encrypted with that secret (the connector must decrypt
  at the edge just to route),
- a Discord interaction token lives inside the signed body — you can't both
  preserve the bytes and strip the credential.

Rewrites §6 to the actual model: the connector is the SOLE crypto/identity
boundary — verifies/decrypts at the edge, normalizes to a tenant-scoped
MessageEvent, strips shared-identity capabilities into its vault, and
forwards only the sanitized event. The gateway re-validates nothing (the
invariant test from the crypto-shed commit enforces this). Notes that this
unifies the passthrough + relay planes and points to the connector repo's
capability-trust-boundary.md.

Also documents the follow_up op in §4 (token-less capability action added
in the previous commit). The conformance test (§2/§3 tables) stays green;
contract is unpublished/EXPERIMENTAL so no version-bump ceremony. 55 passed.
2026-06-17 16:37:45 -07:00
Ben
3db9b3e616 feat(gateway): token-less follow_up outbound op (A2 capability action)
The relay outbound surface had send/edit/typing but no way to act on a
SHARED-identity capability (e.g. a Discord interaction follow-up token,
~15min) that the connector captured + stripped at the edge. Under A2 that
credential never reaches the gateway, so the gateway can't just 'send with
the token' — it needs a semantic op naming the session it's already in.

Adds the follow_up op end to end on the gateway side:
- RelayTransport.send_follow_up(action): protocol method. Action carries
  op='follow_up' + session_key + kind + content (+ metadata) and NO token.
- RelayAdapter.send_follow_up(session_key, kind, content, metadata): builds
  that action and returns a SendResult. The connector resolves the real
  capability (its resolveOutboundCapability), enforces the tenant match so
  tenant B can't wield tenant A's capability, and egresses; success=False
  when the capability is absent/expired/mismatched (nothing to retry — a
  leaked gateway holds zero capability material).
- StubConnector records follow_ups + a canned next_follow_up_result.

Tests: round-trips without a token; the wire action carries only session
refs (no credential value field — the 'kind' string is a type ref, not the
secret); failure surfaces when the connector can't resolve; no-transport
fails cleanly. 55 passed. §4 doc entry follows in the contract-rewrite commit.
2026-06-17 16:37:45 -07:00
Ben
c28a02b49d test(gateway): shed platform crypto from the relay path (A2 invariant)
Under the A2 trust model the connector is the SOLE crypto/identity
boundary: it verifies/decrypts every inbound platform payload at the edge
(it holds the tenant secrets), normalizes to a tenant-scoped MessageEvent,
and forwards only the sanitized event. The gateway re-validates nothing —
it cannot without being handed the shared signing secret, which on a
shared bot is itself the cross-tenant leak.

The relay path already imports no platform-crypto today; this locks that
in as an enforced invariant so nobody bolts re-validation (Discord
ed25519, Twilio HMAC, WeCom BizMsgCrypt, generic webhook signature checks)
onto the relay later and silently re-couples the gateway to platform
secrets it must never hold. Verification stays in the direct platform
adapters (gateway/platforms/*) which serve non-relay deployments.

- test_relay_package_imports_no_platform_crypto: AST-walks gateway/relay/*
  and fails on any import of a platform-crypto/verification module.
- test_relay_package_calls_no_signature_verification: fails on any
  verification-symbol reference (ed25519/hmac/bizmsg/verify_*).

Invariants (assert the relation 'relay re-validates nothing'), not frozen
snapshots. Verified the guard bites: injecting a wecom_crypto import makes
it fail, removing it goes green. docs §6 rewrite follows in a later commit.
2026-06-17 16:37:45 -07:00
Ben
e74577ed0f test(gateway): Telegram relay round-trip (Phase 1 generalization proof)
The Phase 1 exit gate requires BOTH Discord and Telegram to round-trip
through the relay stub, but test_relay_roundtrip.py only covered Discord.
Add the Telegram companion exercising its distinct discriminator profile:

- no guild_id — two chats isolate on chat_id alone
- forum topics share one chat_id and isolate by thread_id (the Telegram
  analog of Discord per-guild isolation), shared across participants by
  default (thread_sessions_per_user=False)
- DM isolation by chat_id
- utf16 len_unit + markdown_v2 dialect round-trip and configure the adapter
- outbound send round-trips through the stub

Proves the CapabilityDescriptor + build_session_key generalize beyond
Discord, not just the struct (which the descriptor unit tests already
covered).
2026-06-17 16:37:45 -07:00
Ben
5feec8b4cf test(gateway): enforce relay contract-doc ⟷ Python conformance
Add an invariant test pinning docs/relay-connector-contract.md to the
Python source of truth so the doc (which the connector repo mirrors by
hand) cannot silently drift:

- CapabilityDescriptor §2 table ⟷ dataclass fields + required/optional
- SessionSource wire keys (to_dict output) ⟷ §3 documented fields
- per-platform discriminator columns exist as real SessionSource fields
- guard that is_bot stays off the wire until deliberately promoted

Writing the test surfaced a real gap: §3 only enumerated 5 discriminators
in its per-platform table while to_dict() emits 12 keys. Seven wire keys
the connector must populate (chat_name, chat_topic, user_id_alt,
chat_id_alt, parent_chat_id, message_id, user_name) were undocumented —
a connector author reading the doc would never know to set them. Added a
complete SessionSource wire-field table to §3. The connector's existing
contract.ts already carries all 12, so no connector change is needed; the
doc was the lagging artifact.
2026-06-17 16:37:45 -07:00
Ben
c803661cec fix(gateway): register relay connection checker
The platform-connected-checker invariant test requires every built-in
Platform enum member to have either a generic token path or a bespoke
entry in _PLATFORM_CONNECTED_CHECKERS. Platform.RELAY was added without
one, so test_all_builtins_have_checker_or_generic_token_path failed.

Relay dials OUT to a connector and is 'connected' once an endpoint URL
is configured (extra['relay_url'] or extra['url']); the capability
descriptor is negotiated at handshake time, so the URL is the only
config-level signal in the experimental phase. Add the checker plus a
synthetic-config case exercising its True path.
2026-06-17 16:37:45 -07:00
Ben
c366466d70 test(relay): assert connector stub never leaks into production paths
CI guard: fails if gateway/ or plugins/ ever imports the test-only stub
connector or defines StubConnector. Matches code leaks (imports / class defs),
not prose mentions, so the transport.py docstring reference to the stub's path
is allowed.

Phase 1 complete. Task 1.6 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
ab1a42fcea docs: relay<->connector cross-repo contract (v1, experimental)
Formal interface between the Hermes gateway (RelayAdapter) and the Node
connector repo: handshake, CapabilityDescriptor field table, MessageEvent
inbound envelope with per-platform SessionSource discriminators (Discord
guild_id is REQUIRED for server isolation), outbound action set, /stop
interrupt routing, signed-body verify-at-edge/byte-preserving rule, and the
additive-only contract_version policy. Documents bot-identity-vs-tenant
separation so single-bot consolidation (Phase 6) stays open. Read-first
artifact for the connector implementer.

Phase 1, Task 1.5 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
a3cdd8c39d feat(relay): route mid-turn /stop over relay interrupt channel
RelayAdapter.on_interrupt(session_key, chat_id) bridges a connector-delivered
mid-turn /stop into the existing interrupt_session_activity path, setting the
per-session _active_sessions Event and clearing typing — cancelling exactly the
targeted session's turn without touching siblings (mirrors test_stop_thread_
sibling isolation). Transport.send_interrupt carries the gateway-side egress to
the connector for socket-owner routing.

Phase 1, Task 1.4 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
d0133fd8e4 feat(relay): register RelayAdapter through platform registry (flagged off by default)
register_relay_adapter() registers the generic 'relay' platform via the same
PlatformRegistry path as plugin adapters — no core dispatch changes. OFF by
default (dark-launch): only registers when HERMES_GATEWAY_RELAY is truthy (or
force=True for tests), so existing single-tenant/direct deployments are
unaffected. Factory builds a transport-less RelayAdapter with a placeholder
descriptor; the real descriptor is negotiated at handshake.

Phase 1, Task 1.3 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
259e78e175 feat(relay): transport protocol + test-only stub connector
Defines RelayTransport (lifecycle/handshake/inbound/outbound/interrupt) as the
gateway<->connector wire contract; RelayAdapter.connect now registers an inbound
handler that bridges connector-delivered MessageEvents into handle_message.
Adds an in-memory StubConnector under tests/ and an E2E round-trip proving:
connect registers the handler, inbound events reach the adapter, guild_id drives
build_session_key isolation (two guilds -> two keys; same guild/channel/user ->
one), outbound send round-trips, get_chat_info is proxied.

Phase 1, Task 1.2 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
b0999c82f3 feat(relay): generic RelayAdapter advertising negotiated capabilities
One BasePlatformAdapter subclass that reads its capability profile from a
CapabilityDescriptor: MAX_MESSAGE_LENGTH attribute, message_len_fn (table-driven
by len_unit: chars=len, utf16=Telegram-style code units), supports_draft_streaming.
Implements the four abstract methods (connect/disconnect/send/get_chat_info) by
delegating to an injected RelayTransport (full protocol lands in Task 1.2). Adds
Platform.RELAY enum member. No per-platform gateway code.

Phase 1, Task 1.1 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
3db49381d6 feat(relay): derive descriptor from PlatformEntry
CapabilityDescriptor.from_platform_entry() projects an existing PlatformEntry
(label, max_message_length, emoji, platform_hint, pii_safe, name) into a
descriptor, proving the descriptor is a projection of existing config rather
than a parallel concept. Runtime-only capabilities (len_unit, draft/edit/
thread/markdown) are caller-supplied. max_message_length==0 ('no limit') maps
to the stream_consumer 4096 default.

Phase 0 complete. Task 0.3 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
53d9b98305 feat(relay): experimental CapabilityDescriptor schema
Frozen, JSON-serializable handshake payload the connector hands the future
RelayAdapter: char limit, draft-streaming/edit/threading flags, markdown
dialect, len_unit. Mostly a wire projection of PlatformEntry + the adapter
capability methods. contract_version gates additive-only evolution; declared
EXPERIMENTAL until >=2 Class-1 platforms validate it. from_json ignores
unknown keys (forward-compat) and fills optional defaults.

Phase 0, Task 0.2 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
e9a2ce6585 test: lock gateway adapter capability surface (relay phase 0)
Behavioral regression harness locking the capability surface that the future
RelayAdapter must reproduce: the abstract-method set (connect/disconnect/send/
get_chat_info), message_len_fn default, supports_draft_streaming default, and
the stream_consumer MAX_MESSAGE_LENGTH attribute read. Passes on main before
any RelayAdapter exists.

Phase 0, Task 0.1 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
shannonsands
6092be413d
Harden hosted Docker install tree against self-modification (#47490)
* Harden hosted Docker install tree

* Document hosted Docker immutable install tree
2026-06-18 09:09:21 +10:00
Teknium
f8098c6b6f
fix(desktop): resolve electronDist to the actual electron install location (#48081)
After the June lockfile regeneration (#46652) floated electron and reshuffled
npm workspace hoisting, the desktop pack fails with "The specified electronDist
does not exist". apps/desktop/package.json pointed electronDist at the repo
root (../../node_modules/electron/dist) while npm now installs electron nested
under apps/desktop/node_modules/electron. The two contradict, so a clean
install can never package the app (Windows + macOS).

- electronDist -> node_modules/electron/dist (resolved relative to apps/desktop,
  i.e. the workspace-local install npm actually produces).
- hermes_cli/main.py, scripts/install.sh, scripts/install.ps1: add a runtime
  electron-dir resolver that prefers apps/desktop/node_modules/electron and
  falls back to the root hoist, so dist checks + the mirror re-download work
  under either npm layout.
- patch-electron-builder-mac-binary.cjs: try the workspace-local Electron.app
  before the root hoist in the macOS binary-restore fallback (sibling site no
  PR touched).
- test: assert build.electronDist resolves to where the lockfile installs
  electron, so a future hoist change (root <-> nested) can't silently break it.

Salvages the overlapping work in #48003 (sitkarev), #48012 (omegazheng), and
#48033 (james47kjv).

Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
2026-06-17 18:08:01 -05:00
Austin Pickett
016bce1a09
fix(desktop): recover stranded session windows when resume fails (#47655)
* fix(desktop): recover stranded session windows when resume fails

Opening a session in a new window (or any routed resume) could latch the
thread loader on "session" forever — the reported "stays stuck loading,
even after a nap" bug. Two compounding causes:

1. use-session-actions.resumeSession's catch ran the REST transcript
   fallback OUTSIDE its own try. When session.resume rejected AND the
   fallback also threw (the common case on a wedged/unreachable backend),
   the throw skipped setMessages and left activeSessionId null with an
   empty transcript — exactly the state the loader gates on
   (messagesEmpty && !activeSessionId), with no terminal/error state.

2. use-route-resume's self-heal could never re-fire: resumeSession sets
   selectedStoredSessionIdRef synchronously at entry (before failing), so
   stuckOnRoutedSession stays false, and on an already-open idle window
   neither pathnameChanged nor gatewayBecameOpen fire again. The window
   never retried — naps, focus, nothing recovered it.

Fix:
- Wrap the REST fallback in its own try so a fallback failure can't strand
  the loader.
- Add $resumeFailedSessionId: armed on terminal resume failure, cleared at
  the next resume's entry (and left clear on success).
- use-route-resume gains a bounded backoff auto-retry (4 attempts, 1s→8s)
  that re-resumes while the routed session matches the failure flag, with a
  fire-time liveness recheck so a recovered session isn't double-resumed.

Regression tests cover: fallback-wrap arming the flag without throwing,
flag cleared on success, retry fires on backoff, no retry for a
non-routed/recovered session, and the retry cap.

* feat(desktop): show error + manual Retry when resume retries exhaust

When a stranded session window's bounded auto-retry gives up (gateway
resume RPC + REST fallback fail through all MAX_RESUME_RETRIES attempts),
the loader latched forever. Add a $resumeExhaustedSessionId atom armed at
the give-up point so the chat view swaps the perpetual spinner for an
explicit error state + manual Retry button. Retry / reconnect / reselect
clears the latch and resets the auto-retry counter for a fresh cycle; a
route-change away from the stranded session also clears it.

Distinct from $resumeFailedSessionId (armed during the backoff window) so
the error UI only appears once auto-recovery has actually given up, not
mid-retry. Adds i18n strings across en/ja/zh/zh-hant and 3 tests covering
latch-arms-on-exhaustion, stays-clear-while-retries-remain, and
clears-on-route-change.

* fix(desktop): address review on stranded-resume recovery layer

Follow-up to review on #47655 (PR head 253bfc0e3). Four issues on the
recovery layer:

1. (blocking) Arm $resumeFailedSessionId only when the transcript is still
   empty after the REST fallback ($messages.get().length === 0), matching the
   atom's documented contract and the loader's messagesEmpty gate. Previously
   armed on any resume-RPC reject regardless of fallback outcome, so a window
   that recovered its history via REST still auto-retried and, on exhaustion,
   blanked the visible transcript behind the error overlay.

2. Reset the bounded-retry attempt counter on the $resumeExhaustedSessionId
   armed->cleared edge so a manual Retry / reconnect / reselect on the SAME
   stranded session gets a fresh backoff cycle, not a single one-shot attempt
   that immediately re-arms the error. (Keyed on the exhausted latch rather
   than the resumeFailedSessionId null->value transition the review suggested:
   the auto-retry loop itself toggles resumeFailedSessionId every cycle, so
   keying the reset there would defeat the MAX_RESUME_RETRIES cap. Only
   resumeSession clears the exhausted latch, making its clear edge the
   unambiguous manual-retry signal.)

3. Advance retryAttemptRef only when the timer actually dispatches a resume,
   not at schedule time. Prevents unrelated dep changes during the 1s-8s
   backoff window (transient gatewayState flip, non-stable resumeSession) from
   burning attempts and hitting MAX with fewer than 4 real resume attempts.

4. Drop unrelated blank-line-only insertions in store/session.ts and
   use-session-actions.ts to keep the diff tight.

Tests: +3 (RPC-fails-REST-succeeds-no-arm; manual-retry-fresh-cycle;
no-attempts-burned-on-dep-churn). All 19 resume tests + full session-hook
suite (65) pass; tsc --noEmit clean.

---------

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-17 17:33:53 -04:00
Austin Pickett
fd674af47f
fix(photon): preserve text in mixed iMessage attachments (salvage #46513) (#46818)
* fix(photon): preserve text in mixed iMessage attachments

When an iMessage bubble carried both text and an attachment, spectrum-ts'
inbound mapper returned only buildAttachmentMessage(...), dropping the user's
typed text before Hermes could see it. The Photon adapter then had no 'group'
content path, so the text was lost entirely.

- adapter.py: handle a new 'group' content type that flattens text + attachment
  items, preserving the typed text alongside cached media (extracted shared
  _normalize_binary_payload helper).
- sidecar: emit 'group' content in normalizeContent, and ship
  patch-spectrum-mixed-attachments.mjs which patches spectrum-ts' pinned mapper
  (at npm postinstall AND at sidecar startup, so existing installs self-heal).

Windows robustness fixes on top of the original PR:
- The patcher's CLI guard used 'import.meta.url === file://${argv[1]}', which
  never matches on Windows (file:/// + drive letter) — it silently no-opped.
  Switched to pathToFileURL(argv[1]).href.
- The patcher matched \n-joined strings, so a CRLF checkout (Windows git
  autocrlf) defeated every replacement. It now normalizes CRLF->LF for matching
  and restores the original EOL style on write.

Co-authored-by: Yuhang Lin <yuhanglin@YuhangdeMac-mini.local>

* chore: map YuhangLin contributor email for attribution (#46513)

---------

Co-authored-by: Yuhang Lin <yuhanglin@YuhangdeMac-mini.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-17 16:14:24 -05:00
kshitij
7fbb8c9df5
Merge pull request #48042 from kshitijk4poor/salvage-47662
fix(openviking): implement on_session_switch hook + harden session writes (salvage #47662)
2026-06-18 02:34:27 +05:30
Austin Pickett
ee41aa0c1a
feat(desktop): add dismiss control to chat error banners (#47985)
A failed turn leaves a red error banner inline in the transcript. These
errors are renderer-local state (never persisted) and stay pinned to the
message until the session is reloaded, so a stale, no-longer-relevant
error (e.g. a transient provider/inference error) lingers with no way to
clear it.

Add an 'x' dismiss button inside the existing MessagePrimitive.Error
block. Clicking it clears the error from BOTH the live $messages view
and the per-runtime session cache — the view first, because
preserveLocalAssistantErrors re-grafts any still-errored message it finds
in the view onto the next session.info flush, so clearing only the cache
would let the heartbeat resurrect the banner. A bare error placeholder
(no streamed content) is dropped entirely; a turn that streamed partial
output before failing keeps its text and just sheds the error.

The control only renders when an onDismissError handler is wired, so
secondary/embedded Thread usages are unaffected. Adds the dismissError
string to all four locales (en/ja/zh/zh-hant) and two behavior tests.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-17 16:46:43 -04:00
Austin Pickett
5a00bd1518
fix(desktop): persist /title set before the first message instead of queuing (#47987)
A /title typed before any message in a fresh desktop chat could be silently
lost: the session DB row is deferred to the first prompt, so session.title
found no row, only stashed pending_title, and returned pending:true. It then
relied on a post-turn apply block to write the title. When that turn never
landed under the same session_key (or the apply path didn't fire), the title
was dropped and the sidebar fell back to the first-message preview — e.g.
"/title my-custom-name" then "hello" left the session titled "hello".

Mirror the messaging gateway's _handle_title_command: an explicit /title is
clear user intent, not an abandoned draft, so create the row up front
(_ensure_session_db_row) and set the title immediately via the profile-aware
_session_db handle, returning pending:false. This also fixes the frontend
symptom for free — the desktop handler's immediate refreshSessions() now pulls
the correct persisted title instead of clobbering the optimistic value with a
still-NULL row.

If row creation can't take (DB unavailable / racing writer), fall back to the
existing pending_title queue so the post-turn apply block remains a recovery
path. The sidebar's min-messages filter keeps a titled 0-message row hidden, so
a /title'd-but-never-used draft still doesn't clutter the list.

Updates the test that asserted the old queue-on-missing-row behavior and adds a
fallback-to-queue regression test.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-17 16:46:21 -04:00
Teknium
22b6942fc2
feat(search_files): headroom compression evaluation report + lossless densification (#47866)
* feat(search_files): path-grouped lossless densification of content matches

Content-mode search_files results repeat the {path,line,content} JSON keys
and the full path string for every match. Group consecutive same-path matches
under one path header with indented '<line>: <content>' rows — lossless (every
path/line/content byte preserved), self-describing (matches_format key), and
readable by the model with no decode step.

57.8% mean token reduction on real search_files content outputs (422-output
corpus), fires on 97% of them. Gated at >=5 matches; below that the verbose
array is left untouched. Default to_dict(densify=False) is unchanged, so no
other caller is affected.

ripgrep emits matches path-ordered, so consecutive grouping never reorders
results.

* test: accept densify kwarg in _FakeSearchResult.to_dict

The search loop-detection tests stub SearchResult with a fake whose
to_dict() must mirror the real signature now that it takes densify=.

* test(search_files): edge-case losslessness battery for densification

Adversarial single-line content (colons, indentation, unicode/emoji, empty,
trailing whitespace, quotes+commas), paths with spaces, and an explicit
one-line-per-match invariant documenting the ripgrep contract the format
relies on (0/6775 real match contents contained a newline).
2026-06-17 13:45:25 -07:00
Austin Pickett
394cdf48ce
fix(logging): alias RotatingFileHandler to concurrent-log-handler (salvage #44921) (#46794)
* fix(logging): alias RotatingFileHandler to concurrent-log-handler

On Windows, stdlib RotatingFileHandler.doRollover() uses os.rename(), which
fails with PermissionError [WinError 32] whenever another process holds an
append-mode handle on agent.log — essentially always in Hermes (TUI, gateway,
hy_memory server, MCP servers, and on-demand CLI commands all log from separate
processes). This pinned agent.log at the 5 MiB threshold and spammed stderr
with a traceback on every emit (#44873).

Add concurrent-log-handler==0.9.29 as a core dep and alias its
ConcurrentRotatingFileHandler as RotatingFileHandler in hermes_logging.py. It
wraps the rename in a cross-process file lock (via portalocker: pywin32 on
Windows, fcntl on POSIX) so only one process rotates at a time. Aliasing keeps
every existing isinstance/class-declaration reference working unchanged.

Co-authored-by: tuancookiez-hub <tuancookiez@gmail.com>

* fix(logging): gate concurrent-log-handler swap to Windows only

The initial salvage aliased RotatingFileHandler -> ConcurrentRotatingFileHandler
unconditionally, which regressed POSIX: CLH opens lazily and rotates via its own
lock path, breaking managed-mode (NixOS) group-writable perms and eager file
creation that _ManagedRotatingFileHandler depends on. CI caught it as 2 failures
in test_managed_mode_*_group_writable on Linux.

The WinError 32 bug (#44873) is Windows-specific — POSIX renames an open file
fine, so stdlib already works on Linux/macOS. Gate the swap behind
sys.platform == 'win32': Windows uses CLH, POSIX keeps stdlib RotatingFileHandler.

- hermes_logging.py: platform-conditional import.
- tests/test_hermes_logging.py: import RotatingFileHandler from hermes_logging
  (single source of truth) so the autouse fixture's isinstance checks match the
  real handler class on both platforms.
- pyproject.toml/uv.lock: mark the dep 'sys_platform == "win32"' so portalocker
  /pywin32 only ship where used.

---------

Co-authored-by: tuancookiez-hub <tuancookiez@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-17 15:39:04 -05:00
kshitijk4poor
c835448908 fix(openviking): don't block the command thread on session switch; lock turn state
Follow-up hardening on @ehz0ah / @harshitAgr's session-switch work (#28296):

- on_session_switch no longer runs the old-session writer-drain + pending-token
  GET + commit POST inline on the caller's command thread. /new, /branch,
  /resume, /undo call it synchronously, so a slow drain (up to 10s) or wedged
  commit blocked the user-facing command — the same hazard #41945 fixed for
  end-of-turn sync. State now rotates synchronously (cheap) and the old-session
  commit is offloaded to a daemon finalizer (generalized _finalize_session_async).
- Guard the (_session_id, _turn_count) pair with _session_state_lock: sync_turn
  runs on the memory-manager executor thread while the session hooks run on the
  command thread, so the snapshot+reset vs increment was a cross-thread race.
- _session_needs_commit checks the committed-session guard BEFORE the
  turn_count>0 shortcut, closing a double-commit window when a racing sync_turn
  re-increments after commit+reset.
- Add a _shutting_down flag so deferred finalizers stop POSTing against a
  torn-down client; track all prefetch threads in a set so invalidate/shutdown
  join every one, not just the latest slot.

Tests: regression for the non-blocking switch (asserts the caller returns while
a slow drain is parked off-thread) and the committed-guard ordering; updated the
deferred-commit test to the unified finalizer contract.
2026-06-18 00:21:21 +05:30
xxxigm
33b1d14459
fix(desktop): pin Electron below the broken native extract-zip install (#47792)
* fix(desktop): pin Electron below the broken native extract-zip install

The Windows desktop install fails at "Building desktop app": Electron's
postinstall aborts with `ERR_DLOPEN_FAILED loading
index.win32-x64-msvc.node` / "Cannot find native binding" from
`@electron-internal/extract-zip`.

Root cause is a dependency drift, not the user's machine. Electron changed
its install mechanism mid-patch-series:

  electron 40.9.3 .. 40.10.2  -> @electron/get@^2 + extract-zip@^2 (pure JS)
  electron 40.10.3 / 40.10.4  -> @electron/get@^5 + @electron-internal/extract-zip@^1 (native napi)

apps/desktop declares `electronVersion: 40.9.3` (the tested, JS-extract
build) but pinned the dependency as `electron: ^40.9.3`, so `npm ci`/`npm
install` silently resolved 40.10.3/40.10.4 — onto the brand-new native
extract-zip whose win32-x64 binding fails to dlopen on some Windows hosts.
The committed lockfile already carried 40.10.3, and the installer's mirror
fallback can't help (it re-runs Electron's own `install.js`, which uses the
same broken native module).

Fix:
- Pin `electron` to an exact `40.10.2` — the newest build before the native
  extract-zip switch — and align `build.electronVersion` to match (Electron
  Builder needs electronVersion/electronDist to match the installed binary).
- Add a root `yauzl: ^3.3.1` override so the (re-introduced) JS extract-zip
  path also works on Node >= 24.16 / >= 26.1, where the old yauzl hangs.
  This is the same workaround the wider Electron ecosystem adopted.
- Regenerate package-lock.json: drops @electron-internal/extract-zip and
  @electron/get@5, restores @electron/get@2 + extract-zip@2 + yauzl@3.4.0.

* test(desktop): lock the Electron pin/version/lockfile consistency contract

Guards against the dependency drift that broke the Windows desktop install:
the Electron dependency must be an exact version, must equal
build.electronVersion, and the lockfile must resolve to that same version so
`npm ci` installs exactly what electron-builder packages. Asserts the
relationships, not a specific version number.
2026-06-17 14:42:30 -04:00
xxxigm
b07b7894ec
fix(desktop): keep streaming painting in unfocused secondary chat windows (#47919)
* fix(desktop): keep streaming painting in unfocused secondary chat windows

The chat transcript streams to screen through a requestAnimationFrame-gated
flush, which Chromium pauses for blurred/occluded windows. The primary window
opted out with `backgroundThrottling: false`, but the secondary "session
windows" (cmd-click pop-out, new-session, subagent-watch) hand-copied their
webPreferences and silently lost that flag — so a streamed answer in one of them
stalled until the window regained focus (reported on Windows 11). The primary
window's own comment even claimed it was "matching the secondary windows," which
was no longer true.

Hoist the chat-window webPreferences into a single shared factory
(`chatWindowWebPreferences`) in session-windows.cjs and use it for BOTH windows,
so they can never drift on this flag again.

* test(desktop): assert chat windows disable background throttling

Cover chatWindowWebPreferences: it must set backgroundThrottling=false (so the
streaming transcript paints while the window is blurred) and pass the preload
path through while keeping the hardened defaults (contextIsolation, sandbox,
nodeIntegration=false).
2026-06-17 14:40:13 -04:00
kshitijk4poor
0c1e8d0ba9 Merge remote-tracking branch 'upstream/main' into salvage-47662
# Conflicts:
#	tests/openviking_plugin/test_openviking.py
2026-06-17 23:59:24 +05:30
kshitij
1e6c4ba74f
Merge pull request #47973 from kshitijk4poor/fix/ov-skill-scaffolding
fix(tests): type-correct OpenViking skill-scaffolding test sentinels
2026-06-17 23:49:25 +05:30
kshitijk4poor
4de4a4e2da fix(tests): type-correct OpenViking skill-scaffolding test sentinels 2026-06-17 23:44:31 +05:30
kshitij
49d7481dfb
Merge pull request #47706 from NousResearch/fix/cli-login-deprecation-graceful
fix(cli): deprecated `hermes login` fails gracefully for any provider
2026-06-17 23:02:32 +05:30
teknium1
aa6f77596b chore: add AUTHOR_MAP entry for #47904 salvage 2026-06-17 09:49:46 -07:00
definitelynotguru
eaddeaf2e6 feat(xai): add grok-composer-2.5-fast to xAI OAuth model picker
The model is callable via xAI OAuth but omitted from models.dev and
/v1/models listings. Merge it into the curated xAI catalog so it appears
in `hermes model` without requiring a custom model name.
2026-06-17 09:49:46 -07:00
teknium1
cc9f37e77c chore: map Rivuza to AUTHOR_MAP for #44249 salvage 2026-06-17 09:49:39 -07:00
Reiji Kisaragi
3d21666b2f fix: preserve multimodal user content during persistence
Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242
2026-06-17 09:49:39 -07:00
xxxigm
c2fa302e93
Merge pull request #47913 from xxxigm/fix/desktop-backend-skew-toast-nag
fix(desktop): stop the "Backend out of date" toast nagging on every session open
2026-06-17 10:04:34 -05:00
Teknium
c6c8abbadb
refactor: remove agent-callable send_message tool (#47856)
* feat(mcp): raise default tool-call timeout 120s -> 300s

Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.

- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
  'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md

* refactor: remove agent-callable send_message tool

The agent should not decide on its own to fire off cross-platform
messages or reactions. Outbound platform messaging is handled outside
the agent loop — cron delivery, the gateway kanban notifier
(dashboard-toggled), and the `hermes send` CLI.

Removes the model-tool registration only; the send engine in
send_message_tool.py (_send_to_platform, _send_via_adapter,
_parse_target_ref, per-platform _send_* helpers) is kept intact for
those non-agent callers. Drops the now-empty 'messaging' toolset and
its `hermes tools` toggle. Yuanbao DM guidance now points at the
native yb_send_dm tool.
2026-06-17 07:11:23 -07:00
brooklyn!
f10f7114f9
Merge pull request #47664 from NousResearch/bb/desktop-markdown-spread-overflow
fix(desktop): stop a single message from crashing or freezing the chat
2026-06-17 08:37:06 -05:00
Brooklyn Nicholson
0138282f97 perf(desktop): keep oversized messages from freezing the chat
A multi-MB message (logged bundle, huge tool dump) froze the renderer
before any paint: Streamdown runs `preprocess` + `marked` lex over the
whole string synchronously in a useMemo, an uninterruptible long task
that no try/catch or content-visibility can help (our JS runs before the
browser ever skips layout). Tiered fix:

- Message gate: past 200KB, bypass markdown entirely and render the raw
  text in `content-visibility:auto` line-chunks — synchronous work is
  bounded to a string split, the browser virtualizes layout natively,
  and every line stays in the DOM (selectable, find-in-page).
- Code-block budget: past 3k lines / 150KB, skip Shiki (which emits a
  span per token) and render plain, chunked the same way.
- Collapse/expand: a reusable ExpandableBlock clamps code blocks and the
  huge-text fallback to a 120px preview with a gradient + chevron,
  expanding to 300px. The inner element is always a scroll container so
  the content-visibility chunks stay lazily laid out in both states.

No content is ever dropped; the copy button (card header) always yields
the full block.
2026-06-17 08:25:52 -05:00
Max Freedom Pollard
992b922389 fix(curator): stop restore from matching unrelated skills by name prefix
restore_skill() falls back to p.name.startswith(f"{skill_name}-") when no
archive directory matches the requested name exactly. That fallback is meant
to catch the timestamped duplicate archive_skill() writes on a name collision
(<skill>-YYYYMMDDHHMMSS), but the bare prefix also matches any unrelated
archived skill named <name>-something. So restoring "git" can pull an archived
"git-helpers" out of .archive/, rename it to "git", and report success: the
requested skill is not restored and the sibling is gone from the archive.

Constrain the fallback to the exact suffix archive_skill() produces, a 14 digit
timestamp. The exact-name match and the recursive nested-archive walk are
unchanged, so nested and timestamped restores still work; unrelated siblings no
longer match.

Fixes #47647
2026-06-17 06:04:03 -07:00