#48879 closed the tool-call sequence on interrupt inside finalize_turn so a
/stop after a tool no longer persists a `tool` tail that the next user message
turns into a `tool -> user` role-alternation violation (which strict providers
like Gemini/Claude react to by hallucinating a continuation and ignoring prior
context — what users see as "lost context after stop").
But the retry-wait, error-handling, and post-error retry-wait interrupt aborts
in conversation_loop return early and never reach finalize_turn, so they still
persisted and returned a raw `tool` tail. Interrupting during provider
backoff/rate-limiting (common under heavy work) hit exactly this path.
Extract the close into a shared close_interrupted_tool_sequence helper and apply
it at every interrupt abort (finalize_turn + the three early returns) so the
whole bug class is fixed, not just the one site.
fixes stacked PRs no-checks bug where
main < a < b
a merges into main
b is retargeted to main
but b doesn't run checks since it's not considered a new pr to main
now b will simply already have passing ci :)
Block scale-to-zero suspend while background async delegations are active, and restore runtime status to running on real inbound after a dormant wake.\n\nAdd regression coverage for both review findings.
The /learn authoring prompt taught a subset of the HARDLINE skill rules,
and stated the <=60-char description rule without making the model enforce
it — so generated descriptions overshot (up to 202 chars), which the
60-char system-prompt skill index then silently truncates.
- description: add the index-truncation rationale, a count-and-trim
self-check, and a good/bad length example so the model actually hits <=60.
- add platforms-gating rule (OS-bound primitives -> declare platforms:).
- add author-credits-human-first rule.
- round out the Hermes-tool framing with the full wrapped-tool mapping and
references/templates layout.
Closes#52367.
The host-allowlist hardening (#30611) plus the refresh heal (#49735) left
the documented NOUS_INFERENCE_BASE_URL dev/staging escape hatch unreachable
for OAuth sessions, despite three code comments asserting it still works.
Root cause — resolution precedence in resolve_nous_runtime_credentials:
inference_base_url = (
_optional_base_url(state.get("inference_base_url")) # stored — wins
or os.getenv("NOUS_INFERENCE_BASE_URL") # env — unreachable
or DEFAULT_NOUS_INFERENCE_URL
)
A staging OAuth login persists its inference_base_url, but the allowlist
rejects the staging host and the refresh heal rewrites the stored value to
the production default. The stored (now prod) value is then read BEFORE the
env var, so the override never takes effect — every request 401s against
prod or is pinned to prod, and setting the env var does nothing.
Fix: the user-set env override is the most-trusted source, so consult it
FIRST for the URL used to build the client / returned to callers — while
keeping the PERSISTED value the validated, network-provenance one (the
override is a runtime overlay, never written to auth.json, so unsetting it
cleanly reverts to prod). Applied at both chokepoints:
- resolve_nous_runtime_credentials (no-refresh read path AND refresh path)
- the nous_portal proxy adapter, which re-validates the resolver's returned
base_url against the prod allowlist as defense-in-depth and would
otherwise reject a legitimate staging override at the forward boundary.
New _nous_inference_env_override() / split of stored-vs-effective URL keep
the threat model intact: Portal-returned URLs are still allowlist-validated
at every network site, and the env path stays ungated (trusted OS user).
Also folds in the no-refresh read-path heal (supersedes the approach in
the open #50265): a poisoned stored staging host now heals to the prod
default on read even when no refresh fires.
Tests: TestEnvOverrideWins (env wins on read + refresh paths; override never
persisted; poisoned stored heals) and TestProxyAdapterEnvOverride. Verified
the 4 behavioral tests fail against pre-fix code and pass with the fix; full
inference-validation + nous-provider suites green (85 passed). E2E-validated
against a real temp HERMES_HOME exercising the real resolver + proxy adapter:
resolver→staging, persisted→prod, proxy→staging, unset→reverts to prod.
- Remove dead `chosen_base or effective_base` fallback; _select_zai_endpoint
always returns a non-empty base URL (returns current_base on cancel).
- Add .rstrip("/") to official-endpoint return for symmetry with custom-proxy
path (both now return normalized URLs).
- Replace magic index 4 with len(ZAI_ENDPOINTS) in custom-proxy tests so they
don't break if a 5th endpoint is added to ZAI_ENDPOINTS.
Z.AI now uses a curses picker instead of plain text input for base URL,
so the existing TestBaseUrlValidation tests (which used zai as their test
subject) are migrated to MiniMax, which still uses the text input path.
Add TestZaiEndpointPicker covering:
- Selecting each official endpoint (Global, China, Coding Plan Global,
Coding Plan China) saves the correct base URL to config
- Custom proxy URL entry (valid + invalid rejection)
- Cancel keeps the existing base URL
- Current endpoint is the default choice in the picker
- Non-standard URL defaults to the Custom proxy option
When provider_id == 'zai', replace the plain text Base URL input with
_select_zai_endpoint, which presents a curses picker offering Global,
China, Coding Plan Global, Coding Plan China, and custom proxy options.
Other API-key providers (MiniMax, DeepSeek, etc.) keep the text input.
Presents a curses-based picker (via _prompt_provider_choice) offering the
four official Z.AI endpoints — Global, China, Coding Plan Global, Coding
Plan China — plus a custom-proxy option. Sourced from ZAI_ENDPOINTS in
auth.py so it stays in sync with the probe list.
Not yet wired into the setup flow; that comes in the next commit.
Add a hover/focus "Remix" action on each completed draft card in the
generation grid. It re-runs generation with the chosen draft fed back in
as the reference image, keeping the same prompt and staying on step 2 so
the user can explore variations without starting over.
Because regenerating is slow and replaces the current drafts, the first
remix shows a one-time confirmation; the acknowledgement is persisted so
subsequent remixes fire immediately.
Move terminal/execute_code/read_file preview compaction into agent.display so CLI, gateway, and Ink TUI all inherit the same labels that desktop introduced in #52321.
The shared preview keeps raw args intact while trimming display-only shell plumbing (`cd`, pipe tails, banner/status echoes) and read_file line ranges. Desktop now prefers backend `context` for live rows and keeps its TypeScript fallback only for hydrated history.
The pet generation image-processing suite is deterministic but expensive enough
to blow the per-file CI timeout on Linux (140s), and it is not relevant to the
fast timeout PR's normal signal. Keep it available for manual validation, but do
not run it by default.
Set HERMES_RUN_SLOW_PET_TESTS=1 to enable the suite. The canonical test wrapper
now preserves that opt-in variable through its hermetic env.
The fixed "up to 5 minutes" wording undersells the slow quality-first path
(OpenAI image via OpenRouter), where a full hatch can run far longer. Use an
open-ended "several minutes" instead so the banner stays honest across the
fast and slow providers.
The quality-first default (OpenAI image via OpenRouter) is slow, and a full
hatch fans out ~8 rows with up to 3 retries each (300s/call) across 2 parallel
waves, so the absolute backend worst case is ~30 min. The old ceilings fired
mid-run:
- per-image HTTP call: 180s -> 300s (a single cold row can exceed 3 min)
- drafts RPC: 240s -> 420s (single wave, no retries — 7 min is ample)
- hatch RPC: 420s -> 1hr (sits above the ~30 min backend worst case)
The hatch ceiling is intentionally well above the realistic max so the frontend
never throws "request timed out" before the backend has exhausted its own
retries. The background-resumable notification path remains the real UX safety
net — the user can close the modal and get pinged on completion.
Make completed desktop tool rows read like useful activity labels instead of raw plumbing: terminal rows use a dispatch-style shell summarizer for agent wrappers, and read_file rows keep the action plus filename and requested line range.
The shell cleanup follows condensed-milk-pi's shape: split command compounds on real separators, strip pipe tails inside each segment, clean redirects/env prefixes, then classify setup/banner/status segments. Multi-command probes render as `first command + N commands`; the full command remains available in copy/detail.
Read rows now render as `Read package.json` or `Read main.ts L25-34`, using requested positive offset/limit and returned line numbers only as fallback for negative/unknown offsets.
Recurring cron jobs were prompt-cache-cold on every fire. session_id is
built as cron_<job_id>_<timestamp>, and the Codex/Responses transport used
session_id directly as prompt_cache_key — so the timestamp changed the cache
key on every run and the static prefix (agent identity + tool schemas) was
re-paid each tick.
Derive prompt_cache_key from a SHA-256 of the static prefix (instructions +
sorted tool schemas) instead. Repeated fires of the same job share one
content-addressed key (pck_<hash>) and reuse the warm prefix within the
provider's cache TTL. The key changes exactly when the prefix changes —
edit the job's prompt or toolset and it re-keys; leave it alone and it stays
stable.
session_id is left untouched for transcript isolation, log correlation, and
the Codex/xAI session-scope routing headers (session_id, x-client-request-id,
x-grok-conv-id) — those are the per-fire identity, not the cache key. Only the
prompt_cache_key body field (standard OpenAI/Codex path and the xAI extra_body
field) is content-addressed.
Closes#51395.
Co-authored-by: spiky02plateau <spiky02plateau@users.noreply.github.com>
Co-authored-by: JoaoMarcos44 <JoaoMarcos44@users.noreply.github.com>
* fix(relay): authorize relay-delivered events by delivery, not source.platform
The #52190 upstream-authz fix keyed _is_user_authorized off
source.platform via _adapter_authorization_is_upstream(source.platform).
But a relay *message* inbound carries the UNDERLYING platform
(source.platform == discord/telegram/...), NOT Platform.RELAY, because
ws_transport._event_from_wire maps the connector's wire payload
(platform="discord") straight onto SessionSource for session-keying and
egress. The relay adapter is registered only under Platform.RELAY, so
adapters.get(Platform.DISCORD) misses, the trusted-upstream branch is
skipped, and the user hits the env-allowlist default-deny:
WARNING gateway.run: Unauthorized user: <id> (<name>) on discord
(Live staging bug: alpha tester linked successfully, then every
follow-up DM was silently dropped.)
Fix: the authentic trust signal is that the event was delivered over the
per-instance-authenticated relay WS, not which platform it underlies. Add
a wire-INVISIBLE SessionSource.delivered_via_upstream_relay flag, stamped
by the relay transport in _event_from_wire, and authorize on it. The flag
is excluded from to_dict/from_dict so a peer can neither forge it across
the wire nor have it restored from persistence. The existing adapter-flag
check is retained for events whose source.platform IS Platform.RELAY
(interaction-passthrough). A direct Discord event on a multiplexing
gateway (direct + relay adapters) is unmarked and still default-denies.
* fix(relay): use identity check on delivery marker to avoid MagicMock fail-open
A MagicMock() source (used by test_signal.py and other gateway tests) auto-
vivifies source.delivered_via_upstream_relay as a truthy Mock, which a bare
truthiness check would treat as authorized — flipping
test_signal_in_allowlist_maps from False to True. The marker is a real bool on
SessionSource, so check 'is True' explicitly: refuses to authorize any non-bool
stand-in, defensive against accidental fail-open.
OpenRouter/Nous image gen now runs a quality-first model chain by default:
attempt the highest-fidelity OpenAI image model first, then fall back to
Gemini 3 Pro Image when it's access-gated/unavailable/times out. An explicit
OPENROUTER_IMAGE_MODEL / config model override pins one model with no fallback.
Atlas validation rejects malformed model output instead of shipping it: adds a
per-state collapse guard (a single sliver/fragment row no longer passes because
other rows are healthy), on top of the existing postage-stamp + multi-pose
checks.
Desktop: pet-gen native notifications are now "global" (not tied to a chat
session), so a background generation started from the command center fires an
OS notification when the user is away even with no active session. Adds a
neutral "This can take up to 5 minutes." banner on step 1, and lets the
provider picker auto-size.
Tests updated/added for the OpenRouter fallback chain, the collapse guard, and
the global notification path.
Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.
Addresses review on #51077 (kxee). The continuable-cron mirror reused
gateway.mirror.mirror_to_session, which writes role=assistant — re-
introducing the exact alternation violation #2313 (37a997945)
deliberately removed: a cron brief landing as assistant after the
agent's last turn yields assistant->assistant, which breaks strict-
alternation providers (OpenAI/OpenRouter) per issue #2221. The mirror/
mirror_source metadata is also dropped at the SQLite boundary, so the
[Delivered from cron] label is lost on replay.
This is an intentional, opt-in (default OFF) reversal of #2313's
'cron output does not belong in interactive history' for the reply-to-
cron use case — gated behind cron.mirror_delivery / attach_to_session.
Fixes:
- mirror_to_session gains a role param (default 'assistant' — interactive
send_message mirror unchanged, it IS the agent speaking). Cron paths
pass role='user' with a '[Cron delivery: <task>]' prefix so the brief
collapses via repair_message_sequence's consecutive-user merge on every
provider, and stays distinguishable on replay despite the metadata drop.
- thread_seeded: defer seeding + the flag until delivery into the new
thread actually succeeds. Previously set pre-delivery, so an open-
succeeds / deliver-fails case both stranded a seeded-but-unseen brief
AND suppressed the DM-fallback mirror.
- seed mirror now passes user_id='system:cron' to resolve the exact
thread-keyed session row it just created.
- dedupe the duplicate BasePlatformAdapter import in _deliver_result.
- trim oversized docstrings to non-obvious WHY (AGENTS.md).
- docs: document cron.mirror_delivery / attach_to_session in
website/docs/user-guide/features/cron.md.
- test: assert the cron mirror writes role='user' with the label prefix.
204 cron+mirror tests pass.
Continuable cron jobs (attach_to_session / cron.mirror_delivery, default
OFF) now prefer a dedicated thread on thread-capable platforms, falling
back to origin-DM mirroring where threads don't exist.
- Thread-capable (Telegram topics, Discord/Slack threads): open a fresh
thread for the job via the shipped adapter.create_handoff_thread,
route the brief into it, and seed the thread-keyed session so the
user's in-thread reply continues with full context. This is the
'continuable cron opens its own thread' interface.
- DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None ->
fall back to mirroring into the origin DM session (existing behaviour).
Reuses existing infrastructure end-to-end — no new adapter surface, no
provider-chain signature change:
- adapter.create_handoff_thread (already implemented per-platform,
returns None on unsupported platforms = the fallback signal)
- the live SessionStore via adapter._session_store (already set on every
adapter), reached without threading a new param through the frozen
CronScheduler.start() contract
- gateway.mirror.mirror_to_session for the seed/append
- existing per-target delivery routing carries the new thread_id for free
Mirrors GatewayRunner._process_handoff's open-thread-or-fallback +
seed pattern, standalone for the cron delivery path. thread_seeded
guards against a double-mirror after seeding. Scoped to the origin
target only; fan-out/broadcast targets are never threaded or mirrored.
Config docs updated (cron.mirror_delivery) + cronjob tool
attach_to_session description reframed around continuable/thread-preferred.
Tests: +5 (thread id returned on thread platform; None on DM platform;
None without capability/loop; seed creates thread session + mirrors;
seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests
pass (4 failures pre-existing: croniter-not-installed + TZ).
Multi-participant parity with interactive send_message, which passes
HERMES_SESSION_USER_ID to gateway.mirror.mirror_to_session so the mirror
lands in the exact participant's session.
- cronjob_tools._origin_from_env now captures user_id from the session
context at job-create time (alongside platform/chat_id/thread_id).
- _maybe_mirror_cron_delivery forwards user_id to mirror_to_session.
- _deliver_result threads origin.user_id through for the origin target.
Effect: in a per-user-isolated group chat (group_sessions_per_user=True,
the default), the mirror resolves to the member who scheduled the job
instead of conservatively no-op'ing on ambiguous candidates. DMs and
shared group/thread sessions are unaffected (single candidate). Default
still OFF.
Tests: helper forwards user_id; E2E _deliver_result forwards origin
user_id. 17/17 in TestCronDeliveryMirror; 527 cron tests pass (4 failures
pre-existing: croniter-not-installed + TZ, identical on baseline).
The cron->session mirror now fires ONLY for the delivery target that
equals the job's origin (platform+chat_id[+thread_id]). A job created
from a live gateway chat stamps that chat as origin, and that session is
guaranteed to exist (it is the conversation the user scheduled the job
in). Fan-out / broadcast / home-channel-fallback targets are never
mirrored: they are not a continuation of a conversation and may have no
session at all.
This makes the prior 'cold-start session seeding' concern a non-case by
construction: when the mirror semantically applies the session exists;
when none exists the target was never the origin, so we no-op.
Adds _target_matches_origin() + origin-scoping tests (exact match,
other-chat/other-platform/no-origin rejection, thread scoping, fan-out
mirrors only the origin target).
Adds an opt-in path so a cron job's delivered output is also appended to
the TARGET chat's gateway session transcript (as an assistant turn), so a
user reply to a recurring delivery (daily brief, reminder) is answered with
the delivery in context instead of 'what is that?' amnesia.
- Reuses the shipped gateway.mirror.mirror_to_session — the same primitive
interactive send_message mirroring already uses. No messaging-toolset
change (cron still can't call send_message; this rides delivery).
- Gated: per-job attach_to_session overrides global cron.mirror_delivery
(config.yaml). Default OFF — historical isolation preserved byte-for-byte.
- Mirrors the CLEAN agent output, not the cron header/footer wrapper.
- Alternation/cache-safe: append lands at a turn boundary, never mid-loop,
never mutates the cached system prompt. Cold-start (no target session)
is a silent no-op; mirror errors never fail a successful delivery.
- Surfaced on the cronjob tool (attach_to_session) + config schema.
Driven by enterprise cron-as-control-plane use case. 10 new tests; full
cron + cronjob-tool suites pass (600).
ToolFallback rebuilt the `part` wrapper every render, defeating the
buildToolView memo and re-running a full JSON.stringify of the result on
every ~33ms stream delta. A /learn over a large directory (many ~100KB
tool results) saturated the renderer main thread (hang/throttle) and
spiked memory until it OOMd (crash).
- Re-derive a stable `part` from the referentially-stable args/result so
the view/copy memos hold across deltas.
- Clamp every inline-painted payload (detail, stdout/stderr, rawResult,
technical trace) to MAX_TOOL_RENDER_CHARS; the row's Copy button still
reads the uncapped view.detail for the full output.
A DM reply carries no guild_id, so the connector's egress guard cannot
resolve the owning tenant from metadata.guild_id and declines the send
with "discord egress declined: target not routed to an onboarded tenant"
— the bug behind "the bot never replies in DMs". Guild replies are
unaffected (they carry guild_id), which is why the guild path worked
end-to-end while DMs looked broken.
The connector now resolves a DM reply's tenant from the recipient's
author binding (gateway-gateway #67, resolveByUser keyed on
metadata.user_id) — the outbound counterpart to inbound Phase 7a
author-first resolution. But it needs the recipient user_id ON the
outbound action, and the adapter only re-attached guild_id
(_capture_scope/_with_scope), no-op for DMs (the docstring even said so).
This extends the adapter's inbound-scope capture: for a DM (no guild_id)
remember chat_id -> the authentic author user_id we observed, and
re-attach it as metadata.user_id on outbound. Guild capture is unchanged
and wins when present; user_id is the DM-only fallback. The id is the one
the connector observed inbound (never gateway-asserted), so the trust
invariant holds.
+4 unit tests (DM reply re-attaches user_id + no guild_id; unknown chat
invents nothing; explicit user_id preserved; guild reply never carries
user_id). Proved load-bearing (reverting the re-attach fails the DM
test). 144 relay tests pass, ruff clean.
Pairs with gateway-gateway #67 (the connector-side resolver). Together
they close the DM-reply egress gap end-to-end.
terminal_tool() resolves a per-task cwd override that WINS over config["cwd"]:
cwd = overrides.get("cwd") or config["cwd"]
config["cwd"] is sanitized for container backends in _get_env_config() (host
prefixes /Users//home//C:\\/C:/ and relative paths are replaced with the
backend default /root). But the override was applied RAW — it was never run
through that guard. The gateway/TUI registers the host launch dir as a cwd
override for workspace tracking (tui_gateway/server.py _register_session_cwd
-> _terminal_task_cwd -> _session_cwd -> os.getcwd()), so on a container
backend a host path leaked straight to `docker run -w <host-path>`:
- Windows desktop: -w C:\Users\<user> -> container fails to start (exit 125)
- POSIX: -w /home/<user> -> same
The ACP adapter translates its override cwd (acp_adapter/session.py
_translate_acp_cwd), but the gateway path did neither translation nor
sanitization, so the override bypassed the one guard that would have caught it.
Fix: extract the host/relative-path predicate into a shared
_is_unusable_container_cwd() helper (so the existing _get_env_config()
sanitizer and the new guard can't drift), and re-apply it to the *resolved*
cwd at the override-resolution site. Valid in-container override paths
(RL/benchmark sandboxes that set cwd to /workspace, /root, ...) are absolute
non-host paths and pass through untouched.
Tests: unit-pin the predicate (Windows backslash/forwardslash, POSIX home,
macOS /Users, relative, valid container paths) AND an E2E call-site pin that
drives terminal_tool() with a host-path override registered and asserts the
cwd reaching _create_environment is sanitized. Mutation-verified: reverting
the call-site guard makes the two host-path E2E tests fail (showing the raw
host path leaking) while the valid-/workspace-override test stays green.
The desktop bootstrap (and curl/PowerShell/docker installs) seeded
~/.hermes/SOUL.md with a comment-only scaffold that contained no persona
text. That shadowed the runtime default (_ensure_default_soul_md ->
DEFAULT_SOUL_MD), since seeding is guarded by 'if SOUL.md doesn't exist'.
Result: every fresh installer install got the empty template instead of
the documented Hermes persona; desktop just made it visible in onboarding.
- install.sh / install.ps1 / docker/SOUL.md now write DEFAULT_SOUL_MD.
- _ensure_default_soul_md() upgrades a SOUL.md still matching the known
legacy scaffold in place; customized files (any deviation, incl. a
persona appended below the comment) are never touched.
- Detection normalizes CRLF/BOM so Windows-installer drift still matches.
The Desktop GUI (tui_gateway) slash worker subprocess has no reader for
the CLI's _pending_input queue. /learn's CLI handler prints the ack and
puts the built prompt onto that queue, so in the TUI the prompt was
silently dropped — ack shown, no LLM turn, no skill created (#51829).
command.dispatch already handles 'learn' correctly (returns
{type: send, message: build_learn_prompt(arg)}), but 'learn' was missing
from _PENDING_INPUT_COMMANDS, so slash.exec fell through to the worker
instead of routing to command.dispatch. Add it to the frozenset, matching
the existing goal/queue/steer/plan pattern.
The gateway-side BEHAVIOUR layer that consumes the relay scale-to-zero
primitives (gateway-gateway Phase 5): the gateway decides it is idle and
drives the relay transport dormant so the platform (Fly autostop:"suspend")
can suspend the now-traffic-idle machine, which wakes on the connector's
wakeUrl poke (decisions.md Q3=C', D1-D13).
- gateway/scale_to_zero.py: pure helpers — scale_to_zero_enabled (the NAS
Labs HERMES_SCALE_TO_ZERO stamp, D11/Q8=A), parse_idle_timeout_seconds
(config.yaml gateway.scale_to_zero.idle_timeout_minutes, D2),
messaging_is_relay_only_or_absent (F6/D1), should_arm (D1/D11/§3.4(1)),
is_idle (D2/D3/F7).
- gateway/run.py: _last_inbound_at clock stamped on user inbound in
_handle_message (F13); the arm-gate + idle predicate + the
_scale_to_zero_watcher dormant sequence (mark draining -> adapter
go_dormant() -> cooldown), started only when armed. Deliberately NOT the
stop path and NOT mark_resume_pending (F12/D13).
- tools/process_registry.py: has_any_active() for the bg-work guard (D3/F7).
- hermes_cli/config.py: gateway.scale_to_zero.idle_timeout_minutes default 5.
Tests: 38 pure-logic + 6 watcher (incl. bg-work regression guard proven RED).
Full relay + scale-to-zero suites: 184 passed. The 20 unrelated failures in
the broader run are PRE-EXISTING on origin/main (custom-provider/tools tests),
confirmed via a pristine baseline worktree.
Net-new WebSocketRelayTransport.go_dormant() + RelayAdapter.go_dormant() —
the third transport mode the scale-to-zero behaviour layer needs, distinct
from both disconnect() and an unexpected close (decisions.md D12/F14):
- disconnect() sets _closing=True and CANCELS the reconnect supervisor
(terminal "shutting down for good") -> a suspended machine never re-dials
on wake, stranding its buffered backlog.
- an unexpected close re-dials IMMEDIATELY -> the socket never stays down,
so the platform proxy never suspends the machine.
go_dormant(): going_idle->ack (reuse go_idle), then close the socket WITHOUT
setting _closing, so the reader's fall-through still arms the reconnect
supervisor (wake path stays live) but on the longer _dormant_redial_s
cadence so it doesn't fight the platform suspend window. A successful re-dial
clears _dormant. Honors the §3.4 wake->reconnect->drain contract.
Tests: 6 new in test_relay_going_idle.py incl. the F14 regression guard
(routing dormancy through disconnect() fails exactly the 4 wake-path tests).
Full relay suite 140 passed.