Defense-in-depth follow-up to the runtime guard added in the previous commit.
Models on headless hosts (Railway / Fly / Docker / fresh VPS) without any ACP
CLI installed occasionally hallucinate ``acp_command="copilot"`` from the
schema description, despite the explicit "Do NOT set" instruction. The runtime
guard prevented the crash but the model still wasted a tool turn and got an
opaque silent fallback.
This commit removes the temptation at its source: ``_build_dynamic_schema_overrides``
now strips ``acp_command`` and ``acp_args`` from both the top-level and per-task
schemas when none of the known ACP CLIs (``copilot``, ``claude``, ``codex``) are
detectable on PATH. The model literally never sees the fields, so it cannot
pass them.
The runtime guard from the previous commit stays in place as defense-in-depth
for internal callers, tests, and any future code path that bypasses the schema.
``_acp_binary_available`` is intentionally NOT cached: ``shutil.which`` is
cheap, and avoiding the cache means the schema reacts to mid-session installs
without requiring a process restart.
Tests:
- ``test_schema_prunes_acp_command_when_no_acp_binary``
- ``test_schema_keeps_acp_command_when_binary_available``
- ``test_acp_binary_available_checks_known_clis``
Full ``test_delegate.py`` suite: 136/136 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a model passes `acp_command="copilot"` (or any other binary name) in a
`delegate_task` tool call, `_build_child_agent` unconditionally sets
`effective_provider = "copilot-acp"`, which routes the subagent through
`CopilotACPClient`. That client spawns the named binary via subprocess; if it
isn't on PATH, every retry raises RuntimeError and an asyncio cleanup race
during error delivery can take the entire gateway down.
This is a real failure mode on headless deploys (Railway / Fly / VPS / Docker)
where `copilot` / `claude` / etc. aren't installed. The schema does say
"Do NOT set unless the user explicitly told you an ACP CLI is installed,"
but models occasionally pass it anyway — particularly for X (Twitter) search
prompts where Grok seems to associate ACP with "search assistance."
Reproduction:
- Headless install (no `copilot` binary on PATH)
- Set provider to xai-oauth + model grok-4.3
- Telegram prompt: "Search X for crypto twitter trends"
- Grok decides to delegate and passes `acp_command="copilot"`
- Subagent crashes 3x, gateway crashes on the 3rd retry teardown
Fix: validate the binary exists on PATH via `shutil.which` before honoring
the override. If missing, log a warning and fall through to the parent's
default transport. No behavior change when the binary IS present (covered
by `test_build_child_agent_honors_acp_command_when_binary_present`).
Tests:
- `test_build_child_agent_ignores_acp_command_when_binary_missing`
- `test_build_child_agent_honors_acp_command_when_binary_present`
Verified on Python 3.11 (macOS) and 3.12 (Debian 13 container).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WhatsAppAdapter lives under plugins/platforms/whatsapp/adapter.py on
current upstream; the owner-forward test still imported the removed
gateway.platforms.whatsapp module.
The opt-in WHATSAPP_FORWARD_OWNER_MESSAGES path in bot mode marks
fromMe inbound messages as fromOwner: true and forwards them to the
Python adapter so plugins can detect "owner just typed in this chat"
and trigger handover / sliding TTL flows. The previous implementation
bypassed the allowlist for that path: the existing allowlist gate at
the bottom of the dispatch loop is guarded by !msg.key.fromMe, so any
chat the operator happened to reply to was forwarded — even ones not
on WHATSAPP_ALLOWED_USERS.
Concretely, on a deployment with a single allowlisted customer, an
owner reply in any other chat would still wake Hermes and let the
gateway-policy plugin's owner-implicit branch create a stray handover
row keyed by the non-allowlisted chatId.
Fix: extract the bot-mode fromMe gate into a small pure helper
(`owner_message_gate.js`) that returns one of
{drop_echo, drop_disabled, drop_allowlist, forward_owner, pass} so the
new allowlist branch can be unit-tested without spinning up Baileys.
The check runs against the customer chatId (not senderId, which is
the owner's own number/LID and won't be on the allowlist by
construction). matchesAllowedUser already short-circuits true on an
empty allowlist or "*", so deployments without an allowlist see no
behavior change.
Self-chat mode is untouched — its existing isSelfChat pin is the
correct guard there.
Tests: scripts/whatsapp-bridge/owner_message_gate.test.mjs covers
echo drop, disabled drop, the new allowlist drop, the forward path,
the open-allowlist short-circuit, and the precedence of echo/disabled
checks over the allowlist check (so logs stay honest).
When WHATSAPP_FORWARD_OWNER_MESSAGES is enabled and the bridge marks an
inbound message with fromOwner=true, also prefix MessageEvent.text with
"[owner reply] " at construction time. This makes the disambiguation
survive any downstream plugin failure (e.g. handover-rule errors that
bypass silent_ingest), so transcripts never misattribute owner-typed
text to the customer.
Idempotent: re-applies are guarded so a future producer that pre-tags
text won't be double-prefixed.
In `WHATSAPP_MODE=bot` the bridge currently drops every fromMe inbound
message — they are all assumed to be echoes of our own /send calls.
That makes it impossible for plugins / agents to detect when a human
owner has typed directly into a customer chat from the same WhatsApp
Business account (e.g. via a linked phone or WhatsApp Web).
This adds an opt-in `WHATSAPP_FORWARD_OWNER_MESSAGES` env var. When
true, the bridge classifies fromMe inbound by looking up `key.id` in a
bounded LRU of recently-sent message IDs (the existing 50-entry echo
suppressor, bumped to 512 and extracted to a testable
`outbound_ids.js` helper). Hits in the LRU are still dropped (echoes);
misses are forwarded to the Python adapter with `fromOwner: true`.
The Python adapter lifts that flag onto
`MessageEvent.metadata["whatsapp_from_owner"]`. `metadata` is a new
free-form dict on the event so future per-platform signals don't each
need their own field. Default behaviour is unchanged: with the env
flag unset, bot mode still drops every fromMe message exactly as
before.
Use cases for downstream consumers:
- Implicit handover activation when the owner replies manually
- Sliding TTL on owner activity (keep an active session alive while
the owner is engaged)
- Audit trails of owner interventions
- Analytics on human-vs-bot reply ratios
Heuristic limitation (documented in code): the LRU is in-memory. After
a bridge restart, in-flight delivery receipts of pre-restart sends will
briefly look like owner-typed for a few seconds until the set is
repopulated. Persisting isn't worth the disk churn — downstream
consumers should treat the flag as best-effort.
Tests:
- tests/gateway/test_whatsapp_from_owner.py (new): adapter sets the
metadata flag iff the bridge payload has `fromOwner: true`; absent
otherwise.
- scripts/whatsapp-bridge/outbound_ids.test.mjs (new): LRU bounds,
eviction order, falsy-id handling.
Backwards compatibility: with the env flag unset, every code path is
identical to before. No existing deployment is affected.
The gateway/CLI /model switch path (switch_model in agent_runtime_helpers)
built the MoAClient facade but left agent.api_mode at the value
determine_api_mode / the resolved aggregator transport produced (e.g.
codex_responses or anthropic_messages). The conversation loop dispatches on
agent.api_mode, so a non-chat_completions value made the primary/acting call
go through client.responses.create — which the MoAClient facade has no
.responses for — and fall through to the moa://local placeholder, 404 three
times, then fall back to a reference model (issues #54259, #54669).
agent_init.py already pins api_mode=chat_completions for provider==moa; mirror
that in the live switch so the primary call always routes through
MoAClient.chat.completions. The aggregator's real transport is resolved and
applied inside the reference/aggregator fan-out, not on the outer call.
Slot_runtime resolved the provider's real API surface (including api_mode)
but only forwarded base_url and api_key to call_llm, dropping api_mode.
This caused Copilot GPT-5.x reference slots to hit /chat/completions
instead of the Responses API, returning 400 unsupported_api_for_model.
- _slot_runtime: forward api_mode from resolve_runtime_provider
- call_llm: accept explicit api_mode param, override task config
- 4 regression tests for propagation, omission, and signature
If redact_sensitive_text() raises or fails to import, stdout/stderr
were silently left unredacted and could leak API keys or tokens into
cron job delivery messages and logs.
Replace bare with a warning log and replace
both outputs with '[REDACTED - redaction failed]' to prevent leaks.
Root cause: silent exception swallow in _run_job_script()
Impact: potential secrets leak in cron job output delivery
get_copilot_api_token now returns (api_token, base_url); the auth-remove
suppression test still mocked it as a bare string, mis-unpacking into the
credential-pool seed path and failing with 'No credential #1'.
Folds @trevorgordon981's #50590 into difujia's #15139:
- exchange_copilot_token now prefers the authoritative endpoints.api from
the token-exchange response, falling back to the proxy-ep-derived host
- resolve_api_key_provider_credentials gains a copilot branch that resolves
the account-specific base URL and a non-empty last-resort guard, so chat
inference never wedges on an empty base URL (#50252)
Co-authored-by: Trevor Gordon <trevorbgordon@gmail.com>
The earlier enterprise base URL change (proxy-ep parsing) gave us URLs
like `api.enterprise.githubcopilot.com`, but ~15 host-matching call
sites still hard-coded `api.githubcopilot.com`. Enterprise users would
therefore drop the `Copilot-Integration-Id: vscode-chat` header at
client-build time, and upstream rejected requests with:
The requested model is not available for integrator "zed"
(or "copilot-language-server") — verify the correct
Copilot-Integration-Id header is being sent.
The header was correct in copilot_default_headers(); it just never
made it into default_headers for non-default hostnames because every
detector compared against the exact string "api.githubcopilot.com".
This commit broadens all those checks to "githubcopilot.com" via
base_url_host_matches (which already does proper subdomain matching),
so api.enterprise.githubcopilot.com, api.business.githubcopilot.com,
etc. all share the same headers, vision routing, max_completion_tokens
selection, and reasoning-effort detection as the default endpoint.
Also adds ".githubcopilot.com" to _URL_TO_PROVIDER so context-window
resolution via models.dev works for enterprise base URLs, and tightens
_is_github_copilot_url to use suffix matching instead of strict equality.
Tests:
- New: enterprise Copilot endpoint preserves Copilot-Integration-Id
- New: enterprise endpoint returns max_completion_tokens (not max_tokens)
- Existing 333 base_url / copilot / aux-client / credential-pool tests pass
Parts 5 of #7731.
Two changes that complete the Copilot auth story (#7731 parts 3 and 4):
1. Switch OAuth client ID from opencode (Ov23li8tweQw6odWQebz) to VS Code
(Iv1.b507a08c87ecfe98). The old ID produces gho_* tokens that return
404 on /copilot_internal/v2/token, making token exchange non-functional.
The new ID produces ghu_* tokens that support exchange.
2. Derive enterprise API base URL from the proxy-ep field in the exchanged
token. Enterprise accounts get tokens containing e.g.
"proxy-ep=proxy.enterprise.githubcopilot.com" which is converted to
"https://api.enterprise.githubcopilot.com" and stored in the credential
pool. Individual accounts (no proxy-ep) continue using the default URL.
The COPILOT_API_BASE_URL env var remains as a user escape hatch.
Tested on both Individual and Enterprise Copilot accounts:
- Individual: device flow works, exchange succeeds, base_url=None (default)
- Enterprise: device flow works, exchange succeeds, 39 models returned
including claude-opus-4.6-1m (936K), enterprise base URL derived
Parts 3 and 4 of #7731.
Adds tests/agent/test_model_extra_type_guard.py exercising the real
ChatCompletionsTransport.normalize_response path with string/list/None/dict
model_extra; adds the AUTHOR_MAP entry for the contributor.
Some OpenAI-compatible providers (NVIDIA NIM + qwen3.5) return a string
for model_extra instead of a dict. The falsy fallback (x or {}) treats a
truthy non-empty string as the value and calls .get() on it, raising
AttributeError and turning every tool call into [error].
Replace the falsy fallback with an explicit isinstance(.., dict) guard at
both extra_content extraction sites (non-streaming normalize_response and
the streaming delta accumulator).
An over-cap model.max_tokens produces a provider 400 that mentions
max_tokens, which trips _CONTEXT_OVERFLOW_PATTERNS and is classified as
context_overflow. On providers whose wording isn't recognized by
parse_available_output_tokens_from_error() (e.g. DashScope/Qwen:
"Range of max_tokens should be [1, 65536]") the smart-retry is skipped
and the error falls into the compression fallback, which re-sends the
same oversized max_tokens, fails identically, and loops until
"cannot compress further" on a tiny conversation (#55546).
Root-cause fix for the whole class, not just DashScope:
- parse_available_output_tokens_from_error(): recognize the DashScope
"Range of max_tokens should be [1, N]" form and return N (smart-retry
then caps output and retries WITHOUT compressing).
- new is_output_cap_error(): broader yes/no gate for output-cap 400s.
In the loop, when the error is output-cap-shaped but unparseable, fail
fast with an actionable message (lower model.max_tokens) instead of
routing into compression. Mirrors the existing GPT-5 max_tokens guard.
Real input overflows and GPT-5 unsupported-param 400s are unchanged.
Concurrent ACP sessions run on a shared ThreadPoolExecutor (max_workers=4).
Each _run_agent mutated the process-global os.environ["HERMES_INTERACTIVE"]
and restored it in finally, so one session's restore could clobber another's
set mid-run — dropping the second session onto the non-interactive
auto-approve path, executing a dangerous command without the approval
callback firing (GHSA-96vc-wcxf-jjff).
Replace the env-var flag with a thread/task-local contextvar in
tools.approval. The two HERMES_INTERACTIVE read sites in approval.py now go
through _is_interactive_cli() (contextvar-first, env fallback for legacy
single-threaded CLI callers). The ACP executor sets the contextvar instead
of os.environ; the existing contextvars.copy_context() wrapper isolates each
session's write.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
The Matrix adapter's _upload_file fell back to sending
"(file not found: {file_path})" directly into the room — the same
host-path leak class fixed for the base adapter and Slack in the
previous commit. Replace it with a friendly notice, log the path at
WARN for operators, and preserve any caller-supplied caption.
The base BasePlatformAdapter implementations of send_voice, send_video,
send_document, and send_image_file forwarded their *_path argument
verbatim into the chat text (e.g. "🎬 Video: /home/.../hermes/cache/...").
Telegram, Discord, and Slack adapters all fall back to those base methods
when their native send raises — so a rejected video on Telegram surfaced
the host filesystem layout to the user instead of a useful message.
Replace the path-echo with a friendly notice, log the path for operator
diagnostics, and keep the user-supplied caption intact. The Slack adapter
had three identical sites that fell through to the same path-echo on its
own native upload failures; fix those too. send_document still surfaces
the caller-provided file_name (or the basename derived from it) since
that is the user-facing filename, not a host path.
Add regression tests asserting the *_path argument never appears in the
fallback content while caption text and explicit file_name still do.
The salvaged test double predated two main changes:
- start() now connects via _connect_adapter_with_timeout, which forwards
is_reconnect to adapter.connect(); the StartupRaceAdapter double didn't
accept the kwarg.
- stop() now awaits _finalize_shutdown_agents (async on main); the fixture
stubbed it as a plain MagicMock.
Accept is_reconnect in the double and use AsyncMock for the finalize stub.
Vision requests routed through the OpenAI-compat API server forward the
raw multi-part content list ([{type:"text"}, {type:"image_url"}, ...])
straight through as user_message. The codex intermediate-ack detector
flattened it with (user_message or "").strip(), so a truthy list survived
and .strip() raised AttributeError — killing any Codex-routed vision turn
that took the require_workspace path.
Route through the existing _summarize_user_message_for_log helper (which
already backs the logging/banner previews on main), and widen the param
type hint from str to Any to match how the function is actually called.
The two logging-preview sites the original PR also touched were fixed
independently on main by the conversation-loop refactor.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
delegate_task's per-task completion display emitted lines like
"✓ [1/3] Research done (17.92s)" via a bare print(). Under ACP (and any
headless JSON-RPC stdio host where AIAgent routes human output to stderr
via a custom _print_fn), these landed on stdout and corrupted the
protocol frame stream, surfacing as "Failed to parse JSON message: ✓
[3/3] …" in the ACP adapter.
Add _emit_parent_console() which prefers parent_agent._safe_print (the
same hook AIAgent uses for every other user-facing print) and falls back
to print() only when no router is wired up or it raises. CLI behavior is
unchanged.
The PR's other fix (preset toolset expansion) is already covered on main
by _expand_parent_toolsets(), so only the stdio-safe printing change is
salvaged here.
The loopback ws-ping window is now 30s/60s (#48445/#50005), so the
hardcoded == 20.0 assertion was a change-detector that broke the moment
the loopback tuning landed. Assert the behavioral contract instead: ping
stays enabled (positive) and timeout >= interval.
session.info is only ever an emitted event (_emit), never a dispatched
@method RPC, so listing it in _LONG_HANDLERS is dead weight that can
never match a dispatched method name. Remove it from the set and the
test's frontend-polled list to keep _LONG_HANDLERS to real RPCs.
Frontend-polled read-only RPCs (session.list, pet.info, process.list)
ran inline in the WS read loop. Under GIL pressure from concurrent agent
turns they block the loop, timing out frontend polls and surfacing as a
false "needs setup" / dropped session (#50005, #48445). Route them
through _LONG_HANDLERS so dispatch() returns immediately, and raise the
default RPC pool to 8 workers so the added long handlers don't queue.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
Three targeted fixes for Desktop GUI WebSocket stability when agent
turns starve the uvicorn event loop of CPU (GIL contention):
1. Loosen ws_ping_timeout for loopback binds (QW-1)
- Loopback (Desktop): ping 30s interval / 60s timeout
- Non-loopback (Cloudflare Tunnel): unchanged 20/20
- A GIL-heavy agent turn can stall the event loop past 20s;
uvicorn's keepalive ping runs on that same starved loop, so a
20s timeout kills an otherwise-healthy local connection over a
recoverable stall. 60s rides out the stall without affecting
half-open detection on public binds.
2. Coalesce streaming token frames in WSTransport (CF-2)
- Buffer high-frequency delta frames (message.delta, reasoning.delta,
thinking.delta) and flush as a batch every ~33ms (~30fps)
- Non-streaming frames (RPC responses, control/tool/completion events)
flush pending tokens first — wire ordering preserved
- Thread-safe via threading.Lock; worker threads return immediately
instead of blocking on per-token loop wakeups
- Reduces event-loop wakeup churn by orders of magnitude during model
streaming, directly cutting GIL pressure
3. Loop heartbeat watchdog (CF-1)
- Self-rearming call_later tick (2s) measures drift between expected
and actual fire time using loop.time() (monotonic)
- Logs 'event loop stalled Ns (GIL pressure suspected)' when drift >5s
- Turns mysterious WS drops into diagnosable log entries
- Uses call_later chain (not a task) — dies with the loop, nothing
to cancel on shutdown
Root cause: uvicorn's ws keepalive ping (20/20s) runs on the same
starved event loop as agent turns. Under GIL pressure from heavy agent
turns or delegation, the loop can't service the ping within 20s, so
the websockets protocol declares the connection dead. Reconnects fail
with ready_send_failed because the old process's loop is still wedged.
None of these fixes touch the model-facing message array, prompt
caching, message role alternation, or the wire protocol — they are
strictly display-transport improvements plus a config tweak and a
diagnostic log.
Tests: 762 passed, 17 skipped (0 failures) across test_tui_gateway_ws,
test_tui_gateway_server, test_web_server, and tui_gateway/ suites.
Batch delegation returned each subagent's full final_response verbatim
into the parent's context. A fan-out of N children could dump 60k+ tokens
at once, blowing the parent's context window and — on rate-limited
providers — triggering a compression/429 death spiral (429 misread as
context-too-large -> window step-down -> retry loop -> conversation dies).
Cap each summary against the parent's *remaining* context headroom split
across the batch (not a magic char count). When trimming, mirror the
web_extract convention: spill the full text to cache/delegation (mounted
into remote backends via credential_files._CACHE_DIRS) and return a
head+tail window (75/25, line-snapped) plus a footer with the exact
read_file offset to page the omitted middle. Both the subagent's opening
AND its closing (outcomes / files-changed / issues, which live at the end)
survive in-context, and nothing is lost — the parent can read_file the
full version on any backend.
delegation.max_summary_chars (default 24000) is a static ceiling layered
on top as belt-and-suspenders for models that ignore 'be concise'; 0
disables it. Child prompt tightened to lead with outcomes / bullets.
Co-authored-by: rc-int <rcint@klaith.com>
Three independent security-scanner hardenings, re-homed onto the current
shared threat-pattern architecture (tools/threat_patterns.py):
- approval.py: add bash/sh/zsh/ksh heredoc to DANGEROUS_PATTERNS. The
existing heredoc pattern only covered python/perl/ruby/node, so
`bash <<'EOF' ... EOF` ran arbitrary shell — including exfil pipelines
whose inner commands don't individually match a pattern — with no prompt.
- threat_patterns.py: apply unicodedata.normalize("NFKC", ...) before
pattern matching so full-width / compatibility homographs (e.g.
`cat ~/.hermes/.env`) are folded to ASCII and no longer bypass the
keyword scanners. Invisible-char detection still runs on the raw content
first (NFKC can strip those codepoints).
- code_execution_tool.py: add CREDS/BEARER/APIKEY to _SECRET_SUBSTRINGS so
vars like HERMES_LLM_CREDS, API_BEARER, MY_APIKEY are scrubbed from the
sandbox env. PASS was intentionally dropped from the original proposal —
it false-positives on BYPASS_CACHE / COMPASS_DIR / PASSENGER_HOST while
PASSWORD/PASSWD already cover the credential cases.
The original PR also proposed a 'synonym' injection pattern block
(overlook/forget/set aside/bypass/discard + developer-mode); dropped here
because it false-positives on ordinary AGENTS.md/SOUL.md prose ("don't
forget to follow the rules", "run in developer mode"), exactly the
bossy-English class threat_patterns.py is documented to avoid.
Salvaged from #9028.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
The auxiliary OpenAI clients were built without overriding the SDK's
default max_retries=2, so every aux call silently made up to 3 attempts
against a slow/hung endpoint — a 120s timeout could stall ~360s before
Hermes saw a single failure. On the critical compression preflight path,
Hermes then added its own same-provider timeout retry on top, roughly
doubling the user-visible stall again before fallback.
- Build both the sync (_create_openai_client) and async (_to_async_client)
aux clients with max_retries=0 (setdefault, so explicit callers still
override). Hermes already owns retry + provider/model fallback policy.
- For task == compression, skip the same-provider transient retry on a
full-budget timeout and fall straight through to fallback. Fast blips
(streaming-close, 5xx) still retry, since those are cheap.
- Add _is_timeout_error to distinguish a full-budget timeout from a fast
connection drop.
Addresses the retry-multiplication root cause of #54465 (the resume-wedge
persistence half landed in #55499).
patch_tool extracts V4A patch paths so _check_sensitive_path can refuse
writes to /etc/*, /boot/*, etc. before they reach the low-level file ops.
The extraction regex had two gaps:
1. `*** Move File: src -> dst` was never extracted (regex only matched
Update/Add/Delete), so a Move targeting /etc/crontab skipped the
pre-check and fell back on the narrower file_operations deny list.
2. The regex required `\\s+` after `***` but patch_parser uses `\\s*`, so
`***Update File: /etc/hosts` (no space) parsed + applied while
skipping the check.
Loosen the leading whitespace to \\s* and add a Move regex that checks
both endpoints. Move endpoints also run through the same '..' traversal
rejection as the other V4A headers (closes the sibling gap on current
main, which gained that traversal guard after this PR was opened).
`_resolve_api_key_provider_secret` resolved API keys via `get_env_value`,
which returns the `os.environ` value first and only falls back to
`~/.hermes/.env`. After a user rotates a key in `.env`, a stale value still
exported in the parent shell (Codex CLI, test runner, login profile) shadows
the fresh key on every request, producing persistent 401s.
The credential-pool seeding path was already fixed to prefer `.env`
(#18254/#18755), but the live request-time resolution path was not — so the
pool re-seeded with the fresh key while `_resolve_api_key_provider_secret`
kept returning the stale shell export. This closes that remaining path.
- config: add `get_env_value_prefer_dotenv()` — checks `~/.hermes/.env`
first, then `os.environ`. Distinct from `get_env_value()` (unchanged,
os.environ-first) so only Hermes-managed credential resolution flips
precedence; the generic helper's many callers are unaffected.
- auth: `_resolve_api_key_provider_secret` resolves through the new helper.
- tests: regression coverage for both the pool-seeding path and the
auth resolution path (a rotated `.env` key must beat a stale shell export).
Closes#20591.
Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
Follow-up to the salvaged #8008 fix:
- Sibling-site fix: _evaluate_slash_authorization gated DISCORD_ALLOWED_CHANNELS /
DISCORD_IGNORED_CHANNELS on numeric IDs only, so name/#name config that now works
for on_message still silently failed for slash-command interactions. Refactor the
channel-key helper to _discord_channel_keys_from_channel(channel, parent) and reuse
it at the interaction gate. Fail-closed on missing channel id is preserved.
- The contributor's hardcoded 8s flush deadline could be hard-cancelled mid-flush:
_teardown_adapter already wraps cancel_background_tasks() in the per-adapter
disconnect budget (HERMES_GATEWAY_ADAPTER_DISCONNECT_TIMEOUT, default 5s). The flush
deadline now derives from that budget with headroom so it always completes inside it.
- AUTHOR_MAP: map cypher@augmentl.com -> Nickperillo for CI.
- Tests: slash-auth name/#name allow + name ignore matching.
Two related fixes to the Discord gateway adapter:
1. Channel name matching (free-response, allowed, ignored, no-thread channels)
Previously these config values only matched against numeric channel IDs.
If a user configured free_response_channels: cypher (by name), the adapter
would silently ignore it because it only intersected against channel_ids.
Now the adapter builds a channel_keys set that includes the channel ID,
channel name, and #channel-name form, and checks all three for each gate.
2. Flush pending text-batch tasks before shutdown
The Discord adapter uses _pending_text_batch_tasks (its own dict) for
merging rapid successive message chunks. These tasks were NOT added to
self._background_tasks (the base class list), so the base
cancel_background_tasks() never awaited them on restart/shutdown.
This caused a race: in-flight response deliveries were cancelled before
Discord had a chance to send them, resulting in silent dropped messages
visible to users as tool-log-only replies with no text body.
Fix: override cancel_background_tasks() in DiscordAdapter to await all
pending text-batch tasks (8s deadline) before delegating to the base class.
The gateway-lifecycle guard's hermes-CLI pattern required `hermes`
and `gateway` to be adjacent, so a profile flag slipped the agent
past it: `hermes -p ade gateway restart` was not flagged. That is the
exact form from the 2026-04-11 ade-profile self-kill loop. Allow an
optional run of global flags (`-p ade`, `--profile ade`, multiple
flags) between `hermes` and the gateway subcommand.
launchctl self-termination is already covered on main by #33071; this
narrows the only remaining real gap.
Update the shared queued-edit ref synchronously with React state so draft persistence sees the correct edit mode while loading and restoring queued prompts. Also drop the accidental node_modules symlink from the PR.
Ensure Windows desktop and local terminal teardown kill full process trees so Git Bash descendants cannot survive wrapper exits and accumulate across retries.
Audit follow-up. ChatBar subscribed to the whole `$statusItemsBySession` (a
computed that rebuilds the entire map) + `$previewStatusBySession` maps just to
derive a boolean, so every per-item status mutation (a subagent tick, the 5s
background poll) and every OTHER session's change re-rendered the ~1.4k
component. The queue hook likewise subscribed to the whole `$queuedPromptsBySession`
map.
- Add `useSessionStatusPresence` — a coarse edge (useSyncExternalStore) that
flips only when the stack shows/hides; ChatBar uses it for the styling
data-attr instead of the two map subscriptions.
- Add generic `useSessionSlice(store, key)` — subscribes to one session's array,
bailing out when other sessions churn (the plain atom keeps per-key refs
stable). The queue hook now reads its slice through it.
Result: ChatBar re-renders only when the stack's presence flips or this session's
queue changes — not on background/subagent status streaming or other sessions.
Verified: typecheck clean, 0 lint errors, composer tests 39/40 (pre-existing
attachments failure unrelated).
Two composer fixes:
- **Paste/input lag** — `flushEditorToDraft` serializes the whole editor
(`composerPlainText` is O(n)); running it on every event during a burst
(holding a key, or holding Cmd+V into a growing editor) was O(n²). Coalesce
the input/paste path to one flush per animation frame. Lossless: the
contentEditable DOM is the source of truth and submit + the compositionend /
keydown paths re-read it synchronously (those stay immediate).
- **Detached-composer dock glow** — was `fixed inset-x-0` (full viewport, spilled
under the sessions sidebar). Switched to `absolute inset-x-0`, so it anchors to
the chat-column root the docked composer centers in — the glow now spans only
the thread area, matching the actual dock target.
Verified: typecheck clean, 0 lint errors, composer DOM repro tests pass.
Lift the submit orchestration out of ChatBar into
composer/hooks/use-composer-submit.ts: `submitDraft` (the one decision tree —
queue-edit save · slash-now-while-busy · queue · drain · send · stop),
`dispatchSubmit` (the shared send-with-restore primitive + the external-submit
listener), and `steerDraft`.
This is the seam where the draft and queue engines meet; it now reads both clean
APIs as explicit inputs instead of closing over inline state. ChatBar is left as
a thin coordinator that owns the shared `queueEditRef` and wires the four engines
(draft · queue · submit · metrics/voice/drop) into render.
Behaviour-identical (verbatim move). Verified: typecheck clean, composer DOM
repro tests (enter-submit, IME, slash-now, steer, drain) pass.