Narrow plaintext shortcut that rewrites a tiny set of admin phrases
("restart gateway", "restart the gateway", "restart hermes") into the
/restart slash command, but only in DMs. Scope is intentionally tight:
- DM text messages only — group chats keep natural-language semantics
- Exact restart-style phrases only
- Skips anything already starting with "/"
Without this, the LLM can receive "restart gateway" as a user turn and
try to satisfy it via the terminal tool (systemctl restart ...). That
kills the gateway while the originating agent is still running, which
leaves systemd in "draining" state waiting on a process it's about to
kill. Routing the phrase to the slash-command dispatcher bypasses the
agent loop and uses the existing restart machinery (request_restart).
Called once, at the adapter level in BasePlatformAdapter.handle_message,
so every platform gets it for free and pending-message reinjection is
covered by the same call site.
Adds 2 Telegram-parametrized e2e tests: DM routes to request_restart,
group chats fall through to the normal agent path.
Runtime already supports list-form fallback_model (run_agent.py:1459
iterates fallback_chain; fallback_cmd.py migrates legacy single-dict
configs to list format). The config validator and save_config comment
gate still assumed single-dict form and flagged list-form configs as
errors. Fix both:
- validate_config_structure: when fallback_model is a list, validate
each entry has provider+model; keep the existing single-dict path.
- save_config: suppress the "add fallback_model" comment when any list
entry is well-formed.
Adds 4 list-form validator tests.
PR #16858's session-scoped interactive sudo password cache falls back to
a thread-identity scope when no HERMES_SESSION_KEY is bound. ACP never
set that contextvar, so two ACP sessions landing on the same reused
ThreadPoolExecutor thread still shared the cache — the exact scenario
the PR headlined.
acp_adapter/server.py now:
- binds HERMES_SESSION_KEY=<session_id> via gateway.session_context
inside _run_agent() (and clears on exit)
- wraps the loop.run_in_executor(_executor, _run_agent) call in a fresh
contextvars.copy_context() so concurrent ACP sessions don't stomp on
each other's ContextVar writes (executor pool threads would otherwise
share a context).
Adds tests/acp/test_approval_isolation.py::
test_sudo_password_cache_isolated_across_acp_sessions_on_same_pool_thread
which drives two back-to-back sessions through a 1-worker ThreadPoolExecutor
and asserts B does not observe A's cached password.
Follow-up on top of the cherry-picked contributor commit for #16751:
1. Delete triggers: the original PR switched FTS5 from external to inline
content mode and concatenated content || tool_name || tool_calls in
the insert/update triggers, but left the delete triggers passing
old.content to the FTS5 delete-command. FTS5 inline delete requires
the content to match what was stored, so every DELETE on messages
raised 'SQL logic error'. Replaced with plain DELETE FROM ... WHERE
rowid = old.id on all four delete paths (normal + trigram, delete +
update-delete).
2. v11 migration: existing DBs have the old external-content FTS tables
and triggers. Because CREATE VIRTUAL TABLE IF NOT EXISTS / CREATE
TRIGGER IF NOT EXISTS skip when the objects already exist, upgraders
would have kept the broken behavior forever. Bumped SCHEMA_VERSION
to 11 and added a migration that drops both FTS tables + all 6 old
triggers, recreates them via FTS_SQL / FTS_TRIGRAM_SQL, and backfills
from messages using the same concatenation expression.
3. Regression tests: 6 new tests cover INSERT / UPDATE / DELETE paths
for tool_name + tool_calls indexing plus the full v10 -> v11 upgrade
path on a hand-built legacy DB.
The FTS5 virtual tables (messages_fts, messages_fts_trigram) previously
only indexed the content column via external content mode. Tool calls
and tool names stored in the tool_calls (JSON) and tool_name columns
were invisible to FTS5 search.
Root cause: FTS5 triggers only INSERTed new.content into the index.
Changes:
- Switch FTS5 tables from external content (content=messages) to inline
mode so that trigger-inserted content is both indexed and stored
- Update all 6 FTS5 triggers to concatenate content, tool_name, and
tool_calls when indexing new messages
- Extend the short-CJK LIKE fallback to also search tool_name and
tool_calls columns
Closes: #16751
FTS5 default tokenizer splits 'sp_new1' into tokens 'sp' and 'new1'.
Without quoting, a search for 'sp_new' becomes an AND query
('sp AND new') that fails to match rows indexed as 'sp_new1'.
Fix: add underscore to the character class in Step 5 regex
([.-] -> [._-]) so underscored terms are wrapped in double quotes.
Also adds test_sanitize_fts5_quotes_underscored_terms.
Both keys are documented in cli-config.yaml.example and read at runtime by
hermes_cli/timeouts.py (get_provider_request_timeout and get_provider_stale_timeout),
but the provider-entry validator in config.py flagged them as unknown, producing
noisy warnings on every CLI invocation for users who followed the documented config.
Fixes#16779
- App.tsx doc comment: replace stale ChatPageHost reference with
'persistent chat host block rendered inline near the bottom of this
file' so readers can find the actual code.
- App.tsx persistent host: show a small spinner on /chat while plugin
manifests are loading instead of a blank content area. Direct
/chat deep-links used to paint empty for up to ~2s in the worst
case (plugin-registration safety timeout) because both the route
sink (null) and the persistent host (!pluginsLoading gate) render
nothing during that window. Non-chat routes stay empty as before.
- ChatPage.tsx: rename setter to match the 'raw' state — useState
now destructures as [mobilePanelOpenRaw, setMobilePanelOpenRaw],
and all four call sites (closeMobilePanel, matchMedia listener,
open-button onClick, plus destructure) updated accordingly. No
behavior change; matches the 'raw vs derived' convention the
original comment set up.
The dashboard's Chat tab (hermes dashboard --tui) lost its session
whenever the user navigated to another tab and came back. React Router
unmounted ChatPage on path change, which ran the cleanup function,
closed the PTY WebSocket, and terminated the underlying TUI child -
so the next mount generated a fresh channel id, spawned a new PTY, and
started a brand-new conversation.
Rather than rebuild the destroyed state (session id capture + resume
via HERMES_TUI_RESUME would reload history from disk but drop in-flight
tool state, scrollback, and picker position), keep the component tree
alive.
* Pull ChatPage out of Routes into a sibling always-mounted host that
toggles visibility via display:none keyed off the current route. A
tiny ChatRouteSink still claims /chat so the catch-all redirect
does not fire.
* xterm instance, WebSocket, PTY child, and TUI/agent state all
survive; returning to /chat shows the exact conversation the user
left.
* Respect plugin `/chat` overrides: if a plugin manifest declares
`tab.override: "/chat"`, the Routes tree already swaps the element
for <PluginPage /> — we additionally suppress the persistent host
so the two don't paint on top of each other. Preserves the
pre-persistence contract that a plugin owning /chat replaces the
built-in chat UI entirely.
* Wait for usePlugins() to finish loading before mounting the
persistent host. Manifests arrive asynchronously from
/api/dashboard/plugins, so without the `!pluginsLoading` gate the
host would mount with manifests=[], spawn a PTY, and then unmount
mid-session when the manifest list resolves and reveals a /chat
override. Typical delay is <50ms; worst case is the 2s plugin-
registration safety timeout. Cheaper than killing someone's
conversation underneath them.
* Gate page-header slot (`setEnd`), the mobile sheet's portalled
render, and body-scroll lock on a new `isActive` prop so the hidden
ChatPage doesn't fight the active page for shared state. The
scroll-lock effect keys on the *derived* `mobilePanelOpen` (which is
`isActive && mobilePanelOpenRaw`) rather than the raw state — that
way tab-switch flips the dep false, fires the cleanup, and releases
`document.body.style.overflow`. Keying on the raw state would leave
body.overflow="hidden" stuck on /sessions and every other tab until
the user navigated back to /chat and explicitly closed the sheet.
* When isActive flips false to true, force a double-rAF fit:
display:none collapses the host box and ResizeObserver does not fire
on display changes, so xterm would otherwise stay at a stale or 1x1
grid. Also early-return from syncTerminalMetrics when the host has
zero area, since fit() on a zero-sized element produces a 1x1
terminal.
* Focus handling on tab return: only steal focus into the terminal if
focus wasn't already parked somewhere inside ChatPage (e.g. the
sidebar model picker, a tool-call entry). Yanking focus away from
whatever the user last clicked is surprising and a screen-reader
foot-gun; the typical "first activation" case still focuses the
terminal because document.activeElement is <body> at that point.
Trade-off worth flagging, deliberately not mitigated in this change:
while hidden, ChatPage still holds a PTY child + WebSocket + xterm
instance for the dashboard's full lifetime. The WS keeps delivering
bytes and xterm keeps parsing them into a display:none host (cheap —
no paint work, but not free). Reasonable costs to pay for the session
preservation; if they become a problem we can pause `term.write` when
!isActive or idle-disconnect after N minutes hidden.
Lint clean on touched files. tsc -b && vite build pass.
Follow-up to the salvaged PR #16867 that added the read path for
agent.disabled_toolsets in _get_platform_tools():
- Document the new config key under a "Global Toolset Disable" section
in website/docs/user-guide/configuration.md, including the precedence
note (global disable overrides per-platform platform_toolsets).
- Map nazirulhafiy@gmail.com -> nazirulhafiy in scripts/release.py
AUTHOR_MAP so release-notes CI attributes the cherry-picked commit.
Previously, agent.disabled_toolsets in config.yaml only worked for CLI
mode (run_agent.py --disabled_toolsets). The gateway always passed
enabled_toolsets to AIAgent, and get_tool_definitions() ignored
disabled_toolsets when enabled_toolsets was set.
Fix: _get_platform_tools() now reads agent.disabled_toolsets from config
and excludes those toolsets from the returned set. This runs last so it
overrides everything above.
Added 3 tests covering cross-platform suppression, explicit platform
config override, and empty/missing config no-op behavior.
Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.
The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (#16844).
Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (#15250) rather than replacing them,
so every existing behavior is preserved:
- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
_copy_reasoning_content_for_api's cross-provider leak guard (#15748),
promote-from-'reasoning' tier, and thinking-pad tier remain live at
replay time.
- No empty '' gets eagerly written on every assistant turn (which would
have bypassed the read-side ladder and triggered empty thinking-block
insertion in the Anthropic adapter).
Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.
Credit @Sanjays2402 for the original diagnosis and patch in #16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.
Refs #16844, #16884, #15250, #15353, #15748.
Address Copilot review on #16868:
1. Tighten pool iteration. ``validate_copilot_token`` only rejects empty
strings and classic PATs (``ghp_*``); a malformed/unsupported ``gho_*``
token at ``credential_pool.copilot[0]`` would pass the gate and short-
circuit the loop, hiding a later valid entry. Switch to calling
``exchange_copilot_token`` directly: only entries that actually exchange
into a live Copilot API token are returned. Bad/expired entries fall
through to the next, and an exhausted pool returns ``""`` so the picker
falls back to the curated list (existing behaviour).
2. Reword the docstring + test module docstring to describe the pool seed
path accurately — ``hermes auth add copilot`` adds an api-key-typed
credential whose ``access_token`` field stores the pasted token, and
``_seed_from_env`` mirrors ``COPILOT_GITHUB_TOKEN`` from
``~/.hermes/.env`` into the pool. The previous wording implied
``auth add copilot`` itself ran the device-code flow, which it does
not (the device-code flow lives in ``hermes model``).
Two new tests cover the iteration change:
- ``test_skips_pool_entry_that_fails_to_exchange`` — pool[0] raises,
pool[1] succeeds, picker uses pool[1].
- ``test_all_pool_entries_fail_exchange_returns_empty`` — every entry
raises, return ``""``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users whose only Copilot credential is the OAuth `access_token` saved by
`hermes auth add copilot` (device-code flow) saw the `/model` picker drop
back to a stale hardcoded list. Reason: `_resolve_copilot_catalog_api_key`
only consulted env vars (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` /
`GITHUB_TOKEN`) and the `gh auth token` CLI fallback, never the credential
pool that Hermes's own login flow writes into `auth.json`. With no token,
the live catalog fetch silently 401s and the picker hides current models
(claude-opus-4.7, claude-sonnet-4.6, gpt-5.5, grok-code-fast-1) — even
though `/model <id>` works fine because runtime inference reads the pool
through a different code path.
Mirror the Codex catalog resolver pattern: env-var first (unchanged), then
walk `read_credential_pool("copilot")` for the first entry with a
supported `access_token` (`gho_*` / `github_pat_*` / `ghu_*`). Run it
through `get_copilot_api_token()` so the catalog request uses the same
exchanged token the runtime path uses. Classic PATs (`ghp_*`) are still
rejected up-front via `validate_copilot_token` since the Copilot API
doesn't accept them.
Strictly additive: env still wins, and a missing/locked auth.json (or any
exception during pool read) still returns "" so the caller falls through
to the curated catalog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).
The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread. A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.
Fix: remove the module-level call. Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:
- gateway/run.py: loop.run_in_executor(None, discover_mcp_tools)
before platforms start accepting traffic
- hermes_cli/main.py: inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()
Closes#16856.
_handle_set_home_command wrote FEISHU_HOME_CHANNEL / DISCORD_HOME_CHANNEL /
etc. as top-level keys into config.yaml, but load_gateway_config() only
reads home channels from env vars. After every gateway restart the home
channel was lost — on every platform, not just Feishu.
Fix: switch /sethome to save_env_value(), which atomically writes to
~/.hermes/.env and updates the current process env in one shot. The
handler builds the env key from platform_name.upper(), so one line
change repairs /sethome for every platform that has a HOME_CHANNEL
env var.
Also widen _EXTRA_ENV_KEYS in hermes_cli/config.py so HOME_CHANNEL and
HOME_CHANNEL_NAME for every platform are treated as managed env vars:
SIGNAL, SLACK, SMS, DINGTALK, BLUEBUBBLES, FEISHU, WECOM, YUANBAO, plus
the missing *_NAME variants for DISCORD/TELEGRAM/MATTERMOST.
Closes#16806
Co-authored-by: teknium1 <screenmachine@gmail.com>
`/new` after `/model <custom-provider>:<model>` silently reverted to a
native provider whose static catalog happened to contain the same model
name (e.g. `deepseek-v4-pro` → native `deepseek` → 401).
Root cause at the `/model` writeback site: `HERMES_INFERENCE_PROVIDER`
was set unconditionally but `HERMES_TUI_PROVIDER` was only mirrored when
it was already set. On sessions launched without `--provider`,
`HERMES_TUI_PROVIDER` stayed unset, so `_resolve_startup_runtime()` on
`/new` skipped the explicit-provider early return and fell through to
`detect_static_provider_for_model()`.
Fix: set `HERMES_TUI_PROVIDER` unconditionally alongside
`HERMES_INFERENCE_PROVIDER` when `/model` lands. Keeps #15755's
invariant intact — `HERMES_TUI_PROVIDER` remains the canonical
"explicit this process" carrier, `HERMES_INFERENCE_PROVIDER` remains
ambient and does not short-circuit startup resolution.
Bug report and diagnosis: @Bartok9 in #16857 / #16873.
Fixes#16857
Replace the Linux/macOS pgrep regex ("hermes.*dashboard") with a ps
scan + the same explicit patterns list already used on the Windows
branch and in hermes_cli.gateway._scan_gateway_pids:
hermes dashboard
hermes_cli.main dashboard
hermes_cli/main.py dashboard
The old greedy regex would match any cmdline containing both words —
e.g. a chat session whose argv mentions "dashboard" or an unrelated
grafana/dashboard-server process. Added regression tests for both.
Follow-up tightening on #16881.
The dashboard is a long-lived server process users start and forget.
When hermes update replaces files on disk, the running process holds
the old Python backend in memory while the JS bundle gets updated,
producing a silent frontend/backend mismatch (e.g. v0.11.0 changed
the session token header -- old backends reject every API call).
Scan for running dashboard processes after a successful update (both
git and ZIP paths) and print a warning with their PIDs and restart
instructions. Mirrors the existing pattern for gateway processes.
Fixes#16872
When delegation.provider is configured (e.g. minimax-cn), subagents
inherited the parent's acp_command unconditionally. This caused
run_agent.py to initialize CopilotACPClient, which bypassed the
override credentials entirely and used its own default model
(provider=copilot-acp model=qwen3.5-397b-a17b) instead of the
configured delegation.provider and delegation.model.
Fix: when override_provider is set but override_acp_command is not,
clear effective_acp_command and effective_acp_args so the child agent
uses direct API calls with the configured provider credentials.
The existing override_acp_command path is unchanged — explicit ACP
transport overrides still force provider=copilot-acp as before.
Fixes#16816
PR #16888 swaps the opencode-zen/go resolver so that api_mode is always
re-derived from the effective model before the persisted api_mode is
consulted. That's the point of the fix — a stale anthropic_messages
from a previous minimax default must not survive a /model switch to a
chat_completions target (or vice versa) and strip /v1 from base_url.
The prior test asserted the opposite precedence — that a persisted
api_mode won over model-derived mode — and was added in #4508 to lock
in escape-hatch behavior. Under the new precedence that escape hatch
no longer exists for opencode (only for providers that genuinely
support both modes at a single endpoint — and for opencode the model
name is the unambiguous signal). Rename + invert the assertion to
document the intentional behavior change.
Refs #16878.
opencode-zen and opencode-go each serve both anthropic_messages
(e.g. minimax-m2.7) and chat_completions (e.g. deepseek-v4-flash)
models behind a single base_url. The api_mode resolver in
hermes_cli/runtime_provider.py honoured the persisted
model_cfg.api_mode (set by the previous default model) before checking
the opencode model registry, so /model deepseek-v4-flash from a session
whose default was minimax-m2.7 inherited 'anthropic_messages', stripped
'/v1' from base_url (the Anthropic SDK adds its own /v1/messages), and
404'd.
Promote the opencode detection branch above the configured_mode check
in both api_mode resolution paths:
- _resolve_runtime_from_pool_entry (pool-backed providers)
- _resolve_api_key_runtime (api-key providers, fallback path)
Both branches now call opencode_model_api_mode(provider, effective_model)
unconditionally for opencode-zen/go before considering any persisted
api_mode, so the mode always reflects the model the user just switched
to.
Existing tests pass (12/12 in tests/hermes_cli/test_model_switch_opencode_anthropic.py).
Fixes#16878
Switch _PRIORITY_PROCESSING_MODELS and _ANTHROPIC_FAST_MODE_MODELS from
hardcoded frozensets to prefix-based matching. Any gpt-*, o1*, o3*, o4*
(OpenAI) and any claude-* (Anthropic) now exposes /fast.
Fixes the case where gpt-5.5 and other post-catalog models silently
skipped Priority Processing because they weren't in the frozenset.
Future OpenAI/Anthropic releases will work without a catalog bump.
Safety:
- Codex-series (*codex*) still excluded — they route through the Codex
Responses API which doesn't take service_tier.
- Anthropic adapter already gates speed=fast on native endpoints only
(_is_third_party_anthropic_endpoint), so claude-sonnet-4.6 on
OpenRouter/Bedrock/opencode-zen won't leak the unknown beta.
- service_tier=priority is silently dropped by non-OpenAI proxies, so
false positives are harmless.
Drop the duplicate _load_openclaw_config_early() added in the salvaged
commit — load_openclaw_config() (line 979) has the identical body and
is a plain instance method that only needs self.source_root, which is
already set before __init__ needs it.
OpenClaw users who started before the rebrand (when the project was
clawd/clawdbot) often have a custom workspace directory configured via
agents.defaults.workspace in openclaw.json (e.g. ~/clawd/ instead of
~/.openclaw/workspace/).
The migration tool only checked hardcoded relative paths (workspace/,
workspace-main/, workspace-assistant/) inside the source root, so files
like MEMORY.md, skills, and daily memory in custom workspaces were
silently skipped.
This change:
- Reads agents.defaults.workspace from openclaw.json at init time
- Uses it as a final fallback in source_candidate() when files aren't
found in the standard locations
- Standard workspace paths are still preferred (custom is fallback only)
- Custom workspace is only used when it's outside the source_root tree
(avoids double-matching when workspace/ is the default)
Adds two tests:
- Custom workspace files are discovered and migrated
- Standard workspace location is preferred over custom
Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and
the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now
requires explicit opt-in ("1"/"true"/"yes"/"on") to enable.
New installs and users without a security.redact_secrets key get pass-
through tool output. Existing users whose config.yaml explicitly sets
redact_secrets: true keep redaction on — the config-yaml -> env-var
bridges in hermes_cli/main.py and gateway/run.py still honor their
setting.
Also updates the inline config comments, website docs, and the
hermes-agent skill so /hermes config set security.redact_secrets true
is now the documented way to turn it on.
MatrixAdapter._is_self_sender returns True defensively when _user_id is empty
(whoami not yet resolved) to prevent echo loops — see #15763. The reaction
approval test must therefore initialize a user_id so _on_reaction does not
drop the inbound test event before reaching the approval handler.
Self-contained docker-compose harness that exercises the new bootstrap
branch against a real Continuwuity homeserver. Three tests:
1. fresh bot → bootstrap fires, /keys/query returns master + ssk
with UNPADDED base64 keyids, current device is signed by the
new SSK
2. second startup with same crypto store → bootstrap is skipped
3. MATRIX_RECOVERY_KEY set → existing verify_with_recovery_key path
takes precedence, no new bootstrap
Run via:
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml up -d
python tests/e2e/matrix_xsign_bootstrap/test_bootstrap.py
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml down -v
The test mirrors the bootstrap snippet from matrix.py inline so it can
run without importing the full hermes gateway and its deps. Skipped
automatically when mautrix isn't installed or the homeserver is
unreachable.
All three pass against ghcr.io/continuwuity/continuwuity:latest
(Continuwuity 0.5.7). The unpadded-keyid assertion is the load-bearing
one — it's exactly the property the PR's bootstrap path provides that
the hand-rolled `base64.b64encode().decode()` scripts get wrong.
Without this, every Matrix bot started under hermes-agent shows the
"Encrypted by a device not verified by its owner" badge in Element
indefinitely, because the cross-signing chain (master → SSK → device)
was never published. Operators currently have to write their own
bootstrap script and remember to run it once per bot — and it's easy
to get wrong (the obvious base64.b64encode().decode() produces padded
keyids that matrix-rust-sdk silently rejects in /keys/query, so even
correctly-signed keys fail to load identity in Element).
mautrix already has the right primitive: generate_recovery_key() does
the full flow — generate seeds, upload privates to SSSS, publish
publics to the homeserver, sign the current device with the new SSK,
and return the human-readable recovery key. We invoke it once on
startup if the bot has no existing cross-signing identity, and log
the recovery key with a clear instruction to save it for future
restarts via MATRIX_RECOVERY_KEY (which the existing recovery-key
path already consumes).
Skipped when MATRIX_RECOVERY_KEY is set (existing path takes over)
or when the bot already has cross-signing keys on the homeserver
(get_own_cross_signing_public_keys returns non-None).
Bootstrap failure is non-fatal — logged with hint about UIA; the bot
continues without cross-signing and Element will show the warning
that prompted this PR. That matches the existing soft-fail pattern
for verify_with_recovery_key.
Tested against Continuwuity 0.5.7 (no UIA required). Synapse with
UIA enabled will need a follow-up PR to thread MATRIX_PASSWORD
through to /keys/device_signing/upload.
Five ``except Exception as exc:`` blocks in the Matrix adapter logged
only ``str(exc)`` without ``exc_info=True``:
- _reverify_keys_after_upload → post-upload key verification failure
- _upload_keys_if_needed → initial device-key query failure
- _upload_keys_if_needed → re-upload device keys failure
- _upload_keys_if_needed → initial device key upload failure
- connect → whoami / access-token validation failure
The E2EE key paths here are security-critical: a silent traceback-
less failure during device-key verification or upload makes it
hard for operators to tell whether their Matrix bot is failing
because of a stale token, a federation timeout, or an olm state
mismatch — all three fail with different tracebacks, which
``str(exc)`` alone flattens.
The contributing guide asks for ``exc_info=True`` on error logs.
Append it to each of the five call sites. Pure logging enrichment.
- Wrap _sync_loop sync() call with asyncio.wait_for(timeout=45s) to guard
against TCP-level hangs that the Matrix long-poll timeout cannot catch
- Add logger.debug at the top of _on_room_message so LOG_LEVEL=DEBUG
confirms whether callbacks fire at all (diagnoses #5819, #7914, #12614)
- Add logger.debug when MATRIX_REQUIRE_MENTION silently drops a message,
pointing users to the env var to disable the filter
Adapted for current mautrix-python adapter (PR was written against the
legacy matrix-nio adapter).
Closes#5819
* Port from Kilo-Org/kilocode#9448: roll up subagent costs into parent session total
Child subagents built by delegate_task() each track their own
session_estimated_cost_usd, but the parent agent's total never folded
those numbers in. On runs where the parent mostly delegates and the
children do the expensive work, the footer/UI was reporting a fraction
of the actual spend — sometimes $0.00 when the parent itself made no
billed calls.
Fix:
- Capture each child's session_estimated_cost_usd into _child_cost_usd
on the result entry (before child.close() drops the counter).
- After the existing subagent_stop hook loop, sum the children's costs
and add the total to parent.session_estimated_cost_usd.
- Promote session_cost_source from 'none' -> 'subagent' when the parent
had no direct spend but children did, so the UI doesn't label the
total as having unknown provenance. Real sources (openrouter,
anthropic, etc.) are preserved.
Nested orchestrator -> worker trees roll up naturally: each layer's own
delegate_task() folds its direct children in, and when the orchestrator
itself returns, its parent folds the orchestrator's now-inflated total
on top.
Internal fields (_child_cost_usd, _child_role) are stripped from the
results dict before it's serialised back to the model — same contract
as _child_role already followed.
Tests: TestSubagentCostRollup (5 cases) covers single-child, batch,
zero-cost-children, preserved-source, and legacy-fixture paths.
Source: https://github.com/Kilo-Org/kilocode/pull/9448
* fix(web): scope dashboard config Reset button to the current tab
Reported by @ykmfb001 via X: clicking 'Restore Defaults' (恢复默认值) on
the Auxiliary page wiped the entire config.yaml to defaults, not just
the auxiliary section. The button sits next to the category tabs and
users reasonably assumed 'reset this tab', not 'reset everything'.
Changes:
- handleReset now scopes to the fields in the current view:
active category's fields (form mode) or search-matched fields
(search mode). Only those keys are copied from defaults; the rest
of the config is left alone.
- Added a window.confirm() with the scope name before applying.
- Button is hidden in YAML mode (scoping doesn't apply there).
- Tooltip/aria-label now name the scope, e.g. 'Reset Auxiliary to
defaults'.
- i18n: new resetScopeTooltip / confirmResetScope / resetScopeToast
strings in en + zh; resetDefaults key preserved for compat.