`hermes dashboard` is a long-lived foreground server that users often
start and forget about, sometimes in a shell they've since closed. We
didn't have a way to stop it — users had to find the PID manually.
Adds two lifecycle flags that reuse the same detection + termination
path the post-`hermes update` cleanup (PR #17832) uses:
hermes dashboard --status
List running hermes dashboard processes with PID + cmdline.
Exit 0, informational.
hermes dashboard --stop
Terminate all running dashboards (3s grace then force-kill survivors).
Exit 0 if none remain, 1 if any couldn't be stopped.
Windows uses `taskkill /F` as before.
Both flags short-circuit before any fastapi/uvicorn import so they work
even on installations where the dashboard extras aren't installed —
useful when you're cleaning up after uninstalling.
The kill helper gained an optional `reason=...` param so the output
reads "(requested via --stop)" instead of the post-update-specific
"running backend no longer matches the updated frontend" wording.
E2E: `hermes dashboard --status` with nothing running prints the
empty message; with a fake `hermes dashboard ...` cmdline spawned via
`exec -a`, `--status` lists it, `--stop` terminates it (exit -15),
and a follow-up `--status` returns empty.
Reshape of PR #17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).
Config shape:
tts:
provider: piper-en
providers:
piper-en:
type: command
command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
output_format: wav
Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.
Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
under tts.providers.<name>.max_text_length.
Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.
Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.
E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.
Co-authored-by: Versun <me+github7604@versun.org>
`hermes update` previously just printed a warning when it detected a
running `hermes dashboard` process from the previous version, telling
the user to kill and restart it themselves. In practice dashboards get
started and forgotten, so the warning was routinely ignored and users
ended up with a silent frontend/backend mismatch (new JS bundle served
against the old in-memory Python backend, e.g. new auth headers the old
code doesn't recognise → every API call 401s).
The dashboard has no service manager, no PID file, and we don't record
the original launch args (--host, --port, --insecure, --tui, --no-open)
so we can't auto-restart it. But we CAN stop it, which is what the
user wants — the failure mode when the stale process is left alive is
worse than the dashboard just being down.
- POSIX: SIGTERM, poll for ~3s, SIGKILL any survivors.
- Windows: `taskkill /PID <pid> /F`.
- Print each PID's outcome plus a one-line restart hint.
- Detection logic is unchanged (same ps / wmic scan, same guards
against the `pgrep -f` greedy-match trap from #16872 and the
#17049 wmic UnicodeDecodeError fix).
Also split the old monolithic `_warn_stale_dashboard_processes` into
`_find_stale_dashboard_pids` (scan) + `_kill_stale_dashboard_processes`
(kill), keeping the old name as an alias so any external callers still
work.
E2E verified: spawned a fake `hermes dashboard` cmdline via
`exec -a 'hermes dashboard …' sleep 300`, ran
`_kill_stale_dashboard_processes()`, confirmed SIGTERM exit (-15)
and that a post-scan returns an empty PID list.
Three narrow fixes targeting the remaining red checks after #17828:
1. ui-tui/src/app/slash/commands/ops.ts (Docker Build):
/reload-mcp's local params type annotated session_id: string
while ctx.sid is string | null. Widen to string | null —
matches every other rpc call site and the test harness which passes
{ session_id: null }. Fixes TS2322 on line 86. The rpc signature
itself is Record<string, unknown>, so this is purely a local
typing fix, no behavioral change.
2. tests/plugins/test_achievements_plugin.py (13 cascading test failures):
_install_fake_session_db did a raw sys.modules['hermes_state'] =
fake_module without restoration, leaking the fake across xdist
worker boundaries. Downstream tests doing from hermes_state import
SessionDB got a module whose SessionDB was lambda: fake_db
— 6 test_hermes_state.py tests failed with AttributeError: 'function'
object has no attribute '_sanitize_fts5_query' / _contains_cjk,
and 7 test_860_dedup.py tests failed with TypeError: got unexpected
keyword argument 'db_path' (real code calls SessionDB(db_path=...)).
Fix: stash monkeypatch on the plugin_api module object in the
fixture, and have the helper do monkeypatch.setitem(sys.modules,
'hermes_state', fake_module) for auto-restoration at test teardown.
3. tests/hermes_cli/test_web_server.py (WS race):
TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers hit the
30s test timeout on CI. websocket_connect returns after
ws.accept() — but /api/events registers the subscriber in
_event_channels on the NEXT await (inside _event_lock). A
publish immediately after connect could race ahead of registration
and be dropped, and the subsequent receive_text() blocked until
SIGALRM killed the test. Fix: poll _event_channels after the
subscriber connects, before publishing.
Validation:
scripts/run_tests.sh tests/plugins/test_achievements_plugin.py
tests/run_agent/test_860_dedup.py
tests/test_hermes_state.py
tests/hermes_cli/test_web_server.py 338 passed
cd ui-tui && npm run type-check clean
cd ui-tui && npm run build clean
Remaining red checks are pure infra (Nix ubuntu hits
TwirpErrorResponse ResourceExhausted on the GH Actions cache API; Nix
macos bounces between npm build openssl-legacy and cache rate-limits)
and cannot be fixed in the codebase.
Extracted from PR #17211 (@versun) so it can land independently of the
local_command TTS provider redesign.
- Add should_send_media_as_audio(platform, ext, is_voice) in
gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
_TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
end-to-end MEDIA routing via _process_message_background and
GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
fallback for FLAC/WAV.
Co-authored-by: Versun <me+github7604@versun.org>
Fixes the xdist collision that broke CI on PR #17764, and structurally
prevents future plugin-adapter tests from reintroducing it.
Problem
-------
tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py
(already on main) both followed the same anti-pattern:
sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>'))
from adapter import <Adapter>
Every platform plugin ships its own adapter.py, so the bare
'from adapter import ...' races for sys.modules['adapter']. Whichever test
collected first in a given xdist worker won; the other crashed at
collection with ImportError, and the polluted sys.path cascaded into 19
unrelated test failures across tools/, hermes_cli/, and run_agent/ in the
same worker.
Fix
---
1. tests/gateway/_plugin_adapter_loader.py (new): shared helper
load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py
via importlib.util under the unique module name plugin_adapter_<name>.
Zero sys.path mutation, no possibility of collision.
2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py:
migrated to the helper. All 'from adapter import ...' statements
(including the ones inside test methods) are replaced with module-level
attribute access on the loaded module.
3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans
every test_*.py under tests/gateway/ at session start and fails the
run with a pointer to the helper if any test uses sys.path.insert into
plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'.
Runs on the xdist controller only (skipped in workers). The next plugin
adapter test that tries to reintroduce this pattern gets rejected at
collection time with a clear remediation message.
4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to
AUTHOR_MAP so the check-attribution workflow passes.
Validation
----------
scripts/run_tests.sh tests/gateway/ 4194 passed
scripts/run_tests.sh tests/gateway/test_{teams,irc}* 72 passed (both orderings)
scripts/run_tests.sh <11 prev-failing test files> 398 passed
Guard triggers correctly on both Path-operator and string-literal forms
of the anti-pattern.
Hello! I am the maintainer of the microsoft-teams-apps Python SDK and
I built this Teams adapter to integrate Microsoft Teams into Hermes.
Adds a `plugins/platforms/teams` platform plugin using the new
PlatformRegistry system from #17751. The adapter self-registers via
`register(ctx)` — no hardcoding in run.py, toolsets.py, or any
other core file.
Key features:
- Supports personal DMs, group chats, and channel posts
- Adaptive Card approval prompts with in-place button replacement
(Allow Once / Allow Session / Always Allow / Deny)
- aiohttp webhook server bridged from the Teams SDK to avoid
the fastapi/uvicorn dependency
- ConversationReference caching for correct proactive sends in
non-DM chats
- `interactive_setup()` for `hermes gateway setup` integration
- `platform_hint` for LLM context (Teams markdown subset)
- 34 tests covering adapter init, send, message handling, and
plugin registration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #17660 landed a sweep of CI fixes but left three loose ends:
1. tests/cli/test_cli_loading_indicator.py::test_reload_mcp_sets_busy_state_
and_prints_status — /reload-mcp gained a prompt-cache-invalidation
confirmation (commit 4d7fc0f37) that was never wired into this test.
The test exercises the loading-indicator path, so pre-approve via
config and go straight into _reload_mcp().
2. tools/mcp_tool.py _make_tool_handler — the added
getattr(server, '_rpc_lock', None) + 'skip the lock if missing'
branch is inconsistent with four sibling call sites that still
direct-access server._rpc_lock. The lock is guaranteed by
MCPServerTask.__init__; falling through to an unlocked
session.call_tool would silently serialize-strip RPCs if the guard
ever triggered. Restore direct access.
3. tui_gateway/server.py _messages_as_conversation — the helper
existed only to catch 'TypeError: include_ancestors unexpected'
from mocked SessionDBs that don't actually exist. The real
SessionDB.get_messages_as_conversation has accepted
include_ancestors since introduction, and every test FakeDB in
the repo already declares the kwarg. Remove the shim, inline the
two call sites.
* feat(plugins): bundle hermes-achievements, scan full session history
Ships @PCinkusz's hermes-achievements dashboard plugin (https://github.com/PCinkusz/hermes-achievements) as a bundled plugin at plugins/hermes-achievements/ and fixes a bug in the scan path that made the plugin only see the first 200 sessions — making lifetime badges (50k tool calls, 75k errors, etc.) unreachable on long-running installs.
Changes:
- plugins/hermes-achievements/: vendor v0.3.1 verbatim (manifest, dist/, plugin_api.py, tests, docs, README).
- plugins/hermes-achievements/dashboard/plugin_api.py:
* scan_sessions(): limit=None now scans ALL sessions via SQLite LIMIT -1. Previously capped at 200, so users with 8000+ sessions saw ~2% of their history.
* evaluate_all(): first-ever scans run in a background thread so the dashboard request path never blocks. Stale snapshots serve immediately while a background refresh runs. force=True still blocks synchronously for manual /rescan.
* _build_pending_snapshot(), _start_background_scan(), _run_scan_and_update_cache(): supporting plumbing + idempotent thread spawn.
- tests/plugins/test_achievements_plugin.py: new tests covering the 200-cap regression, the background-scan first-run flow, stale-serve-plus-background-refresh, forced sync rescan, and scan-thread idempotency.
- website/docs/user-guide/features/built-in-plugins.md: lists hermes-achievements in the bundled-plugins table and documents API endpoints, state files, and performance characteristics.
E2E validated against a real 8564-session ~6.4GB state.db:
* Cold scan: 13m 19s (one-time, backgrounded — UI never blocks)
* Warm rescan: 1.47s (8563/8564 sessions reused from checkpoint cache)
* 57/60 achievements unlocked, 3 discovered — aggregates like total_tool_calls=259958, total_errors=164213, skill_events=368243 correctly surface lifetime badges that the 200-cap made unreachable.
Original credit: @PCinkusz (MIT-licensed). Upstream repo remains the staging ground for new badges; this bundle keeps the dashboard feature parity with Hermes core changes.
* feat(achievements): publish partial snapshots during cold scan
Previously a cold scan on a large session DB (13min on 8564 sessions)
showed zero badges for the entire duration, then every badge at once
when the scan completed. A dashboard refresh mid-scan was indistinguishable
from a fresh install with no history.
Now the scanner publishes a partial snapshot to _SNAPSHOT_CACHE every
250 sessions, so each refresh during a cold scan surfaces more badges
incrementally.
Mechanism:
- scan_sessions() takes an optional progress_callback fired every
progress_every sessions with (sessions_so_far, scanned, total).
- _compute_from_scan() is extracted from compute_all() and gains an
is_partial flag that skips writing to state.json — we don't want
to record unlocked_at based on a half-complete aggregate that a
later session might rebalance.
- _run_scan_and_update_cache() installs a publisher callback that
builds a partial snapshot, marks it mode='in_progress', and writes
it to the cache with age=0 so the UI keeps polling /scan-status
and picks up the final snapshot when the scan completes.
- Manual /rescan (force=True) disables partial publishing — the
caller is blocking on the final result anyway.
E2E against real 8564-session state.db (polled cache every 10s):
t=10s: cache empty
t=20s: 250/8564 scanned, 35 unlocked, 25 discovered
t=40s: 500/8564 scanned, 42 unlocked, 18 discovered
t=60s: 1000/8564 scanned, 49 unlocked, 11 discovered
...
Tests: 9/9 pass (2 new — partial snapshot publication + no-persist-on-partial).
Upstream unittest suite: 10/10 pass.
* feat(achievements): in-progress scan banner with live % progress
Previously the dashboard showed zero badges silently during long cold
scans (13min on 8564 sessions). The backend was publishing partial
snapshots every 250 sessions, but the bundled UI didn't surface any
indicator that a scan was running — it just rendered the main page
with whatever counts were currently published and no way for the user
to know more progress was coming.
UI changes (dist/index.js, dist/style.css):
- Added a scan-in-progress banner rendered between the hero and stats
when scan_meta.mode is 'pending' or 'in_progress'. Shows:
BUILDING ACHIEVEMENT PROFILE…
Scanned 1,750 of 8,564 sessions · 20%. Badges unlock as more history streams in.
with a pulsing teal indicator and a filling teal/cyan progress bar.
Disappears the moment the backend flips to 'full' or 'incremental'.
- Added an auto-poller via useEffect — while scanInFlight is true the
page re-fetches /achievements every 4s WITHOUT toggling the loading
skeleton, so unlock counts tick up visibly without the user refreshing.
The effect cleans itself up when the scan finishes.
- Added refresh() (re-fetch, no loading flip) alongside the existing
load() (full reload, used by the Rescan button).
Attribution preserved:
- Added a header comment to index.js crediting @PCinkusz
(https://github.com/PCinkusz/hermes-achievements, MIT) as the
original author, noting the banner is a layered addition on top
of the original dist bundle.
- Matching header comment in style.css, flagging the new
.ha-scan-banner* rules as the local addition.
Live-verified end to end:
- Spun up `hermes dashboard --port 9229 --no-open` against a fresh
HERMES_HOME symlinked to the real 8564-session state.db.
- Opened /achievements in a browser, confirmed the banner renders with
live progress: 'Scanned 1,000 of 8,564 sessions · 11%' → updates to
'1,250 ... · 14%' → '1,750 ... · 20%' without user interaction,
matching the backend's partial publications.
- Stats row simultaneously climbed from 35 → 49 → 53 unlocked as
more history streamed in.
- Vision analysis of the rendered page confirms the banner styling
matches the rest of the dashboard (dark card bg, teal accent, same
small-caps typography, pulsing indicator reusing ha-pulse keyframes).
The _CODEX_AUX_MODEL constant had already rotated twice in 6 weeks
(gpt-5.3-codex -> gpt-5.2-codex -> now broken again at gpt-5.2-codex)
because ChatGPT-account Codex gates which models it accepts via an
undocumented, shifting allow-list that OpenAI publishes no changelog
for. Any pinned default will keep going stale. Issue #17533 reports
the current breakage: every ChatGPT-account auxiliary fallback fails
with HTTP 400 "model is not supported" and the 60s pause loop degrades
long sessions.
Rather than reset the clock with another stale pin (PR #17544 proposes
gpt-5.2-codex -> gpt-5.4), remove the hardcoded second-order Codex
fallback entirely:
- Delete `_CODEX_AUX_MODEL`.
- Drop `_try_codex` from `_get_provider_chain()` (the auto chain now
ends at api-key providers; 4 rungs instead of 5).
- Rename `_try_codex() -> _build_codex_client(model)` and require an
explicit model from the caller. No more guessing.
- `resolve_provider_client("openai-codex", model=None)` now warns and
returns (None, None) instead of silently guessing a stale model ID.
- Remove `_try_codex` from the `provider="custom"` fallback ladder
(same stale-constant trap).
- `_resolve_strict_vision_backend("openai-codex")` routes through
`resolve_provider_client` so the caller's explicit model is honored.
Codex-main users are unaffected: Step 1 of `_resolve_auto` already
uses `main_provider` + `main_model` directly and passes the user's
configured Codex model through `resolve_provider_client`, which never
touched `_CODEX_AUX_MODEL`. Per-task overrides (`auxiliary.<task>.provider/model`)
continue to work and are the supported way to route specific aux tasks
through Codex.
Users whose main provider fails with a payment/connection error and
who have ONLY ChatGPT-account Codex auth will now see the 60s pause
without a stale-model-rejection noise line in between -- same outcome,
cleaner failure.
Closes#17533. Supersedes #17544 (which resets the clock on the
same stale-constant problem).
Keep context-1m-2025-08-07 in OAuth requests by default so 1M-capable
subscriptions retain full context. When Anthropic rejects a request with
400 'long context beta is not yet available for this subscription',
disable the beta for the rest of the session, rebuild the client, and
retry once.
Addresses #17680 (thanks @JayGwod for the clean reproduction) without
forcing every OAuth user off the 1M context window.
Changes:
- agent/error_classifier.py: new FailoverReason.oauth_long_context_beta_forbidden;
pattern matches 400 + 'long context beta' + 'not yet available'. Narrow
enough that the existing 429 tier-gate pattern keeps its own reason.
- agent/anthropic_adapter.py: _common_betas_for_base_url,
build_anthropic_client, build_anthropic_kwargs gain drop_context_1m_beta
kwarg. Default=False (1M stays). OAuth OAUTH_ONLY_BETAS unchanged.
- agent/transports/anthropic.py: build_kwargs forwards the flag.
- run_agent.py: self._oauth_1m_beta_disabled flag, retry-once guard,
recovery branch next to the image-shrink path. _rebuild_anthropic_client
honors the flag. The main build_kwargs call site threads it through for
fast-mode extra_headers.
- hermes_cli/doctor.py, hermes_cli/models.py: sibling OAuth /v1/models
probes get the same reactive retry — previously they'd falsely report
the Anthropic API as unreachable for affected subscriptions.
Tests: 2190 tests/agent/ + 94 adjacent integration tests pass. New unit
tests cover the classifier pattern (including the collision guard against
the 429 tier-gate) and the drop_context_1m_beta adapter behavior (default
keeps 1M, flag strips only 1M while preserving every other beta).
Platform plugins shipped in-repo under plugins/platforms/ should be
available out of the box — users shouldn't have to add 'irc-platform'
to plugins.enabled before they can pick IRC from the gateway setup menu.
Adds a new ``kind: platform`` plugin type that mirrors the existing
``kind: backend`` auto-load semantics:
- Bundled (shipped in the hermes-agent repo): auto-load unconditionally.
- User-installed (~/.hermes/plugins/): still opt-in via plugins.enabled
so untrusted code doesn't silently run.
Changes:
* hermes_cli/plugins.py: add 'platform' to _VALID_PLUGIN_KINDS, document
the new kind in the PluginManifest docstring, extend the bundled auto-
load rule from 'backend only' to 'backend or platform'.
* plugins/platforms/irc/plugin.yaml: declare kind: platform.
* hermes_cli/gateway.py: remove the now-redundant
_load_bundled_platform_plugins_for_enumeration() helper and the
_enable_plugin_for_platform() helper. The setup menu's _all_platforms()
just calls discover_plugins() and reads the registry — bundled
platforms are already loaded at that point. Drops the 'needs_enable'
flag and the 'plugin disabled — select to enable' status string.
* hermes_cli/setup.py: relax the "gateway is configured" detector used
during OpenClaw migration. Switching to _platform_status() in an
earlier commit tightened the check to require an exact "configured"
match, dropping platforms whose status is "enabled, not paired",
"partially configured", "configured + E2EE", etc. Now any non-"not
configured" status counts — the user has already started setup there
and we shouldn't force the section to rerun.
* tests/hermes_cli/test_setup_irc.py: drop the TestIRCPluginDisabledFlow
class and test_configure_platform_enables_disabled_plugin_first — the
no-longer-existent flow they were testing.
* tests/hermes_cli/test_setup_openclaw_migration.py: patch both
setup.get_env_value and gateway.get_env_value in the 4 gateway-section
tests that reach _platform_status() through the unified setup flow;
switch WHATSAPP_ENABLED to the literal "true" in the registry-parity
test so WhatsApp's value-shape validator matches.
Verified via fresh-install smoke (empty plugins.enabled, no env vars):
IRC plugin loads, Platform('irc') resolves, _all_platforms() lists IRC
with status 'not configured'. 160 targeted tests pass.
feat(gateway): refine Platform._missing_ and platform-connected dispatch
Restricts plugin-name acceptance to bundled plugin scan + registry
(no arbitrary string -> enum-pollution), pulls per-platform connectivity
checks into a _PLATFORM_CONNECTED_CHECKERS lambda map with a clean
_is_platform_connected method, and adds tests covering the checker map,
plugin platform interface, and IRC setup wizard.
Merge the two gateway setup paths (hermes setup gateway + hermes gateway
setup) to use a single _unified_platforms() list that merges built-in
_PLATFORMS with dynamically registered plugin entries from
platform_registry.
- Add setup_fn field to PlatformEntry for plugin setup flows
- _unified_platforms() merges built-ins with registry entries by key
- setup_gateway() now uses unified list instead of hardcoded
_GATEWAY_PLATFORMS tuple list
- gateway_setup() uses same unified list, plugin entries appear
alongside built-ins with no [plugin] suffix
- _platform_status() handles plugin platforms via registry check_fn
- Plugin platforms with setup_fn get called directly; plugins without
get a generic env-var display fallback
IRC and other plugin platforms now appear automatically in the setup
menu when registered via platform_registry.register().
feat(gateway): surface disabled platform plugins in setup and auto-enable on select
Platform plugins under plugins/platforms/* (IRC, etc.) were gated behind
plugins.enabled, so `hermes gateway setup` wouldn't list them until the
user ran `hermes plugins enable <name>` first. Now the setup menu always
surfaces them as "plugin disabled — select to enable", and picking one
adds it to plugins.enabled before running its setup flow.
Along the way, unify the two gateway setup flows so `hermes setup gateway`
and `hermes gateway setup` both read from the same platform list (built-in
_PLATFORMS + platform_registry entries), dispatch through a single
_configure_platform() helper, and share _platform_status(). Deletes the
dead bespoke wrappers in setup.py (_setup_whatsapp, _setup_weixin,
_setup_email, etc.) that duplicated logic now covered by the registry
path or _setup_standard_platform.
Also:
- PlatformEntry gains a plugin_name field so the registry knows which
plugin owns each entry (required for auto-enable).
- PluginContext.register_platform auto-stamps plugin_name from the
manifest so plugins don't have to pass it explicitly.
- PluginManager now scans plugins/platforms/* as its own category root,
one level below the bundled plugin scan.
- Fix IRC plugin discovery: rename PLUGIN.yaml → plugin.yaml (the
scanner is case-sensitive) and add the missing __init__.py that
_load_directory_module requires.
Extends the platform plugin interface from Phase 1 to cover every
touchpoint where built-in platforms have hardcoded behavior.
- allowed_users_env / allow_all_env: per-platform auth env vars
- max_message_length: smart-chunking for send_message tool
- pii_safe: session PII redaction flag
- emoji: CLI/gateway display
- allow_update_command: /update access control
send_message tool (tools/send_message_tool.py):
- Replaced hardcoded platform_map dict with Platform() call
- Added _send_via_adapter() for plugin platforms — routes through
live gateway adapter when available
- Registry-aware max message length for smart chunking
Cron delivery (cron/scheduler.py):
- Replaced hardcoded 15-entry platform_map with Platform() call
- Plugin platforms now work as cron delivery targets
User authorization (gateway/run.py _is_user_authorized):
- Registry fallback: checks PlatformEntry.allowed_users_env and
allow_all_env when platform not in hardcoded maps
- Plugin platforms get per-platform auth support
_UPDATE_ALLOWED_PLATFORMS: checks registry allow_update_command flag
Channel directory: includes plugin platforms in session enumeration
Orphaned config warning: descriptive message when plugin platform is
in config but no plugin registered it
Gateway weakref: _gateway_runner_ref for cross-module adapter access
hermes status: shows plugin platforms with (plugin) tag
hermes gateway setup: plugin platforms appear in menu with setup hints
hermes_cli/platforms.py: get_all_platforms() merges with registry,
platform_label() falls back to registry for plugin names
- 8 new tests (extended fields, cron resolution, platforms merge)
- Updated 3 tests for new Platform() based resolution
- 2829 passed, 24 pre-existing failures, zero new failures
Adds a platform adapter plugin interface so anyone can create new gateway
platforms (IRC, Viber, Line, etc.) as drop-in plugins without modifying
core gateway code.
- PlatformEntry dataclass: name, label, adapter_factory, check_fn,
validate_config, required_env, install_hint, source
- PlatformRegistry singleton with register/unregister/create_adapter
- _create_adapter() in gateway/run.py checks registry first, falls
through to existing if/elif chain for built-in platforms
- Platform._missing_() accepts unknown string values, creating cached
pseudo-members so Platform('irc') is Platform('irc') holds true
- GatewayConfig.from_dict() now parses plugin platform names from
config.yaml without rejecting them
- get_connected_platforms() delegates to registry for unknown platforms
- PluginContext.register_platform() for plugin authors
- Mirrors the existing register_tool() / register_hook() pattern
- Full async IRC adapter using stdlib asyncio (zero external deps)
- Connects via TLS, handles PING/PONG, nick collision, NickServ auth
- Channel messages require addressing (nick: msg), DMs always dispatch
- Markdown stripping for IRC-clean output, message splitting for
512-byte line limit
- Config via config.yaml extra dict or IRC_* env vars
- Platform enum dynamic members (identity stability, case normalization)
- PlatformRegistry (register, unregister, create, validation, factory)
- GatewayConfig integration (from_dict parsing, get_connected_platforms)
- IRC adapter (init, send, protocol parsing, markdown, requirements)
No existing platform adapters were migrated — the if/elif chain is
untouched. This is Phase 1: prove the interface with a real plugin.
Reloading MCP servers rebuilds the tool set for the active session, which
invalidates the provider prompt cache (tool schemas are baked into the
system prompt). The next message re-sends full input tokens — can be
expensive on long-context or high-reasoning models.
To surface that cost, /reload-mcp now routes through a new slash-confirm
primitive with three options: Approve Once / Always Approve / Cancel.
'Always Approve' persists approvals.mcp_reload_confirm: false so future
reloads run silently.
Coverage:
* Classic CLI (cli.py) — interactive numbered prompt.
* TUI (tui_gateway + Ink ops.ts) — text warning on first call; `now` /
`always` args skip the gate; `always` also persists the opt-out.
* Messenger gateway — button UI on Telegram (inline keyboard), Discord
(discord.ui.View), Slack (Block Kit actions); text fallback on every
other platform via /approve /always /cancel replies intercepted in
gateway/run.py _handle_message.
* Config key: approvals.mcp_reload_confirm (default true).
* Auto-reload paths (CLI file watcher, TUI config-sync mtime poll) pass
confirm=true so they do NOT prompt.
Implementation:
* tools/slash_confirm.py — module-level pending-state store used by all
adapters and by the CLI prompt. Thread-safe register/resolve/clear.
* gateway/platforms/base.py — send_slash_confirm hook (default 'Not
supported' → text fallback).
* gateway/run.py — _request_slash_confirm helper + text intercept in
_handle_message (yields to in-progress tool-exec approvals so
dangerous-command /approve still unblocks the tool thread first).
Tests:
* tests/tools/test_slash_confirm.py — primitive lifecycle + async
resolution + double-click atomicity (16 tests).
* tests/hermes_cli/test_mcp_reload_confirm_gate.py — default-config
shape + deep-merge preserves user opt-out (5 tests).
Targeted runs (hermetic): 89 passed (slash-confirm, config gate,
existing agent cache, existing telegram approval buttons).
PR #15027 (5 days ago) shipped TELEGRAM_GROUP_ALLOWED_USERS as a chat-ID
allowlist. #17686 correctly renames that to sender user IDs and moves
chat IDs to TELEGRAM_GROUP_ALLOWED_CHATS. Without a shim, any user on
PR #15027's guidance would silently start rejecting group traffic on
upgrade.
- gateway/run.py: in _is_user_authorized, if TELEGRAM_GROUP_ALLOWED_USERS
contains values starting with '-' (chat-ID-shaped), honor them as chat
IDs and log a one-shot deprecation warning pointing users at the new
TELEGRAM_GROUP_ALLOWED_CHATS var.
- tests/gateway/test_unauthorized_dm_behavior.py: three new tests cover
legacy chat-ID values authorizing the listed chat, not crossing to
other chats, and mixed sender/chat values in the same var.
- website/docs/user-guide/messaging/telegram.md: rewrite the Group
Allowlisting section to document the new user/chat split + migration
note. Remove stale '/thread_id' suffix claim (code never parsed it).
- website/docs/reference/environment-variables.md: document all three
Telegram allowlist env vars.
Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to
match the design: user-initiated rescan, no prompt-cache reset, no new
schema surface, no phantom user turn, and the next-turn note carries each
added/removed skill's 60-char description (not just its name).
Changes vs the original PR:
* Drop the in-process skills prompt-cache clear in reload_skills(). Skills
are invoked at runtime via /skill-name, skills_list, or skill_view —
they don't need to live in the system prompt for the model to use them.
Keeping the cache intact preserves prefix caching across the reload so
/reload-skills pays no cache-reset cost. (MCP has to break the cache
because tool schemas must be known at conversation start; skills do not.)
* Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from
tools/skills_tool.py, plus the four skills_reload enumerations in
toolsets.py. No new schema surface — agents can already see a freshly-
installed skill via skill_view / skills_list the moment it's on disk.
* Replace the phantom 'role: user' turn injection with a one-shot queued
note. CLI uses self._pending_skills_reload_note (same pattern as
_pending_model_switch_note, prepended to the next API call and cleared).
Gateway uses self._pending_skills_reload_notes[session_key]. The note
is prepended to the NEXT real user message in this session, so message
alternation stays intact and nothing out-of-band is persisted to the
transcript.
* reload_skills() now returns added/removed as
[{'name': str, 'description': str}, ...] (description truncated to 60
chars — matches the curator / gateway adapter budget). The injected
next-turn note formats each entry as 'name — description' so the model
can actually reason about which new skills to call without running
skills_list first.
* Only emit the note when the diff is non-empty. On empty diff, print
'No new skills detected' and do nothing else.
* Tests rewritten to cover the queue semantics, the description payload,
and a regression guard that the prompt-cache snapshot is preserved.
Adds a public reload path for the in-process skill caches so newly
installed (or removed) skills become visible mid-session without a
gateway restart. Mirrors the shape of /reload-mcp.
Three surfaces:
* /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py),
with /reload_skills alias for Telegram autocomplete and an explicit
Discord registration.
* skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents
pick up freshly-installed skills via tool call.
* agent.skill_commands.reload_skills() — shared helper that clears
_skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the
on-disk .skills_prompt_snapshot.json, then returns an added/removed
diff plus the new total count.
Tested:
* tests/agent/test_skill_commands_reload.py (9 cases)
* tests/cli/test_cli_reload_skills.py (3 cases)
* tests/gateway/test_reload_skills_command.py (4 cases)
Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop
skills into ~/.hermes/skills mid-session, plus agentic flows where the
agent itself installs a skill via the shell tool and needs it bound
without a gateway restart. The Python helper
clear_skills_system_prompt_cache(clear_snapshot=True) already exists
internally — this PR just exposes it via slash command and tool.
Pull the top-level + chat parser construction out of main() into
hermes_cli/_parser.py so relaunch.py can introspect parser._actions to
discover which flags exist and whether they take values, instead of
maintaining a parallel hand-rolled (flag, takes_value) tuple list.
- _parser.py: build_top_level_parser() returns (parser, subparsers,
chat_parser); side-effect-free import.
- main.py: ~290 lines of inline parser construction collapsed to a
helper call. Other subparsers stay inline (dispatch is bound to
module-level cmd_* functions).
- _parser._inherited_flag(parser, ...): wraps parser.add_argument and
sets action.inherit_on_relaunch = True. Used in place of
parser.add_argument for the 25 flags (top-level + chat) that need to
carry over.
- _parser.PRE_ARGPARSE_INHERITED_FLAGS: holds --profile/-p, which
isn't on argparse (consumed earlier by main._apply_profile_override).
- relaunch.py: drops _CRITICAL_DESTS and _PRE_ARGPARSE_FLAGS; the table
builder now filters by getattr(action, 'inherit_on_relaunch', False).
- test_ignore_user_config_flags.py: brittle inspect.getsource grep
replaced with proper parser introspection.
- test_relaunch.py: introspection sanity tests added.
Salvaged from PR #17549; added top-level -t/--toolsets flag to
_parser.py so #17623 (fix(tui): honor launch toolsets) behavior is
preserved on current main.
Co-authored-by: ethernet <arilotter@gmail.com>
Extract all os.execvp('hermes', ...) calls into a utility so flags like
--tui, --dev, --profile, --model, --provider, et al. survive session
resume and post-setup relaunch.
- resolve_hermes_bin: prefers sys.argv[0] when callable, then PATH,
then falls back to '${sys.executable} -m hermes_cli.main' (fixes nix
run relaunches)
- build_relaunch_argv: allowlists critical flags so they carry over
- cmd_sessions browse now calls relaunch(['--resume', <id>])
- _apply_profile_override skips redundant work when HERMES_HOME is
already set (child inherits parent profile)
- setup.py replaces _resolve_hermes_chat_argv with relaunch_chat()
- added comprehensive tests for flag extraction and binary resolution
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If a concurrent RPC mutates _sessions while session.delete is iterating
it (e.g. a parallel session.create on the thread pool), the bare except
swallowed the RuntimeError and let the delete proceed against a row
that may still be live. Snapshot via list(_sessions.values()) and
return an error when even that raises, instead of treating "couldn't
check" as "no active sessions."
Pressing `d` on the highlighted row in the resume picker prompts
`delete? y/n`; `y` deletes the session (DB row + on-disk transcript
files), anything else cancels. The active session is excluded from
deletion server-side.
Adds a new `session.delete` JSON-RPC handler that wraps
`SessionDB.delete_session`, forwarding the per-profile `sessions/`
directory so transcripts get cleaned up alongside the row.
CI Tests workflow has been red on main for 40+ consecutive runs. This
commit recovers every failure visible in run 25130722163 (most recent
completed run prior to this PR).
Root causes, by group:
Test-mock drift after product landed (fix: update mocks)
- test_mcp_structured_content / test_mcp_dynamic_discovery (6 tests):
product added _rpc_lock (#02ae15222) and _schedule_tools_refresh
(#1350d12b0) without updating sibling test files. Install a real
asyncio.Lock inside the fake run-loop and patch at _schedule_tools_refresh.
- test_session.py: renamed normalize_whatsapp_identifier → canonical_
whatsapp_identifier upstream; keep a local alias so the legacy tests
keep working.
- test_run_progress_topics Slack DM test: PR #8006 made Slack default
tool_progress=off; explicitly set it to 'all' in the test fixture so
the progress-callback path still runs. Also read tool_progress_callback
at call time rather than freezing it in FakeAgent.__init__ — production
assigns it AFTER construction.
- test_tui_gateway_server session-create/close race: session.create now
defers _start_agent_build behind a 50ms timer — wait for the build
thread to enter _make_agent before closing, otherwise the orphan-
cleanup path never runs.
- test_protocol session.resume: product get_messages_as_conversation now
takes include_ancestors kwarg; accept **_kwargs in the test stub.
- test_copilot_acp_client redaction: redactor is OFF by default (snapshots
HERMES_REDACT_SECRETS at import); patch agent.redact._REDACT_ENABLED=True
for the duration of the test.
- test_minimax_provider: after #17171, dots in non-Anthropic model names
stay dots even with preserve_dots=False. Assert the new invariant
rather than the old 'broken for MiniMax' behavior.
- test_update_autostash: updater now scans `ps -A` for dashboard PIDs;
the test's catch-all subprocess.run stub needed stdout/stderr fields.
- test_accretion_caps: read_timestamps dict is populated lazily when
os.path.getmtime succeeds. Use .get("read_timestamps", {}) to tolerate
CI filesystems where the stat races file creation.
Change-detector tests (fix: rewrite as structural invariants)
- test_credential_sources_registry_has_expected_steps: was a frozen set
comparison that broke when minimax-oauth was added. Rewrite as an
invariant check (every step has description, no dupes, core steps
present) per AGENTS.md 'don't write change-detector tests'.
xdist ordering / test pollution (fix: reset state, use module-local patches)
- test_setup vercel: sibling test saved VERCEL_PROJECT_ID='project' to
os.environ via save_env_value() and never cleared it. monkeypatch.delenv
the VERCEL_* vars in the link-file test.
- test_clipboard TestIsWsl: GitHub Actions is on Azure VMs whose real
/proc/version often contains 'microsoft'. Patching builtins.open with
mock_open didn't reliably intercept hermes_constants.is_wsl's call in
xdist workers that had already cached _wsl_detected=True from an
earlier test. Patch hermes_constants.open directly and add
teardown_method to reset the cache after each test.
Pytest-asyncio cancellation hangs (fix: bound product await with timeout)
- test_session_split_brain_11016 (3 params) + test_gateway_shutdown
cancel-inflight: under pytest-asyncio 1.3.0, 'await task' and
'asyncio.gather(cancelled_tasks)' can stall for 30s when the cancelled
task's finally block awaits typing-task cleanup. Bound both with
asyncio.wait_for(..., timeout=5.0) and asyncio.shield — the stragglers
are released from adapter tracking and allowed to finish unwinding in
the background. This is also a legitimate hardening: a wedged finally
shouldn't stall the caller's dispatch or a gateway shutdown.
Orphan UI config (fix: merge tiny tab into messaging category)
- test_web_server test_no_single_field_categories: the telegram.reactions
config field lived in its own 'telegram' schema category with no
siblings. Fold it under 'discord' via _CATEGORY_MERGE so the dashboard
doesn't render an orphan single-field tab.
Local verification: 38/38 originally-failing tests pass; 4044/4044
gateway tests pass; 684/684 targeted subset (all 16 touched test files)
passes.
Detect leaked SGR mouse-report fragments in CLI input, strip them, and reset terminal modes in-place so scroll and typing recover without reopening the tab. Add regression tests for escaped, visible, and bare leak forms.
* fix(tui): offload manual compaction RPC
Route TUI session compression through the existing long-handler pool so slow compaction does not block other gateway RPCs.
* fix(tui): show compaction progress immediately
Print a local status line before the compress RPC starts so slow manual compaction does not look like a no-op.
* feat(tui): rich /compress feedback parity with CLI
Show pre-compaction message count and rough token estimate immediately, emit a status update so the bottom bar reflects ongoing compaction, and report a multi-line summary (headline + token delta + optional note) using the shared summarize_manual_compression helper.
* fix(tui): show live compaction estimate in transcript
Mirror compression progress status into the transcript so users see the backend message count and token estimate while /compress is still running.
* fix(tui): single live compaction line with spinner glyph
Drop the redundant local "compressing context..." placeholder and prefix the live backend status line with a braille spinner glyph so /compress reads as a single in-progress row.
* fix(tui): address review nits on /compress feedback
Reuse the precomputed token estimate inside _compress_session_history so the gateway does not redo the O(n) work while holding history_lock, keep the status bar pinned during long manual compactions instead of auto-restoring after 4s, and drop the redundant noop bullet that doubled with the system role glyph.
* fix(tui): release history_lock during compaction LLM call
Move the snapshot/commit pattern into _compress_session_history so the lock is held only across the in-memory bookkeeping, not during agent._compress_context. Also emit a final neutral status update from session.compress so the pinned compressing indicator clears even on errors.
* fix(tui): rebuild prompt cleanly + sync session_key after compress
Pass system_message=None so AIAgent._compress_context rebuilds the system prompt without nesting the cached identity block. Reuse the handler's pre-snapshotted history inside _compress_session_history to avoid a second O(n) copy under the lock. After compaction, when AIAgent._compress_context rotates session_id, sync the gateway session_key, migrate approval notify + yolo state, restart the slash worker, and clear the stale pending title. Mirrors HermesCLI._manual_compress.
* Avoid /compress lock re-entry in slash side effects.
Stop pre-locking history before _compress_session_history in slash command mirroring, keep session-key sync parity with manual compression, and add a regression test that asserts /compress is invoked without holding history_lock.
* fix(tui): honor launch toolsets
Carry chat --toolsets through the TUI launcher so TUI sessions use the same per-session tool scope as the classic CLI.
* fix(tui): parse top-level toolsets flag
Allow top-level hermes --tui --toolsets to reach the implicit chat session, matching chat subcommand behavior.
* fix(tui): validate launch toolsets
Filter invalid HERMES_TUI_TOOLSETS entries and fall back to configured CLI toolsets when the override contains no valid toolsets.
* fix(tui): avoid config load for builtin toolsets
Honor built-in HERMES_TUI_TOOLSETS values before loading config and treat all/* as the all-toolsets sentinel.
* fix(cli): honor toolsets in oneshot mode
Forward top-level --toolsets into oneshot agent construction so the flag is not silently ignored outside the TUI path.
* fix(cli): validate oneshot toolsets
Reject invalid-only oneshot toolset overrides before output redirection and clarify TUI fallback warnings.
* fix(cli): preserve all-toolsets sentinel
Map explicit all/* oneshot toolset overrides to the all-toolsets sentinel and replace locals() checks in TUI toolset loading.
* fix(cli): warn on extra all-toolset entries
Warn when all/* toolset overrides include additional ignored entries so typos are still visible.
* fix(tui): honor plugin toolset overrides
Discover plugin toolsets before rejecting unresolved explicit toolset overrides and read raw config for MCP name validation.
* fix(tui): reuse toolset argument normalizer
Share top-level TUI toolset argument parsing with the oneshot path to avoid duplicate normalization logic.
* fix(cli): reject disabled mcp toolsets
Validate explicit toolset overrides against enabled MCP servers only and clarify top-level toolset flag help.
* fix(cli): distinguish disabled mcp from unknown toolsets
Report disabled MCP servers separately from unknown toolset entries and stub plugin discovery in invalid-name tests for determinism.
shutil.copytree from default ~/.hermes duplicated ~/.hermes/profiles into
the new profile, causing nested profiles/.../profiles/... and huge disk use.
Match export behavior (_DEFAULT_EXPORT_EXCLUDE_ROOT) by ignoring the sibling
profiles tree at the source root.
Made-with: Cursor
The skip_pre_tool_call_hook flag was added to prevent double-firing of
pre_tool_call when run_agent._invoke_tool pre-checks for a block
directive and then dispatches via handle_function_call. But the
implementation added an else: branch that fired invoke_hook again for
'observers', without noticing that get_pre_tool_call_block_message() in
hermes_cli.plugins already fires invoke_hook('pre_tool_call', ...) as
part of its block-directive poll.
Result: every tool call ran through the run_agent loop fired the hook
twice — reported by community users whose observer / audit plugins
logged each tool invocation twice with identical timestamps.
Fix: delete the else: branch. The single-fire contract is now:
- skip=False (direct handle_function_call): hook fires once inside
get_pre_tool_call_block_message().
- skip=True (run_agent._invoke_tool path): caller fires the hook
once via get_pre_tool_call_block_message(); handle_function_call
must not fire it again.
Tightened the existing skip-flag test (renamed to
test_skip_flag_prevents_double_fire) to assert pre_tool_call fires
zero times when skip=True, and added
test_run_agent_pattern_fires_pre_tool_call_exactly_once to lock in
end-to-end that the full block-check + dispatch sequence fires the
hook exactly once.
Extend curator's pin flag from 'skip auto-transitions' to 'no agent
edits at all'. All five skill_manage mutation actions (edit, patch,
delete, write_file, remove_file) now refuse pinned skills with a
message pointing the user at `hermes curator unpin <name>`.
Motivation: pin used to only stop the curator's own maintenance pass
from touching a skill. Nothing prevented the main agent from editing
or deleting a pinned skill via skill_manage in-session. This gives
users a hard fence against unwanted agent edits — same semantics as
curator pinning, extended to the write tool.
Create is unaffected (you can't pin a name that doesn't exist yet,
and name collisions already error out). Broken sidecars fail open
rather than lock the agent out.
The schema description advertises the new refusal so models know
not to route around it with rename/recreate tricks.
compute_next_run() ignored the last_run_at parameter for cron-type
schedules, always computing from _hermes_now() instead. This was
inconsistent with interval jobs which DO use last_run_at as the anchor.
After a crash or restart, cron jobs would compute next_run_at from
the arbitrary restart time rather than the actual last execution time.
While the stale detection in get_due_jobs() catches most cases, using
last_run_at as the croniter base eliminates edge cases and makes the
behavior consistent across schedule types.
Salvaged from #9014 (authored by @beenherebefore) onto current main.
The original PR branch was 2+ weeks stale and would have reverted
substantial unrelated work (jobs_file_lock, workdir/context_from/
enabled_toolsets, issue #16265 state=error recovery). Kept just the
7-line substantive fix and the regression test.
Bare `float(os.getenv("HERMES_CRON_TIMEOUT", 600))` in `run_job()` raises
a `ValueError` when the env var is set to a non-numeric string (e.g. "abc").
Replace it with the same defensive try/except pattern already used by
`_get_script_timeout()` for `HERMES_CRON_SCRIPT_TIMEOUT`: log a warning
and fall back to the 600 s default instead of crashing.
Also update the existing env-var tests to exercise the new code path and
add two new tests — one for an invalid value, one for an empty string.
Fixes#11319
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes#4759, closes#4381.
Mutating actions (patch, edit, write_file, remove_file, delete) used to
refuse skills that lived under `skills.external_dirs` with 'Skill X is in
an external directory and cannot be modified. Copy it to your local skills
directory first.' Faced with that error, the agent would fall back to
action='create', which always writes under ~/.hermes/skills/ — producing
a silent duplicate of the external skill in the local store.
Fix: drop the read-only gate. `skills.external_dirs` is configured by the
user; if they pointed it at a directory, they already said 'these are my
skills, treat them the same.' Filesystem permissions handle the genuine
read-only case (write fails, agent sees the error).
- New _containing_skills_root() resolves whichever dir actually contains
the skill; _delete_skill uses it to bound empty-category cleanup so an
external root is never rmdir'd.
- _create_skill behavior is unchanged: new skills still land in local
SKILLS_DIR only. Fewer moving parts.
- Seven new TestExternalSkillMutations tests covering patch/edit/write_file/
remove_file/delete/create against a mocked two-root layout + a category
rmdir-safety check.
When a user authenticates a built-in provider via env var (e.g. DASHSCOPE_API_KEY
triggers the built-in 'alibaba' row) AND defines a custom_providers entry
pointing at the same endpoint, the picker previously emitted two rows for one
endpoint. The built-in row already carries the canonical slug, curated model
list, and correct auth wiring, so the shadow custom entry is redundant.
Adds a _builtin_endpoints set populated as sections 1/2/2b emit rows. Each
entry is the provider's effective base URL (env override via base_url_env_var
wins over the static inference_base_url, so DASHSCOPE_BASE_URL-overridden
endpoints dedup correctly). Section 4 skips any grouped custom entry whose
base_url matches.
Intentionally does NOT repurpose model_catalog.enabled as a 'hide built-ins'
flag. That config controls the remote curated-manifest fetch (documented on
the model-catalog reference page) and overloading it would silently change
behavior for users who disable it for network/privacy reasons.
Three new tests:
- shadow dedup fires when endpoint matches static inference_base_url
- dedup does NOT hide custom entries on genuinely distinct endpoints
- dedup honors the base_url_env_var override path
Covers the #16748 fix:
- unsigned thinking blocks synthesised from reasoning_content survive replay
- non-latest assistant turns keep their thinking (DeepSeek validates every turn)
- signed Anthropic blocks are stripped (DeepSeek can't validate them)
- cache_control is stripped from thinking blocks
- OpenAI-compat base (api.deepseek.com without /anthropic) is NOT matched
- non-DeepSeek third parties (minimax) keep the generic strip-all behaviour