Add a generic suppress_notification flag to the drain-request marker. When a
drain that ends in process exit (e.g. a NAS auto-update image migration on the
always-on Hermes Cloud fleet) is flagged, the gateway skips ONLY the
home-channel 'gateway shutting down' broadcast — the operator-flavoured ping
that would otherwise fire on every routine auto-update, dozens of times a day.
The per-active-session interrupt ping is ALWAYS kept: on a drained shutdown
it's empty by construction, and in the force-interrupt (deadline-exceeded) case
it carries the user-valuable 'your task was cut off, message me to resume' hint.
The gateway stays agnostic about WHY a drain is quiet (generic boolean, not a
kind enum); the policy of which drain causes set the flag lives in the caller
(NAS). Default-false so legacy/operator drains behave exactly as before. The
reader reuses the NS-570 epoch-staleness check so an orphaned marker on the
durable volume can never silence a fresh gateway's legitimate broadcast.
- drain_control.py: write_drain_request gains suppress_notification; new
drain_notification_suppressed() reader (current-epoch + truthy flag).
- web_server.py: /api/gateway/drain reads + echoes the flag.
- run.py: _notify_active_sessions_of_shutdown skips the home-channel loop only.
Tests prove: flag round-trips; home-channel suppressed when set, kept when
unset; active-session ping always fires; stale/legacy/corrupt markers never
suppress.
* fix(windows): stop terminal-window popups from background spawns
Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:
- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
subprocess allocate a fresh console when the launching process has
none (pythonw desktop backend / detached gateway) - even with output
captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
get a console regardless of how they're launched.
Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
exclusively owns an auto-allocated console detaches it at startup
(GetConsoleProcessList()==1 gate leaves shared interactive consoles
untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
tools repo-wide so the popup class can't regress.
* fix(windows): scope FreeConsole to background entry points; fix merge fallout
Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.
- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
(start_server, what the desktop spawns), cron standalone, tui_gateway entry,
slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
no-op without console / on POSIX, and asserts it's not an import side effect.
Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
every terminal init). _subprocess_compat.popen already hides the window, so
drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
case-variant test to drive that production path.
* test(claw): mock _subprocess_compat.run seam for Windows process scan
claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
The dashboard Config tab's Model field is a flat string with no provider
info. _denormalize_config_from_web only updated model.default and kept the
stale provider, so picking an OpenRouter model while the default provider was
ollama-local left provider=ollama-local and every call 404'd.
When the model string actually changes, infer the serving provider — curated
catalog first, then a vendor/model-slug heuristic for non-aggregator providers
— and route the switch through the existing _normalize_main_model_assignment /
_apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are
cleared on a provider change and preserved on a same-provider re-pick. Saving
an unchanged model never re-detects, so unrelated config saves keep an explicit
provider.
Closes#14058
The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate
every root-level key the YAML supports. Most visibly, `custom_providers`
is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend
never sends it in the PUT body. The previous full-replace save() then
silently wiped the key from disk every time the user clicked anything
that triggered a save. Other casualties (less visible because defaults
re-mask them on load) include `agent.personalities`,
`agent.reasoning_effort`, `terminal.lifetime_seconds`, etc.
Fix: read the raw on-disk config and deep-merge the incoming PUT body
on top of it before saving. The frontend can only overwrite what it
explicitly sends; everything else is preserved verbatim.
Reuses the existing `_deep_merge` helper from `hermes_cli.config`.
Tests:
- `test_round_trip_preserves_custom_providers` exercises the exact bug:
seed config with custom_providers, GET → drop the key → PUT,
assert it's still on disk.
- `test_round_trip_preserves_schema_invisible_nested_keys` covers the
shallow-vs-deep-merge case for nested dicts under `agent` etc.
Both fail on current main; both pass with this patch.
On the desktop Channels / Messaging page, the "Open setup guide" button was
rendered as a bare <a href={platform.docs_url} target="_blank"> with no guard.
Plugin-provided platforms (Microsoft Teams, Google Chat, Line, Raft, Yuanbao,
…) ship an empty docs_url, so the anchor's href was "".
In a packaged build, Electron resolves an empty href against the current
document — the app's own index.html inside the asar bundle — and
shell.openPath then fails with an OS "file not found" dialog. This is exactly
the Windows error reported for Messaging → Teams → Open guide.
Fix (3 changes):
1. fix(desktop) — Only render the "Open setup guide" button when docs_url is
non-empty, and route clicks through openExternalLink so a relative/empty
value can never be treated as a local bundle path. Fixes the whole class
(every plugin platform), not just Teams.
2. fix(messaging) — Give the Teams platform plugin a real docs_url (Microsoft
Teams setup guide) so its card shows a working button instead of nothing.
3. fix(messaging) — Give the Google Chat platform plugin a real docs_url
(Google Chat setup guide) so its card shows a working button instead of
nothing. Originally from #48940; folded in here because that PR's test
was broken (it queried the HTTP endpoint, but google_chat is a dynamic
enum member that only appears after the adapter module is imported).
Test plan:
- apps/desktop — new src/app/messaging/index.test.tsx: button is hidden when
docs_url is empty; a real URL opens via the validated external opener (does
not navigate).
- apps/desktop typecheck (tsc --noEmit) clean.
- backend — test_teams_messaging_metadata_links_setup_guide: the Teams catalog
entry exposes the setup-guide docs_url.
- backend — test_google_chat_messaging_metadata_links_setup_guide: the Google
Chat catalog entry exposes the setup-guide docs_url.
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: p-andhika <andhika.prakasiwi@gmail.com>
Tasks 2.1 + 2.2 + 2.3 of the safe-shutdown plan — the reversible
quiesce-without-restart machinery NAS drives during a lifecycle action (D4a).
These ship together because the endpoint, the control channel, and the gateway
state machine are one coherent slice.
2.2 — control channel (gateway/drain_control.py, new):
The dashboard has no HTTP path into a running gateway (guardrails: "there is NO
external control channel into a running gateway"); restart/drain is driven only
by markers the gateway reacts to. So begin/cancel-drain writes/removes a
presence-based marker .drain_request.json (HERMES_HOME-scoped, atomic write,
never-raises read; a corrupt marker reads as present-contentless → fail-safe
toward quiescing). This is Q-B option A.
2.2 — gateway state machine (gateway/run.py):
- _external_drain_active flag, DISTINCT from the shutdown _draining flag: this
one does NOT exit the process and is fully reversible.
- _enter_external_drain / _exit_external_drain: idempotent transitions that
flip gateway_state→draining / →running via _update_runtime_status (preserving
the live active_agents count). exit refuses to revert to running during a
real shutdown or after the loop stops (shutdown wins).
- _drain_control_watcher: 1s background task (modelled on _handoff_watcher)
reconciling accept-state with the marker; honours a marker that survived a
restart on its first tick. Registered alongside the other watchers in start.
- New-turn accept gate in _handle_message, placed BEFORE the session-slot
claim: when draining, refuse to START a new turn (so active_agents can only
fall → no TOCTOU race), while in-flight turns finish untouched. Internal/
system events (restart-recovery replays, bg-process completions) bypass it.
2.1 — endpoint (hermes_cli/web_server.py):
POST /api/gateway/drain {action: drain|cancel}. Authenticated by the Task-2.0a
token seam (the drain plugin registered this exact path as a token route);
attributes the request to the verified token principal. Begin writes the
marker, cancel removes it — the gateway process owns the actual transition.
Force-override (D6) is NOT here; it maps onto the existing immediate
/api/gateway/restart force path.
Tests (mocked — necessary-not-sufficient; the HARD live gate Q-B is next):
- tests/gateway/test_external_drain_control.py — marker contract (write/clear/
read/corrupt/atomic), state machine (enter/exit/idempotency/shutdown-wins/
loop-stopped), watcher reconcile-enter-then-exit, new-turn refusal, and
in-flight-not-interrupted. 15 tests.
- tests/hermes_cli/test_web_server.py — /api/gateway/drain begin/default-begin/
cancel/cancel-idempotent/bad-action-400. 6 tests.
- dashboard.drain_auth config section already added in 2.0b commit.
All touched suites green: 301 (gateway+auth) + 9 (web_server endpoints) passed.
Intentionally deferred:
- HARD live-validation gate (Q-B): real isolated `hermes gateway run`, drive a
real begin-drain marker, prove the 5-point checklist a–e.
- Spec-doc status flip + Phase-2 PR.
Build status: external-drain, restart-drain, status, dashboard-auth, drain-plugin,
token-auth, and web_server-endpoint suites green.
* feat(moa): expose MoA presets as selectable virtual models
Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.
Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.
- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
/ one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.
* fix(moa): thread moa_config through _run_agent to _run_agent_inner
The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).
* fix(moa): classify moa as a virtual provider in the catalog
The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
- test_provider_catalog: api_key providers must expose a credential env var
- test_provider_parity: every hermes-model provider must be desktop-configurable
moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
Salvages #50469 by @libre-7.
_dashboard_local_update_managed_externally() previously blocked every containerized dashboard from the local update API, even when the running install was a bind-mounted git checkout that can be updated with hermes update.
Allow the dashboard updater only for git installs inside containers, while keeping hosted /opt/data, docker, and pip installs managed externally. Pip remains blocked because its apply path mutates the running container filesystem and is not the self-managed checkout case.
Adds regression coverage for docker, git, and pip install-method handling inside containers, and maps the contributor email for release attribution.
Co-authored-by: libre-7 <libre-7@users.noreply.github.com>
Defense-in-depth for the dashboard plugin auto-import path. The web server
auto-imports and mounts the Python backend (dashboard/manifest.json -> api file)
of plugins found in ~/.hermes/plugins/ (user) and ./.hermes/plugins/ (project),
not just bundled plugins. So any plugin that reaches one of those dirs gets
arbitrary Python executed on the next dashboard start.
NOTE ON THREAT MODEL: #43719's originally-documented delivery chain (a public
--insecure dashboard + open API used to git clone a malicious repo into
~/.hermes/plugins/) is ALREADY mitigated on main — since the June 2026
hermes-0day hardening, a non-loopback bind ALWAYS requires an auth provider and
--insecure no longer bypasses the auth gate. This change is therefore NOT
closing that (now-authenticated) network path; it removes the residual
'arbitrary code executes merely because a plugin is on disk' hazard, which still
applies when a plugin arrives by other means: a socially-engineered git clone,
a supply-chain drop, an authenticated-but-malicious actor, or a future
regression in the auth gate. Untrusted on-disk code should not auto-execute.
Restrict dashboard backend Python auto-import to BUNDLED plugins only. User and
project plugins may still extend the dashboard UI via static JS/CSS, but their
api Python file is never auto-imported. Two layers: _discover_dashboard_plugins
scrubs api/_api_file for user/project sources (and bundled wins name conflicts
so a non-bundled plugin cannot shadow a trusted backend route);
_mount_plugin_api_routes re-refuses user/project at mount time. Tightens the
prior GHSA-5qr3-c538-wm9j / #29156 hardening (bundled+user) to bundled-only.
Salvaged from #44472 (@egilewski) onto current main.
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.
Add to /api/status (additive, non-breaking):
- active_agents — in-flight gateway-turn count (now refreshed
per-turn by the companion gateway-side commit)
- gateway_busy — running AND active_agents > 0
- gateway_drainable — running and live (a valid begin-drain target)
- restart_drain_timeout — resolved seconds, so the consumer can size its
poll deadline without out-of-band knowledge
(env HERMES_RESTART_DRAIN_TIMEOUT → config
agent.restart_drain_timeout → default)
The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.
Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
On Windows, hermes writes writer.bat (@echo off / hermes -p writer %*)
with CRLF endings instead of the POSIX writer shell script. The test
hardcoded the POSIX path and exact bytes, so it failed on Windows hosts.
Assert on stripped non-empty lines per platform, making it line-ending-
and OS-independent.
Adds the end-to-end parity contract test: every CANONICAL_PROVIDERS entry (the
`hermes model` universe) must be configurable on a desktop Providers tab —
keys(/api/env) ∪ ids(/api/providers/oauth) ⊇ canonical. Asserted as an
invariant against the live endpoints so the GUI can never silently drift from
the CLI again.
Surfacing this contract caught Bedrock: it's aws_sdk (no api-key vars), so it
had no Keys card. /api/env now tags AWS_REGION/AWS_PROFILE to the bedrock
provider card. Anthropic is whitelisted as a legitimate dual-tab provider
(direct API key + subscription OAuth).
Also refreshes the _OAUTH_PROVIDER_CATALOG docstring to describe its new role
as the override base for _build_oauth_catalog().
The Keys tab now surfaces every keys-tab provider in provider_catalog() (the
`hermes model` universe), synthesizing a card even when the env var has no hand
entry in OPTIONAL_ENV_VARS. Closes the drift where openai-api, kilocode, novita,
tencent-tokenhub, and copilot were CLI-configurable but invisible in the desktop
Providers → API keys tab.
Each provider row now carries backend-derived provider/provider_label grouping
hints so the desktop can group by the same provider identity the CLI picker
uses. Hand OPTIONAL_ENV_VARS prose still wins where present (enrichment, not a
gate). Shared non-provider credentials (e.g. tool-category GITHUB_TOKEN) are
explicitly not hijacked into a provider card — Copilot uses its provider-owned
COPILOT_GITHUB_TOKEN.
- Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or
interior comma (which the gateway silently tolerates in
gateway/platforms/slack.py) is no longer rejected at the dashboard.
- Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant
and note it stays in sync with the frontend SLACK_MEMBER_ID_RE.
- Add a regression test for the trailing-comma case.
The new SLACK_ALLOWED_USERS validation rejected '*', but the Slack gateway
honors '*' as an allow-all wildcard (gateway/platforms/slack.py DM auth,
slash-confirm, and approval-button paths). Accept '*' as a valid list entry
in both the API validator and the dashboard form so a value the runtime
honors is no longer blocked at setup.
* fix(dashboard): resolve chat TUI argv off event loop
Dashboard chat now resolves its TUI launch command off the
FastAPI/WebSocket event loop. The resolver can run `npm install` /
`npm run build` through `_make_tui_argv()`, and doing that synchronously
in `/api/pty` can block proxy keepalives and other dashboard WebSocket
work long enough for reverse-proxy deployments to drop the chat
connection.
This keeps the current TUI build policy intact: normal production
launches still run the correctness-first `npm run build` path, while
`HERMES_TUI_DIR` remains the prebuilt/no-build path for distros and
containers. The change only moves the potentially slow resolver work to
a worker thread for the dashboard chat path, serialized by an
`asyncio.Lock` so concurrent chat tabs preserve one-build-at-a-time
behavior. `SystemExit` (node/npm missing) and the profile `HTTPException`
path still propagate cleanly through `asyncio.to_thread()`.
Salvaged from #26124 — rebased onto current main. The async wrapper now
threads the `profile` parameter that `_resolve_chat_argv` gained on main
since the PR was opened, so cross-profile chat is preserved.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
* chore: add 0xdany to AUTHOR_MAP
* fix(dashboard): bind chat-argv lock to app.state; cover error propagation
Self-review hardening on top of the salvaged fix:
- Move `_chat_argv_lock` from a module-level `asyncio.Lock()` onto
`app.state` (initialised in `_lifespan`, lazy fallback via
`_get_chat_argv_lock`), mirroring `event_lock`. A module-level
`asyncio.Lock()` binds to whatever event loop is active at import time,
which is the exact pattern `_get_event_state`'s docstring warns against
(breaks across TestClient instances / uvicorn reloads). This keeps the
lock on the running loop.
- Add two tests exercising the real `_resolve_chat_argv_async` →
`asyncio.to_thread` → lock → re-raise chain: `SystemExit` (node/npm
missing) and `HTTPException` (invalid profile) both propagate out of the
worker thread and are caught by `pty_ws`'s existing handlers. The prior
tests mocked `asyncio.to_thread` away and never covered this path.
* test(dashboard): dedupe pty error-propagation tests; assert close code
simplify-code cleanup pass on the salvage stack:
- Extract the shared scaffolding of the two pty_ws error-propagation tests
into `_assert_pty_propagates`, keeping the two tests as distinct contracts
for the `except SystemExit` and `except HTTPException` arms.
- Assert the stable WebSocket close code (1011) instead of relying solely on
the user-facing "Chat unavailable" notice wording — a behavior contract per
the AGENTS.md "behavior contracts over snapshots" rule, robust to notice
rewording. The detail substring ("unknown profile") is still checked for the
HTTPException case since proving the detail survives the thread hop is the
point of that test.
No production-code change; the helper exercises the same real
_resolve_chat_argv_async -> asyncio.to_thread -> lock -> re-raise chain.
---------
Co-authored-by: draihan <draihan@student.ubc.ca>
* fix(desktop): show Hindsight memory provider
* feat(desktop): configure Hindsight memory provider
* fix(desktop): limit Hindsight modes to supported setup
* refactor(desktop): generic memory-provider config surface
Replace the bespoke Hindsight settings surface with a declarative,
schema-driven path so adding a memory provider is pure declaration —
no per-provider page, conditional, or endpoint.
- memory_providers.py: declarative registry. Each provider lists its
fields {key, label, kind, default, options, secret-vs-plain}. Hindsight's
mode is a select(cloud, local_external), so rejecting local_embedded
falls out of generic enum validation instead of a hand-written check.
- One generic endpoint pair GET/PUT /api/memory/providers/{name}/config.
GET returns declared fields + current values (secrets only as is_set,
never read back); PUT validates selects against their options, writes
plain fields to the provider config file, secrets to the env store,
and flips memory.provider.
- ProviderConfigPanel renders straight from the schema, replacing
hindsight-settings.tsx and the memory.provider === 'hindsight'
conditional in config-settings.tsx — same pattern as
toolset-config-panel.tsx off env_vars.
Scoped to memory providers; storage layout is unchanged so the runtime
Hindsight plugin reads the same config.json / HINDSIGHT_API_KEY / provider
keys as before. Tests cover the registry, endpoint behavior (defaults,
write+secret, select rejection, unknown provider, secret-never-returned),
and the generic panel.
DELETE /api/sessions/{id} was the only session endpoint that didn't
resolve the id (detail, messages, rename, export all call
resolve_session_id) and 404'd when the row was already gone. The desktop
optimistically removes the sidebar row, then RESTORES it and shows the
error on any failure — so deleting a session that had just been reaped
(empty-session hygiene) or removed by a concurrent client resurrected a
ghost row and surfaced "session not found". /goal + auto-compression churn
leaves transient empty rows that race the sidebar snapshot, which is the
exact "I deleted the empty one and got 'session not found'" report.
Resolve exact ids / unique prefixes, and treat an already-absent session
as an idempotent success — DELETE's contract is "ensure it's gone". This
mirrors the bulk-delete endpoint, which already treats ghost ids as
success.
Tests: deleting an absent id is idempotent (200, not 404); delete resolves
a unique prefix; a real session still deletes.
The dashboard's /api/skills/hub/install (and the new-profile hub_skills
path) spawned `hermes skills install <id>` with stdin=DEVNULL but
without --yes. do_install()'s 'Confirm [y/N]' prompt hit EOF, defaulted
to 'n', and printed 'Installation cancelled.' into a background log the
user never sees — every dashboard install no-opped.
Pass --yes on both spawn sites, matching the uninstall endpoint which
already passed --yes. The dashboard install button is the explicit user
consent, same as the TUI/slash-command skip_confirm rationale.
Repro: spawned the exact argv with stdin=DEVNULL against a temp
HERMES_HOME — without --yes it cancels, with --yes the skill installs.
The desktop "Local / custom endpoint" onboarding never collected an API
key and /api/model/set silently dropped one, so an auth-gated endpoint
(e.g. a hosted vLLM behind a key) could never enumerate models — and
Settings' "Set up custom endpoint" routed `custom` into a non-existent
OAuth flow, booting the user back to the first screen (the reported loop).
Backend (web_server.py):
- /api/providers/validate accepts an optional api_key and sends it as a
Bearer header when probing a custom endpoint's /v1/models.
- /api/model/set accepts api_key, persists it to model.api_key (same
switch/preserve lifecycle as base_url), and registers a named
custom_providers entry via _save_custom_provider — matching the
`hermes model` CLI flow so the endpoint shows up as a ready picker row.
Desktop:
- ApiKeyForm shows an optional API key field for the local/custom option;
the key is threaded through saveOnboardingLocalEndpoint → validate +
setModelAssignment.
- New onboarding `localEndpoint` intent + startManualLocalEndpoint(); the
Settings "Set up custom endpoint" button now opens the local-endpoint
form (URL + key) instead of the OAuth dead-end.
- Added localApiKeyPlaceholder i18n key (en + types + zh).
Tests: api_key lifecycle on _apply_main_model_assignment, key persistence
+ custom_providers registration on /api/model/set, Bearer-header probe;
onboarding store forwards + persists the key.
A 'recipe' is a one-place definition of an automation that every surface
renders natively. The slot schema (cron/recipe_catalog.py) is the single
source of truth; four renderers consume it, and all paths end at the same
cron.jobs.create_job — no second job engine.
Form where there's a screen, conversation where there's a chat line:
- Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each
recipe's typed slots as a form (time-picker, enum dropdown, free-text);
submit POSTs /api/cron/recipes/instantiate which fills + creates the job.
- CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's
fields, or fills + creates from a pasted 'key slot=val' command. The shared
handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so
the agent can ask a targeted follow-up.
- Docs: a generated Cron Recipes catalog page (website, .mdx + React cards)
shows each recipe with a copy-paste command and a 'Send to App' button.
- Desktop: a hermes:// URL scheme (Electron single-instance lock +
setAsDefaultProtocolClient + open-url/second-instance) routes
hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled.
Typed slots (time/enum/text/weekdays) with defaults: users never type raw
cron — recipes parameterize time-of-day and weekday sets and translate to
cron expressions; a free-text 'schedule' slot is the full-flexibility escape
hatch. Consent-first throughout: nothing schedules without an explicit submit
or send.
Core:
- cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes,
recipe_form_schema / recipe_slash_command / recipe_deeplink /
recipe_catalog_entry renderers, fill_recipe (validate + translate to
create_job kwargs).
- hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI +
gateway never drift). CommandDef + dispatch in commands.py / cli.py /
gateway/run.py.
Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate
(web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage,
api.ts methods + types.
Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue,
preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller
composer prefill, electron-builder protocols key).
Docs: extract-cron-recipes.py generator wired into prebuild.mjs,
cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry.
Generated index json gitignored like skills.json.
Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command
handler/generator) + 5 web_server endpoint tests. E2E verified end to end:
slot fill -> create_job -> persisted job with correct schedule/deliver/origin.
Two beta-reported dashboard bugs:
1. Models page: 'Use as -> Main model' on an analytics card sends
entry.provider, which falls back to the model's VENDOR prefix
(modelVendor('anthropic/claude-opus-4.6') == 'anthropic') when the
session row has no billing_provider. That persisted
provider: anthropic + default: anthropic/claude-opus-4.6 — a
vendor-prefixed OpenRouter slug on the NATIVE Anthropic provider.
New sessions then 400 against api.anthropic.com and the user reads
it as 'changing models does nothing'. Unknown vendors (moonshotai,
poolside, ...) were worse: a provider that can never resolve
credentials.
Fix: _normalize_main_model_assignment() at the single write
chokepoint — maps non-provider vendor names back to the user's
current aggregator (else openrouter), and runs the model through
normalize_model_for_provider() so the persisted name matches the
target provider's API format. Wired into both /api/model/set and
the profile-scoped _write_profile_model.
2. System page: 'Restore from backup' spawns hermes import with
stdin=DEVNULL, so the CLI's interactive 'Continue? [y/N]' overwrite
prompt hits EOF and auto-aborts whenever a config already exists
(always, when the dashboard is running). Fix: ConfirmDialog in the
dashboard owns the consent, then the endpoint passes --force so the
restore runs non-interactively.
Validated live: dashboard on a temp HERMES_HOME, repro'd both failure
modes pre-fix (vendor-slug write verified via config.yaml + tui
session.create; import 'Aborted.' in action-import.log), then verified
post-fix (normalized writes, modal -> --force -> restored marker file).
* feat(dashboard): unify multi-profile management — one machine dashboard, global profile switcher
The dashboard becomes a machine-level management surface with one
write-target selector, replacing per-profile dashboard fragmentation.
Backend:
- profile param (query or body) on /api/config (get/put/raw), /api/env
(get/put/delete/reveal), /api/mcp/servers (list/add/remove/test/enabled),
/api/mcp/catalog (list/install), /api/model/info, /api/model/set —
all scoped through the existing _profile_scope() context manager
- model/set restructured: expensive-model warning (await) runs before the
scope; the config write runs sync inside the scope in a worker thread
- MCP catalog installs + git-bootstrap entries spawn 'hermes -p <profile>'
- chat PTY: ?profile= on /api/pty points the child's HERMES_HOME at the
profile dir (its own gateway subprocess, config/skills/memory/state.db
all profile-bound); in-process gateway attach skipped when scoped
CLI launch unification:
- '<profile> dashboard' routes to the machine dashboard: attach (open
browser at ?profile=) when one is listening, else re-exec pinned to the
default profile with --open-profile preselecting the launcher
- --isolated preserves the old dedicated per-profile server behavior
- start_server(initial_profile=...) appends ?profile= to the auto-open URL
Frontend:
- ProfileProvider + sidebar ProfileSwitcher: ONE global selector, URL-
persisted (?profile=), mirrored into fetchJSON which auto-appends the
param to the scoped endpoint families (explicit params win)
- app-wide amber banner names the managed profile
- SkillsPage's page-local selector (from the skills-scoping PR) folded
into the global context — single source of truth
- ChatPage threads the scope into the PTY WS URL; switching profiles
remounts the terminal into a fresh scoped session
Omitted profile keeps legacy behavior everywhere.
* docs(dashboard): document machine-level multi-profile management
- web-dashboard.md: 'Managing multiple profiles' section (switcher, URL
deep-links, unified launch, --isolated, scoped Chat, what stays
per-profile) + --isolated in the options table
- profiles.md: 'From the dashboard' subsection + set-as-active vs
switcher clarification
- cli-commands.md: --isolated flag + profile-alias launch example
* fix(dashboard): address profile-unification review findings
Review findings (dev review on PR #44007):
1. HIGH — stale page state on profile switch: pages load data on mount
and didn't consume the profile scope, so a page opened under profile A
kept showing A's state while writes silently targeted the newly
selected B. Fixed structurally: ProfileKeyedRoutes wraps the routed
page tree and keys it by the selected profile, remounting every page
(fresh state + refetch) on switch. ChatPage keeps its own remount
(channel keyed on scopedProfile).
2. HIGH — /api/model/auxiliary read was unscoped while /api/model/set
wrote scoped (Models page could show default's aux pins while editing
worker's). Endpoint now takes profile + _profile_scope, added to
PROFILE_SCOPED_PREFIXES, HTTPException re-raise so ghost profiles 404
instead of 500. Regression test asserts read/write symmetry with
differing worker/default aux config.
3. MEDIUM — tools post-setup spawned unscoped from the profile-aware
drawer. Now spawns 'hermes -p <profile> tools post-setup <key>'
(same mechanism as hub installs); drawer threads its profile prop.
Most hooks install machine-level artifacts where the scope is inert,
but hooks reading config/env now see the drawer's HERMES_HOME.
4. LOW — ty warnings: env Optional asserts before subscript/membership,
fastapi import replaced with web_server.HTTPException re-use.
298 tests green across the four affected suites; tsc -b + vite build
green; aux scoping E2E-verified with real imports.
* fix(dashboard): address second profile-unification review (gille)
1. BLOCKER — profile scope dropped on sidebar navigation: ProfileProvider
derived the selection from the current URL, and nav links are bare
paths, so clicking Config from /skills?profile=worker silently reset
the write target. State is now the source of truth; an effect
re-asserts ?profile= onto the new location after every navigation
(URL stays a synchronized projection for deep links/refresh), and an
incoming URL param (e.g. 'Manage skills & tools' links) still wins.
2. BLOCKER — /api/model/options unscoped while model/set wrote scoped:
the picker context (current model/provider, custom providers,
per-profile .env auth state) now loads inside _profile_scope; added
to PROFILE_SCOPED_PREFIXES. Test: a worker-only current-model pin
appears in the scoped payload and not the unscoped one.
3. BLOCKER — MCP test-server probe escaped the scope after the config
read: the probe now re-enters _profile_scope inside the worker thread
so env-placeholder expansion resolves against the selected profile's
.env. Known limit (documented): the probe's dedicated MCP event-loop
thread doesn't inherit the contextvar (OAuth token paths). Test
asserts get_hermes_home() inside the probe == the worker profile dir.
4. BLOCKER — broad excepts swallowed unknown-profile 404s: /api/model/info
degraded to 200-with-empty-model-info and /api/mcp/catalog to a
silently-empty catalog. Both re-raise HTTPException; 404 regression
tests added for info/options/catalog.
Polish: scope banner clears the fixed mobile header (mt-14 lg:mt-0);
--open-profile hidden via argparse.SUPPRESS (internal re-exec flag);
attach-path test now asserts the opened ?profile= URL.
(Stale-page-state + /api/model/auxiliary findings from this review were
already fixed in 92bcd1568 — the review ran against e600f6951.)
35 tests in the two new suites + 274 in the adjacent ones, all green;
tsc -b + vite build green; scoping E2E-verified with real imports.
* docs(dashboard)+fix: self-review pass — Profiles page section, REST profile-param tip, body-beats-query precedence
Docs:
- web-dashboard.md: add the missing 'Profiles' subsection to Pages
(cards, create/builder, manage-skills jump, set-as-active vs switcher
distinction, editors); REST API section gets a profile-scoped-endpoints
tip documenting ?profile= / body profile / 404 semantics / /api/pty
- (profiles.md + cli-commands.md were already updated in e600f6951)
Precedence fix: scoped endpoints taking BOTH a query param and a body
field now resolve body.profile first. The SPA's fetchJSON injects the
query param from the GLOBAL switcher; an explicit body.profile (e.g.
Profile Builder flows writing into a specific new profile) is the more
specific intent and must not be overridden by whatever the sidebar
happens to be set to. Matches the documented 'explicit beats global'
contract in api.ts.
Verified: 304 tests green across the four suites; tsc -b + vite build
green; docusaurus build green (only pre-existing broken-link warnings,
none from this PR's pages).
* feat(profiles): extend create endpoint for full profile-builder (model + MCPs + skills)
Backend foundation for the dashboard profile builder. Extends POST /api/profiles
to accept, in one call, everything a profile needs beyond name/clone:
- mcp_servers[] -> written into the new profile's config.yaml
- keep_skills[] -> replace-semantics: disable every seeded skill not kept
- hub_skills[] -> async install via 'hermes -p <name> skills install <id>'
All applied best-effort AFTER the profile dir exists, so a hiccup in any one
never 500s the create. Model/MCP/keep-skills writes are profile-scoped via the
HERMES_HOME context override (same mechanism as the existing _write_profile_model).
Hub installs go through a subprocess scoped with -p because skills_hub.SKILLS_DIR
is import-time-bound and the runtime override can't redirect it.
Adds two helpers (_write_profile_mcp_servers, _disable_unselected_skills) and a
TestClient test asserting all four paths land in the NEW profile's config and
the hub spawn is scoped to it. Design doc at docs/design/profile-builder.md.
* feat(dashboard): full-featured profile builder page
Adds a dedicated /profiles/new builder that composes everything a profile
needs into one stepped create flow, reusing the existing Models/Skills/MCP
data paths instead of duplicating them:
- Identity name + description
- Model provider+model picker (api.getModelOptions)
- Skills keep-which-built-in/optional (replace semantics, default = full
bundle) + skills-hub search/add (api.getSkills, searchSkillsHub)
- MCPs add HTTP/stdio servers inline
- Review blueprint -> single POST /api/profiles create
Nothing writes until Create; the one call commits model+MCPs+skill selection
and spawns hub-skill installs (reported in the success toast). ProfilesPage
header gets a 'Build' button (full builder) alongside 'Create' (quick modal).
Route is page-only (not in the sidebar nav). Verified with vite build (2258
modules, green).
Follow-ups to the salvaged Telegram QR onboarding auto-restart:
- _spawn_gateway_restart() reuses a live in-flight 'hermes gateway restart'
child instead of spawning a second racing one (stale cached frontend +
new backend both requesting a restart, or restart-button double-click).
Both /api/gateway/restart and the onboarding apply path go through it.
- ChannelsPage polls /api/actions/gateway-restart/status after a
server-initiated restart and surfaces a non-zero exit (e.g. systemd
linger missing) via the manual-restart banner, since restart_started
only means the child spawned.
- Test for the reuse path + _ACTION_PROCS isolation in existing tests.
Rebased onto current main and re-ported across the restructured
surfaces: model flows now thread confirm_provider/base_url/api_key
through hermes_cli/model_setup_flows.py, the Discord picker lives in
plugins/platforms/discord/adapter.py, and the web dashboard picker
applies chat-mode switches via config.set so the expensive-model
confirmation can ride the response.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>