- Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or
interior comma (which the gateway silently tolerates in
gateway/platforms/slack.py) is no longer rejected at the dashboard.
- Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant
and note it stays in sync with the frontend SLACK_MEMBER_ID_RE.
- Add a regression test for the trailing-comma case.
The new SLACK_ALLOWED_USERS validation rejected '*', but the Slack gateway
honors '*' as an allow-all wildcard (gateway/platforms/slack.py DM auth,
slash-confirm, and approval-button paths). Accept '*' as a valid list entry
in both the API validator and the dashboard form so a value the runtime
honors is no longer blocked at setup.
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.
- Thread refresh through model.options (RPC + REST /api/model/options) ->
build_models_payload -> list_authenticated_providers, which calls
clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
sync icon). Normal opens leave refresh=false to stay snappy on the cache.
Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.
* fix(dashboard): resolve chat TUI argv off event loop
Dashboard chat now resolves its TUI launch command off the
FastAPI/WebSocket event loop. The resolver can run `npm install` /
`npm run build` through `_make_tui_argv()`, and doing that synchronously
in `/api/pty` can block proxy keepalives and other dashboard WebSocket
work long enough for reverse-proxy deployments to drop the chat
connection.
This keeps the current TUI build policy intact: normal production
launches still run the correctness-first `npm run build` path, while
`HERMES_TUI_DIR` remains the prebuilt/no-build path for distros and
containers. The change only moves the potentially slow resolver work to
a worker thread for the dashboard chat path, serialized by an
`asyncio.Lock` so concurrent chat tabs preserve one-build-at-a-time
behavior. `SystemExit` (node/npm missing) and the profile `HTTPException`
path still propagate cleanly through `asyncio.to_thread()`.
Salvaged from #26124 — rebased onto current main. The async wrapper now
threads the `profile` parameter that `_resolve_chat_argv` gained on main
since the PR was opened, so cross-profile chat is preserved.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
* chore: add 0xdany to AUTHOR_MAP
* fix(dashboard): bind chat-argv lock to app.state; cover error propagation
Self-review hardening on top of the salvaged fix:
- Move `_chat_argv_lock` from a module-level `asyncio.Lock()` onto
`app.state` (initialised in `_lifespan`, lazy fallback via
`_get_chat_argv_lock`), mirroring `event_lock`. A module-level
`asyncio.Lock()` binds to whatever event loop is active at import time,
which is the exact pattern `_get_event_state`'s docstring warns against
(breaks across TestClient instances / uvicorn reloads). This keeps the
lock on the running loop.
- Add two tests exercising the real `_resolve_chat_argv_async` →
`asyncio.to_thread` → lock → re-raise chain: `SystemExit` (node/npm
missing) and `HTTPException` (invalid profile) both propagate out of the
worker thread and are caught by `pty_ws`'s existing handlers. The prior
tests mocked `asyncio.to_thread` away and never covered this path.
* test(dashboard): dedupe pty error-propagation tests; assert close code
simplify-code cleanup pass on the salvage stack:
- Extract the shared scaffolding of the two pty_ws error-propagation tests
into `_assert_pty_propagates`, keeping the two tests as distinct contracts
for the `except SystemExit` and `except HTTPException` arms.
- Assert the stable WebSocket close code (1011) instead of relying solely on
the user-facing "Chat unavailable" notice wording — a behavior contract per
the AGENTS.md "behavior contracts over snapshots" rule, robust to notice
rewording. The detail substring ("unknown profile") is still checked for the
HTTPException case since proving the detail survives the thread hop is the
point of that test.
No production-code change; the helper exercises the same real
_resolve_chat_argv_async -> asyncio.to_thread -> lock -> re-raise chain.
---------
Co-authored-by: draihan <draihan@student.ubc.ca>
* fix(desktop): show Hindsight memory provider
* feat(desktop): configure Hindsight memory provider
* fix(desktop): limit Hindsight modes to supported setup
* refactor(desktop): generic memory-provider config surface
Replace the bespoke Hindsight settings surface with a declarative,
schema-driven path so adding a memory provider is pure declaration —
no per-provider page, conditional, or endpoint.
- memory_providers.py: declarative registry. Each provider lists its
fields {key, label, kind, default, options, secret-vs-plain}. Hindsight's
mode is a select(cloud, local_external), so rejecting local_embedded
falls out of generic enum validation instead of a hand-written check.
- One generic endpoint pair GET/PUT /api/memory/providers/{name}/config.
GET returns declared fields + current values (secrets only as is_set,
never read back); PUT validates selects against their options, writes
plain fields to the provider config file, secrets to the env store,
and flips memory.provider.
- ProviderConfigPanel renders straight from the schema, replacing
hindsight-settings.tsx and the memory.provider === 'hindsight'
conditional in config-settings.tsx — same pattern as
toolset-config-panel.tsx off env_vars.
Scoped to memory providers; storage layout is unchanged so the runtime
Hindsight plugin reads the same config.json / HINDSIGHT_API_KEY / provider
keys as before. Tests cover the registry, endpoint behavior (defaults,
write+secret, select rejection, unknown provider, secret-never-returned),
and the generic panel.
DELETE /api/sessions/{id} was the only session endpoint that didn't
resolve the id (detail, messages, rename, export all call
resolve_session_id) and 404'd when the row was already gone. The desktop
optimistically removes the sidebar row, then RESTORES it and shows the
error on any failure — so deleting a session that had just been reaped
(empty-session hygiene) or removed by a concurrent client resurrected a
ghost row and surfaced "session not found". /goal + auto-compression churn
leaves transient empty rows that race the sidebar snapshot, which is the
exact "I deleted the empty one and got 'session not found'" report.
Resolve exact ids / unique prefixes, and treat an already-absent session
as an idempotent success — DELETE's contract is "ensure it's gone". This
mirrors the bulk-delete endpoint, which already treats ghost ids as
success.
Tests: deleting an absent id is idempotent (200, not 404); delete resolves
a unique prefix; a real session still deletes.
The interactive model pickers (Desktop REST API, TUI model.options, CLI
/model) were hard-capped at max_models=50, which truncated large provider
catalogs like Kilo Gateway (336 models) to just 50 entries. This made
most models undiscoverable via the picker search box.
Changes:
- Change build_models_payload() default from max_models=50 to None (unlimited)
- Change list_authenticated_providers() default from max_models=8 to None
- Change list_picker_providers() default from max_models=8 to None
- Fix all [:max_models] slicing to handle None as 'no limit'
- Remove max_models=50 from 5 interactive picker callers:
* web_server.py: get_model_options (Desktop /api/model/options)
* web_server.py: get_recommended_default_model
* model_switch.py: prewarm_picker_cache_async
* tui_gateway/server.py: model.options JSON-RPC
* cli.py: HermesCLI model picker
- Telegram/Discord inline keyboard picker (gateway/slash_commands.py)
still passes max_models=50 explicitly — unchanged behavior.
The total_models field was already in the response payload and is now
meaningful since models.length == total_models for interactive pickers.
Fixes#48279
The dashboard MCP catalog only showed name/description/transport and a
non-clickable source. Users couldn't see what an entry connects to or runs
before installing — the exact detail the docs trust model tells them to vet.
- /api/mcp/catalog now returns transport target (url, or command+args),
auth_type, git install source/ref + bootstrap commands, default-enabled
tool hint, and post-install guidance per entry.
- McpPage renders the endpoint URL (http) or command+args (stdio), the git
install source/ref, a collapsible bootstrap-commands list, setup notes,
and the source as a clickable link when it's a URL.
- Docs: drop the 'uv pip install -e .[mcp]' quick-start step (Hermes does
not support pip installs; MCP ships with the standard install) and note
the dashboard now surfaces this detail.
- Strengthen the catalog endpoint test to assert the new inspection fields.
Follow-up to #47663 (streaming multipart upload), fixing two issues that
landed with it.
1. Temp file leaked on client disconnect. The streaming upload endpoint's
except chain caught only HTTPException / PermissionError / OSError — all
Exception subclasses. asyncio.CancelledError, raised when a browser aborts
a large upload mid-stream (the exact NS-501 scenario), is a BaseException,
so it bypassed every except clause and reached a finally that only closed
the file handle and never unlinked the temp file. Every aborted large
upload orphaned a partial `.{name}.*.upload` file (up to ~100 MB) in the
target directory. Cleanup now lives in finally, keyed on a `renamed`
success flag, so the temp file is removed on every non-success exit
including BaseException paths. Added test_stream_upload_cleans_temp_on_cancellation,
which fails on the pre-fix code (leaks the temp file) and passes with the fix.
2. python-multipart pinned to ==0.0.27 instead of ==0.0.20. The package was
already resolved at 0.0.27 transitively (via daytona) before #47663; the
explicit ==0.0.20 pin in the [web] extra and the tool.dashboard lazy-install
set downgraded it. Bumped both to ==0.0.27 and regenerated with `uv lock`,
keeping the lockfile coherent. The base dependency stays >=0.0.9,<1.
* fix(dashboard): stream file uploads via multipart instead of base64 JSON
The dashboard file manager uploaded files (including backup/restore zip
archives) by reading them client-side with FileReader.readAsDataURL and
POSTing a base64 data URL inside a JSON body to /api/files/upload. For a
large backup this (a) inflates the payload ~33%, (b) buffers the whole
file plus its decoded copy in memory, and (c) reliably trips an upstream
proxy body-size/timeout limit, surfacing as a 502 with the upload
appearing to hang indefinitely (NS-501). Dashboard-only hosted users have
no shell fallback to place the archive, so backup restore was unusable.
Add a streaming multipart endpoint POST /api/files/upload-stream
(UploadFile + Form) that reads the request body in 1 MiB chunks straight
to a sibling temp file, enforces the existing 100 MB size cap as it
streams (413 on overflow, before buffering the whole file), and
atomically renames into place so a partial/aborted/over-limit upload
never clobbers an existing file. The frontend api.uploadFile now sends
multipart/form-data (raw bytes, no base64, browser-set boundary) and
FilesPage passes the File object directly; the dead readAsDataUrl helper
is removed. The legacy base64 JSON endpoint stays for backward compat.
FastAPI's UploadFile/Form require python-multipart, which is NOT pulled in
by fastapi itself, so it is added to the base deps, the [web] extra, and
the tool.dashboard lazy-install set (kept in sync).
Validated: 5 new endpoint tests (roundtrip, multi-chunk >1 MiB,
over-limit 413 without clobbering + no temp-file leak, overwrite=false
conflict, forced-root traversal containment); existing base64 tests still
pass; web typecheck + vite build clean; and a real uvicorn server E2E
(5 MB multipart upload -> HTTP 200 in 0.21s, exact byte match) plus a
30 MB TestClient roundtrip confirm constant-memory streaming end to end.
Reported via beta (NS-501).
* build(deps): regenerate uv.lock for python-multipart (NS-501)
CI ran uv lock --check / uv sync --locked which failed because the
python-multipart dependency add was not reflected in uv.lock. Regenerate
the lockfile (resolves to 0.0.20, matching the [web] extra pin) after
merging current main.
get_runtime_status_running_pid() validates liveness with a local
os.kill(pid, 0) probe. In /api/status the runtime record can be the
REMOTE health-probe body (cross-container), whose PID belongs to another
host and is display-only — probing it locally is wrong and trips the
test live-system guard (os.kill on a PID outside the test subtree).
Run the fallback only against the local read_runtime_status() record.
_profile_scope swaps process-global skills_tool/skill_manager module
attrs under an RLock; /api/status holds that scope across the
run_in_executor remote-health probe await, so a concurrent
/api/skills?profile=X request can cross-restore the status profile's
skill dir on its finally. Add _config_profile_scope (contextvar-only,
task-local, await-safe) and use it for status, which only resolves
get_hermes_home() at call time for config/env/gateway state and never
needs the skills-module globals.
External providers (Claude Code) store creds outside Hermes, so the
disconnect API refuses them. The backend now hands the GUI a per-OS
`disconnect_command` that clears the credential the same way the CLI's
logout does (macOS Keychain entry + ~/.claude/.credentials.json), and
the misleading "use claude setup-token" hint is corrected.
Settings → Providers offers a Disconnect button for these: it confirms,
leaves Settings, and runs the removal command in the embedded terminal
via a new runInTerminal() (queues onto $terminalInjection; the terminal
pane flushes and clears it once its session is live). The expanded list
also gets its own "Other providers" header so it no longer reads as
grouped under "Connected". API-managed providers keep the one-click
(trash) disconnect.
On a remote gateway connection, agent-written files live on the gateway
host, not the desktop's disk, so the Artifacts view's file:// hrefs failed
("Invalid external URL") and image thumbnails broke.
Make mediaExternalUrl() remote-aware in one place: in remote mode it
rewrites gateway-local paths to GET /api/files/download (a new endpoint
that streams the file as a Content-Disposition: attachment). The artifacts
view now resolves through it, and so do the existing chat-media and
generated-image callers, for free.
The download endpoint stays auth-gated; auth_middleware additionally
accepts the session token as a ?token= query param for this one path so a
shell/browser-opened download (which can't set the session header) still
authenticates — the same query-token tradeoff as the /api/pty WebSocket.
It is NOT added to PUBLIC_API_PATHS.
Salvages #46663 (which carried ~19k lines of CRLF noise and made the
endpoint public). Reimplemented on a clean LF base with the security hole
closed and tests added.
Co-authored-by: qingshan89 <qs2816661685@gmail.com>
The dashboard's public /api/status liveness endpoint is in PUBLIC_API_PATHS
and bypasses dashboard auth, yet it returned absolute hermes_home,
config_path, env_path, the gateway PID, and the internal gateway health URL.
That exceeds the shape its own allowlist documents as public ("version,
gateway state, active session count, and the dashboard auth-gate shape. No
bodies, no session content, no secrets"), leaking deployment recon to any
unauthenticated caller on a network-exposed (gated) bind.
Withhold host-local detail unless the bind is loopback / --insecure, where
the dashboard is local-only and the caller is already inside the trust
envelope -- the same split should_require_auth draws. The NAS liveness probe
and the auth-gate badge are unaffected.
Adds invariant tests for both modes (gated withholds, loopback keeps).
The dashboard's /api/skills/hub/install (and the new-profile hub_skills
path) spawned `hermes skills install <id>` with stdin=DEVNULL but
without --yes. do_install()'s 'Confirm [y/N]' prompt hit EOF, defaulted
to 'n', and printed 'Installation cancelled.' into a background log the
user never sees — every dashboard install no-opped.
Pass --yes on both spawn sites, matching the uninstall endpoint which
already passed --yes. The dashboard install button is the explicit user
consent, same as the TUI/slash-command skip_confirm rationale.
Repro: spawned the exact argv with stdin=DEVNULL against a temp
HERMES_HOME — without --yes it cancels, with --yes the skill installs.
Replace the PortPool-based port reservation system (9120-9199 range) with OS-assigned ephemeral ports via --port 0.
Before: Desktop probed a hardcoded port range, reserved ports in-process to close TOCTOU races, and passed the chosen port to the dashboard via CLI arg.
After: Desktop spawns dashboard with --port 0, parses the actual port from a stdout announcement line (HERMES_DASHBOARD_READY port=<N>), and uses that for WebSocket connections.
Changes:
- web_server.py: add --port 0 support with SO_REUSEADDR pre-bind + announcement; add EADDRINUSE preflight for explicit ports
- main.cjs: remove PortPool, PORT_FLOOR/CEILING, pickPort(), isPortAvailable(); add waitForDashboardPort() stdout parser
- Delete port-pool.cjs and port-pool.test.cjs (106 lines removed)
Net effect: eliminates the entire TOCTOU-mitigation reservation infrastructure and arbitrary port range constraints. OS handles port allocation natively.
Two halves of the same community report (dashboard Profile Builder):
1. A fresh dashboard/CLI-created profile got no .env file unless cloned,
so it silently inherited API keys and messaging tokens from the shell
environment / root install. create_profile() now seeds a placeholder
.env (0600) for non-clone profiles, matching the SOUL.md seeding.
2. The Channels endpoints (/api/messaging/platforms GET/PUT/test) were
not profile-scoped: they read/wrote the dashboard process's own .env
via load_env()/save_env_value() regardless of the global profile
switcher. They now accept the standard optional profile param (body
beats query on the PUT, matching other scoped writes) and run inside
_profile_scope(). When scoped, the payload no longer falls back to
os.environ or load_gateway_config()'s env-override layer — both carry
the ROOT install's credentials and would misreport them as the
profile's. /api/messaging/platforms added to PROFILE_SCOPED_PREFIXES
so the sidebar switcher scopes the Channels page automatically.
The desktop "Local / custom endpoint" onboarding never collected an API
key and /api/model/set silently dropped one, so an auth-gated endpoint
(e.g. a hosted vLLM behind a key) could never enumerate models — and
Settings' "Set up custom endpoint" routed `custom` into a non-existent
OAuth flow, booting the user back to the first screen (the reported loop).
Backend (web_server.py):
- /api/providers/validate accepts an optional api_key and sends it as a
Bearer header when probing a custom endpoint's /v1/models.
- /api/model/set accepts api_key, persists it to model.api_key (same
switch/preserve lifecycle as base_url), and registers a named
custom_providers entry via _save_custom_provider — matching the
`hermes model` CLI flow so the endpoint shows up as a ready picker row.
Desktop:
- ApiKeyForm shows an optional API key field for the local/custom option;
the key is threaded through saveOnboardingLocalEndpoint → validate +
setModelAssignment.
- New onboarding `localEndpoint` intent + startManualLocalEndpoint(); the
Settings "Set up custom endpoint" button now opens the local-endpoint
form (URL + key) instead of the OAuth dead-end.
- Added localApiKeyPlaceholder i18n key (en + types + zh).
Tests: api_key lifecycle on _apply_main_model_assignment, key persistence
+ custom_providers registration on /api/model/set, Bearer-header probe;
onboarding store forwards + persists the key.
* docs: finish Automation Blueprints terminology rebrand
Replace leftover "Automation Templates" wording from the Cron Recipes
rebrand, rename the copy-paste cookbook guide to Automation Recipes, and
point the marketing gallery link at the blueprints catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: use Automation Blueprints instead of Recipes in guide
Rename the cookbook guide from automation-recipes to
automation-blueprints so sidebar and copy match the product term.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: rename automation-blueprints-catalog to automation-blueprints
Drop the -catalog suffix from the reference page slug and title, and
move the copy-paste cookbook to automation-blueprint-examples so the
main Automation Blueprints doc is unambiguous.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Revert "docs: rename automation-blueprints-catalog to automation-blueprints"
This reverts commit 605f1eeab5.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Review fixes for the Cron Recipes stack before release:
- hydration-move: */90 in the cron minute field silently wraps to hourly
(croniter-verified) — 90/120-minute options never fired at their stated
cadence. Replaced with an hour-field step (0 9-17/2 * * 1-5) and an
interval_hours slot whose options (1/2/3h) all fire as labeled.
- fill_recipe: reject unknown slot names. A typo'd 'tiem=07:15' used to
silently create the job at the 08:00 default; now it 422s on the dashboard
form and errors on the slash/deep-link paths with the valid slot list.
- deliver slot: non-strict enum (options are suggestions, scheduler
validates downstream) so slack/whatsapp/etc. users aren't locked out;
GET /api/cron/recipes rewrites its options from cron_delivery_targets()
so the dashboard form only offers configured platforms; help text no
longer claims dashboard-created jobs deliver to 'the chat you set this
up from' (the endpoint strips origin — they go to the home channel).
- gateway: success/accept messages no longer point at /cron (cli_only);
surface-aware hint instead. Conversational fill now sends the
'Setting up X — I'll ask you a couple of things…' ack before the agent
turn, matching the CLI experience.
- important-mail catalog entry: reference the urgency classifier by module
path (python3 -m cron.scripts.classify_items) instead of baking an
absolute host path into the job prompt — stale after relocation and
nonexistent on remote terminal backends. cron/scripts is now a real
package and ships in the wheel (pyproject packages.find).
- export_recipe: interval schedules round-trip again — parse_schedule
stores 'minutes' but the renderer only read 'seconds', so every interval
job exported as the silent '0 9 * * *' fallback.
- skills_hub install: say so when a recipe suggestion is dropped
(latched dedup or pending cap) instead of printing nothing.
Targeted tests: 58 cron/recipe + 261 web_server pass; E2E-validated all
14 recipes fill+parse, hydration cadences via croniter, typo rejection on
slash + endpoint paths, surface-aware hints, and interval export round-trip.
A 'recipe' is a one-place definition of an automation that every surface
renders natively. The slot schema (cron/recipe_catalog.py) is the single
source of truth; four renderers consume it, and all paths end at the same
cron.jobs.create_job — no second job engine.
Form where there's a screen, conversation where there's a chat line:
- Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each
recipe's typed slots as a form (time-picker, enum dropdown, free-text);
submit POSTs /api/cron/recipes/instantiate which fills + creates the job.
- CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's
fields, or fills + creates from a pasted 'key slot=val' command. The shared
handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so
the agent can ask a targeted follow-up.
- Docs: a generated Cron Recipes catalog page (website, .mdx + React cards)
shows each recipe with a copy-paste command and a 'Send to App' button.
- Desktop: a hermes:// URL scheme (Electron single-instance lock +
setAsDefaultProtocolClient + open-url/second-instance) routes
hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled.
Typed slots (time/enum/text/weekdays) with defaults: users never type raw
cron — recipes parameterize time-of-day and weekday sets and translate to
cron expressions; a free-text 'schedule' slot is the full-flexibility escape
hatch. Consent-first throughout: nothing schedules without an explicit submit
or send.
Core:
- cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes,
recipe_form_schema / recipe_slash_command / recipe_deeplink /
recipe_catalog_entry renderers, fill_recipe (validate + translate to
create_job kwargs).
- hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI +
gateway never drift). CommandDef + dispatch in commands.py / cli.py /
gateway/run.py.
Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate
(web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage,
api.ts methods + types.
Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue,
preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller
composer prefill, electron-builder protocols key).
Docs: extract-cron-recipes.py generator wired into prebuild.mjs,
cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry.
Generated index json gitignored like skills.json.
Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command
handler/generator) + 5 web_server endpoint tests. E2E verified end to end:
slot fill -> create_job -> persisted job with correct schedule/deliver/origin.
The Config page read config_path from /api/status, which is machine-global
and always reports the profile the dashboard process was started under.
After switching profiles with the global switcher, the header kept showing
the old profile's path (e.g. /root/.hermes/profiles/worker_1/config.yaml)
even though reads/writes correctly targeted the new profile.
Fix: /api/config/raw now returns the resolved path alongside the YAML
(resolved inside _profile_scope, so it follows ?profile=). ConfigPage
prefers that scoped path and only falls back to /api/status for old
servers. ProfileKeyedRoutes already remounts the page on switch, so the
header refreshes immediately.
Two dashboard fixes:
1. The 'Anthropic API Key' OAuth catalog entry's status fn read
~/.claude/.credentials.json (which has its own dedicated claude-code
entry) and never checked ANTHROPIC_API_KEY at all. It now checks the
Hermes PKCE file, then the registry env-var order (ANTHROPIC_API_KEY
-> ANTHROPIC_TOKEN -> CLAUDE_CODE_OAUTH_TOKEN) via get_env_value, so
keys from .env, the shell, or Bitwarden (injected into the process
env by load_hermes_dotenv) are all reported, with a '(from Bitwarden)'
source suffix when applicable.
2. Deprecated HERMES_TOOL_PROGRESS / HERMES_TOOL_PROGRESS_MODE removed
from OPTIONAL_ENV_VARS so the keys page and setup checklists stop
offering them. Moved to _EXTRA_ENV_KEYS so .env sanitization and
reload_env still recognize them for existing users (gateway back-compat
fallback unchanged).
Headless/VPS users (dashboard-over-Tailscale, no comfortable SSH) could
list/toggle/install skills and create/edit cron jobs, but not author a
custom skill or link one to a cron job — the UI set WHEN a job runs, but
not WHICH skill it uses.
- Skills page: 'New skill' button + per-row edit pencil open a SKILL.md
editor dialog (frontmatter + body, server-side validation via the same
_create_skill/_edit_skill path as the agent's skill_manage tool).
- New endpoints: GET /api/skills/content, POST /api/skills,
PUT /api/skills/content — all profile-scoped via _profile_scope(),
which now also retargets tools.skill_manager_tool's import-time
SKILLS_DIR binding.
- Cron page: skills multi-select in both create and edit modals (parity
with hermes cron --skill / edit --add-skill); CronJobCreate gains a
skills field; job cards show an attached-skills badge. update_job
already accepted skills in updates.
- Tests: 17 new endpoint tests (content read, create/edit validation +
profile scoping + auth gate, cron skills round-trip).
Two beta-reported dashboard bugs:
1. Models page: 'Use as -> Main model' on an analytics card sends
entry.provider, which falls back to the model's VENDOR prefix
(modelVendor('anthropic/claude-opus-4.6') == 'anthropic') when the
session row has no billing_provider. That persisted
provider: anthropic + default: anthropic/claude-opus-4.6 — a
vendor-prefixed OpenRouter slug on the NATIVE Anthropic provider.
New sessions then 400 against api.anthropic.com and the user reads
it as 'changing models does nothing'. Unknown vendors (moonshotai,
poolside, ...) were worse: a provider that can never resolve
credentials.
Fix: _normalize_main_model_assignment() at the single write
chokepoint — maps non-provider vendor names back to the user's
current aggregator (else openrouter), and runs the model through
normalize_model_for_provider() so the persisted name matches the
target provider's API format. Wired into both /api/model/set and
the profile-scoped _write_profile_model.
2. System page: 'Restore from backup' spawns hermes import with
stdin=DEVNULL, so the CLI's interactive 'Continue? [y/N]' overwrite
prompt hits EOF and auto-aborts whenever a config already exists
(always, when the dashboard is running). Fix: ConfirmDialog in the
dashboard owns the consent, then the endpoint passes --force so the
restore runs non-interactively.
Validated live: dashboard on a temp HERMES_HOME, repro'd both failure
modes pre-fix (vendor-slug write verified via config.yaml + tui
session.create; import 'Aborted.' in action-import.log), then verified
post-fix (normalized writes, modal -> --force -> restored marker file).
* fix(mcp): propagate HERMES_HOME override onto the MCP event loop
Closes the known limit documented in #44007: tasks scheduled via
run_coroutine_threadsafe are created INSIDE the MCP loop thread, so they
copy that thread's context — a per-request profile scope (dashboard
?profile= endpoints, e.g. the MCP 'Test server' probe) silently vanished
for anything resolving get_hermes_home() inside the coroutine. Most
visible symptom: OAuth token-store paths (HERMES_HOME/mcp-tokens/)
resolved against the process home instead of the selected profile, so
testing an OAuth MCP cross-profile read the wrong tokens.
_run_on_mcp_loop now wraps scheduled coroutines with the caller's
context-local override (_wrap_with_home_override): set inside the task's
own context on the loop, reset on completion — task-local, so concurrent
calls carrying different scopes don't interfere, and the loop thread's
default context stays untouched. No-op (coroutine passes through
unwrapped) when no override is active, i.e. every non-dashboard caller.
web_server's probe comment updated from 'known limit' to 'covered'.
Tests: override propagation (direct + factory form), OAuth token-path
resolution on the loop, loop-context cleanliness after scoped calls,
no-op passthrough. 225 green across mcp_tool + unification suites.
* test(mcp): concurrent different-scope calls don't interfere
* feat(dashboard): unify multi-profile management — one machine dashboard, global profile switcher
The dashboard becomes a machine-level management surface with one
write-target selector, replacing per-profile dashboard fragmentation.
Backend:
- profile param (query or body) on /api/config (get/put/raw), /api/env
(get/put/delete/reveal), /api/mcp/servers (list/add/remove/test/enabled),
/api/mcp/catalog (list/install), /api/model/info, /api/model/set —
all scoped through the existing _profile_scope() context manager
- model/set restructured: expensive-model warning (await) runs before the
scope; the config write runs sync inside the scope in a worker thread
- MCP catalog installs + git-bootstrap entries spawn 'hermes -p <profile>'
- chat PTY: ?profile= on /api/pty points the child's HERMES_HOME at the
profile dir (its own gateway subprocess, config/skills/memory/state.db
all profile-bound); in-process gateway attach skipped when scoped
CLI launch unification:
- '<profile> dashboard' routes to the machine dashboard: attach (open
browser at ?profile=) when one is listening, else re-exec pinned to the
default profile with --open-profile preselecting the launcher
- --isolated preserves the old dedicated per-profile server behavior
- start_server(initial_profile=...) appends ?profile= to the auto-open URL
Frontend:
- ProfileProvider + sidebar ProfileSwitcher: ONE global selector, URL-
persisted (?profile=), mirrored into fetchJSON which auto-appends the
param to the scoped endpoint families (explicit params win)
- app-wide amber banner names the managed profile
- SkillsPage's page-local selector (from the skills-scoping PR) folded
into the global context — single source of truth
- ChatPage threads the scope into the PTY WS URL; switching profiles
remounts the terminal into a fresh scoped session
Omitted profile keeps legacy behavior everywhere.
* docs(dashboard): document machine-level multi-profile management
- web-dashboard.md: 'Managing multiple profiles' section (switcher, URL
deep-links, unified launch, --isolated, scoped Chat, what stays
per-profile) + --isolated in the options table
- profiles.md: 'From the dashboard' subsection + set-as-active vs
switcher clarification
- cli-commands.md: --isolated flag + profile-alias launch example
* fix(dashboard): address profile-unification review findings
Review findings (dev review on PR #44007):
1. HIGH — stale page state on profile switch: pages load data on mount
and didn't consume the profile scope, so a page opened under profile A
kept showing A's state while writes silently targeted the newly
selected B. Fixed structurally: ProfileKeyedRoutes wraps the routed
page tree and keys it by the selected profile, remounting every page
(fresh state + refetch) on switch. ChatPage keeps its own remount
(channel keyed on scopedProfile).
2. HIGH — /api/model/auxiliary read was unscoped while /api/model/set
wrote scoped (Models page could show default's aux pins while editing
worker's). Endpoint now takes profile + _profile_scope, added to
PROFILE_SCOPED_PREFIXES, HTTPException re-raise so ghost profiles 404
instead of 500. Regression test asserts read/write symmetry with
differing worker/default aux config.
3. MEDIUM — tools post-setup spawned unscoped from the profile-aware
drawer. Now spawns 'hermes -p <profile> tools post-setup <key>'
(same mechanism as hub installs); drawer threads its profile prop.
Most hooks install machine-level artifacts where the scope is inert,
but hooks reading config/env now see the drawer's HERMES_HOME.
4. LOW — ty warnings: env Optional asserts before subscript/membership,
fastapi import replaced with web_server.HTTPException re-use.
298 tests green across the four affected suites; tsc -b + vite build
green; aux scoping E2E-verified with real imports.
* fix(dashboard): address second profile-unification review (gille)
1. BLOCKER — profile scope dropped on sidebar navigation: ProfileProvider
derived the selection from the current URL, and nav links are bare
paths, so clicking Config from /skills?profile=worker silently reset
the write target. State is now the source of truth; an effect
re-asserts ?profile= onto the new location after every navigation
(URL stays a synchronized projection for deep links/refresh), and an
incoming URL param (e.g. 'Manage skills & tools' links) still wins.
2. BLOCKER — /api/model/options unscoped while model/set wrote scoped:
the picker context (current model/provider, custom providers,
per-profile .env auth state) now loads inside _profile_scope; added
to PROFILE_SCOPED_PREFIXES. Test: a worker-only current-model pin
appears in the scoped payload and not the unscoped one.
3. BLOCKER — MCP test-server probe escaped the scope after the config
read: the probe now re-enters _profile_scope inside the worker thread
so env-placeholder expansion resolves against the selected profile's
.env. Known limit (documented): the probe's dedicated MCP event-loop
thread doesn't inherit the contextvar (OAuth token paths). Test
asserts get_hermes_home() inside the probe == the worker profile dir.
4. BLOCKER — broad excepts swallowed unknown-profile 404s: /api/model/info
degraded to 200-with-empty-model-info and /api/mcp/catalog to a
silently-empty catalog. Both re-raise HTTPException; 404 regression
tests added for info/options/catalog.
Polish: scope banner clears the fixed mobile header (mt-14 lg:mt-0);
--open-profile hidden via argparse.SUPPRESS (internal re-exec flag);
attach-path test now asserts the opened ?profile= URL.
(Stale-page-state + /api/model/auxiliary findings from this review were
already fixed in 92bcd1568 — the review ran against e600f6951.)
35 tests in the two new suites + 274 in the adjacent ones, all green;
tsc -b + vite build green; scoping E2E-verified with real imports.
* docs(dashboard)+fix: self-review pass — Profiles page section, REST profile-param tip, body-beats-query precedence
Docs:
- web-dashboard.md: add the missing 'Profiles' subsection to Pages
(cards, create/builder, manage-skills jump, set-as-active vs switcher
distinction, editors); REST API section gets a profile-scoped-endpoints
tip documenting ?profile= / body profile / 404 semantics / /api/pty
- (profiles.md + cli-commands.md were already updated in e600f6951)
Precedence fix: scoped endpoints taking BOTH a query param and a body
field now resolve body.profile first. The SPA's fetchJSON injects the
query param from the GLOBAL switcher; an explicit body.profile (e.g.
Profile Builder flows writing into a specific new profile) is the more
specific intent and must not be overridden by whatever the sidebar
happens to be set to. Matches the documented 'explicit beats global'
contract in api.ts.
Verified: 304 tests green across the four suites; tsc -b + vite build
green; docusaurus build green (only pre-existing broken-link warnings,
none from this PR's pages).
'Set as active' on the Profiles page only flips the sticky active_profile
file (future CLI/gateway runs) — it never retargets the running dashboard
process. The skills/toolsets endpoints called bare load_config()/
save_config(), so after 'activating' a profile in the web UI, deactivating
a skill silently wrote into the dashboard's own profile and the activated
profile was untouched.
Backend:
- _profile_scope() context manager on the skills/toolsets endpoints:
context-local HERMES_HOME override for call-time config resolution +
cron-style locked swap of tools.skills_tool's import-time SKILLS_DIR
- profile param on /api/skills, /api/skills/toggle, /api/tools/toolsets*
(list/toggle/config/provider/env), hub sources/search installed-state
- hub install/uninstall/update spawn 'hermes -p <profile> skills ...' so
the child rebinds skills_hub.SKILLS_DIR at import (the override cannot
reach import-time globals); profile validated -> 404/400 before spawn
Frontend:
- Skills page: profile selector (deep-linkable /skills?profile=<name>),
amber banner naming the managed profile, threaded through skill toggles,
toolset drawer, and hub browser
- Profiles page: 'Manage skills & tools' action per card; 'Set as active'
toast now says it applies to new CLI/gateway runs only
Omitted profile keeps legacy behavior (dashboard's own profile).