The OpenAI device-code login (POST auth.openai.com/.../deviceauth/usercode)
had no retry or 429 handling — a transient throttle from OpenAI surfaced as
a bare "Device code request returned status 429" with no guidance, reading
as a hard login failure.
- Retry the device-code request with capped exponential backoff (honoring
Retry-After), up to 4 attempts.
- On persistent 429, raise a clear AuthError tagged CODEX_RATE_LIMITED_CODE
(classified transient, not a credential problem) with a wait hint.
- Apply the same 429 classification to the token-exchange step (same bug
class).
Unrelated to PR #47399 (Responses-API cache headers); this is the OAuth
device-code path in hermes_cli/auth.py.
get_runtime_status_running_pid() validates liveness with a local
os.kill(pid, 0) probe. In /api/status the runtime record can be the
REMOTE health-probe body (cross-container), whose PID belongs to another
host and is display-only — probing it locally is wrong and trips the
test live-system guard (os.kill on a PID outside the test subtree).
Run the fallback only against the local read_runtime_status() record.
_profile_scope swaps process-global skills_tool/skill_manager module
attrs under an RLock; /api/status holds that scope across the
run_in_executor remote-health probe await, so a concurrent
/api/skills?profile=X request can cross-restore the status profile's
skill dir on its finally. Add _config_profile_scope (contextvar-only,
task-local, await-safe) and use it for status, which only resolves
get_hermes_home() at call time for config/env/gateway state and never
needs the skills-module globals.
Context files (AGENTS.md, CLAUDE.md, .hermes.md, .cursorrules, SOUL.md) were
hard-capped at a flat 20K chars before head/tail truncation. Among the agent
harnesses we track, only Codex caps project docs at all (32 KiB); Claude Code,
OpenCode, and Cline load them whole. The flat 20K predates large context
windows and silently truncates real-world AGENTS.md files.
B — dynamic cap: when context_file_max_chars is unset (now the shipped
default), the cap scales with the model's context window
(ctx_tokens * 4 * 0.06, floor 20K, ceiling 500K). Small-context models stay at
the historical 20K; a 200K model gets 48K; large models stop truncating real
docs. An explicit context_file_max_chars still wins. Context length is resolved
once per conversation (stable -> prompt cache untouched).
C — when truncation does happen, the marker now names the concrete file path
and tells the agent to read_file it for the full content.
Validation: 154 targeted tests + full agent/ + hermes_cli/ + test_config
(0 failures); E2E against a real 60K AGENTS.md confirms small windows truncate
with the path-bearing marker, large windows load whole, and the system prompt
is byte-stable across rebuilds.
The curator now defaults to prune-only: the deterministic inactivity pass
(mark stale / archive long-unused skills) still runs whenever the curator is
enabled, but the opinionated LLM umbrella-building consolidation fork is OFF
by default.
- agent/curator.py: add DEFAULT_CONSOLIDATE=False + get_consolidate(); gate
the forked aux-model review in run_curator_review behind it (new consolidate
param, None=read config). When off, the LLM pass is skipped entirely (no
aux-model cost); the run is still recorded and reported.
- config.py: add curator.consolidate (default false); v29->v30 migration seeds
the key for existing installs without clobbering a user-set value.
- hermes_cli/curator.py: 'hermes curator run --consolidate' override; status
shows consolidate state; prune-only notice on run.
- docs + tests.
When refresh_launchd_plist_if_needed() runs from inside the gateway's own
launchd process tree (agent-initiated self-update via the terminal tool), a
direct launchctl bootout tears down the service's process group — including
the CLI doing the refresh — before the follow-up bootstrap can run. The
gateway is left unloaded and KeepAlive can't revive it (#43842).
Detect in-service execution via gateway.status.get_running_pid() +
_is_pid_ancestor_of_current_process(), and delegate the bootout->bootstrap to
a detached (start_new_session=True) helper that survives the process-group
teardown. The normal out-of-tree CLI path is unchanged.
Fixes#43842.
The docstring described a token as path-like when it contains a "/"
separator, but the keystroke-latency fix now excludes "://" scheme tokens
(URLs) even though they contain "/". Document the exclusion so the contract
matches the behavior.
The interactive CLI input box runs its completer with
`complete_while_typing=True`, so `SlashCommandCompleter.get_completions`
is invoked on *every* keystroke. That completer does blocking I/O:
fuzzy `@`-file indexing shells out to `rg`/`fd` (up to a 2s timeout) and
file-path completion calls `os.listdir` + `stat`. Because the completer
was passed inline (never wrapped in `ThreadedCompleter`), all of this ran
synchronously on the prompt_toolkit event loop, stalling the render after
each key — very noticeable on WSL2 and other slow-filesystem setups
("typing in the prompt box being very latent").
Two fixes:
- Wrap the input completer in `ThreadedCompleter` so completion work runs
off the UI event loop and never blocks rendering between keystrokes.
- Stop treating URLs as file paths in `_extract_path_word`: a token like
`https://example.com/x` contains `/`, so it triggered `os.listdir` on
every keystroke while typing/pasting a link (listing a bogus `https:`
dir) for a completion that can never be useful. Skip any token with a
`://` scheme separator.
(cherry picked from commit b5be2ba276)
Follow-up to salvaged PR #41633: the timestamp prefix injection was
unconditional. Gate the in-context render behind
gateway.message_timestamps.enabled (default false) at both the live-message
and history-replay sites; timestamp metadata is still captured + persisted
regardless so the toggle can be flipped on later. Add DEFAULT_CONFIG entry,
docs, and gate tests.
* fix(desktop): re-download Electron binary via mirror when pack fails (#47266)
Since #38673 pinned build.electronDist to node_modules/electron/dist,
electron-builder reads the Electron binary straight from there and never
downloads it during `npm run pack`. That dist tree is only produced by the
electron package's postinstall (install.js) during `npm ci`. When that
download is blocked or throttled (GitHub's release host is unreachable in
some regions), the dist is missing and the build dies with:
The specified electronDist does not exist: .../node_modules/electron/dist
The existing ELECTRON_MIRROR fallback in all three desktop-build paths
(scripts/install.ps1, scripts/install.sh, and `hermes desktop` in
hermes_cli/main.py) re-ran `npm run pack` with ELECTRON_MIRROR set — but
pack never downloads Electron anymore, so the mirror was never used and the
retry re-read the same missing dist. The fallback was effectively dead.
Drive the mirror through electron's own downloader instead:
- Add a dist-presence check + a downloader helper (Test-ElectronDist /
Restore-ElectronDist, _electron_dist_ok / _restore_electron_dist,
_electron_dist_ok / _redownload_electron_dist) that wipes a partial dist
+ the path.txt version marker (electron's install.js short-circuits on it)
and re-runs `node install.js`, optionally via a mirror.
- On the first retry, repopulate a missing dist from the canonical source;
on the mirror retry, re-fetch through npmmirror.com, then pack.
- Gate the re-download on the dist check so an unrelated build failure
(tsc/vite) doesn't trigger a pointless ~200 MB refetch, and skip the final
pack when the binary still can't be fetched instead of failing the same way.
* test(desktop): cover Electron dist re-download mirror fallback (#47266)
Add behavior coverage for the electronDist re-download fix:
- _electron_dist_ok across linux/win32/darwin, including the partial-dist
case (dir present but binary missing) that makes the pinned electronDist
fail.
- _redownload_electron_dist: no-op when the binary is present, bail when
install.js is absent, wipe a stale dist + path.txt marker and run
electron's downloader with ELECTRON_MIRROR injected, and report failure
when the download still produces no binary.
- `hermes desktop`: the mirror fallback now drives electron's own downloader
before re-running pack, and skips the final pack entirely when the binary
can't be fetched.
Replaces the old mirror test that asserted the (now-fixed) dead behavior of
re-running `npm run pack` with ELECTRON_MIRROR set — pack never downloads
Electron under the pinned electronDist, so that retry could never help.
The /model interactive picker resolved a base_url from user credentials
but never passed it to ProviderProfile.fetch_models(), causing the
picker to always query the provider's hardcoded default endpoint
instead of the user's custom URL (e.g. a company litellm proxy).
- providers/base.py: add optional base_url parameter to fetch_models()
- hermes_cli/models.py: pass resolved base_url to fetch_models()
- Update all subclass overrides for signature compatibility
- Add 6 regression tests covering override, fallback, and integration
Section 3 of list_authenticated_providers (user-defined endpoints from
the providers: config section) required an api_key before probing the
endpoint's /v1/models for live model discovery. This broke local
self-hosted backends (llama.cpp, Ollama, vLLM, etc.) that don't require
authentication — they would only ever show the single default_model
from config instead of the full model catalog.
Section 4 (custom_providers list) already handled this correctly with
the policy: probe when api_key is set OR when no explicit models are
configured. Apply the same logic to Section 3 so local backends get
full model discovery without requiring a placeholder api_key workaround.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The #45954 model-dedup builds `user_models` from every is_user_defined
row, then strips those model IDs from every row where is_aggregator(slug)
is True. But is_aggregator() returns True for *every* `custom:*` slug, and
list_authenticated_providers emits named custom providers with slug
`custom:<name>` and is_user_defined=True. So a user's own custom provider
is treated as an aggregator and filtered against user_models — which holds
exactly its own models (the row helped build that set). Every model is
removed, the row drops to zero, and the provider disappears from the model
picker.
Guard the dedup loop to skip is_user_defined rows: a user's configured
provider is never an aggregator duplicate of itself. Built-in aggregators
(openrouter, etc.) are still deduped as before. Adds a regression test.
Switch the default model for the xAI/Grok provider and the xAI web
search backend from grok-4.3 to grok-build-0.1. grok-build-0.1 is
already recognized by the model metadata, so no new model definition
is required; grok-4.3 remains selectable.
PROBLEM: Automatic context files such as SOUL.md and AGENTS.md were capped by a hardcoded CONTEXT_FILE_MAX_CHARS value. Amy's local fork had raised that constant from 20K to 25K so a larger SOUL.md would not be silently truncated, but the hardcoded 25K value changed upstream default behavior and made the patch less generally useful.
SOLUTION: Restore the upstream-compatible 20K default, add a context_file_max_chars config setting for users who intentionally keep larger identity/project-context files, keep chat-visible truncation warnings, and document the new setting. Tests cover the default, config override, explicit max_chars precedence, and the warning text.
Z.ai released GLM 5.2 on 2026-06-15, available on OpenRouter:
- https://openrouter.ai/z-ai/glm-5.2
GLM-5.2 is Z.ai's flagship for long-horizon tasks, shipping a 1M-token
context window (up from 200K on GLM 5.1) and tool calling. Per the
OpenRouter API: text-only, context_length 1048576, tools supported.
No separate -fast variant exists.
The 1M context length, native zai picker entry, setup wizard, and Z.ai
coding-plan auth entries for glm-5.2 already landed on main. This fills
the remaining gap: the two aggregator surfaces where glm-5.1 appears but
glm-5.2 did not.
Changes:
hermes_cli/models.py
- Add z-ai/glm-5.2 to the OpenRouter fallback snapshot (OPENROUTER_MODELS)
and the Nous Portal curated list (_PROVIDER_MODELS["nous"]), newest
flagship first. Live catalogs surface it automatically when reachable;
the fallback lists matter when the manifest fetch fails.
website/static/api/model-catalog.json
- Regenerated via scripts/build_model_catalog.py (not hand-edited) so the
manifest stays in sync with the source lists; guarded by
tests/hermes_cli/test_model_catalog.py.
The generic live+curated merge (commit 630b438) seeded the merged list
from live results, demoting curated-only models below live ones. That
regressed #46309, which deliberately surfaces the newest curated model
(kimi-k2.7-code) FIRST in the native picker even when the live /models
listing lags. Restore curated-first ordering: curated entries lead (in
catalog order), live-only entries are appended for discovery. This keeps
the #46850 fix (zai glm-5.2 now appears) without the kimi regression.
Also switch the validate_requested_model curated fallback (commit
ee7b8a4) from provider_model_ids() — which triggers a second, uncached
live /models fetch with its own 8s timeout and may resolve different
credentials than the api_key/base_url just probed — to the pure-catalog
helper _model_in_provider_catalog(). Membership is checked against the
shipped catalog only, with no extra network call.
Tests: restore the curated-first assertion in
test_kimi_coding_live_catalog_does_not_hide_curated_k2_7_code; update
the new merge tests to curated-first semantics; de-circularize the
validation fallback tests to patch _PROVIDER_MODELS (the real source)
instead of mocking the function under test.
Slack caps apps at 50 slash commands and the registry is at that ceiling, so
adding /debug clamped it out of the native list and broke the telegram-parity
test (debug on Telegram, absent from Slack native slashes, in neither
exclusion set). Add 'debug' to _SLACK_VIA_HERMES_ONLY — same treatment credits
already gets. /debug stays native on CLI/TUI/Telegram/Discord and reachable via
/hermes debug on Slack.
Classify exhausted pool-only openai-codex credentials as quota/rate-limited instead of missing auth. This prevents auth status and runtime credential resolution from reporting missing credentials when a valid manual:device_code pool credential exists but is temporarily in a 429 usage-limit cooldown.
Adds regression coverage for pool-only Codex auth status and runtime resolution.
Streamed Telegram replies that finalize through editMessageText were
converted to MarkdownV2, which has no table syntax and rewrites pipe
tables into bullet lists — users saw a table while streaming that
collapsed to a list at the last moment.
Finalize now edits the existing preview IN PLACE via Bot API 10.1's
editMessageText rich_message parameter when the content has constructs
the legacy path degrades (tables, task lists, <details>, block math).
No fresh send + delete, so no duplicate-preview flicker — the reason
#46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming
stays False; the in-place edit replaces it.
- _needs_rich_rendering(): rich reserved for table/task-list/details/math
(adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2.
- _try_edit_rich(): editMessageText + rich_message via do_api_request,
mirroring _try_send_rich's fallback/latch/transient contract.
- edit_message finalize tries rich in place before the 4,096 overflow
pre-flight (rich cap is 32,768), falling back to legacy on rejection.
- rich_messages default flipped back to True (DEFAULT_CONFIG + adapter).
- docs (en + zh-Hans) + cli-config example updated to default-on.
Closes the root cause behind #45911 / #46009.
When live /v1/models responds but omits a model that exists in the
curated static catalog, validate_requested_model now accepts it with
a note instead of rejecting. This covers the /model slash-command path
(the picker path was already fixed in the parent commit).
Addresses review feedback from potatogim on #46857.
When a provider's live /v1/models endpoint returns a stale or incomplete
list (e.g. Z.AI missing glm-5.2), the generic profile-based code path
returned only the live results, silently dropping curated models.
Generalize the kimi-coding merge pattern to all providers: live entries
come first (provider's preferred order), then curated-only entries are
appended with case-insensitive dedup. This ensures models that the live
endpoint omits still appear in /model picker.
Fixes#46850
External providers (Claude Code) store creds outside Hermes, so the
disconnect API refuses them. The backend now hands the GUI a per-OS
`disconnect_command` that clears the credential the same way the CLI's
logout does (macOS Keychain entry + ~/.claude/.credentials.json), and
the misleading "use claude setup-token" hint is corrected.
Settings → Providers offers a Disconnect button for these: it confirms,
leaves Settings, and runs the removal command in the embedded terminal
via a new runInTerminal() (queues onto $terminalInjection; the terminal
pane flushes and clears it once its session is live). The expanded list
also gets its own "Other providers" header so it no longer reads as
grouped under "Connected". API-managed providers keep the one-click
(trash) disconnect.
On a remote gateway connection, agent-written files live on the gateway
host, not the desktop's disk, so the Artifacts view's file:// hrefs failed
("Invalid external URL") and image thumbnails broke.
Make mediaExternalUrl() remote-aware in one place: in remote mode it
rewrites gateway-local paths to GET /api/files/download (a new endpoint
that streams the file as a Content-Disposition: attachment). The artifacts
view now resolves through it, and so do the existing chat-media and
generated-image callers, for free.
The download endpoint stays auth-gated; auth_middleware additionally
accepts the session token as a ?token= query param for this one path so a
shell/browser-opened download (which can't set the session header) still
authenticates — the same query-token tradeoff as the /api/pty WebSocket.
It is NOT added to PUBLIC_API_PATHS.
Salvages #46663 (which carried ~19k lines of CRLF noise and made the
endpoint public). Reimplemented on a clean LF base with the security hole
closed and tests added.
Co-authored-by: qingshan89 <qs2816661685@gmail.com>
* feat(delegation): async background subagents via delegate_task(background=true)
delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.
- tools/async_delegation.py: daemon-executor registry, capacity cap,
rich self-contained completion event pushed onto the shared
process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
results into the originating session (idle + post-turn), session_key
routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).
Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.
* test(delegation): make async non-blocking tests environment-independent
CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.
Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.
* fix(delegation): harden async background delegation
Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
evicts (release_clients), and session close (/new) kill/close the
detached subagent mid-run, defeating the point of background mode.
Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
fallthrough keyed them all as ("", type), suppressing every completion
after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
unused _ASYNC_DEFAULT import.
Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.
* fix(delegation): harden async background completion drains
`terminal.backend` in config.yaml is bridged to the TERMINAL_ENV env var,
but a TERMINAL_ENV set in .env / the shell overrides config and is what
terminal_tool actually uses. The dump printed only the config value, so a
user whose agent was jailed in a docker/podman sandbox via a stale
TERMINAL_ENV still saw `terminal: local` — hiding the real cause. Report
the effective backend and flag when TERMINAL_ENV overrides config.yaml.
When a user-defined provider (e.g. litellm-proxy) and an aggregator
(e.g. openrouter) both advertise the same model name, the Desktop/TUI
model picker would show the model under both groups. Selecting it from
the aggregator row silently set model.provider to the aggregator,
breaking calls because the aggregator doesn't actually serve that model
ID.
Fix: after list_authenticated_providers() returns, collect all models
from user-defined provider rows and filter them out of aggregator rows.
Uses is_aggregator() from hermes_cli/providers.py to identify
aggregators. Case-insensitive matching.
Fixes#45954
NVIDIA NIM API uses vendor-prefixed model IDs (e.g. qwen/qwen3.5-122b-a10b,
nvidia/nemotron-3-super-120b-a12b). The doctor command incorrectly warns that
vendor-prefixed slugs belong to aggregators like openrouter when nvidia is
the configured provider.
Add 'nvidia' to the providers_accepting_vendor_slugs set so doctor no longer
raises false-positive warnings for valid NVIDIA NIM configurations.
Fixes#35425
PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat.
SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative.
Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py
Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.
Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
The salvaged read-side fix lets a profile resolve the xAI OAuth grant from
the global-root auth store when it has no own providers.xai-oauth block.
But _save_xai_oauth_tokens still wrote rotated tokens only to the active
profile store. Because xAI rotates the refresh_token on every refresh, a
profile that reads root's grant and refreshes it left root holding a now-
revoked refresh token — killing every other profile reading the stale root
grant with invalid_grant once its access token expired (#43589).
Detect the read-from-root case (profile lacks its own providers.xai-oauth
block) and, after the profile save, write the rotated chain back to the
global root too via a best-effort, TOCTOU-safe write-through that reuses
_save_auth_store with an explicit target path. A profile that genuinely
shadows root (has its own block) is left untouched, classic mode is a
no-op, and a failed root write never breaks the profile's own save.
Pairs with the read fallback in the preceding commit so the cross-profile
xAI grant stays coherent in both directions.
* fix(docker): skip per-profile gateway reconciliation in dashboard container
When gateway and dashboard containers share a bind-mounted HERMES_HOME,
both run the cont-init.d profile reconciliation script, which creates
s6-log processes for every persisted profile. These s6-log processes
in different containers race to flock() the same log-directory lock
files under logs/gateways/<profile>/lock, producing repeated
"s6-log: fatal: unable to lock ... Resource busy" errors and a
supervision restart storm.
Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py
and set it in the official docker-compose.yml dashboard service so the
dashboard container no longer creates per-profile gateway s6 services
it never uses.
* chore(release): map salvaged contributor
* refactor(docker): autodetect dashboard container instead of env-var gate
Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role
detection. A dashboard-only container never spawns or supervises
per-profile gateways, so the reconcile boot hook now skips itself when
/proc/1/cmdline is the dashboard command — no operator flag to set (or
forget in a hand-written manifest, which would reintroduce the s6-log
flock storm this prevents).
- Extract _strip_container_argv_prefix() shared by the legacy-gateway
and new dashboard detectors (DRY the init/wrapper/hermes peel).
- Add _is_dashboard_container(); gate reconcile main() on it.
- Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml.
- Tests: argv matrix for both roles + main()-level skip/reconcile proof
and a regression that the removed env var is now inert.
Co-authored-by: 895252509 <895252509@qq.com>
---------
Co-authored-by: zhouxiang <895252509@qq.com>
Co-authored-by: Ben <ben@nousresearch.com>
The supervised `gateway-default` s6 slot runs bare `hermes gateway run`
(no -p) to mean "the root HERMES_HOME profile". But `_apply_profile_override`
falls through its #22502 HERMES_HOME guard for the container root
(/opt/data, whose parent is not `profiles`) and reads the sticky
`active_profile` file. If the user set another profile active (e.g. via
the dashboard), the reserved default gateway gets redirected into that
profile — producing a duplicate gateway for the active profile and no
real default gateway. The profile page and `gateway status` then
correctly report default as "not running" because there genuinely isn't
one.
Guard step 2 (the sticky active_profile fallback) with the existing
HERMES_S6_SUPERVISED_CHILD sentinel that the container run-script already
exports. Supervised named-profile slots pass -p explicitly (step 1, never
reaches step 2); only the bare default slot was affected. Inert outside
the s6 container — the sentinel is never set elsewhere.
Reported in the 'Docker & Profiles & Dashboard' support thread.
The unified machine-dashboard reroute (cmd_dashboard) re-execs a named-profile
dashboard launch as the machine dashboard and dropped HERMES_HOME from the
child env with the comment "so the child binds the machine root". That holds
for a standard install (root == ~/.hermes) but breaks the Docker layout: the
published image sets `ENV HERMES_HOME=/opt/data`, so once HERMES_HOME is unset
the child falls back to $HOME/.hermes = /opt/data/.hermes — an empty,
auto-seeded home.
Two user-visible symptoms, one root cause (reported via support):
1. Dashboard Profiles page shows only an empty `default` — the real
default/oracle/saga profiles live under /opt/data/profiles, but the
rerouted child resolves _get_profiles_root() to /opt/data/.hermes/profiles.
2. The "Update Hermes" button runs `hermes update` inside the container
repeatedly instead of bailing with the docker-update guidance. The Docker
guard keys off detect_install_method(), which reads
$HERMES_HOME/.install_method; the image stamps that at /opt/data, but the
misresolved home has no stamp, no HERMES_MANAGED, and no .git → falls
through to "pip", so the guard never fires.
The reporter's workaround was to bind-mount the host dir at both /opt/data and
/opt/data/.hermes so the two paths converge (at the cost of a self-referential
recursion).
Fix: resolve the machine root explicitly with get_default_hermes_root() and set
it on the child env instead of popping HERMES_HOME. That helper returns the
root for both layouts — ~/.hermes for a standard install, and /opt/data for
Docker (it strips a trailing profiles/<name>). Falls back to the old pop
behaviour only if root resolution raises, so the reroute is never blocked.
Regression tests in test_dashboard_unified_launch.py: the existing standard-
install test now asserts the child carries HERMES_HOME == get_default_hermes_root()
(not absent), and a new test_reexec_pins_docker_machine_root covers the Docker
layout (HERMES_HOME=/opt/data/profiles/oracle → child gets /opt/data). Both
fail against the pre-fix pop behaviour (mutation-verified).
* fix: persist s6 gateway desired state
* chore(release): map salvaged contributor
---------
Co-authored-by: Alfred Smith <alfred@my-cloud.me>
Co-authored-by: Ben <ben@nousresearch.com>