Commit graph

11747 commits

Author SHA1 Message Date
xxxigm
45e2f4fdcd nix: refresh npmDepsHash for the @assistant-ui/store pin
The store pin changed package-lock.json, so the workspace-wide
npmDepsHash in nix/lib.nix is stale and the Nix flake check fails on
the hash mismatch. Use the hash reported by the real fetchNpmDeps
build (the flake check's `got:`), which is authoritative — it differs
from prefetch-npm-deps' lockfile-contents hash, exactly the divergence
nix/lib.nix already documents.
2026-06-15 11:55:02 -04:00
xxxigm
30377e108c ci(desktop): build the renderer on PRs so vite breaks fail in CI
The desktop build break shipped because nothing in CI runs the
apps/desktop production build. typecheck only runs `tsc`, which does
not exercise Vite/Rolldown module resolution, so an unresolvable
package export (the @assistant-ui/tap "./react-shim" split) sailed
through green checks and only failed when users built from source on
install/update.

Add a desktop-build job that runs `npm run build` (tsc -b + vite build
+ assert-dist-built) for apps/desktop. This closes the gap so the same
class of break fails in CI instead of on every user's machine.
2026-06-15 11:55:02 -04:00
xxxigm
f02484feba test(deps): guard @assistant-ui cluster on one tap version
Lockfile invariant that would have caught the desktop build break: the
single hoisted @assistant-ui/tap must satisfy every @assistant-ui/*
package's declared tap requirement (deps or non-optional peer). It is a
contract, not a snapshot -- no hardcoded versions -- so it stays green
across routine bumps but fails the moment the cluster splits its tap
requirement again.
2026-06-15 11:55:02 -04:00
xxxigm
eae3836eb6 fix(desktop): pin @assistant-ui/store so the cluster shares one tap
The desktop app is built from source on every install/update
(install.ps1 -> npm ci/install -> tsc -b && vite build). The
@assistant-ui packages share an internal reactivity lib,
@assistant-ui/tap, and only interoperate when they all resolve the
SAME tap version.

@assistant-ui/react@0.12.28 and @assistant-ui/core pin tap@^0.5.x
(which exports only "." and "./react"), but the caret range
react -> store@^0.2.9 floated store up to 0.2.18, which bumped its
tap peer to ^0.9.0 and began importing "@assistant-ui/tap/react-shim"
-- an entry point that only exists in the tap 0.9.x line. With the
hoisted tap stuck on 0.5.x, vite build crashed:

    "./react-shim" is not exported ... from package @assistant-ui/tap

i.e. the opaque "apps/desktop build failed (exit 1)" everyone hit when
updating today.

Pin @assistant-ui/store via root overrides to 0.2.13 -- the last
release that targets tap@^0.5.x -- so react/core/store all agree on the
hoisted tap@0.5.14 again. Verified: tsc -b and vite build both pass.
2026-06-15 11:55:02 -04:00
Teknium
3e7e9b24d4 fix: harden salvaged session and browser improvements
Polish salvaged contributor work before PR review:
- read browser inactivity timeout from config with documented fallback
- skip redundant v10 trigram backfill before v11 FTS rebuild
- show delegate_task goals safely in progress previews
- show gateway status model/context without redundant token wording
- wire gateway /sessions to shared session-listing helpers
- map Ravenwolf author emails for release attribution

Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>
Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>
2026-06-15 07:46:34 -07:00
Wolfram Ravenwolf
ead38107a2 feat(status): restore model and context in gateway status
PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat.

SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative.

Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py
2026-06-15 07:46:34 -07:00
Amy Ravenwolf
5035fa9029 feat(display): show delegate_task goals in tool progress notifications
Previously, delegate_task in batch mode only showed '3 parallel tasks'
without revealing what the tasks actually are. Single-task mode showed
the goal via the primary_args fallback, but batch mode had no goal
extraction.

Changes:
- build_tool_preview(): Add dedicated delegate_task handler that
  extracts individual task goals from both single and batch modes.
  Batch shows '3 tasks: Goal A | Goal B | Goal C'.
- _get_cute_tool_message_impl(): Show individual goals in CLI cute
  messages for batch delegate calls ('3x: Goal A | Goal B').
- Add 4 tests covering single goal, batch goals, missing goals,
  and no-goal edge case.
2026-06-15 07:46:34 -07:00
Wolfram Ravenwolf
5b2604df99 fix(state): skip redundant trigram backfill before v11 FTS rebuild 2026-06-15 07:46:34 -07:00
Amy Ravenwolf
2f2e3616b4 fix(config): read browser inactivity timeout from config 2026-06-15 07:46:34 -07:00
xxxigm
bee13817f0 test(desktop): cover $connection resync on profile switch
Asserts ensureGatewayProfile keeps $connection in lockstep with the active
profile's backend: activating a remote pool profile flips mode to remote,
returning to default resyncs to local, a failed descriptor fetch leaves the
prior connection intact, and a same-profile activation doesn't churn it.
Regression coverage for #46651.
2026-06-15 07:11:02 -07:00
xxxigm
fbabf438a1 fix(desktop): sync $connection on profile switch so remote profiles attach images as bytes
The renderer's $connection seeds from the PRIMARY (window) backend at boot and
otherwise only refreshes on a sleep/wake reconnect. Activating a background
profile (ensureGatewayProfile) pointed the live gateway + REST at that profile's
backend but never updated $connection, so its `mode` stayed stuck on the
primary. With a local primary and a remote pool profile active, every code path
that branches on local-vs-remote misfired: image attachments went out via the
path-based `image.attach` instead of `image.attach_bytes`, handing the remote
gateway a client-only Windows path it can't resolve ("image not found: C:\..."),
and the /api/fs/* file browser and /api/media fetches targeted the wrong
machine.

Resync $connection from the now-active profile's descriptor right after the
gateway swap, so the remote-aware paths follow the live backend. Best-effort: a
failed descriptor fetch leaves the prior connection intact for boot/reconnect to
resync. Single-profile users are unaffected (the same-profile fast path never
runs the swap).

Fixes #46651
2026-06-15 07:11:02 -07:00
Teknium
49e743985a fix: route minimax m3 reasoning controls through profile
Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes.

Co-authored-by: goku94123 <gooku94123@gmail.com>
2026-06-15 07:08:43 -07:00
goku94123
ba3883cd18 fix(minimax): enable reasoning extra_body for api.minimax.io 2026-06-15 07:08:43 -07:00
Teknium
be7c919bf9
fix(process): label background completion causes (#46659)
Track why a background process finished and include that source in notify-on-complete messages so SIGTERM from process.kill, kill_all, backend loss, and ordinary exits are distinguishable.
2026-06-15 07:08:24 -07:00
Teknium
733472952a fix: complete cron jobs lock salvage
Route curator rollback through the same cross-process cron job lock, make save_jobs lock for legacy direct callers without deadlocking nested mutation paths, and harden the regression test so a second _jobs_lock caller really blocks across processes.
2026-06-15 06:29:00 -07:00
CiarasClaws
e5b4cf7bea fix(cron): make jobs.json writes safe across processes
`hermes cron pause`/`resume`/`remove` run in their own CLI process (CLI →
cronjob tool → pause_job → update_job → save_jobs), entirely separate from
the gateway process that also writes jobs.json (mark_job_run, advance_next_run,
due-fast-forward in get_due_jobs). The only synchronization was a module-level
`threading.Lock`, which serializes writers *within a single process* but does
nothing across processes — and update_job/pause_job/remove_job/create_job did
not even take it.

The result is a classic lost update: a `cron pause` issued while the gateway is
live loads jobs.json, sets enabled=False, and saves; concurrently the gateway
loads the same file and saves back its run-bookkeeping, clobbering the pause.
The CLI prints "Paused" (it succeeded against its own in-memory copy) but the
job stays enabled and keeps firing, with no error surfaced. The scheduler's
`.tick.lock` flock can't be reused for this — it is held for the entire tick,
including multi-minute agent runs, so a CLI mutation would block for minutes.

Add `_jobs_lock()`: a short-held cross-process advisory file lock (fcntl/msvcrt
flock on `<hermes_home>/cron/.jobs.lock`) layered over the existing in-process
lock, and wrap every load→modify→save critical section with it — create_job,
update_job, remove_job, mark_job_run, advance_next_run, get_due_jobs,
rewrite_skill_refs. The lock degrades to in-process-only if neither fcntl nor
msvcrt is available, preserving prior behaviour. All critical sections are short
(field edits, no agent execution), so contention resolves in milliseconds.

Adds a regression test that proves the lock excludes a second process (an
in-process threading.Lock cannot).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:29:00 -07:00
Teknium
29c6985590 fix(nix): refresh npm deps hash 2026-06-15 06:18:27 -07:00
FT_IOxCS
92a456f711 fix(cli,deps): clear esbuild audit loop
Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.
2026-06-15 06:18:27 -07:00
Teknium
975b9f0a54
docs: recommend standard installer for development (#46646) 2026-06-15 06:14:57 -07:00
Teknium
0d82060c74 fix: harden WhatsApp target alias salvage
Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
2026-06-15 05:51:47 -07:00
Keiron McCammon
ea49a79633 fix(messaging): route WhatsApp group JIDs to the target, not the home DM
send_message(target="whatsapp:<group-jid>") silently delivered to the
configured home DM instead of the requested group. Two gaps:

1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us),
   user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and
   broadcast/newsletter JIDs matched no pattern and fell through to
   `return None, None, False`, so the caller treated them as
   unresolvable and used the home channel. The bridge's /send endpoint
   accepts any chatId, so only the tool-side target parsing was at fault.
   Add a whatsapp branch that recognizes native JIDs as explicit targets.
   The pre-existing '+'-prefixed E.164 path is preserved.

2. WhatsApp groups have no human-friendly name — the channel directory
   is regenerated from session data on a timer, so a group shows up as
   its raw 18-digit JID and any hand-edit to channel_directory.json is
   clobbered on the next rebuild. Add a user-maintained alias overlay
   (~/.hermes/channel_aliases.json) re-applied on every build AND every
   load, giving durable friendly names and letting a freshly-created
   group be pre-named before its first message.

Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser;
TestChannelAliases (7 cases) for the overlay, plus an autouse fixture
isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the
existing directory tests.
2026-06-15 05:51:47 -07:00
Teknium
c17469cb19 chore: map Veritas-7 release attribution
Add the contributor noreply email used by the salvaged xAI OAuth refresh-skew commit so release notes credit the original author.
2026-06-15 05:40:23 -07:00
Veritas-7
febdddb41a fix(auth): refresh xAI OAuth tokens earlier 2026-06-15 05:40:23 -07:00
Teknium
aab2e99bae test: cover request debug dump redaction
Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.
2026-06-15 05:31:21 -07:00
xtymac
ad58dd51ac redact secrets in API request debug dumps
dump_api_request_debug() masks the provider Authorization header but writes
the request `body` (system prompt, tool defs, context-embedded values) and the
error message raw via atomic_json_write. This path also fires unconditionally
on API errors (not only under HERMES_DUMP_REQUESTS), so any secret surfaced
into context (e.g. an integration token) lands in cleartext at
request_dump_*.json on every failed call.

Run the serialized dump through the existing redact_sensitive_text() scrubber
(already used for logs/tool output) before persisting and before the
HERMES_DUMP_REQUEST_STDOUT print; preserve atomicity via temp-file +
Path.replace. Also add the Notion internal-integration prefix (ntn_) to
_PREFIX_PATTERNS so bare values are caught.

Per SECURITY.md §3.2 this is a redaction (in-process heuristic) hardening, not
a §3.1 vulnerability. Refs #46583.
2026-06-15 05:31:21 -07:00
Teknium
a688d2a1bd test: assert disk cleanup prunes protected walks 2026-06-15 05:25:27 -07:00
墨綠BG
40699c3292 🐛 fix(disk-cleanup): avoid brittle sweep review issues 2026-06-15 05:25:27 -07:00
墨綠BG
c1a70a5439 🐛 fix(disk-cleanup): prune protected cleanup walks 2026-06-15 05:25:27 -07:00
liuhao1024
2cddc9c895 fix(bedrock): check boto3 version >= 1.34.59 before using converse_stream
converse() and converse_stream() were added in boto3 1.34.59. When Hermes
is installed editable into system Python (e.g. Ubuntu 24.04 ships 1.34.46),
the system boto3 takes precedence and calls to converse_stream fail with
AttributeError. Add an early version check in _require_boto3() that raises
a clear RuntimeError with upgrade instructions.
2026-06-15 05:25:17 -07:00
Teknium
f79b109f4f chore: map 0xneobyte release author 2026-06-15 05:25:07 -07:00
Tharushka Dinujaya
ec05d2bc3e fix(gateway): evict scoped lock when PID+start_time match but process is not a gateway
On Linux, systemd spawns core services (cron, nginx, sshd) with
deterministic PIDs and jiffy start_times across reboots. A service can
land on the exact same PID and start_time as a previous gateway, causing
acquire_scoped_lock to mistake it for a live gateway and block startup.

The existing stale-detection paths only covered:
  - start_times both non-None and different (clear mismatch)
  - start_times both None (macOS/Windows fallback to cmdline check)

The boot-time collision falls through both: times are non-None and
equal, so neither branch fired.

Add a third check: when both start_times are known and match but the
live process fails _looks_like_gateway_process, read its cmdline. If
the cmdline is readable (non-None), we have positive evidence of an
impostor and mark the lock stale. Requiring a readable cmdline keeps the
check conservative — if cmdline is unreadable we do not evict.
2026-06-15 05:25:07 -07:00
Nicolò Boschi
a376ca0081 feat(hindsight): make observation scopes configurable on retain
Adds an observation_scopes config key (and HINDSIGHT_RETAIN_OBSERVATION_SCOPES
env var) so retained memories can opt into per_tag / all_combinations /
custom scoping instead of Hindsight's default combined pass.

Threaded through _build_retain_kwargs so all three retain paths honor it:
auto-retain and flush-on-switch already use aretain_batch; the tool retain
path is switched from aretain to aretain_batch (functionally equivalent,
aretain just wraps a single-item batch) since aretain doesn't accept the
observation_scopes parameter.
2026-06-15 04:59:17 -07:00
kshitij
8844e091c1
Merge pull request #46614 from kshitijk4poor/salvage/xai-oauth-profile-writethrough
fix(auth): resolve xAI OAuth credentials across profiles + write rotated tokens back to root
2026-06-15 17:16:19 +05:30
kshitijk4poor
1227007aed chore: map capt-marbles contributor email for attribution
Salvaged commit in this PR is authored by capt-marbles
(andrewdmwalker@gmail.com), a bare gmail that does not auto-resolve in
the check-attribution job. Add the AUTHOR_MAP entry.
2026-06-15 17:09:27 +05:30
kshitijk4poor
497352bc4e fix(auth): write rotated xAI OAuth tokens back to global root (#43589)
The salvaged read-side fix lets a profile resolve the xAI OAuth grant from
the global-root auth store when it has no own providers.xai-oauth block.
But _save_xai_oauth_tokens still wrote rotated tokens only to the active
profile store. Because xAI rotates the refresh_token on every refresh, a
profile that reads root's grant and refreshes it left root holding a now-
revoked refresh token — killing every other profile reading the stale root
grant with invalid_grant once its access token expired (#43589).

Detect the read-from-root case (profile lacks its own providers.xai-oauth
block) and, after the profile save, write the rotated chain back to the
global root too via a best-effort, TOCTOU-safe write-through that reuses
_save_auth_store with an explicit target path. A profile that genuinely
shadows root (has its own block) is left untouched, classic mode is a
no-op, and a failed root write never breaks the profile's own save.

Pairs with the read fallback in the preceding commit so the cross-profile
xAI grant stays coherent in both directions.
2026-06-15 17:08:19 +05:30
Andrew Walker
f1d6f04362 fix(auth): resolve xAI OAuth credentials across profiles
(cherry picked from commit 8d8b9f50e4)
2026-06-15 17:03:35 +05:30
helix4u
dcc3216955 fix(mcp): fail fast for noninteractive oauth without tokens 2026-06-15 04:22:07 -07:00
Teknium
aca11c227e
fix(docker): skip gateway reconciliation in dashboard container (autodetect) (#46293)
* fix(docker): skip per-profile gateway reconciliation in dashboard container

When gateway and dashboard containers share a bind-mounted HERMES_HOME,
both run the cont-init.d profile reconciliation script, which creates
s6-log processes for every persisted profile.  These s6-log processes
in different containers race to flock() the same log-directory lock
files under logs/gateways/<profile>/lock, producing repeated
"s6-log: fatal: unable to lock ... Resource busy" errors and a
supervision restart storm.

Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py
and set it in the official docker-compose.yml dashboard service so the
dashboard container no longer creates per-profile gateway s6 services
it never uses.

* chore(release): map salvaged contributor

* refactor(docker): autodetect dashboard container instead of env-var gate

Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role
detection. A dashboard-only container never spawns or supervises
per-profile gateways, so the reconcile boot hook now skips itself when
/proc/1/cmdline is the dashboard command — no operator flag to set (or
forget in a hand-written manifest, which would reintroduce the s6-log
flock storm this prevents).

- Extract _strip_container_argv_prefix() shared by the legacy-gateway
  and new dashboard detectors (DRY the init/wrapper/hermes peel).
- Add _is_dashboard_container(); gate reconcile main() on it.
- Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml.
- Tests: argv matrix for both roles + main()-level skip/reconcile proof
  and a regression that the removed env var is now inert.

Co-authored-by: 895252509 <895252509@qq.com>

---------

Co-authored-by: zhouxiang <895252509@qq.com>
Co-authored-by: Ben <ben@nousresearch.com>
2026-06-15 20:51:48 +10:00
kshitij
6cb88a0874
Merge pull request #46552 from kshitijk4poor/salvage/file-tools-session-cwd
Some checks failed
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run
Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
OSV-Scanner / Scan lockfiles (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run
Typecheck / typecheck (apps/desktop) (push) Waiting to run
Typecheck / typecheck (apps/shared) (push) Waiting to run
Typecheck / typecheck (ui-tui) (push) Waiting to run
Typecheck / typecheck (web) (push) Waiting to run
uv.lock check / uv lock --check (push) Waiting to run
Nix / nix (macos-latest) (push) Has been cancelled
Nix / nix (ubuntu-latest) (push) Has been cancelled
fix(tools): respect session cwd in file tools (salvage of #46460)
2026-06-15 14:13:15 +05:30
kshitijk4poor
8fce54499f refactor(tools): extract shared sentinel-free abs cwd validator
_configured_terminal_cwd and _registered_task_cwd_override carried a
byte-identical sentinel + expanduser + isabs validation tail. Extract it
into _sentinel_free_abs_cwd(raw) so the relative/sentinel rejection rule
lives in one place. Behaviour unchanged (the str() coercion the override
path relied on is preserved in the helper).
2026-06-15 14:03:41 +05:30
kshitijk4poor
b0c99c12dd docs(tools): document registered-cwd step in resolver docstrings
The session-cwd fix inserted a registered task/session cwd override step
between the live-cwd and $TERMINAL_CWD fallbacks, but three docstrings still
described the old two-step order — _resolve_base_dir's numbered list was
outright wrong. Update _authoritative_workspace_root, _resolve_base_dir, and
_path_resolution_warning to reflect the actual four-step resolution order.
No behaviour change.
2026-06-15 14:02:54 +05:30
kshitijk4poor
ddf7c7af81 refactor(tools): consolidate task-override lookup into one helper
The raw-key-first-then-collapsed override lookup was hand-rolled in three
places with subtly different spellings: terminal_tool's command setup, and
both file_tools._registered_task_cwd_override and _get_file_ops. Since that
exact raw-vs-collapsed invariant is what the session-cwd fix depends on,
keeping three copies invites the drift that caused the original bug.

Add terminal_tool.resolve_task_overrides(task_id) as the single source and
route all three sites through it. Behaviour is unchanged (verified
byte-equivalent across raw/collapsed/isolation/None/subagent inputs).
2026-06-15 14:02:17 +05:30
Gille
d6a8d9dcab fix(tools): respect session cwd in file tools 2026-06-15 14:00:42 +05:30
Ben Barclay
95715dcb03
fix(s6): reserved default gateway must not follow sticky active_profile (#46483)
The supervised `gateway-default` s6 slot runs bare `hermes gateway run`
(no -p) to mean "the root HERMES_HOME profile". But `_apply_profile_override`
falls through its #22502 HERMES_HOME guard for the container root
(/opt/data, whose parent is not `profiles`) and reads the sticky
`active_profile` file. If the user set another profile active (e.g. via
the dashboard), the reserved default gateway gets redirected into that
profile — producing a duplicate gateway for the active profile and no
real default gateway. The profile page and `gateway status` then
correctly report default as "not running" because there genuinely isn't
one.

Guard step 2 (the sticky active_profile fallback) with the existing
HERMES_S6_SUPERVISED_CHILD sentinel that the container run-script already
exports. Supervised named-profile slots pass -p explicitly (step 1, never
reaches step 2); only the bare default slot was affected. Inert outside
the s6 container — the sentinel is never set elsewhere.

Reported in the 'Docker & Profiles & Dashboard' support thread.
2026-06-15 05:36:20 +00:00
Ben Barclay
80f8ffc74c
fix(dashboard): pin machine-dashboard reroute to the machine root, not $HOME/.hermes (#46487)
The unified machine-dashboard reroute (cmd_dashboard) re-execs a named-profile
dashboard launch as the machine dashboard and dropped HERMES_HOME from the
child env with the comment "so the child binds the machine root". That holds
for a standard install (root == ~/.hermes) but breaks the Docker layout: the
published image sets `ENV HERMES_HOME=/opt/data`, so once HERMES_HOME is unset
the child falls back to $HOME/.hermes = /opt/data/.hermes — an empty,
auto-seeded home.

Two user-visible symptoms, one root cause (reported via support):

1. Dashboard Profiles page shows only an empty `default` — the real
   default/oracle/saga profiles live under /opt/data/profiles, but the
   rerouted child resolves _get_profiles_root() to /opt/data/.hermes/profiles.

2. The "Update Hermes" button runs `hermes update` inside the container
   repeatedly instead of bailing with the docker-update guidance. The Docker
   guard keys off detect_install_method(), which reads
   $HERMES_HOME/.install_method; the image stamps that at /opt/data, but the
   misresolved home has no stamp, no HERMES_MANAGED, and no .git → falls
   through to "pip", so the guard never fires.

The reporter's workaround was to bind-mount the host dir at both /opt/data and
/opt/data/.hermes so the two paths converge (at the cost of a self-referential
recursion).

Fix: resolve the machine root explicitly with get_default_hermes_root() and set
it on the child env instead of popping HERMES_HOME. That helper returns the
root for both layouts — ~/.hermes for a standard install, and /opt/data for
Docker (it strips a trailing profiles/<name>). Falls back to the old pop
behaviour only if root resolution raises, so the reroute is never blocked.

Regression tests in test_dashboard_unified_launch.py: the existing standard-
install test now asserts the child carries HERMES_HOME == get_default_hermes_root()
(not absent), and a new test_reexec_pins_docker_machine_root covers the Docker
layout (HERMES_HOME=/opt/data/profiles/oracle → child gets /opt/data). Both
fail against the pre-fix pop behaviour (mutation-verified).
2026-06-15 15:33:15 +10:00
Teknium
c2b7669ad3
fix(s6): clear stale log lock before startup (#46289)
* fix(cli): clear stale s6-log lock file before startup on virtiofs

* chore(release): map salvaged contributor

---------

Co-authored-by: zxcasongs <35259607+zxcasongs@users.noreply.github.com>
Co-authored-by: Ben <ben@nousresearch.com>
2026-06-15 14:10:51 +10:00
Teknium
b770967263
fix(s6): persist profile gateway desired state (#46292)
* fix: persist s6 gateway desired state

* chore(release): map salvaged contributor

---------

Co-authored-by: Alfred Smith <alfred@my-cloud.me>
Co-authored-by: Ben <ben@nousresearch.com>
2026-06-15 14:02:10 +10:00
Teknium
61ee2dbfdb
fix(s6): make profile gateway log parent writable (#46291)
* fix(gateway): chown logs/gateways parent so late-added profiles can log

The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.

Two complementary fixes:

- service_manager._render_log_run: chown the gateways/ parent
  (non-recursively) before the leaf chown. Runs on every root-context
  boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
  block; cont-init runs before any service starts, so the parent
  already exists hermes-owned when the first log/run does 'mkdir -p'.

The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.

Fixes #45258

* chore(release): map salvaged contributor

---------

Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
2026-06-15 13:47:05 +10:00
leo4226
f795513782
fix(windows): kill hermes before recreating venv to release _bcrypt.pyd lock (#45120)
On Windows, native Python extensions such as _bcrypt.pyd are loaded as
DLLs by any running hermes process. When the installer tries to recreate
the venv (Remove-Item -Recurse -Force "venv"), Windows denies the delete
because the DLL is still mapped into the running process.

Add a taskkill /F /T /IM hermes.exe call before the Remove-Item so any
hermes process tree is stopped first, releasing the file lock. A short
sleep gives the OS time to unload the image before deletion proceeds.

This mirrors the existing force_kill_other_hermes() guard already present
in the --update flow (update.rs), applying the same pattern to the full
reinstall/repair path through install.ps1.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 02:27:00 +00:00
Alli
8fe334b056
fix(desktop): inset hover-reveal trigger past the adjacent scrollbar (#44159)
The collapsed-pane hover-reveal trigger strip (14px wide, 6px edge
gutter) overlapped the neighboring scroller's 8px .scrollbar-dt
scrollbar, which sits flush with the window edge when the rail panes
are collapsed. Hovering the scrollbar revealed the file browser over
it, and clicks on the overlapped band hit the trigger instead of the
scrollbar thumb.

Widen the edge gutter to calc(0.5rem + 2px) so the strip clears the
scrollbar (rem-coupled to the .scrollbar-dt width) while still
covering the OS window-resize grab area inset.

Part of #44140 (item 2).

Co-authored-by: AIalliAI <285906080+AIalliAI@users.noreply.github.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 21:23:38 -05:00