The store pin changed package-lock.json, so the workspace-wide
npmDepsHash in nix/lib.nix is stale and the Nix flake check fails on
the hash mismatch. Use the hash reported by the real fetchNpmDeps
build (the flake check's `got:`), which is authoritative — it differs
from prefetch-npm-deps' lockfile-contents hash, exactly the divergence
nix/lib.nix already documents.
The desktop build break shipped because nothing in CI runs the
apps/desktop production build. typecheck only runs `tsc`, which does
not exercise Vite/Rolldown module resolution, so an unresolvable
package export (the @assistant-ui/tap "./react-shim" split) sailed
through green checks and only failed when users built from source on
install/update.
Add a desktop-build job that runs `npm run build` (tsc -b + vite build
+ assert-dist-built) for apps/desktop. This closes the gap so the same
class of break fails in CI instead of on every user's machine.
Lockfile invariant that would have caught the desktop build break: the
single hoisted @assistant-ui/tap must satisfy every @assistant-ui/*
package's declared tap requirement (deps or non-optional peer). It is a
contract, not a snapshot -- no hardcoded versions -- so it stays green
across routine bumps but fails the moment the cluster splits its tap
requirement again.
The desktop app is built from source on every install/update
(install.ps1 -> npm ci/install -> tsc -b && vite build). The
@assistant-ui packages share an internal reactivity lib,
@assistant-ui/tap, and only interoperate when they all resolve the
SAME tap version.
@assistant-ui/react@0.12.28 and @assistant-ui/core pin tap@^0.5.x
(which exports only "." and "./react"), but the caret range
react -> store@^0.2.9 floated store up to 0.2.18, which bumped its
tap peer to ^0.9.0 and began importing "@assistant-ui/tap/react-shim"
-- an entry point that only exists in the tap 0.9.x line. With the
hoisted tap stuck on 0.5.x, vite build crashed:
"./react-shim" is not exported ... from package @assistant-ui/tap
i.e. the opaque "apps/desktop build failed (exit 1)" everyone hit when
updating today.
Pin @assistant-ui/store via root overrides to 0.2.13 -- the last
release that targets tap@^0.5.x -- so react/core/store all agree on the
hoisted tap@0.5.14 again. Verified: tsc -b and vite build both pass.
PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat.
SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative.
Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py
Previously, delegate_task in batch mode only showed '3 parallel tasks'
without revealing what the tasks actually are. Single-task mode showed
the goal via the primary_args fallback, but batch mode had no goal
extraction.
Changes:
- build_tool_preview(): Add dedicated delegate_task handler that
extracts individual task goals from both single and batch modes.
Batch shows '3 tasks: Goal A | Goal B | Goal C'.
- _get_cute_tool_message_impl(): Show individual goals in CLI cute
messages for batch delegate calls ('3x: Goal A | Goal B').
- Add 4 tests covering single goal, batch goals, missing goals,
and no-goal edge case.
Asserts ensureGatewayProfile keeps $connection in lockstep with the active
profile's backend: activating a remote pool profile flips mode to remote,
returning to default resyncs to local, a failed descriptor fetch leaves the
prior connection intact, and a same-profile activation doesn't churn it.
Regression coverage for #46651.
The renderer's $connection seeds from the PRIMARY (window) backend at boot and
otherwise only refreshes on a sleep/wake reconnect. Activating a background
profile (ensureGatewayProfile) pointed the live gateway + REST at that profile's
backend but never updated $connection, so its `mode` stayed stuck on the
primary. With a local primary and a remote pool profile active, every code path
that branches on local-vs-remote misfired: image attachments went out via the
path-based `image.attach` instead of `image.attach_bytes`, handing the remote
gateway a client-only Windows path it can't resolve ("image not found: C:\..."),
and the /api/fs/* file browser and /api/media fetches targeted the wrong
machine.
Resync $connection from the now-active profile's descriptor right after the
gateway swap, so the remote-aware paths follow the live backend. Best-effort: a
failed descriptor fetch leaves the prior connection intact for boot/reconnect to
resync. Single-profile users are unaffected (the same-profile fast path never
runs the swap).
Fixes#46651
Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes.
Co-authored-by: goku94123 <gooku94123@gmail.com>
Track why a background process finished and include that source in notify-on-complete messages so SIGTERM from process.kill, kill_all, backend loss, and ordinary exits are distinguishable.
Route curator rollback through the same cross-process cron job lock, make save_jobs lock for legacy direct callers without deadlocking nested mutation paths, and harden the regression test so a second _jobs_lock caller really blocks across processes.
`hermes cron pause`/`resume`/`remove` run in their own CLI process (CLI →
cronjob tool → pause_job → update_job → save_jobs), entirely separate from
the gateway process that also writes jobs.json (mark_job_run, advance_next_run,
due-fast-forward in get_due_jobs). The only synchronization was a module-level
`threading.Lock`, which serializes writers *within a single process* but does
nothing across processes — and update_job/pause_job/remove_job/create_job did
not even take it.
The result is a classic lost update: a `cron pause` issued while the gateway is
live loads jobs.json, sets enabled=False, and saves; concurrently the gateway
loads the same file and saves back its run-bookkeeping, clobbering the pause.
The CLI prints "Paused" (it succeeded against its own in-memory copy) but the
job stays enabled and keeps firing, with no error surfaced. The scheduler's
`.tick.lock` flock can't be reused for this — it is held for the entire tick,
including multi-minute agent runs, so a CLI mutation would block for minutes.
Add `_jobs_lock()`: a short-held cross-process advisory file lock (fcntl/msvcrt
flock on `<hermes_home>/cron/.jobs.lock`) layered over the existing in-process
lock, and wrap every load→modify→save critical section with it — create_job,
update_job, remove_job, mark_job_run, advance_next_run, get_due_jobs,
rewrite_skill_refs. The lock degrades to in-process-only if neither fcntl nor
msvcrt is available, preserving prior behaviour. All critical sections are short
(field edits, no agent execution), so contention resolves in milliseconds.
Adds a regression test that proves the lock excludes a second process (an
in-process threading.Lock cannot).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.
Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
send_message(target="whatsapp:<group-jid>") silently delivered to the
configured home DM instead of the requested group. Two gaps:
1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us),
user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and
broadcast/newsletter JIDs matched no pattern and fell through to
`return None, None, False`, so the caller treated them as
unresolvable and used the home channel. The bridge's /send endpoint
accepts any chatId, so only the tool-side target parsing was at fault.
Add a whatsapp branch that recognizes native JIDs as explicit targets.
The pre-existing '+'-prefixed E.164 path is preserved.
2. WhatsApp groups have no human-friendly name — the channel directory
is regenerated from session data on a timer, so a group shows up as
its raw 18-digit JID and any hand-edit to channel_directory.json is
clobbered on the next rebuild. Add a user-maintained alias overlay
(~/.hermes/channel_aliases.json) re-applied on every build AND every
load, giving durable friendly names and letting a freshly-created
group be pre-named before its first message.
Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser;
TestChannelAliases (7 cases) for the overlay, plus an autouse fixture
isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the
existing directory tests.
Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.
dump_api_request_debug() masks the provider Authorization header but writes
the request `body` (system prompt, tool defs, context-embedded values) and the
error message raw via atomic_json_write. This path also fires unconditionally
on API errors (not only under HERMES_DUMP_REQUESTS), so any secret surfaced
into context (e.g. an integration token) lands in cleartext at
request_dump_*.json on every failed call.
Run the serialized dump through the existing redact_sensitive_text() scrubber
(already used for logs/tool output) before persisting and before the
HERMES_DUMP_REQUEST_STDOUT print; preserve atomicity via temp-file +
Path.replace. Also add the Notion internal-integration prefix (ntn_) to
_PREFIX_PATTERNS so bare values are caught.
Per SECURITY.md §3.2 this is a redaction (in-process heuristic) hardening, not
a §3.1 vulnerability. Refs #46583.
converse() and converse_stream() were added in boto3 1.34.59. When Hermes
is installed editable into system Python (e.g. Ubuntu 24.04 ships 1.34.46),
the system boto3 takes precedence and calls to converse_stream fail with
AttributeError. Add an early version check in _require_boto3() that raises
a clear RuntimeError with upgrade instructions.
On Linux, systemd spawns core services (cron, nginx, sshd) with
deterministic PIDs and jiffy start_times across reboots. A service can
land on the exact same PID and start_time as a previous gateway, causing
acquire_scoped_lock to mistake it for a live gateway and block startup.
The existing stale-detection paths only covered:
- start_times both non-None and different (clear mismatch)
- start_times both None (macOS/Windows fallback to cmdline check)
The boot-time collision falls through both: times are non-None and
equal, so neither branch fired.
Add a third check: when both start_times are known and match but the
live process fails _looks_like_gateway_process, read its cmdline. If
the cmdline is readable (non-None), we have positive evidence of an
impostor and mark the lock stale. Requiring a readable cmdline keeps the
check conservative — if cmdline is unreadable we do not evict.
Adds an observation_scopes config key (and HINDSIGHT_RETAIN_OBSERVATION_SCOPES
env var) so retained memories can opt into per_tag / all_combinations /
custom scoping instead of Hindsight's default combined pass.
Threaded through _build_retain_kwargs so all three retain paths honor it:
auto-retain and flush-on-switch already use aretain_batch; the tool retain
path is switched from aretain to aretain_batch (functionally equivalent,
aretain just wraps a single-item batch) since aretain doesn't accept the
observation_scopes parameter.
Salvaged commit in this PR is authored by capt-marbles
(andrewdmwalker@gmail.com), a bare gmail that does not auto-resolve in
the check-attribution job. Add the AUTHOR_MAP entry.
The salvaged read-side fix lets a profile resolve the xAI OAuth grant from
the global-root auth store when it has no own providers.xai-oauth block.
But _save_xai_oauth_tokens still wrote rotated tokens only to the active
profile store. Because xAI rotates the refresh_token on every refresh, a
profile that reads root's grant and refreshes it left root holding a now-
revoked refresh token — killing every other profile reading the stale root
grant with invalid_grant once its access token expired (#43589).
Detect the read-from-root case (profile lacks its own providers.xai-oauth
block) and, after the profile save, write the rotated chain back to the
global root too via a best-effort, TOCTOU-safe write-through that reuses
_save_auth_store with an explicit target path. A profile that genuinely
shadows root (has its own block) is left untouched, classic mode is a
no-op, and a failed root write never breaks the profile's own save.
Pairs with the read fallback in the preceding commit so the cross-profile
xAI grant stays coherent in both directions.
* fix(docker): skip per-profile gateway reconciliation in dashboard container
When gateway and dashboard containers share a bind-mounted HERMES_HOME,
both run the cont-init.d profile reconciliation script, which creates
s6-log processes for every persisted profile. These s6-log processes
in different containers race to flock() the same log-directory lock
files under logs/gateways/<profile>/lock, producing repeated
"s6-log: fatal: unable to lock ... Resource busy" errors and a
supervision restart storm.
Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py
and set it in the official docker-compose.yml dashboard service so the
dashboard container no longer creates per-profile gateway s6 services
it never uses.
* chore(release): map salvaged contributor
* refactor(docker): autodetect dashboard container instead of env-var gate
Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role
detection. A dashboard-only container never spawns or supervises
per-profile gateways, so the reconcile boot hook now skips itself when
/proc/1/cmdline is the dashboard command — no operator flag to set (or
forget in a hand-written manifest, which would reintroduce the s6-log
flock storm this prevents).
- Extract _strip_container_argv_prefix() shared by the legacy-gateway
and new dashboard detectors (DRY the init/wrapper/hermes peel).
- Add _is_dashboard_container(); gate reconcile main() on it.
- Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml.
- Tests: argv matrix for both roles + main()-level skip/reconcile proof
and a regression that the removed env var is now inert.
Co-authored-by: 895252509 <895252509@qq.com>
---------
Co-authored-by: zhouxiang <895252509@qq.com>
Co-authored-by: Ben <ben@nousresearch.com>
_configured_terminal_cwd and _registered_task_cwd_override carried a
byte-identical sentinel + expanduser + isabs validation tail. Extract it
into _sentinel_free_abs_cwd(raw) so the relative/sentinel rejection rule
lives in one place. Behaviour unchanged (the str() coercion the override
path relied on is preserved in the helper).
The session-cwd fix inserted a registered task/session cwd override step
between the live-cwd and $TERMINAL_CWD fallbacks, but three docstrings still
described the old two-step order — _resolve_base_dir's numbered list was
outright wrong. Update _authoritative_workspace_root, _resolve_base_dir, and
_path_resolution_warning to reflect the actual four-step resolution order.
No behaviour change.
The raw-key-first-then-collapsed override lookup was hand-rolled in three
places with subtly different spellings: terminal_tool's command setup, and
both file_tools._registered_task_cwd_override and _get_file_ops. Since that
exact raw-vs-collapsed invariant is what the session-cwd fix depends on,
keeping three copies invites the drift that caused the original bug.
Add terminal_tool.resolve_task_overrides(task_id) as the single source and
route all three sites through it. Behaviour is unchanged (verified
byte-equivalent across raw/collapsed/isolation/None/subagent inputs).
The supervised `gateway-default` s6 slot runs bare `hermes gateway run`
(no -p) to mean "the root HERMES_HOME profile". But `_apply_profile_override`
falls through its #22502 HERMES_HOME guard for the container root
(/opt/data, whose parent is not `profiles`) and reads the sticky
`active_profile` file. If the user set another profile active (e.g. via
the dashboard), the reserved default gateway gets redirected into that
profile — producing a duplicate gateway for the active profile and no
real default gateway. The profile page and `gateway status` then
correctly report default as "not running" because there genuinely isn't
one.
Guard step 2 (the sticky active_profile fallback) with the existing
HERMES_S6_SUPERVISED_CHILD sentinel that the container run-script already
exports. Supervised named-profile slots pass -p explicitly (step 1, never
reaches step 2); only the bare default slot was affected. Inert outside
the s6 container — the sentinel is never set elsewhere.
Reported in the 'Docker & Profiles & Dashboard' support thread.
The unified machine-dashboard reroute (cmd_dashboard) re-execs a named-profile
dashboard launch as the machine dashboard and dropped HERMES_HOME from the
child env with the comment "so the child binds the machine root". That holds
for a standard install (root == ~/.hermes) but breaks the Docker layout: the
published image sets `ENV HERMES_HOME=/opt/data`, so once HERMES_HOME is unset
the child falls back to $HOME/.hermes = /opt/data/.hermes — an empty,
auto-seeded home.
Two user-visible symptoms, one root cause (reported via support):
1. Dashboard Profiles page shows only an empty `default` — the real
default/oracle/saga profiles live under /opt/data/profiles, but the
rerouted child resolves _get_profiles_root() to /opt/data/.hermes/profiles.
2. The "Update Hermes" button runs `hermes update` inside the container
repeatedly instead of bailing with the docker-update guidance. The Docker
guard keys off detect_install_method(), which reads
$HERMES_HOME/.install_method; the image stamps that at /opt/data, but the
misresolved home has no stamp, no HERMES_MANAGED, and no .git → falls
through to "pip", so the guard never fires.
The reporter's workaround was to bind-mount the host dir at both /opt/data and
/opt/data/.hermes so the two paths converge (at the cost of a self-referential
recursion).
Fix: resolve the machine root explicitly with get_default_hermes_root() and set
it on the child env instead of popping HERMES_HOME. That helper returns the
root for both layouts — ~/.hermes for a standard install, and /opt/data for
Docker (it strips a trailing profiles/<name>). Falls back to the old pop
behaviour only if root resolution raises, so the reroute is never blocked.
Regression tests in test_dashboard_unified_launch.py: the existing standard-
install test now asserts the child carries HERMES_HOME == get_default_hermes_root()
(not absent), and a new test_reexec_pins_docker_machine_root covers the Docker
layout (HERMES_HOME=/opt/data/profiles/oracle → child gets /opt/data). Both
fail against the pre-fix pop behaviour (mutation-verified).
* fix: persist s6 gateway desired state
* chore(release): map salvaged contributor
---------
Co-authored-by: Alfred Smith <alfred@my-cloud.me>
Co-authored-by: Ben <ben@nousresearch.com>
* fix(gateway): chown logs/gateways parent so late-added profiles can log
The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.
Two complementary fixes:
- service_manager._render_log_run: chown the gateways/ parent
(non-recursively) before the leaf chown. Runs on every root-context
boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
block; cont-init runs before any service starts, so the parent
already exists hermes-owned when the first log/run does 'mkdir -p'.
The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.
Fixes#45258
* chore(release): map salvaged contributor
---------
Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
On Windows, native Python extensions such as _bcrypt.pyd are loaded as
DLLs by any running hermes process. When the installer tries to recreate
the venv (Remove-Item -Recurse -Force "venv"), Windows denies the delete
because the DLL is still mapped into the running process.
Add a taskkill /F /T /IM hermes.exe call before the Remove-Item so any
hermes process tree is stopped first, releasing the file lock. A short
sleep gives the OS time to unload the image before deletion proceeds.
This mirrors the existing force_kill_other_hermes() guard already present
in the --update flow (update.rs), applying the same pattern to the full
reinstall/repair path through install.ps1.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The collapsed-pane hover-reveal trigger strip (14px wide, 6px edge
gutter) overlapped the neighboring scroller's 8px .scrollbar-dt
scrollbar, which sits flush with the window edge when the rail panes
are collapsed. Hovering the scrollbar revealed the file browser over
it, and clicks on the overlapped band hit the trigger instead of the
scrollbar thumb.
Widen the edge gutter to calc(0.5rem + 2px) so the strip clears the
scrollbar (rem-coupled to the .scrollbar-dt width) while still
covering the OS window-resize grab area inset.
Part of #44140 (item 2).
Co-authored-by: AIalliAI <285906080+AIalliAI@users.noreply.github.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>