* fix: persist s6 gateway desired state
* chore(release): map salvaged contributor
---------
Co-authored-by: Alfred Smith <alfred@my-cloud.me>
Co-authored-by: Ben <ben@nousresearch.com>
* fix(gateway): chown logs/gateways parent so late-added profiles can log
The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.
Two complementary fixes:
- service_manager._render_log_run: chown the gateways/ parent
(non-recursively) before the leaf chown. Runs on every root-context
boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
block; cont-init runs before any service starts, so the parent
already exists hermes-owned when the first log/run does 'mkdir -p'.
The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.
Fixes#45258
* chore(release): map salvaged contributor
---------
Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
On Windows, native Python extensions such as _bcrypt.pyd are loaded as
DLLs by any running hermes process. When the installer tries to recreate
the venv (Remove-Item -Recurse -Force "venv"), Windows denies the delete
because the DLL is still mapped into the running process.
Add a taskkill /F /T /IM hermes.exe call before the Remove-Item so any
hermes process tree is stopped first, releasing the file lock. A short
sleep gives the OS time to unload the image before deletion proceeds.
This mirrors the existing force_kill_other_hermes() guard already present
in the --update flow (update.rs), applying the same pattern to the full
reinstall/repair path through install.ps1.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
When an existing install at $INSTALL_DIR has an unmerged index (files in a
"needs merge" state left by a previously interrupted update), the update path
ran `git stash` then `git checkout <branch>`. On a conflicted index `git stash`
aborts with "could not write index" and `git checkout` then aborts with "you
need to resolve your current index first" — surfacing to desktop/bootstrap
users as `git checkout main failed (exit 1)` and failing the whole install at
the repository stage.
Mirror the `hermes update` Python path (#4735): detect unmerged entries with
`git ls-files --unmerged` and clear the conflict state with `git reset` before
stashing. Working-tree changes are still captured by the subsequent stash, so
nothing is discarded; only the index-level conflict markers are dropped, which
lets the checkout proceed.
Fixed in both installers (install.sh and install.ps1) so the Windows GUI
installer and the POSIX one share the same recovery behavior.
Addresses PR review feedback:
- Validate refresh_token (not only access_token) before persisting the
re-imported Codex token, so a half-token payload can't silently break the
next refresh cycle.
- Make the recovery log path-agnostic ("Codex CLI auth.json") since
_import_codex_cli_tokens can read $CODEX_HOME, not only ~/.codex.
- Add regression test: relogin-required + imported token missing refresh_token
-> re-raise and persist nothing.
- Map kenmege@yahoo.com -> Kenmege in scripts/release.py AUTHOR_MAP
(fixes the check-attribution job).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Salvages PR #25747 by preserving gateway session rotation even when a post-compression model call fails before returning final content.
Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>
The deploy-site skills index crawl was capped at ~3k ClawHub entries
because CATALOG_WALK_BUDGET_SECONDS applied to max_items=0 walks too.
Only enforce the wall-clock budget for bounded browse requests and pass
limit=0 from build_skills_index so CI walks the full catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
Follow-up to #44389: the generic 'except Exception' branch in connect()
had the same orphaned-task hazard as the timeout branch. Extract the
cancel-and-await logic into _cancel_bot_task() and call it from all
three sites (timeout branch, exception branch, disconnect()).
Also adds deaneeth to AUTHOR_MAP.
A long-lived Baileys bridge survives gateway restarts AND hermes update:
connect() adopted any bridge already listening with status connected, and
disconnect() only kills bridges the adapter spawned itself. Users who
updated to get inbound media support kept talking to a bridge process
serving months-old bridge.js — images and voice notes still arrived as
placeholders with no cached file path (refs #19105 follow-up reports).
Three fixes in the same stale-bridge class:
- Staleness handshake: bridge.js reports a sha256 self-hash in /health
(scriptHash); connect() compares it against bridge.js on disk and
restarts the bridge on mismatch. Pre-handshake bridges report no hash
and are treated as stale, so every existing stale bridge gets recycled
exactly once on the next gateway start.
- npm dep refresh: deps reinstall when package.json changes (stamp file
in node_modules), not only when node_modules is missing — a Baileys
pin bump now actually lands.
- Cache-dir passthrough: the gateway passes profile-aware
HERMES_{IMAGE,AUDIO,DOCUMENT}_CACHE_DIR to the bridge instead of the
bridge hardcoding ~/.hermes/image_cache etc., fixing media paths under
HERMES_HOME overrides, profiles, and the new cache/ layout.
Follow-up to the winget stale-registration fix. Update-ProcessPathForPackages
rebuilt $env:Path wholesale from the persisted User+Machine hives (plus winget's
Links dir), discarding any process-only PATH entries added earlier in the
installer run. Since the helper now runs after every package manager, that
wholesale replace is more likely to clobber a process-local entry than the
original winget-branch-only version was.
Merge instead: seed from the current process PATH, then append hive and
winget-Links entries not already present, with a case-insensitive,
order-preserving dedupe. Behaviour on a clean box is unchanged (the hive entries
are simply appended); the difference is that pre-existing process-only entries
now survive the refresh.
When ripgrep/ffmpeg is missing, `winget install <id>` on a package winget
already has registered is treated as an upgrade: it finds no newer version and
exits 0x8A15002B (-1978335189, APPINSTALLER_CLI_ERROR_UPDATE_NOT_APPLICABLE)
without ensuring the binary is actually present. The installer only logged that
code and judged success by `Get-Command rg`, so a stale registration (files
removed outside winget, or a missing alias shim) became a permanent dead-end —
winget kept reporting "already installed" and the user could never reinstall.
Detect that exit code and retry once with `--force` to repair the registration
so the shim reappears.
Also refresh the process PATH after the choco and scoop fallbacks (not just
winget) via a shared helper, so a successful fallback install — or any install
on a box without winget — is no longer misreported as "not installed".
Follow-up to the salvaged #41264 (Windows watcher): the setsid/bash detached
restart watcher on Linux/macOS inherits _HERMES_GATEWAY=1 the same way, so
the CLI's self-restart loop guard silently refuses 'hermes gateway restart'
and the gateway never comes back. Scrub the marker from the watcher env on
the POSIX branch as well, and extend the setsid test to assert it.
The salvaged draft-persistence effect wrote to localStorage on every
keystroke — the composer's per-keystroke path was deliberately slimmed
down previously, so debounce the write (400ms) and flush pending text on
scope change/unmount so a fast session switch can't drop trailing
keystrokes. Also add AUTHOR_MAP entry for the salvaged commit.
Replace the hermes-identifying clientInfo/User-Agent/session-id prefix on
the keyless Parallel Search MCP path with a neutral 'mcp-web-client'
identity. Project policy forbids third-party usage attribution without an
explicit user opt-in (see telemetry PR policy); MCP requires a clientInfo,
so a generic one satisfies the spec without attributing traffic.
Also adds the contributor AUTHOR_MAP entry and refreshes uv.lock against
current main (parallel-web 0.6.0).
* fix(desktop): keep model runtime state per session
(cherry picked from commit f72ee87d99ee38cb7b5badeb9a8af869bb92073a)
* fix(desktop): keep footer model state scoped to active session
(cherry picked from commit d91942ebd4671ff857b5c8526dbf133f04782ecb)
* fix(desktop): restore stored runtime when resuming sessions
(cherry picked from commit 32b3793418257617b8da57e26151f079c2620d00)
* fix(desktop): persist live runtime changes for resume
(cherry picked from commit c58467779436dcef44a80ad55b52664752dc0837)
* fix(desktop): persist resumed endpoint runtime
* chore(attribution): map pinguarmy's commit email in AUTHOR_MAP
The salvaged commits on this branch preserve @pinguarmy's authorship
(郝鹏宇 / peterhao@Peters-MacBook-Air.local). Add the mapping so the
check-attribution CI gate resolves the email to the GitHub username.
---------
Co-authored-by: 郝鹏宇 <peterhao@Peters-MacBook-Air.local>
* fix(ci): append filesystem forensics when a per-file pytest run exhausts exit-4 retries
A PR-added test file (tests/test_iron_proxy.py, PR #30179) repeatedly
failed exactly one CI shard with 'ERROR: file or directory not found'
across 4 runs (including a fresh merge SHA on fresh runners), while the
identical slice passes locally against the same merge commit and a
tree-integrity watcher confirms no sibling test mutates the repo. Three
unrelated branches showed the same one-shard signature the same day.
We currently cannot attribute these because the log only carries
pytest's exit-4 line. This adds a forensics block to the captured
output when exit-4 survives the retry loop:
- does the file exist NOW (post-retries)
- parent dir entry count + similarly-named entries
- git status --porcelain dirty-entry count + first 10 entries
Zero behavior change: rc stays 4, retries unchanged, forensics wrapped
in a broad try/except so they can never mask the failure.
Two new tests cover the exhausted-retries and genuinely-missing paths.
* chore: drop the two forensics tests — ship the runner change only
- Add output_path suffix assertions (.ogg Telegram / .mp3 non-Telegram) to
_send_voice_reply tests, covering the OGG voice-note path that landed on
main in ae82eed2b (the PR's third commit was redundant with it).
- Convert test_gemini_default_is_32000 back to an invariant against
PROVIDER_MAX_TEXT_LENGTH instead of a hardcoded literal.
- Map barronlroth@gmail.com -> barronlroth in scripts/release.py.
A GPT-5 model rejecting max_tokens returns a 400 whose message contains the
literal substring 'max_tokens' — one of the _CONTEXT_OVERFLOW_PATTERNS. The 400
path in _classify_400 checked overflow patterns before any request-validation
check (which only existed on the 5xx path), so the parameter error was routed
into the compression loop, re-sent with the same bad param, and ended in
'Cannot compress further' on a tiny context.
Hoist a request-validation guard (unsupported/unknown parameter) above the
context-overflow check in _classify_400. Deliberately excludes the generic
invalid_request_error code, which OpenAI also stamps on real overflow 400s, so
genuine overflows still compress. Pairs with the max_completion_tokens param
fix that stops the bad request at the source.
Also adds AUTHOR_MAP entry for the salvaged PR #13902 commit.
The per-file test runner re-runs a file once when pytest exits 4 ("file or
directory not found") while the file exists on disk — a transient seen on
loaded shared CI runners where the planner collects a file (--collect-only
counts its tests) but the per-file subprocess fails to stat it moments later.
A single immediate retry could land in the same brief high-load window and
fail again, and the retry was gated on one Path.exists() check that can itself
be a flaky stat under that load — so a freshly-added test file that LPT pins to
one shard would deterministically red that shard on every run (no actual test
failure; the file just never executes).
- Extract the subprocess spawn/communicate/process-tree-kill logic into a
shared _spawn_pytest_once() helper (removes ~90 lines of duplication between
the primary run and the retry).
- Replace the single-shot retry with a bounded backoff loop
(_EXIT4_RETRY_ATTEMPTS, escalating sleep) that re-runs while the file is
present on disk.
- Add _file_present() which re-checks existence across a few spaced stats, so a
single flaky negative stat doesn't wrongly conclude the file is missing. A
genuinely-missing file (typo/deleted) still fails fast — exit 4 is not
swallowed when the file truly does not exist.
- Tests: transient-then-pass recovery, genuinely-missing fails fast with no
retry, give-up after max attempts, and _file_present transient/missing cases.
* docs(windows): correct native data dir to %LOCALAPPDATA%\hermes
The Windows-native guide claimed a deliberate split where config, auth,
skills, and sessions live under %USERPROFILE%\.hermes. That is not what
the installer does: scripts/install.ps1 sets HERMES_HOME=%LOCALAPPDATA%\hermes,
so data actually lives in %LOCALAPPDATA%\hermes alongside the disposable
install (the hermes-agent\, git\, node\, bin\ subdirectories) — `hermes
config` confirms config.yaml/.env resolve there, not under %USERPROFILE%.
Update the data-layout table, the "split is deliberate" note, the env-var
and uninstall sections to describe the real layout: data and install share
the %LOCALAPPDATA%\hermes root, reinstall only replaces hermes-agent\, and
a full wipe targets %LOCALAPPDATA%\hermes (with %USERPROFILE%\.hermes kept
only as a legacy/WSL cleanup). Mention HERMES_HOME as the override knob.
* docs(windows): fix PATH + bin layout to match installer
The installer adds hermes-agent\venv\Scripts (where hermes.exe lives) to
User PATH and sets HERMES_HOME — not %LOCALAPPDATA%\hermes\bin. The \bin
dir holds Hermes's managed uv.exe, not a hermes.cmd shim. Correct the
install-step list and the data-layout table accordingly.
* fix(install): show real HERMES_HOME path in setup messages
The native Windows installer wrote config/env/skills under $HermesHome
(%LOCALAPPDATA%\hermes) but its success messages claimed ~/.hermes,
which doesn't exist on native Windows. Print the actual paths so a new
user can find their config, .env, and skills.