- Use os.pathsep instead of literal ':' so Windows paths (C:\dir) and
the Windows separator ';' work correctly.
- Add 9 tests covering multi-root behavior: writes inside first/second
root, writes outside all roots, trailing/leading/double separators,
all-separators edge case, static deny priority, duplicate dedup.
- Update hermes_cli/tips.py tip string to mention multiple paths.
- Update docs to mention os.pathsep / ; on Windows.
Follow-up for salvaged PR #49557.
The OpenAI SDK exposes client.base_url as an httpx.URL object, not str.
The isinstance(live_raw, str) guard made this branch dead code in
production. Use _normalized_runtime_url (which coerces via str()) so
the fallback actually fires.
When parent_agent.base_url still carries a stale OpenRouter URL but the
live OpenAI client already points at local Ollama, subagents were routing
API calls to OpenRouter and failing with HTTP 401. Prefer _client_kwargs
and the mounted client base_url when they disagree with the surface field.
The PR's original refactor commit only replaced the primitives (regex,
is_table_row, split_markdown_table_row) with shared imports but left the
verbatim-copied renderer (_render_table_block_for_telegram) and driver
(_wrap_markdown_tables) in place. Both are logic-identical to the shared
convert_table_to_bullets in gateway/platforms/helpers.py.
Replace both with a direct import alias. _TABLE_SEPARATOR_RE is still
imported separately because it's used by the rich-message routing logic
(lines 1024, 1044) to detect whether content contains tables.
Found by 3-agent parallel code-reuse review.
Replace local _TABLE_SEPARATOR_RE, _is_table_row, and
_split_markdown_table_row with imports from the shared module.
Telegram-specific rendering stays local.
Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>
Discord does not render GFM pipe tables — raw pipe characters display
as garbage text. format_message now rewrites tables into bold-heading +
bullet groups using the shared helpers.
Fixes#21168
Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>
Move table-detection regex, row-splitting, and table-to-bullet
conversion into gateway/platforms/helpers.py so both Discord and
Telegram adapters can share them.
Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>
The desktop MoA settings 'Add preset', 'Set default', and 'Delete' buttons
mutated local React state only and never called the save endpoint, so a newly
constructed preset vanished on refresh. Each now builds the next config and
calls saveMoa() so the change is written to config.yaml via PUT /api/model/moa.
A MoA preset whose reference or aggregator slot points at the moa virtual
provider creates a recursive MoA tree. The runtime guards in moa_loop.py only
surface this mid-turn (references silently skipped, aggregator raises). Reject
it at the config chokepoint (_clean_slot) so it can never be saved, and hide it
from the desktop/dashboard slot pickers so it isn't offered as a dead choice.
preexec_fn=os.setsid runs Python code in the forked child before exec,
which is unsafe in multi-threaded processes (CPython docs). When the
Desktop gateway loads native libraries (onnxruntime, BLAS, provider SDKs)
with active thread pools, the fork can SIGSEGV before the child execs.
Replace all preexec_fn usage with start_new_session=True, which provides
the same setsid/process-group semantics without running Python in the
fork. This is already the pattern used throughout hermes_cli/gateway.py
and hermes_cli/_subprocess_compat.py.
Fixes#46789
_normalize_preset uses bare float() and int() to coerce
reference_temperature, aggregator_temperature, and max_tokens from
config.yaml. When a user hand-edits a non-numeric value (e.g.
max_tokens: "8k" or reference_temperature: "hot"), the coercion raises
ValueError. Since normalize_moa_config runs on every model-selection
and MoA turn (via resolve_moa_preset), the crash is unrecoverable and
blocks all MoA usage until the config is manually fixed.
Replace the bare casts with _coerce_float / _coerce_int helpers that
fall back to the default on TypeError/ValueError instead of raising.
The autonomous self-improvement review fork could still write to a pinned
skill — only external/bundled/hub-installed/protected-builtin skills were
guarded. The curator skips pinned skills from every auto-transition; the
review fork is the same kind of no-user-present actor and must too.
Adds a pin check to _background_review_write_guard so background-origin
edit/patch/delete/write_file/remove_file on a pinned skill are refused.
Stricter than the foreground _pinned_guard (delete-only) by design: with
no user in the loop there is no one to consent to an edit.
Fixes#25839
The verify-on-stop guard (#52296) printed '↻ Verification required before
finishing' to the terminal on every internal nudge turn, adding noise to
CLI/gateway sessions whenever code was edited without fresh passing checks.
Demote the user-facing status emit to a logger.debug breadcrumb — the loop
still nudges the model to verify before finishing, just silently.
The dangerous-command approval layer already blocks `hermes gateway
(stop|restart)`, `pkill/killall hermes|gateway`, and `kill ... $(pgrep ...)`.
A reporter noted on #33071 that the agent can still achieve the same
effect by driving launchd directly against the gateway's service label
(`launchctl stop ai.hermes.gateway`, `launchctl kickstart -k
system/ai.hermes.gateway`, etc.) or by substituting `pidof` for `pgrep`
in the kill-expansion form.
This widens the "Gateway lifecycle protection" block in
`tools/approval.py` to cover both vectors:
- `launchctl (stop|kickstart|bootout|unload|kill|disable|remove)`
scoped to commands that target a Hermes label (`hermes`,
`ai.hermes`). Read-only inspection (`launchctl print …`,
`launchctl list`) and operations against unrelated labels remain
unflagged.
- `kill ... $(pidof …)` and the backtick form, alongside the existing
`pgrep` expansion. `pidof` is the BSD/Linux equivalent and is
equally opaque to the `(pkill|killall) … hermes` name pattern.
Intentionally left out of scope: plain `kill -TERM <numeric_pid>` with
a PID looked up out-of-band. Catching that would require runtime PID
state and would break the existing
`TestPgrepKillExpansion::test_safe_kill_pid_not_flagged` contract,
which guarantees that a plain literal-PID `kill 12345` stays safe.
Embeds reach out to third parties on render, so default to a placeholder that
mirrors the tool-approval UX: "Load <service>" (this embed) or "Always allow
<service>" (persisted). A desktop-local store ($embedMode ask|always|off +
per-service allowlist) gates the fetch with zero gateway round-trip; an
Appearance setting controls the global default. Local renderers (mermaid, svg,
alerts) are never gated. Addresses review feedback on outbound third-party
requests.
Natural-language skill search returned a short, arbitrary list and never
surfaced NVIDIA (or OpenAI/Anthropic/HuggingFace) skills. Two causes:
1. The runtime index collapses every GitHub tap into source="github", so
there was no way to find or filter by provider at the CLI — the per-tap
identity only existed in the docs-site catalog.
2. HermesIndexSource.search matched only name/description/tags (not the
identifier or provider) and broke at the first `limit` hits in raw index
order, burying the most relevant skills. `search` also defaulted to
--limit 10 against an 86k-entry catalog.
Changes:
- GitHubSource stamps a per-tap provider label (extra.provider) on each
skill via github_provider_for(); source stays "github" so dedup/floor/
index-skip logic is untouched. Flows into the built index.
- HermesIndexSource.search now matches identifier + provider too, and
collect-then-ranks (exact > prefix > whole-word > substring) instead of
break-at-limit.
- --source nvidia|openai|anthropic|huggingface|voltagent|gstack|minimax
provider filters for browse/search (narrows merged results by provider).
- search --limit default 10 -> 25; table Source column shows the provider
label for github skills.
Tested: 181 unit tests pass; E2E against the live runtime index confirms
'nvidia'/'cuda' searches now surface NVIDIA-provider skills and
--source nvidia narrows to exactly the NVIDIA catalog.
The post-update gateway restart path relaunched the gateway with the
venv's console `python.exe` (via `get_python_path()` in
`_gateway_run_args_for_profile`). On Windows this leaves a terminal
window open permanently: uv's `venv\Scripts\python.exe` is a launcher
shim that re-execs the *base* console interpreter, which allocates its
own conhost — and `CREATE_NO_WINDOW` cannot suppress that second window.
The clean-start path (`_spawn_detached`) already dodges this by routing
through `_resolve_detached_python` to use the windowless base
`pythonw.exe`; the restart watcher did not.
Symptom (reported on Windows 11): after an in-app GUI update, a console
window for the gateway stays open and never closes. Confirmed on the
reporter's box — the running gateway was `python.exe ... gateway run
--replace` with a live conhost child and the foreground "Press Ctrl+C to
stop" banner, born exactly at the update's "Restarting Windows gateway"
log line.
Fix:
- Add `gateway_windows.windowless_gateway_restart_spec(run_argv)` which
rewrites a console-python gateway argv into the windowless `pythonw.exe`
equivalent and returns the cwd + env overlay (VIRTUAL_ENV / PYTHONPATH /
HERMES_HOME) the base interpreter needs to import `hermes_cli` without
the venv launcher's site config. No-op on POSIX.
- `_spawn_gateway_restart_watcher` now applies that rewrite on Windows and
threads cwd= / env= into the inlined respawn Popen. Covers both restart
entry points (`launch_detached_profile_gateway_restart` and
`launch_detached_gateway_restart_by_cmdline`). CREATE_NO_WINDOW |
DETACHED_PROCESS | CREATE_BREAKAWAY_FROM_JOB and the breakaway-denied
fallback are all preserved.
Verified E2E on a real Windows 11 box: drove the actual watcher against a
dummy old-pid; the respawned gateway came up as `pythonw.exe` (zero
console python, no conhost child) and booted fully (housekeeping + kanban
dispatcher started → imports resolved under the base interpreter).
Tests: TestWindowlessGatewayRestartSpec (behavior) +
TestGatewayDetachedWatcherWindowsFlags regression assert. Pre-existing
Linux-only failures on a Windows host (SIGKILL, systemd, docker-root)
confirmed identical on the bare base.
Add a deprecation banner to the top of the dedicated Nix & NixOS setup
guide and consistency notes at the Nix sections of installation, updating,
and the plugin-distribution guide. Nix is now best-effort only; the
supported install paths are the curl|bash installer, Docker, and Windows.
On macOS, terminal(background=true) silently failed: the process returned a
session_id and exit_code=0 but the command never ran (empty stdout, no side
effects). Root cause is two interacting issues:
1. _find_shell was aliased to _find_bash, which prefers `shutil.which("bash")`
→ /bin/bash (GNU bash 3.2, still shipped on macOS) over $SHELL (/bin/zsh).
2. process_registry.spawn_local runs [shell, "-lic", "set +m; <cmd>"] with
stdin=/dev/null. bash 3.2 as a login shell sources ~/.bash_profile, which on
many macOS setups contains `exec /bin/zsh -l`; that exec replaces bash but
drops the -c argument, so the command is swallowed (exit 0, no output).
Decouple _find_shell from _find_bash: _find_shell now prefers the user's
configured $SHELL on POSIX (the shell they actually log in with), falling back
to _find_bash when $SHELL is unset/missing. _find_bash is unchanged, so callers
that genuinely need bash (e.g. the _run_bash login-shell snapshot) keep bash
semantics. zsh handles -lic correctly even with redirected stdin.
Salvaged from #42219 by @liuhao1024 (authorship preserved via cherry-pick).
On top of the original (8 unit tests covering $SHELL-set/unset/missing/empty,
Windows-ignores-$SHELL, _find_bash-unchanged), added an E2E regression test
that reproduces the real bash-3.2 login-shell swallow (exit 0 / no file) and
asserts the shell _find_shell selects actually executes a -lic background
command. Mutation-verified: reverting _find_shell to the bash alias fails the
$SHELL-preference test. Bug reproduced directly: /bin/bash 3.2 -lic with a
.bash_profile->exec-zsh creates no file; zsh -lic does.
Closes#42203. Supersedes #42290.
When a Hermes-managed node/npm/npx shim exists but fails --version, redownload
the pinned nodejs.org bundle under HERMES_HOME/node and retry. Do not fall
back to system npm on PATH when a managed tree is present.
POSIX heal probes node, npm, and npx (npm can break while node still runs).
A stale or partial Hermes-managed Node tree under the active HERMES_HOME
can leave bin/npm behind while lib/cli.js is missing. File-existence checks
alone made hermes update pick that broken npm and skip healthy system npm
on PATH. Probe managed candidates with --version before preferring them.
projects.db (per-profile project store) and kanban.db were missing from
_QUICK_STATE_FILES, so the pre-update quick snapshot never backed them up.
On a desktop upgrade, when the update flow removes/replaces the file and the
post-update schema-init re-creates an empty one, all user-created projects,
folder mappings, the active-project pointer, kanban board bindings, and tasks
vanish silently — no error.
Add the per-profile user-created stores to the snapshot set:
- projects.db — project store
- response_store.db — gateway conversation history / tool payloads (WAL)
- memory_store.db — holographic memory facts/entities (WAL)
- verification_evidence.db — agent verification audit trail
- kanban.db — default board (back-compat <root>/kanban.db)
- kanban/boards — non-default boards (<root>/kanban/boards/<slug>/kanban.db
+ metadata); workspaces/ and attachments/ subtrees
are skipped as large + regenerable.
Also: the directory-branch of create_quick_snapshot now routes *.db through the
WAL-safe _safe_copy_db (SQLite backup() API), matching the top-level file path —
previously a non-default board DB with an open WAL could be copied inconsistently.
Salvaged from #52930 by @0xDevNinja (authorship preserved via cherry-pick).
On top of the original (which covered only projects.db + the default kanban.db),
this adds: non-default-board coverage, the three sibling per-profile DBs that
meet the same upgrade-wipe criteria, WAL-safe directory copies, and a
workspaces/attachments skip to avoid snapshot bloat (×20 retained). 8 tests,
all mutation-verified; E2E verified snapshot→wipe→restore preserves all six
store types on the real code path.
Closes#52889. Supersedes #52930.
The external-drain marker .drain_request.json is written under HERMES_HOME,
which on Hermes Cloud is a persistent Fly volume (/opt/data). A begin-drain
marker therefore SURVIVES the post-update machine restart. But the disruptive
lifecycle actions a drain protects (auto-update / image migrate / env edit /
profile change) all restart the machine — which is exactly the signal the drain
is over. The freshly-restarted gateway re-read the orphaned marker on its
startup reconcile and parked itself back in 'draining', refusing every new turn
indefinitely (NS-570: ~52 min until manually cleared).
Fix: stamp the marker with an identity of THIS container/VM instantiation
(kernel boot_id + PID 1 start time, read from /proc) and treat a marker whose
epoch differs from the current instantiation as absent. A deliberate restart →
new PID 1 → new epoch → stale marker ignored → gateway boots 'running'. A marker
written during the current instantiation (the live drain) still matches; an s6
respawn of just the gateway (PID 1/init unchanged) keeps the same epoch, so an
in-flight drain is still honoured (D4a reversibility preserved).
The staleness check is lenient and never fail-closed: a legacy marker with no
epoch, a corrupt/contentless marker, or an environment with no /proc (epoch
unavailable) all degrade to the original presence-only behaviour. NAS is
untouched — it only ever POSTs begin/cancel-drain over HTTP; the marker file is
purely gateway-internal IPC.
The fix is entirely within gateway/drain_control.py; the watcher and the
dashboard endpoint go through the same drain_requested()/write_drain_request()
chokepoints and need no functional change.
## Description
On macOS 26.x, `launchctl bootstrap` and `launchctl kickstart` return exit code 5 ("Input/output error"), which Hermes already anticipates and handles by spawning a detached fallback process. However, the gateway status reporting is ambiguous:
- `gateway status` says "Gateway service is loaded" (because `launchctl list` returns exit 0)
- But `launchctl print` shows `state = not running` — launchd isn't actually supervising anything
- The detached fallback PID running is invisible to the status command
- Users can't tell whether auto-start at login and auto-restart on crash are available
### Root Cause
Two problems in `hermes_cli/gateway.py`:
1. **`_probe_launchd_service_running()`** (line 1067): Determined launchd service liveness solely by `launchctl list <label>` exit code. On macOS 26, this returns 0 even when the service is only *registered* but not running (output lacks a `"PID"` field). This caused `GatewayRuntimeSnapshot.service_running = True` incorrectly, which suppressed the process/service mismatch warning.
2. **`launchd_status()`** (line 3569): Used the same binary "loaded/not loaded" check without inspecting whether launchd actually has a PID, whether a detached fallback is running, or whether auto-start/restart are available.
### Changes
**`hermes_cli/gateway.py`:**
1. **New `_parse_launchd_pid_from_list_output()` helper** — Extracts the PID from `launchctl list` output. When launchd is actively supervising, the output includes `"PID" = <number>;`. When only registered but not running, no PID field is present.
2. **Fixed `_probe_launchd_service_running()`** — Now requires a PID in the `launchctl list` output to confirm launchd is actually supervising. This correctly sets `service_running = False` when launchd has the service registered but `state = not running`, which triggers the existing process/service mismatch detection.
3. **Reworked `launchd_status()`** — Reports clearly separated information:
- LaunchAgent plist currentness (stale or current)
- Whether launchd is actively supervising (with PID)
- Whether a detached fallback PID is running
- Whether auto-start at login and auto-restart on crash are available
- When launchd supervision is known to be unavailable, explains why
4. **Persistent unsupported marker** (`~/.hermes/.gateway-launchd-unsupported`) — Written when `_launchd_fallback_to_detached()` is called (launchd exit 5/125). Allows `launchd_status()` to explain *why* launchd can't supervise even when no fallback process is currently running. Cleared automatically when a future bootstrap/kickstart succeeds (e.g., after an OS update fixes the issue).
5. **Updated `_print_gateway_process_mismatch()`** — Distinguishes the managed detached fallback from a genuinely manual `nohup hermes gateway run`, providing accurate guidance for each case.
### Status Output Examples
**Before** (macOS 26, fallback active):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
✓ Gateway service is loaded
{
"Label" = "ai.hermes.gateway";
"OnDemand" = true;
...
};
```
**After** (macOS 26, fallback active):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
⚠ Gateway service is registered but launchd is not supervising it
launchd cannot manage the gateway on this macOS version.
✓ Detached fallback process is running (PID 12345)
Cron jobs will fire. Stop with: hermes gateway stop
⚠ Auto-start at login and auto-restart on crash are NOT available.
```
**After** (normal launchd supervision):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
✓ Gateway is supervised by launchd (PID 12345)
Auto-start at login and auto-restart on crash are available.
```
### Tests
Updated 5 existing tests and added 11 new tests in `tests/hermes_cli/test_gateway_service.py`:
- PID parsing from `launchctl list` output (with PID, without PID, empty, unquoted PID)
- `_probe_launchd_service_running()` requires PID presence
- Unsupport marker lifecycle (write, clear, persist across fallback)
- Marker cleared on successful bootstrap
- `launchd_status()` reporting: supervised, fallback-running, fallback-unavailable
- Existing fallback tests now verify marker creation
### Related Issues
- Issue #23387 (original macOS 26 launchd workaround)
- Issue #42524 (this issue)
A clarify/approval/sudo/secret prompt blocks the turn on the user, but the UI
treated it as an in-flight turn: the "thinking" timer kept ticking and Esc
interrupted the run — discarding a question you might want to come back to. Add
$activeSessionAwaitingInput (the pet's awaitingInput concept, scoped to the
active session) and use it to suppress the stall indicator and disarm Esc while a
prompt waits. Clear the session's prompts (and needsInput) on Stop and on turn
end so a resolved/aborted turn can't leave a dead panel or a stuck "needs input"
dot.
The inline clarify panel used its own card tokens, an animated ring, and
oversized spacing — out of step with every other tool row. Rebuild it on the
shared --ui-*/--conversation-* tokens: a compact panel, letter-key badges
(A/B/C…) that double as a/b/c… shortcuts, an inline content-sizing "Other" field
(CSS field-sizing — no view swap, no layout shift on focus), and a Continue
button so picking an option selects rather than auto-sends. Selection lives on
the letter badge alone (solid primary; outlined while Other is focused-but-empty).
Also settle the panel into the standard tool block once the turn stops running,
so a stopped turn no longer strands a live, unanswerable prompt.
Add a content-agnostic Zoomable primitive (useZoomPan hook + overlay viewer):
click to open full-screen, wheel-zoom toward the cursor, drag to pan, toolbar
zoom/reset, and an optional copy action. Wire Mermaid diagrams into it with
copy-as-PNG; reusable for other inline content later.