* docs(discord): document bot-to-bot comms as unsupported (#32791)
Multi-profile bot-to-bot conversation is not a supported topology.
DISCORD_ALLOW_BOTS=none (the default) blocks all bot-originated
messages; setting mentions/all across multiple Hermes profiles to make
them reply to each other ack-loops because Discord's reply auto-mention
satisfies the mention gate every turn. Document the safe default and
the loop hazard so operators don't wire it up.
* docs(discord): infographic for bot-to-bot unsupported stance (#32791)
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.
Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
the stub-creation point and stamp _content_filter_terminated on the stub
(single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
burning any continuation retries; roll partial content back to the last
clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
Plain network stalls are unaffected (only content_policy_blocked is tagged).
Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.
Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
The MiniMax output-layer safety filter surfaces the error verbatim as
`output new_sensitive (1027)` (sometimes with additional provider
wrapping like 'Stream stalled mid tool-call: output new_sensitive (1027)').
When the model emits a large tool-call argument block, the upstream
filter trips and the SSE stream is truncated mid-flight, producing
'stream stalled mid tool-call' errors. Until now this case was
misclassified and retried 3x on the same provider, reproducing the same
refusal and burning paid attempts.
Adding `new_sensitive` to `_CONTENT_POLICY_BLOCKED_PATTERNS` routes
it through the existing is_client_error path: skip 3x retry, activate
configured fallback model immediately, surface a clear provider-safety
message to the user.
Refs #32421
A restart now interrupts in-flight agents immediately rather than holding
the gateway open for a grace window. The previous 180s default coupled two
independently-set timers: the gateway's own drain timer and systemd's
TimeoutStopSec. On a stale unit where TimeoutStopSec < drain, systemd
SIGKILLed the gateway mid-cleanup, leaving a stale lock that made the next
startup exit immediately ('already running') — an infinite crash loop under
Restart=on-failure (#31981).
Setting drain to 0 makes the mismatch structurally impossible: with drain 0
the generated unit gets TimeoutStopSec=90 against a near-instant drain, so
systemd never kills mid-cleanup. Contract: restart the gateway, in-flight
work stops. A grace window large enough to 'save' a long agent turn would
have to outlast an unbounded task, which is impossible.
Also fixes the stale-unit warning's suggested command
(hermes gateway service install --replace -> hermes gateway install --force);
the former subcommand does not exist.
Closes#31981
The .integration.test.mjs greps bridge.js source text for the queue
wiring — a change-detector that breaks on any benign refactor of the
same code. The behavioral unit test (bridge.sendqueue.test.mjs) already
covers FIFO ordering, error isolation, timeout propagation, and
single-consumer concurrency, which is the contract that matters.
Concurrent sock.sendMessage() calls on a single Baileys socket can cause
the WhatsApp protocol-level routing to misdeliver messages — responses
intended for one chat appear in another.
Add a promise-based send queue that serialises all sendMessage() calls
across concurrent HTTP /send, /edit, and /send-media handlers so only
one send is in-flight at a time.
Includes unit tests for queue ordering, error isolation, timeout
propagation, and single-consumer concurrency semantics, plus an
integration check that the queue is wired into sendWithTimeout.
Pull the #33271 post-interrupt recovery (flush_stdin + _force_full_redraw)
out of process_loop's finally block into _recover_terminal_after_interrupt(),
and replace the inline-logic-copy tests with ones that exercise the real
helper plus a source guard that process_loop still invokes it behind the
_last_turn_interrupted gate.
When the agent is interrupted during processing, prompt_toolkit's
renderer and VT100 input parser can be left in an inconsistent state.
CSI 6n cursor position report responses leak as literal text
(^[[19;1R) and the terminal stops accepting keyboard input.
Fix: in process_loop's finally block, after an interrupted turn:
- flush_stdin() to drain stray escape bytes from the OS input buffer
- _force_full_redraw() to reset prompt_toolkit's renderer cache
Closes#33271
Make check-and-replace atomic in _ensure_primary_openai_client by
keeping both operations under the same lock acquisition. Previously,
the lock was released between detecting a closed client and replacing
it, allowing two threads to simultaneously replace the client.
Fixes#32846
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 600s default evicted the gateway clarify entry while users were
still away (meeting/AFK); a later button tap then landed on a dead
entry and the agent hung on 'running: clarify'. Raise the default to
1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback,
documenting the running-agent-guard tradeoff. User overrides still win.
Generated images under a profile gateway's cache (profiles/<name>/cache/
images/...) were silently dropped from Telegram/Discord delivery when
HERMES_HOME is symlinked under a denied prefix (e.g. /opt/data ->
/root/.hermes) and $HOME is not that prefix. The resolved path lands
under /root (a system denylist prefix), the root-home exception only
fires when the denied prefix IS $HOME, and the static safe-roots list
only covers the active HERMES_HOME's top-level cache — not per-profile
cache dirs. Both gates fail, so validate_media_delivery_path returns
None and the gateway logs 'Skipping unsafe MEDIA directive path'.
_media_delivery_allowed_roots() now also enumerates per-profile cache
roots (<root>/profiles/*/cache/{images,audio,videos,documents,
screenshots}) at check time. Allowlist match runs before the denylist,
so the profile artifact delivers regardless of the /root interaction;
profile-dir credentials (auth.json) stay blocked since they aren't
under a cache subdir.
Reopened regression of #34485/#38108, neither of which covered the
profile-scoped symlink case. Fixes#31733.
On Darwin, synchronous=FULL (the WAL default) only issues a plain
fsync(), which Apple documents does NOT guarantee writes reach stable
storage or stay ordered. SQLite's WAL corruption-safety guarantee
assumes the OS honors the fsync barrier; macOS does not unless the app
uses F_FULLFSYNC. During a launchd *system* shutdown the page cache is
dropped (effectively power-loss for in-flight pages), so a WAL
checkpoint whose fsync 'reported' durable may never hit the platter —
corrupting state.db with a malformed image. That is the trigger in
#30636 ('SIGTERM during launchd shutdown under high load').
Apply PRAGMA checkpoint_fullfsync=1 (macOS-guarded) in
apply_wal_with_fallback. It forces the F_FULLFSYNC barrier only at
checkpoint boundaries (where WAL frames land in the main DB), so cost
amortizes to ~+0.1ms/commit vs ~+4ms for the broader fullfsync=1.
No-op off Darwin (F_FULLFSYNC is macOS-only).
Root-cause analysis by @catapreta on #30636. Supersedes #30654, whose
synchronous=FULL is a no-op (already FULL in WAL mode) and whose
TRUNCATE-on-close is already on main.
Co-authored-by: catapreta <catapreta@users.noreply.github.com>
The advisory reference view stripped all tool calls and tool results, so
reference models judged a task whose actions and results they never saw — and
references only fired once per user turn, never re-running as the agent's
state advanced through the tool loop.
Two fixes:
- _reference_messages() now PRESERVES the agent's tool calls and tool results,
rendering them inline as text ([called tool: ...] / [tool result: ...]) so a
reference gives an informed judgement on the real current state. Still emits
zero tool-role messages and zero tool_calls arrays (strict providers reject
those), and large tool results are previewed head+tail (4000-char budget).
The required end-on-user shape is met by APPENDING a synthetic advisory user
turn — not by deleting the agent's latest context (which the prior fix did).
- References now re-run on every state change — each new user message AND each
new tool result — instead of once per user turn. The state-sensitive advisory
signature drives the cache: new tool result = miss (re-run), identical-state
re-call = hit (no re-run, no re-emit).
The acting aggregator still receives the full, untrimmed transcript.
Two follow-ups on top of the salvaged #46365 fix:
1. Tests: the salvaged tests injected the ephemeral MatrixAdapter via
sys.modules["gateway.platforms.matrix"], but Matrix migrated to a plugin
(#41112) and the fallback now imports from plugins.platforms.matrix.adapter.
Point the three sys.modules patches at the current module path so the
ephemeral-fallback tests actually exercise the injected fake adapter.
2. Harden the live-adapter lookup: split the gateway import guard from the
adapter lookup and log (instead of silently swallowing) when a runner
exists but adapters.get() raises. A silent fall-through there would
re-introduce the per-send reconnect/OTK-exhaustion storm this fix exists
to prevent (#46310). Documented that the live adapter is gateway-owned and
must not be disconnected, and why the ephemeral finally never touches it.
When a live gateway adapter is available (i.e. the tool runs inside a
running gateway), reuse the persistent connection instead of creating a
new MatrixAdapter per call. This eliminates per-message E2EE re-init
storms that exhaust recipient OTKs and silently drop messages.
The fix follows the same pattern as _send_to_platform (line 618):
gateway_runner_ref → runner.adapters[Platform.MATRIX]. Falls back to
the ephemeral connect/disconnect cycle for standalone contexts.
Also extracts the shared send logic into _send_via_matrix_adapter()
to avoid duplicating the media dispatch code between the two paths.
Fixes#46310
The docker-exec privilege-drop shim tests started a sleep container and
released the fixture as soon as `docker exec <c> true` returned 0. On
s6-overlay that succeeds almost immediately — ~0.05s in measurement —
long before the `01-hermes-setup` cont-init hook (docker/stage2-hook.sh)
has finished seeding + `chown hermes:hermes` config.yaml and running the
Python config migration (cont-init only fully settles at ~9.8s under
arm64 QEMU emulation).
`test_shim_opt_out_keeps_root` wipes config.yaml, writes it as root with
HERMES_DOCKER_EXEC_AS_ROOT=1, and asserts root:root ownership. When the
fixture released the test inside that ~10s window, stage2-hook's
boot-time `chown hermes:hermes config.yaml` raced the root-written file
and reset it to hermes:hermes — failing the assertion. The window is
invisible on native amd64 (stage2-hook completes in a blink) but wide
open under the arm64 build's QEMU emulation, which is why only build-arm64
flaked while build-amd64 stayed green.
Replace the responsiveness poll with a wait on the canonical
'cont-init finished' signal: $HERMES_HOME/logs/container-boot.log gaining
a `profile=default` line, written by 02-reconcile-profiles which s6 runs
strictly after 01-hermes-setup. Mirrors the readiness pattern already
used in test_container_restart.py. Also bumps the readiness timeout 20s->60s
to cover slow emulation.
No production code change — test-only hardening of a timing race.
tests/cron/test_scheduler_provider.py spawned a background ticker thread,
slept a fixed 0.2s, then asserted the loop had called tick()/heartbeat() at
least N times. Under loaded CI the worker thread isn't always scheduled
within that window, so the loop hadn't ticked yet — flaking with 'provider
never called tick()' (assert 0 >= 1).
Add a _wait_until(predicate, timeout) helper and replace all five fixed
time.sleep(0.2) sites with a poll on the actual predicate (calls/beats count
reached). Same contract assertions, no wall-clock dependence.
* fix(moa): reference advisory view must end with a user turn
MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.
Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.
* feat(moa): give reference models an advisory-role system prompt
Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.
Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
The parallel runner only forwarded pytest args after a literal '--', so a
bare 'scripts/run_tests.sh tests/foo.py -q' (or -v/-x/-k/--tb=long) errored
out with 'unrecognized arguments'. This contradicted the docstring's
promise that common pytest flags pass through, and forced a retry on every
run that used pytest muscle-memory.
Now any token starting with '-' that isn't one of the runner's own options
(-j/--jobs, --paths, --slice, --file-timeout, --generate-slices, --files,
--include-integration) is routed to each per-file pytest invocation
automatically. Value-taking flags given space-separated (-k expr, -m mark,
-p plugin, -o name=val, etc.) keep their value instead of having it stolen
by positional-path discovery. The explicit '--' separator still works and
stacks with bare flags.
- scripts/run_tests_parallel.py: argv splitter routes bare unknown flags to
pytest; value-flag lookahead; updated docstring.
- scripts/run_tests.sh: usage comment reflects bare-flag passthrough.
- tests/test_run_tests_parallel.py: 4 behavior-contract tests (bare -q runs,
-k keeps its value/filters, '--' still works, positional path stays a root).
`npm run build` ended with `bundle-electron-main.mjs`, which esbuild-bundled
electron/main.cjs and renamed the bundle on top of the tracked source file.
Because every `hermes desktop` runs `npm run build`, each launch rewrote a
checked-in source file (~7.5k-line source -> ~14.8k-line bundle), dirtying the
working tree with a build artifact that `git restore` couldn't keep (the next
launch re-clobbered it) and forcing autostash/restore conflicts on update.
The bundle only existed to inline `simple-git` so the packaged app.asar (which
ships no node_modules) wouldn't crash at launch with "Cannot find module
'simple-git'". Replace it with the mechanism the repo already uses for the
other hoisted runtime dep (node-pty): stage the dependency closure and resolve
it from process.resourcesPath at runtime.
- stage-native-deps.cjs: resolve simple-git's runtime closure (walking
dependencies + optionalDependencies, so a version bump that adds a transitive
dep can't silently reintroduce the crash) and stage it under
build/native-deps/vendor/node_modules/. The `vendor/` nesting is load-bearing:
electron-builder drops a node_modules dir at the ROOT of an extraResources
copy but keeps a nested one.
- git-review-ops.cjs: fall back to the staged
native-deps/vendor/node_modules/simple-git when the hoisted require() fails;
dev runs resolve the hoisted copy and never hit the fallback.
- package.json: drop the bundler from the `build` script so main.cjs is never a
build target again.
- nix/desktop.nix: drop the direct bundler call (the closure rides the existing
`cp -rn native-deps` into $out) and patch process.resourcesPath in
git-review-ops.cjs alongside main.cjs.
- delete scripts/bundle-electron-main.mjs.
Verified: electron-builder's own file filter keeps the full staged closure
(0 dropped), and a packaged win-unpacked build launches with the git-review
pane resolving simple-git from the staged vendor path.
Adds a desktop: section to config.yaml so headless/VM users can make
`hermes desktop` launch correctly without a wrapper command:
- desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11)
appended to every launch. Accepts a list or a shell-split string.
- desktop.disable_gpu: auto|true|false, bridged to the HERMES_DESKTOP_DISABLE_GPU
env var the Electron app already reads. An explicit env var still wins.
cmd_gui() reads these via _desktop_launch_options() and applies them. This is
the config.yaml form of the capability proposed as a raw env var in #38934
(@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var.
Co-authored-by: ray <86501179+1RB@users.noreply.github.com>
* docs: third-party-product plugins ship standalone, not into core tree
Generalizes the closed-set memory-provider policy to any plugin that
integrates someone else's product/project (observability backends,
vendor SaaS, analytics dashboards, paid-service tie-ins). These create
an open-ended maintenance burden on us for backends we don't own, so
they ship as standalone plugin repos installed into ~/.hermes/plugins/
and are promoted in #plugins-skills-and-skins — not merged into core.
- AGENTS.md: new 'what we don't want' bullet + generalized policy note
beside the memory-provider closed-set rule
- CONTRIBUTING.md: new 'Third-Party Product Integrations' section
- build-a-hermes-plugin.md: caution callout at the top of the guide
It's a coupling decision, not a quality bar — a plugin can clear review
and still be a close.
* docs: add infographic for standalone-plugin policy
The plain-Linux overlay re-enable (#53185) left nativeOverlayWidth() at 0
for plain Linux, so the native min/max/close buttons painted on top of the
app's right-edge titlebar tools. Reserve the fallback width everywhere the
WCO overlay is painted (Windows, WSLg, plain Linux); macOS still reserves 0
since it uses traffic lights.
Commit da5484b61 disabled the Window Controls Overlay on all Linux
(non-Windows, non-WSL) with the note that WCO is a Windows/macOS-only
Electron feature. However, several Linux compositors (KDE/KWin,
GNOME/Mutter) do support it — plain Electron titleBarOverlay paints
native min/max/close buttons that were working before that change.
Narrow the exclusion to only WSLg, where the RDP host draws its own
window controls and an Electron overlay would leave a dead gap.
Fixes: da5484b61 ("fix(desktop): WSL2 clipboard image paste + Linux titlebar overlay")
Salvage of NousResearch/hermes-agent#41498 (0-CYBERDYNE-SYSTEMS-0).
- Leave response_previewed false on partial_stream_recovery so gateway
fallback delivery can send the recovered fragment plus explanation.
- Always append the turn-completion explainer for partial_stream_recovery,
not only for empty or very short fragments (#34452 gap).
- Launch the detached /restart helper before drain, idempotently, with a
bounded wait of restart_drain_timeout + 5s.
* fix(cli): correct stale `hermes auth login nous` hints to `hermes auth add nous`
There is no `hermes auth login` subcommand — valid auth verbs are
add/list/remove/reset/status/logout/spotify. Six user-facing strings told
users to run `hermes auth login nous`, which fails with
`invalid choice: 'login'` — the same broken-hint class reported in #28089
for the proxy flow (already fixed there to `hermes auth add nous`).
Sites corrected to `hermes auth add nous`:
- hermes_cli/dashboard_register.py (401 retry hint, not-logged-in hint)
- hermes_cli/gateway_enroll.py (401 retry hint, not-logged-in hint)
- cli-config.yaml.example (two provider-requirement comments)
* docs(infographic): auth login nous hint fix
Non-root users picking 'System service' in the setup wizard were handed a
'sudo hermes gateway install --system --run-as-user <you>' recipe that fails
on most distros: sudo's secure_path strips ~/.local/bin (pipx/uv installs),
so 'sudo hermes' is command-not-found. Worse, it funnels a non-root user
toward a system install they shouldn't be doing from a user session.
Now prompt_linux_gateway_install_scope() only offers system scope when
os.geteuid()==0. Non-root sessions get user-service or skip, with a tip to
re-run as root for a boot service. The non-root branch in
install_linux_gateway_from_setup becomes a defensive guard that refuses
without printing any self-elevation recipe. Gated the matching deferral hint
in setup.py behind root too.
Follow-up to the salvaged #49129 commit. The original change flipped the
shared generic-provider merge in provider_model_ids() to live-first
unconditionally, which regressed curated-first for single providers
(kimi/zai, #46309) — and the PR encoded that regression by flipping the
kimi-coding and zai test assertions to expect live-first.
Gate live-first on an explicit _LIVE_FIRST_PICKER_PROVIDERS set
({opencode-zen, opencode-go}); every other provider keeps curated-first.
Also widen the uncapped picker + live-first sets to opencode-go, which has
the same 70+ model catalog problem as opencode-zen. Restore the
kimi-coding curated-first test and rewrite the merge-order test to assert
the per-provider contract.
Auxiliary clients now inject a keepalive httpx transport with explicit
HTTPS_PROXY/NO_PROXY resolution, matching the main agent. This avoids
macOS system proxy settings (which omit the ExceptionsList) breaking
vision and other auxiliary calls to internal provider endpoints.