The first ship of verify-on-stop (config v30) defaulted
DEFAULT_CONFIG agent.verify_on_stop to a literal True, and migrate_config
persists defaults with strip_defaults=False — so every install that updated
through v30 had verify_on_stop: true written into config.yaml as a literal.
The v30->v31 migration only flipped missing/'auto' values to false and
deliberately preserved an explicit bool, so it skipped that entire population
and left verify-on-stop ON for everyone who had updated. A literal true was
never a user choice: the feature had no off-switch worth setting it against
until v31 introduced one, so a true persisted before v32 is always the old
machine default.
v32 migration flips a literal true -> false once, for both v30 (skipped v31)
and v31 (preserved-by-bug) installs. A true the user sets AFTER v32 is a
deliberate opt-in and is never touched.
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.
The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:
- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
encode/resize/dimension-check off the caller's loop, sized to the host's
usable core count with NO fixed ceiling — the cap tracks the actual
exhausted resource (cores), not a magic number. Excess encodes queue on
the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
(honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.
Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.
Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).
It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.
Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.
Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
The auth-header fix adds headers=_auth_headers() to all Camofox HTTP
calls. Two _capture_post mocks in the persistence test lacked a headers
parameter, so navigate raised TypeError and the success assertions
failed. Add headers=None to both mock signatures.
The auth-header fix reads CAMOFOX_API_KEY but it was never registered,
so it didn't surface in `hermes setup` / `hermes tools`. Add it as an
advanced password-category tool env var alongside CAMOFOX_URL.
When a Camofox browser tab is garbage collected (idle timeout, browser
recycle), the held tab_id becomes stale. The next browser_navigate call
hits /tabs/{stale_id}/navigate -> HTTP 404 -> unhandled HTTPError.
Catch the 404 in camofox_navigate, clear the stale tab_id, and create a
fresh tab via _ensure_tab. The agent recovers transparently without
requiring a session restart.
Other tab operations (snapshot, click, type, etc.) use the same pattern
but only fail if the tab dies between successful calls — much rarer.
The navigate fix covers 95%+ of cases since navigate is always the entry
point.
The Camofox browser backend hardcoded a 30s HTTP timeout via
_DEFAULT_TIMEOUT, ignoring the user's browser.command_timeout config.
The main browser_tool path already reads this config via
_get_command_timeout().
This commit adds an equivalent _get_command_timeout() to
browser_camofox.py that reads browser.command_timeout from config
with caching, and switches all HTTP helper methods (_post, _get,
_get_raw, _delete) to use it as the default timeout.
Fixes#40843
The five HTTP call sites in browser_camofox.py (_ensure_tab, _post,
_get, _get_raw, _delete) did not include Authorization headers, causing
403 Forbidden when the Camofox server has API key auth enabled.
Added _auth_headers() helper and wired it into all five call sites.
The health check endpoint (/health) is left without auth since it is
a connectivity probe, not a browser operation.
Regression test covers: header present when key set, absent when unset,
blank key produces empty headers.
Fixes#20476
The Camoufox REST API server expects `listItemId` in the `POST /tabs`
body, but `_ensure_tab` was sending `sessionKey`. This caused a 400
Bad Request on every `browser_navigate` call.
The parameter name mismatch is visible in the same file: line 283
already reads `tab.get("listItemId")` when adopting existing tabs,
confirming the server-side field name.
Fixes#37960
Follow-up to the group-DM manifest fix. The manifest change only helps
NEW installs; existing apps keep their old (mpim-less) scopes until the
admin reinstalls. Since a missing message.mpim event delivers nothing
(no runtime API error to catch), detect stale installs at connect time
from the auth.test x-oauth-scopes header and log an actionable reinstall
nudge when im:history is granted but mpim:history is not. Also promote
message.mpim from Recommended to Required in the docs event tables so the
default setup path can't drop it.
Group DMs (multi-person DMs, channel_type=mpim) were never delivered to
the Slack bot. The adapter already classifies mpim as a DM and replies
ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the
generated app manifest only subscribed to message.im / im:history — the
1:1 DM pair. Without the message.mpim event subscription Slack drops
group-DM messages before the adapter ever sees them, so 1:1 DMs worked
while group-DM ambient mode was dead.
Add message.mpim to bot_events and mpim:history (the scope that event
requires per Slack docs) + mpim:read (mirrors im:read for the
conversations.info classification call) to bot_scopes. Update the
SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs
(EN + zh-Hans: scope table, event table, troubleshooting) so existing
installs are told to add the new scopes and reinstall.
Reported by an enterprise customer. Note: this is a manifest/scope
change, so it only takes effect after the app is reinstalled and the
new scopes are accepted.
Tests: assert message.mpim + mpim:history + mpim:read are in the
manifest (with and without assistant mode); both fail on current main
and pass with this change.
test_gateway_pid_scan_hides_wmic_and_powershell_windows flaked once in CI
(slice 7/8) with 'KeyError: creationflags' while passing 15/15 under exact
CI-parity locally. The positional 'kwargs["creationflags"]' indexing raises
a bare KeyError the moment any stray subprocess.run call is captured, masking
the real contract. Filter captured calls to the two intended Windows console
spawns (wmic + PowerShell fallback) and assert each is windowless via
.get('creationflags'); a leaked/extra call now surfaces as a readable
len-mismatch with the full captured list, not a cryptic KeyError.
The supermemory and mem0 memory providers shipped third-party SDKs
(supermemory / mem0ai) that are not core dependencies, but — unlike the
honcho and hindsight providers — they imported those SDKs directly with
no tools.lazy_deps.ensure() preflight and had no LAZY_DEPS allowlist
entry. On the published Docker image the agent venv is sealed
(HERMES_DISABLE_LAZY_INSTALLS=1) and lazy installs are redirected to a
writable durable target (HERMES_LAZY_INSTALL_TARGET). honcho/hindsight
route through ensure() and install fine there; supermemory/mem0 never
called it, so their SDK was never installed on a hosted instance and the
provider silently reported itself unavailable even with the API key set.
Fixes:
- Add memory.supermemory + memory.mem0 to the LAZY_DEPS allowlist
(tools/lazy_deps.py), pinned to current PyPI releases.
- Call ensure('memory.<x>', prompt=False) at each SDK-import chokepoint
(_SupermemoryClient.__init__; Mem0MemoryProvider._create_backend),
mirroring honcho's wrapped try/except shape.
- Drop the SDK-import gate from supermemory's is_available() — it was a
chicken-and-egg trap (provider never loaded on a sealed venv, so
ensure() never ran). Now key-presence only, like honcho/mem0.
- Add matching pyproject extras [supermemory]/[mem0]; update the
lazy-covered-extras contract test (excluded from [all] by policy).
Tests prove each path fails without the fix and the real sealed-venv
durable-target gate accepts both features.
The bootstrap-runner PowerShell spawn is formatted multiline (spawn(\n ps,\n fullArgs,...), so the literal substring 'spawn(ps, fullArgs' never matched and the assertion was failing on main independent of #54635. Convert it to a whitespace-tolerant regex like every other call-site assertion in this file.
The recurring Windows desktop console-flash bug (#54220) is governed by the
*parent's* console, not by each child spawn. The desktop backend was launched as
GUI-subsystem pythonw.exe, which has no console at all — so every
console-subsystem child it spawns (git, gh, cmd, wmic, powershell, ...) had to
allocate its own console, flashing a window. That is why the fix had become an
endless per-call-site sweep of CREATE_NO_WINDOW flags: each leaf spawn was
papering over a missing console on the root.
Launch the backend as the venv's console python.exe instead. Under the existing
hiddenWindowsChildOptions() wrapper (windowsHide: true -> CREATE_NO_WINDOW) the
backend owns a single *windowless* console, and every descendant spawn inherits
it instead of allocating a visible one. This makes "no flashing windows" a
property of the one backend launch rather than a flag that must be remembered at
every spawn site — including spawns inside third-party libraries that no
call-site sweep can reach.
Verified on Windows 11 25H2 (Windows Terminal default): with the per-site hide
flag forcibly neutered, the canonical culprits (git/gh/cmd/wmic/powershell)
spawned naively and none flashed, while the same naive spawn from the old
console-less pythonw parent did flash — isolating the parent console as the cause.
Two premises behind the old pythonw approach did not hold up on current Windows
and are dropped here:
- The venv Scripts\python.exe uv shim, under CREATE_NO_WINDOW, re-execs base
python *windowless* — it does not flash a conhost (the #52239 concern), so the
base-pythonw detour is unnecessary.
- Console python restores stdout, so the backend announces its port on the normal
HERMES_DASHBOARD_READY stdout line; the pythonw-only ready-file side channel is
no longer needed and the readyFile opt-in is removed.
Removes the now-dead pythonw machinery (getNoConsoleVenvPython, toNoConsolePython,
applyWindowsNoConsoleSpawnHints, readVenvHome) and updates the test to assert the
new invariant: backend command is never pythonw, both backend spawns still go
through hiddenWindowsChildOptions, and no backend opts into the ready-file path.
Scope: this fixes the high-frequency backend-descendant flash classes. The
updater/UAC handoff (#54543) and embedded-terminal PTY accumulation (#53555)
classes have separate root causes and are unaffected.
The env translation block is type-checked across every locale (tsc -b), so
the 8 new customKeys strings must exist in all of them, not just en/zh. Add
translated entries to the remaining 14 locales (de, es, fr, it, ja, ko, pt,
ru, tr, uk, hu, ga, af, zh-hant).
The Keys page only rendered env vars present in a catalog (OPTIONAL_ENV_VARS
or the provider catalog); any other key a user set in .env was invisible, and
there was no way to add an arbitrary env var from the GUI (e.g. to inject a
var a skill or MCP server needs).
Backend: GET /api/env now also emits a row for every on-disk .env key that
isn't in any catalog, flagged category="custom" + custom=true and
password-masked (an unrecognised key could hold anything, so it's redacted and
reveal-gated like any secret). Channel-managed credentials stay excluded. The
write (PUT /api/env) and reveal (POST /api/env/reveal) paths already handle
arbitrary keys, with the existing env-name guard + denylist (PATH, LD_PRELOAD,
PYTHONPATH, …) enforced server-side — no new write surface.
Frontend: a new "Custom Keys" section lists those custom rows and carries an
add-a-key form (client-side name validation mirroring the backend regex; the
new row reuses the normal edit/save flow, so on save it round-trips back from
the backend as a durable custom row). i18n added for en + zh + types.
Tests: behavior-contract coverage that an unknown .env key surfaces as a
masked custom row and a catalogued key does not — verified to fail on the
pre-fix backend.
When local Ollama models are absent from models.dev, probe the Ollama
server's /api/show capabilities so attached images are routed natively
instead of being stripped as non-vision input.
Google's native Gemini REST endpoint (generativelanguage.googleapis.com,
non-/openai) rejects OpenAI-only stream_options={"include_usage": true},
crashing every streaming chat-completions call with TypeError. Omit it for
that endpoint while keeping it for the Gemini OpenAI-compat shim and all
OpenAI-compatible aggregators (OpenRouter, etc.) so usage accounting is
preserved.
Reuses is_native_gemini_base_url() so the compat shim (.../openai), which
accepts stream_options, is correctly excluded from the omission.
Fixes#14387
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
The WeCom callback endpoint (internet-facing, 0.0.0.0) parsed untrusted
request bodies before signature verification. defusedxml already guards
the entity-expansion class on main, but there was no cap on raw body
size, so an unauthenticated POST could still force unbounded read work
pre-auth.
Set client_max_size=64KB on the aiohttp app (413 at the framework layer)
plus an explicit length guard in _handle_callback as defense in depth.
WeCom callbacks are small encrypted XML envelopes — media is delivered
out-of-band via MediaId, never inline — so 64KB is ample for legitimate
traffic. Adds tests for oversized (413) and normal-sized (not 413) bodies.
Salvaged from #10192 by @memosr (body-size limit half; defusedxml half
already superseded on main).
Relocate marco0158's eviction into the dedicated auto-reset cleanup block
(single source of truth for dropping session-scoped transient state) and
add an AST invariant pinning _evict_cached_agent into that block. Add
AUTHOR_MAP entry for marco0158.
When a session is auto-reset by daily schedule, idle timeout, or suspended
state, the agent cache was not being cleared. This caused the old agent's
context_compressor._previous_summary to leak into the new session, mixing
old conversation history into new compaction summaries.
This was the root cause of the "skin making history" appearing after
compaction in fresh sessions reported by the user.
Follow-up to #9893 which only handled compression_exhausted case.
Changes:
- Add _evict_cached_agent(session_key) call after was_auto_reset check
- Covers daily, idle, and suspended auto-reset scenarios
- Matches the behavior of manual /reset command
Related tests: test_session_boundary_hooks, test_async_memory_flush,
test_session_reset_notify, test_session_reset_fix - all passing.
/resume is a conversation boundary, but unlike /new it did not clear the
chat-keyed _session_model_overrides / _pending_model_notes. A /model switch
made in the previous session under the same chat session_key leaked into the
resumed conversation, running it on the wrong model.
Clear both maps for the session_key after the switch (mirroring /new), scoped
to that key so other chats' overrides are untouched. The cached-agent eviction
this leak also implied already landed via #6672.
Closes#10702.
The cron delivery table only showed Discord/Telegram with explicit
target syntax and described Slack and every other platform as
home-channel-only. In fact the generic platform:<target> routing in
_resolve_single_delivery_target resolves explicit targets for every
platform: Slack (#channel / channel ID / channel:thread_ts), Matrix
(room/user IDs), Feishu (chat:thread), WhatsApp (JID / E.164), Signal
(group / E.164), SMS, Email, and Weixin all have dedicated explicit-
target branches in _parse_target_ref; the remaining platforms accept a
generic platform:<chat_id> passthrough.
Update the Delivery Model table (en + zh-Hans) to show the real
per-platform syntax, document #channel name resolution via the channel
directory, and note the Slack thread_ts nuance. Docs-only.
Clears the ty diff bot's warnings on the new test: pass real callables to
build_dashboard_parser (not object()) and replace the pytest.mark.parametrize
with a plain loop so the file is stdlib-only.
`hermes cron status` (and the create/list 'gateway not running' nag)
judge whether cron will fire purely from the in-process ticker's
heartbeat file + a live gateway PID. That heuristic is correct for the
built-in ticker but WRONG for an external provider like Chronos:
Chronos arms exactly one external one-shot per job and is fired by a
NAS-mediated webhook (POST /api/cron/fire). Its `start()` returns
immediately and it deliberately runs no 60s loop and writes no ticker
heartbeat — that's the whole point of scale-to-zero (the machine is at
zero between fires). So on a perfectly healthy Chronos instance,
`cron status` always printed '✗ Gateway is not running — cron jobs will
NOT fire' (or a STALLED-ticker warning), and `cron create` always
appended the 'jobs won't fire automatically' nag — both false.
Verified live on a staging Chronos instance: jobs fired and completed on
schedule via the relay while `cron status` insisted the gateway wasn't
running and the heartbeat was 370s+ stale.
Fix: resolve the active provider (offline — `resolve_cron_scheduler`,
whose `is_available()` contract forbids network) and, for any non-builtin
provider, report the managed-scheduler state instead of the ticker
heuristics, and suppress the ticker-only 'gateway not running' warning.
The built-in path is byte-unchanged. Active-job summary is factored into
a shared helper so both paths print it identically.
New tests prove both directions (chronos: no false negative even with no
gateway PID / no heartbeat; builtin: historical warning preserved) and
fail without the fix.
_on_invite now rejects auto-joins from users not on the allow-list. The
DM-recording tests invite @alice and expect a join, so the shared
_make_adapter fixture now puts @alice on _allowed_user_ids.
Two platform-security hardenings:
- Matrix: _on_invite now checks the inviter against the existing
allow-list (_allowed_user_ids / GATEWAY_ALLOW_ALL_USERS) before
auto-joining. Without this any federated Matrix user could invite
the bot into arbitrary rooms, exposing its presence and metadata.
The message and reaction paths already enforce this allow-list; the
invite path bypassed it.
- Mattermost: _api_get / _api_post / _api_put reject any path
containing '..'. WebSocket-event values (channel_id, post_id,
file_id) are interpolated directly into API paths, so a malicious or
compromised server could craft traversal payloads to make the bot
issue authenticated requests to arbitrary endpoints with its bearer
token.
The configurable-E2EE-passphrase change from the original PR is dropped:
the matrix adapter was rewritten onto mautrix and the passphrase-protected
key-export file no longer exists.
PR infographics are rendered locally and embedded in PR descriptions via
the image-provider (fal.media) URL — they were never meant to live in the
repo. The intended .gitignore enforcement (documented as added back in May
2026) was never actually committed, so 35 PNGs (~54MB) accumulated under
infographic/ via 'docs: add PR infographic for X' commits.
- Remove all 35 tracked infographic/*.png files.
- Add infographic/ to .gitignore so git add on the path is now a no-op.
The PR body remains the archive for these images.
Regression tests for the injection fix: outside a git repo only cwd is
checked (planted ancestor .hermes.md is ignored), a cwd-local .hermes.md
is still found, and inside a git repo the parent walk to the git root
still works.
_find_hermes_md walks parent directories looking for .hermes.md/HERMES.md,
stopping at the git root. But when there is no git repo (_find_git_root
returns None), the stop guard never fires and the loop walks all the way
to /. On shared systems (CI runners, multi-tenant servers), a .hermes.md
planted at /tmp, /home, or / would be loaded into the system prompt of any
agent session not inside a git repo — a cross-user prompt-injection vector.
Fix: when there is no git root, only check cwd; do not walk parents.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
User terminal tabs and their recent scrollback now survive an app restart
(VS Code parity). Tabs, active selection, cwd, and a serialized scrollback
snapshot are written to localStorage on every change; on launch the tabs
reopen with their history replayed above a fresh shell. Processes are NOT
revived — a new shell starts one line below the restored block.
- Capture: SerializeAddon snapshots the buffer on a 750ms leading-edge
throttle, so a `cmd; quit` lands on disk before teardown; the snapshot is
trimmed of its trailing idle prompt (no "double prompt" on restore) and
capped (200 scrollback lines / 48k chars) to stay under the storage budget.
- Teardown guard: app quit/reload kills the PTYs from the main process,
firing onExit in the renderer, but React skips effect cleanups on teardown
so the per-instance `disposed` flag never flips. A pagehide/beforeunload
flag stops onExit from calling closeTerminal() and wiping the persisted
tabs right before relaunch restores them. A real `exit`/Ctrl-D still closes.
- Agent mirror tabs stay runtime-only — only user tabs persist.
Add a focused contract test for the headless `serve` command (routes to the
shared dashboard handler, headless by default while `dashboard` is not, accepts
the legacy --no-open, shares the same runtime/lifecycle flag surface). Also
refresh the dashboard.py module docstring to cover both commands.
`hermes serve` is newer than the desktop binary's release cadence, so a new
app launched against an un-upgraded managed install / PATH `hermes` would
crash on an unknown subcommand and brick the user mid-upgrade. Detect whether
the resolved runtime registers `serve` (fast source read of its dashboard.py,
with a one-time CLI probe fallback) and rewrite the backend argv to the legacy
`dashboard --no-open` only when it does not. Happy path (current runtimes)
pays nothing and still spawns `serve`.
- electron/backend-command.cjs: pure serve/dashboard argv helpers + serve-
source detection (unit-tested in backend-command.test.cjs)
- main.cjs: backendSupportsServe() cache + getBackendArgsForRuntime() guard at
both backend spawn sites; expose `root` from the Windows venv unwrap so the
fast source check covers Windows too
- docs: note the backward-compat fallback in README, desktop.md, AGENTS.md
The desktop app spawned `hermes dashboard --no-open` as its backend, which
made the dashboard look like a desktop prerequisite. Add a dedicated headless
`hermes serve` command that boots the same gateway (shared cmd_dashboard /
start_server) but never opens a browser, and point the desktop backend spawn
exclusively at it. dashboard and serve are now independent surfaces — neither
launches the other.
- subcommands/dashboard.py: factor shared server args; add `serve` parser
(always headless; accepts legacy --no-open as a no-op)
- main.py: register serve in _BUILTIN_SUBCOMMANDS + coalesce set + gui-log
detection; extend stale-backend reaper patterns to match `serve`
- desktop electron: spawn `serve`, rename dashboardArgs -> backendArgs,
update comments + windows-child-process test assertions
- docs: desktop README, desktop.md (incl. remote-backend), AGENTS.md, and
cli-commands.md now describe `hermes serve` as the desktop/headless backend
The desktop app spawns a headless `hermes dashboard --no-open` backend and
talks to it through the shared @hermes/shared WebSocket client — it never
runs or requires the browser dashboard UI. Spell this out in the desktop
README, the desktop docs page, and AGENTS.md so "dashboard" stops reading
as a desktop prerequisite.