hermes-agent/website/docs/reference
Ben Barclay eddfecd2ce fix(vision): cap vision_analyze fan-out concurrency process-wide
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.

Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).

It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.

Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.

Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
2026-06-29 01:27:10 -07:00
..
_category_.json feat: add documentation website (Docusaurus) 2026-03-05 05:24:55 -08:00
automation-blueprints-catalog.mdx docs: finish Automation Blueprints terminology rebrand (#44470) 2026-06-11 17:22:22 -04:00
cli-commands.md feat(cli): add headless hermes serve backend; desktop no longer launches dashboard 2026-06-28 22:04:22 -05:00
environment-variables.md fix(vision): cap vision_analyze fan-out concurrency process-wide 2026-06-29 01:27:10 -07:00
faq.md feat(docs): clarify platform support 2026-06-26 11:37:56 -07:00
mcp-config-reference.md refactor: remove agent-callable send_message tool (#47856) 2026-06-17 07:11:23 -07:00
model-catalog.md docs: deep audit — registry drift, stale claims, 2-week PR coverage, dashboard screenshot (#40952) 2026-06-07 01:39:06 -07:00
optional-skills-catalog.md fix(docs): regenerate skill docs to fix stale cross-links, add tool-search to sidebar 2026-06-20 20:42:49 -07:00
profile-commands.md fix(profile): make clone-from a full source selector 2026-06-13 07:33:58 -07:00
skills-catalog.md Merge remote-tracking branch 'origin/main' into bb/pets 2026-06-22 05:25:49 -05:00
slash-commands.md docs: reconcile docs with code across last 3 releases (#54254) 2026-06-28 12:47:50 -07:00
tools-reference.md docs: reconcile docs with code across last 3 releases (#54254) 2026-06-28 12:47:50 -07:00
toolsets-reference.md docs: reconcile docs with code across last 3 releases (#54254) 2026-06-28 12:47:50 -07:00