hermes-agent/website/docs
Ben Barclay eddfecd2ce fix(vision): cap vision_analyze fan-out concurrency process-wide
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.

Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).

It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.

Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.

Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
2026-06-29 01:27:10 -07:00
..
developer-guide docs(cron): document explicit per-channel delivery targets for all platforms (#54630) 2026-06-29 15:23:16 +10:00
getting-started feat(docs): clarify termux/nix as t2 platforoms 2026-06-26 11:37:56 -07:00
guides docs: reconcile docs with code across last 3 releases (#54254) 2026-06-28 12:47:50 -07:00
integrations feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492) 2026-06-21 19:53:27 -07:00
reference fix(vision): cap vision_analyze fan-out concurrency process-wide 2026-06-29 01:27:10 -07:00
user-guide fix(vision): cap vision_analyze fan-out concurrency process-wide 2026-06-29 01:27:10 -07:00
index.mdx feat(docs): clarify platform support 2026-06-26 11:37:56 -07:00
user-stories.mdx docs(website): add User Stories and Use Cases collage page (#18282) 2026-04-30 23:56:59 -07:00