mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-02 02:01:47 +00:00
Replaces the per-turn threading.Thread(target=_sync).start() pattern in
HonchoMemoryProvider with a persistent SyncWorker. sync_turn() and
on_memory_write() both enqueue SyncTasks on the shared worker and return
immediately — run_conversation's post-response path is no longer coupled
to Honcho latency.
Three behavioural changes land here:
Layer 1 — fire-and-forget sync
No more join(timeout=5.0) on prior turn's thread. Back-to-back
sync_turn() calls return in microseconds regardless of backend
latency. Worker runs tasks serially per-provider (intentional:
session writes must be ordered), uses a bounded queue with
oldest-drop backpressure.
Layer 2 — adaptive timeout
SyncWorker feeds successful call latencies into HonchoLatencyTracker.
After each turn, _drain_backlog_if_healthy() invokes
rebuild_honcho_client_with_timeout() which rebuilds the SDK client
iff the tracker's p95-derived timeout differs >20% from the active
one. Hosted Honcho converges on ~1-3s timeouts; self-hosted cold
starts scale naturally. 30s default still applies during warmup.
Layer 3 — circuit breaker + in-memory backlog
CircuitBreaker trips open after 3 consecutive failures; SyncWorker
refuses breaker-open tasks via their on_failure callback. Provider
wraps each task's on_failure with _enqueue_with_backlog() so
breaker-open and queue-full tasks land in a bounded backlog (256
tasks max). On recovery (probe succeeds, state → closed), the next
sync_turn() drains the backlog through the worker. Tasks that
crashed inside Honcho itself are NOT backlogged — replay won't help.
Updates one existing test (test_session.py) that poked at the now-
removed _sync_thread attribute; replaced with the worker's shutdown().
5 new integration tests verify the provider-level wiring:
- sync_turn returns in < 100ms even when flush blocks 2s
- 5 back-to-back sync_turns in < 200ms total (old code: up to 25s)
- breaker-open enqueue lands in backlog, not on the worker
- recovery drains backlog + new task on next sync_turn
- backlog respects _BACKLOG_MAX and stops growing during long outages
No change to run_conversation or any agent-facing API.
|
||
|---|---|---|
| .. | ||
| context_engine | ||
| disk-cleanup | ||
| example-dashboard/dashboard | ||
| image_gen | ||
| memory | ||
| spotify | ||
| strike-freedom-cockpit | ||
| __init__.py | ||