hermes-agent/tests/honcho_plugin
Erosika f512fdf697 feat(honcho): wire fire-and-forget worker + adaptive timeout + breaker into provider
Replaces the per-turn threading.Thread(target=_sync).start() pattern in
HonchoMemoryProvider with a persistent SyncWorker.  sync_turn() and
on_memory_write() both enqueue SyncTasks on the shared worker and return
immediately — run_conversation's post-response path is no longer coupled
to Honcho latency.

Three behavioural changes land here:

  Layer 1 — fire-and-forget sync
    No more join(timeout=5.0) on prior turn's thread.  Back-to-back
    sync_turn() calls return in microseconds regardless of backend
    latency.  Worker runs tasks serially per-provider (intentional:
    session writes must be ordered), uses a bounded queue with
    oldest-drop backpressure.

  Layer 2 — adaptive timeout
    SyncWorker feeds successful call latencies into HonchoLatencyTracker.
    After each turn, _drain_backlog_if_healthy() invokes
    rebuild_honcho_client_with_timeout() which rebuilds the SDK client
    iff the tracker's p95-derived timeout differs >20% from the active
    one.  Hosted Honcho converges on ~1-3s timeouts; self-hosted cold
    starts scale naturally.  30s default still applies during warmup.

  Layer 3 — circuit breaker + in-memory backlog
    CircuitBreaker trips open after 3 consecutive failures; SyncWorker
    refuses breaker-open tasks via their on_failure callback.  Provider
    wraps each task's on_failure with _enqueue_with_backlog() so
    breaker-open and queue-full tasks land in a bounded backlog (256
    tasks max).  On recovery (probe succeeds, state → closed), the next
    sync_turn() drains the backlog through the worker.  Tasks that
    crashed inside Honcho itself are NOT backlogged — replay won't help.

Updates one existing test (test_session.py) that poked at the now-
removed _sync_thread attribute; replaced with the worker's shutdown().

5 new integration tests verify the provider-level wiring:
  - sync_turn returns in < 100ms even when flush blocks 2s
  - 5 back-to-back sync_turns in < 200ms total (old code: up to 25s)
  - breaker-open enqueue lands in backlog, not on the worker
  - recovery drains backlog + new task on next sync_turn
  - backlog respects _BACKLOG_MAX and stops growing during long outages

No change to run_conversation or any agent-facing API.
2026-04-24 18:55:40 -04:00
..
__init__.py feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00
test_async_memory.py fix(honcho): dialectic lifecycle — defaults, retry, prewarm consumption 2026-04-18 22:50:55 -07:00
test_cli.py style(honcho): hoist hashlib import; validate baseUrl scheme before 'local' sentinel 2026-04-24 18:48:10 -04:00
test_client.py fix(plugins/memory/honcho): default Honcho SDK HTTP timeout to 30s 2026-04-24 18:48:10 -04:00
test_pin_peer_name.py fix(honcho): pinPeerName opt-in keeps memory unified across platforms (#14984) 2026-04-24 18:48:10 -04:00
test_provider_sync_integration.py feat(honcho): wire fire-and-forget worker + adaptive timeout + breaker into provider 2026-04-24 18:55:40 -04:00
test_session.py feat(honcho): wire fire-and-forget worker + adaptive timeout + breaker into provider 2026-04-24 18:55:40 -04:00
test_sync_worker.py feat(honcho): SyncWorker + HonchoLatencyTracker + CircuitBreaker primitives 2026-04-24 18:50:32 -04:00