hermes-agent/plugins/memory
Erosika f512fdf697 feat(honcho): wire fire-and-forget worker + adaptive timeout + breaker into provider
Replaces the per-turn threading.Thread(target=_sync).start() pattern in
HonchoMemoryProvider with a persistent SyncWorker.  sync_turn() and
on_memory_write() both enqueue SyncTasks on the shared worker and return
immediately — run_conversation's post-response path is no longer coupled
to Honcho latency.

Three behavioural changes land here:

  Layer 1 — fire-and-forget sync
    No more join(timeout=5.0) on prior turn's thread.  Back-to-back
    sync_turn() calls return in microseconds regardless of backend
    latency.  Worker runs tasks serially per-provider (intentional:
    session writes must be ordered), uses a bounded queue with
    oldest-drop backpressure.

  Layer 2 — adaptive timeout
    SyncWorker feeds successful call latencies into HonchoLatencyTracker.
    After each turn, _drain_backlog_if_healthy() invokes
    rebuild_honcho_client_with_timeout() which rebuilds the SDK client
    iff the tracker's p95-derived timeout differs >20% from the active
    one.  Hosted Honcho converges on ~1-3s timeouts; self-hosted cold
    starts scale naturally.  30s default still applies during warmup.

  Layer 3 — circuit breaker + in-memory backlog
    CircuitBreaker trips open after 3 consecutive failures; SyncWorker
    refuses breaker-open tasks via their on_failure callback.  Provider
    wraps each task's on_failure with _enqueue_with_backlog() so
    breaker-open and queue-full tasks land in a bounded backlog (256
    tasks max).  On recovery (probe succeeds, state → closed), the next
    sync_turn() drains the backlog through the worker.  Tasks that
    crashed inside Honcho itself are NOT backlogged — replay won't help.

Updates one existing test (test_session.py) that poked at the now-
removed _sync_thread attribute; replaced with the worker's shutdown().

5 new integration tests verify the provider-level wiring:
  - sync_turn returns in < 100ms even when flush blocks 2s
  - 5 back-to-back sync_turns in < 200ms total (old code: up to 25s)
  - breaker-open enqueue lands in backlog, not on the worker
  - recovery drains backlog + new task on next sync_turn
  - backlog respects _BACKLOG_MAX and stops growing during long outages

No change to run_conversation or any agent-facing API.
2026-04-24 18:55:40 -04:00
..
byterover refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
hindsight feat(hindsight): optional bank_id_template for per-agent / per-user banks 2026-04-24 03:38:17 -07:00
holographic refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
honcho feat(honcho): wire fire-and-forget worker + adaptive timeout + breaker into provider 2026-04-24 18:55:40 -04:00
mem0 refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
openviking refactor(memory): drop on_session_reset — commit-only is enough 2026-04-15 11:28:45 -07:00
retaindb refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
supermemory feat(supermemory): add multi-container, search_mode, identity template, and env var override (#5933) 2026-04-07 14:03:46 -07:00
__init__.py fix(memory): discover user-installed memory providers from $HERMES_HOME/plugins/ (#10529) 2026-04-15 14:25:40 -07:00