fix(honcho): dialectic lifecycle — defaults, retry, prewarm consumption

Several correctness and cost-safety fixes to the Honcho dialectic path
after a multi-turn investigation surfaced a chain of silent failures:

- dialecticCadence default flipped 3 → 1. PR #10619 changed this from 1 to
  3 for cost, but existing installs with no explicit config silently went
  from per-turn dialectic to every-3-turns on upgrade. Restores pre-#10619
  behavior; 3+ remains available for cost-conscious setups. Docs + wizard
  + status output updated to match.

- Session-start prewarm now consumed. Previously fired a .chat() on init
  whose result landed in HonchoSessionManager._dialectic_cache and was
  never read — pop_dialectic_result had zero call sites. Turn 1 paid for
  a duplicate synchronous dialectic. Prewarm now writes directly to the
  plugin's _prefetch_result via _prefetch_lock so turn 1 consumes it with
  no extra call.

- Prewarm is now dialecticDepth-aware. A single-pass prewarm can return
  weak output on cold peers; the multi-pass audit/reconcile cycle is
  exactly the case dialecticDepth was built for. Prewarm now runs the
  full configured depth in the background.

- Silent dialectic failure no longer burns the cadence window.
  _last_dialectic_turn now advances only when the result is non-empty.
  Empty result → next eligible turn retries immediately instead of
  waiting the full cadence gap.

- Thread pile-up guard. queue_prefetch skips when a prior dialectic
  thread is still in-flight, preventing stacked races on _prefetch_result.

- First-turn sync timeout is recoverable. Previously on timeout the
  background thread's result was stored in a dead local list. Now the
  thread writes into _prefetch_result under lock so the next turn
  picks it up.

- Cadence gate applies uniformly. At cadence=1 the old "cadence > 1"
  guard let first-turn sync + same-turn queue_prefetch both fire.
  Gate now always applies.

- Restored query-length reasoning-level scaling, dropped in 9a0ab34c.
  Scales dialecticReasoningLevel up on longer queries (+1 at ≥120 chars,
  +2 at ≥400), clamped at reasoningLevelCap. Two new config keys:
  `reasoningHeuristic` (bool, default true) and `reasoningLevelCap`
  (string, default "high"; previously parsed but never enforced).
  Respects dialecticDepthLevels and proportional lighter-early passes.

- Restored short-prompt skip, dropped in ef7f3156. One-word
  acknowledgements ("ok", "y", "thanks") and slash commands bypass
  both injection and dialectic fire.

- Purged dead code in session.py: prefetch_dialectic, _dialectic_cache,
  set_dialectic_result, pop_dialectic_result — all unused after prewarm
  refactor.

Tests: 542 passed across honcho_plugin/, agent/test_memory_provider.py,
and run_agent/test_run_agent.py. New coverage:
- TestTrivialPromptHeuristic (classifier + prefetch/queue skip)
- TestDialecticCadenceAdvancesOnSuccess (empty-result retry, pile-up guard)
- TestSessionStartDialecticPrewarm (prewarm consumed, sync fallback)
- TestReasoningHeuristic (length bumps, cap clamp, interaction with depth)
- TestDialecticLifecycleSmoke (end-to-end 8-turn session walk)
This commit is contained in:
Erosika 2026-04-18 09:35:42 -04:00 committed by kshitij
parent bf5d7462ba
commit 78586ce036
10 changed files with 665 additions and 107 deletions

View file

@ -206,10 +206,11 @@ class HonchoMemoryProvider(MemoryProvider):
self._turn_count = 0
self._injection_frequency = "every-turn" # or "first-turn"
self._context_cadence = 1 # minimum turns between context API calls
self._dialectic_cadence = 3 # minimum turns between dialectic API calls
self._dialectic_cadence = 1 # minimum turns between dialectic API calls
self._dialectic_depth = 1 # how many .chat() calls per dialectic cycle (1-3)
self._dialectic_depth_levels: list[str] | None = None # per-pass reasoning levels
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "medium", "high"
self._reasoning_heuristic: bool = True # scale base level by query length
self._reasoning_level_cap: str = "high" # ceiling for auto-selected level
self._last_context_turn = -999
self._last_dialectic_turn = -999
@ -305,12 +306,12 @@ class HonchoMemoryProvider(MemoryProvider):
raw = cfg.raw or {}
self._injection_frequency = raw.get("injectionFrequency", "every-turn")
self._context_cadence = int(raw.get("contextCadence", 1))
self._dialectic_cadence = int(raw.get("dialecticCadence", 3))
self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
self._dialectic_depth = max(1, min(cfg.dialectic_depth, 3))
self._dialectic_depth_levels = cfg.dialectic_depth_levels
cap = raw.get("reasoningLevelCap")
if cap and cap in ("minimal", "low", "medium", "high"):
self._reasoning_level_cap = cap
self._reasoning_heuristic = cfg.reasoning_heuristic
if cfg.reasoning_level_cap in self._LEVEL_ORDER:
self._reasoning_level_cap = cfg.reasoning_level_cap
except Exception as e:
logger.debug("Honcho cost-awareness config parse error: %s", e)
@ -391,14 +392,42 @@ class HonchoMemoryProvider(MemoryProvider):
except Exception as e:
logger.debug("Honcho memory file migration skipped: %s", e)
# ----- B7: Pre-warming context at init -----
# ----- B7: Pre-warming at init -----
# Context prewarm: warms peer.context() cache (base layer), consumed
# via pop_context_result() in prefetch().
# Dialectic prewarm: fires a depth-aware cycle against the plugin's
# own _prefetch_result so turn 1 can consume it directly. Without this
# the first-turn sync path pays for a duplicate .chat() — and at
# depth>1 a single-pass session-start dialectic often returns weak
# output that multi-pass audit/reconciliation is meant to catch.
if self._recall_mode in ("context", "hybrid"):
try:
self._manager.prefetch_context(self._session_key)
self._manager.prefetch_dialectic(self._session_key, "What should I know about this user?")
logger.debug("Honcho pre-warm threads started for session: %s", self._session_key)
except Exception as e:
logger.debug("Honcho pre-warm failed: %s", e)
logger.debug("Honcho context prewarm failed: %s", e)
_prewarm_query = (
"Summarize what you know about this user. "
"Focus on preferences, current projects, and working style."
)
def _prewarm_dialectic() -> None:
try:
r = self._run_dialectic_depth(_prewarm_query)
except Exception as exc:
logger.debug("Honcho dialectic prewarm failed: %s", exc)
return
if r and r.strip():
with self._prefetch_lock:
self._prefetch_result = r
# Treat prewarm as turn 0 so cadence gating starts clean.
self._last_dialectic_turn = 0
self._prefetch_thread = threading.Thread(
target=_prewarm_dialectic, daemon=True, name="honcho-prewarm-dialectic"
)
self._prefetch_thread.start()
logger.debug("Honcho pre-warm started for session: %s", self._session_key)
def _ensure_session(self) -> bool:
"""Lazily initialize the Honcho session (for tools-only mode).
@ -526,6 +555,11 @@ class HonchoMemoryProvider(MemoryProvider):
if self._injection_frequency == "first-turn" and self._turn_count > 1:
return ""
# Skip trivial prompts — "ok", "yes", slash commands carry no semantic signal,
# so injecting user context there just burns tokens and can derail the reply.
if self._is_trivial_prompt(query):
return ""
parts = []
# ----- Layer 1: Base context (representation + card) -----
@ -560,37 +594,46 @@ class HonchoMemoryProvider(MemoryProvider):
# On the very first turn, no queue_prefetch() has run yet so the
# dialectic result is empty. Run with a bounded timeout so a slow
# Honcho connection doesn't block the first response indefinitely.
# On timeout the result is skipped and queue_prefetch() will pick it
# up at the next cadence-allowed turn.
# On timeout we let the thread keep running and write its result into
# _prefetch_result under the lock, so the next turn picks it up.
#
# Skip if the session-start prewarm already filled _prefetch_result —
# firing another .chat() would be duplicate work.
with self._prefetch_lock:
_prewarm_landed = bool(self._prefetch_result)
if _prewarm_landed and self._last_dialectic_turn == -999:
self._last_dialectic_turn = self._turn_count
if self._last_dialectic_turn == -999 and query:
_first_turn_timeout = (
self._config.timeout if self._config and self._config.timeout else 8.0
)
_result_holder: list[str] = []
_fired_at = self._turn_count
def _run_first_turn() -> None:
try:
_result_holder.append(self._run_dialectic_depth(query))
r = self._run_dialectic_depth(query)
except Exception as exc:
logger.debug("Honcho first-turn dialectic failed: %s", exc)
_t = threading.Thread(target=_run_first_turn, daemon=True)
_t.start()
_t.join(timeout=_first_turn_timeout)
if not _t.is_alive():
first_turn_dialectic = _result_holder[0] if _result_holder else ""
if first_turn_dialectic and first_turn_dialectic.strip():
return
if r and r.strip():
with self._prefetch_lock:
self._prefetch_result = first_turn_dialectic
self._last_dialectic_turn = self._turn_count
else:
self._prefetch_result = r
# Only advance cadence on a non-empty result so failures
# don't burn a 3-turn cooldown on nothing.
self._last_dialectic_turn = _fired_at
self._prefetch_thread = threading.Thread(
target=_run_first_turn, daemon=True, name="honcho-prefetch-first"
)
self._prefetch_thread.start()
self._prefetch_thread.join(timeout=_first_turn_timeout)
if self._prefetch_thread.is_alive():
logger.debug(
"Honcho first-turn dialectic timed out (%.1fs) — "
"will inject at next cadence-allowed turn",
"Honcho first-turn dialectic still running after %.1fs"
"will surface on next turn",
_first_turn_timeout,
)
# Don't update _last_dialectic_turn: queue_prefetch() will
# retry at the next cadence-allowed turn via the async path.
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
@ -641,6 +684,10 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return
# Trivial prompts don't warrant either a context refresh or a dialectic call.
if self._is_trivial_prompt(query):
return
# ----- Context refresh (base layer) — independent cadence -----
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
@ -650,23 +697,35 @@ class HonchoMemoryProvider(MemoryProvider):
logger.debug("Honcho context prefetch failed: %s", e)
# ----- Dialectic prefetch (supplement layer) -----
# B5: cadence check — skip if too soon since last dialectic call
if self._dialectic_cadence > 1:
if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
logger.debug("Honcho dialectic prefetch skipped: cadence %d, turns since last: %d",
self._dialectic_cadence, self._turn_count - self._last_dialectic_turn)
return
# Guard against thread pile-up: if a prior dialectic is still in flight,
# let it finish instead of stacking races on _prefetch_result.
if self._prefetch_thread and self._prefetch_thread.is_alive():
logger.debug("Honcho dialectic prefetch skipped: prior thread still running")
return
self._last_dialectic_turn = self._turn_count
# B5: cadence check — skip if too soon since last *successful* dialectic call.
# The gate applies uniformly (including cadence=1): "every turn" means once
# per turn, not twice on the same turn when first-turn sync already fired.
if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
logger.debug("Honcho dialectic prefetch skipped: cadence %d, turns since last: %d",
self._dialectic_cadence, self._turn_count - self._last_dialectic_turn)
return
# Advance cadence only on a non-empty result — otherwise a silent failure
# (empty dialectic, transient API error) would burn the full cadence window
# before the next retry, making it look like dialectic "never fires again".
_fired_at = self._turn_count
def _run():
try:
result = self._run_dialectic_depth(query)
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
except Exception as e:
logger.debug("Honcho prefetch failed: %s", e)
return
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
self._last_dialectic_turn = _fired_at
self._prefetch_thread = threading.Thread(
target=_run, daemon=True, name="honcho-prefetch"
@ -692,11 +751,42 @@ class HonchoMemoryProvider(MemoryProvider):
_LEVEL_ORDER = ("minimal", "low", "medium", "high", "max")
def _resolve_pass_level(self, pass_idx: int) -> str:
# Reasoning-level heuristic thresholds (restored from pre-9a0ab34c behavior).
# Promoted to class constants so tests can override without widening the
# config surface. Bump to config fields only if real use shows they're needed.
_HEURISTIC_LENGTH_MEDIUM = 120
_HEURISTIC_LENGTH_HIGH = 400
def _apply_reasoning_heuristic(self, base: str, query: str) -> str:
"""Scale `base` up by query length, clamped at reasoning_level_cap.
Char-count heuristic: +1 at >=120 chars, +2 at >=400. Ceiling is
reasoning_level_cap (default 'high' 'max' is reserved for
explicit tool-path selection).
"""
if not self._reasoning_heuristic or not query:
return base
if base not in self._LEVEL_ORDER:
return base
n = len(query)
if n < self._HEURISTIC_LENGTH_MEDIUM:
bump = 0
elif n < self._HEURISTIC_LENGTH_HIGH:
bump = 1
else:
bump = 2
base_idx = self._LEVEL_ORDER.index(base)
cap_idx = self._LEVEL_ORDER.index(self._reasoning_level_cap)
return self._LEVEL_ORDER[min(base_idx + bump, cap_idx)]
def _resolve_pass_level(self, pass_idx: int, query: str = "") -> str:
"""Resolve reasoning level for a given pass index.
Uses dialecticDepthLevels if configured, otherwise proportional
defaults relative to dialecticReasoningLevel.
Precedence:
1. dialecticDepthLevels (explicit per-pass) wins absolutely
2. _PROPORTIONAL_LEVELS table (depth>1 lighter-early passes)
3. Base level = dialecticReasoningLevel, optionally scaled by the
reasoning heuristic when the mapping falls through to 'base'
"""
if self._dialectic_depth_levels and pass_idx < len(self._dialectic_depth_levels):
return self._dialectic_depth_levels[pass_idx]
@ -704,7 +794,7 @@ class HonchoMemoryProvider(MemoryProvider):
base = (self._config.dialectic_reasoning_level if self._config else "low")
mapping = self._PROPORTIONAL_LEVELS.get((self._dialectic_depth, pass_idx))
if mapping is None or mapping == "base":
return base
return self._apply_reasoning_heuristic(base, query)
return mapping
def _build_dialectic_prompt(self, pass_idx: int, prior_results: list[str], is_cold: bool) -> str:
@ -791,7 +881,7 @@ class HonchoMemoryProvider(MemoryProvider):
break
prompt = self._build_dialectic_prompt(i, results, is_cold)
level = self._resolve_pass_level(i)
level = self._resolve_pass_level(i, query=query)
logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
self._dialectic_depth, i, level, is_cold)
@ -808,6 +898,29 @@ class HonchoMemoryProvider(MemoryProvider):
return r
return ""
# Prompts that carry no semantic signal — trivial acknowledgements, slash
# commands, empty input. Skipping injection here saves tokens and prevents
# stale user-model context from derailing one-word replies.
_TRIVIAL_PROMPT_RE = re.compile(
r'^(yes|no|ok|okay|sure|thanks|thank you|y|n|yep|nope|yeah|nah|'
r'continue|go ahead|do it|proceed|got it|cool|nice|great|done|next|lgtm|k)$',
re.IGNORECASE,
)
@classmethod
def _is_trivial_prompt(cls, text: str) -> bool:
"""Return True if the prompt is too trivial to warrant context injection."""
if not text:
return True
stripped = text.strip()
if not stripped:
return True
if stripped.startswith("/"):
return True
if cls._TRIVIAL_PROMPT_RE.match(stripped):
return True
return False
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Track turn count for cadence and injection_frequency logic."""
self._turn_count = turn_number

View file

@ -460,17 +460,17 @@ def cmd_setup(args) -> None:
pass # keep current
# --- 7b. Dialectic cadence ---
current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "3")
current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "1")
print("\n Dialectic cadence:")
print(" How often Honcho rebuilds its user model (LLM call on Honcho backend).")
print(" 1 = every turn (aggressive), 3 = every 3 turns (recommended), 5+ = sparse.")
print(" 1 = every turn (default), 3+ = sparse (cost-saving).")
new_dialectic = _prompt("Dialectic cadence", default=current_dialectic)
try:
val = int(new_dialectic)
if val >= 1:
hermes_host["dialecticCadence"] = val
except (ValueError, TypeError):
hermes_host["dialecticCadence"] = 3
hermes_host["dialecticCadence"] = 1
# --- 8. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-session")
@ -636,7 +636,7 @@ def cmd_status(args) -> None:
print(f" Recall mode: {hcfg.recall_mode}")
print(f" Context budget: {hcfg.context_tokens or '(uncapped)'} tokens")
raw = getattr(hcfg, "raw", None) or {}
dialectic_cadence = raw.get("dialecticCadence") or 3
dialectic_cadence = raw.get("dialecticCadence") or 1
print(f" Dialectic cad: every {dialectic_cadence} turn{'s' if dialectic_cadence != 1 else ''}")
print(f" Observation: user(me={hcfg.user_observe_me},others={hcfg.user_observe_others}) ai(me={hcfg.ai_observe_me},others={hcfg.ai_observe_others})")
print(f" Write freq: {hcfg.write_frequency}")

View file

@ -251,6 +251,14 @@ class HonchoClientConfig:
# matching dialectic_depth length. When None, uses proportional defaults
# derived from dialectic_reasoning_level.
dialectic_depth_levels: list[str] | None = None
# Reasoning-level heuristic for auto-injected dialectic calls. When true,
# scales the base level up on longer queries (restored from pre-#10619
# behavior; see plugins/memory/honcho/__init__.py for thresholds).
# Never auto-selects a level above reasoning_level_cap.
reasoning_heuristic: bool = True
# Ceiling for heuristic-selected reasoning level. "max" is reserved for
# explicit tool-path selection; default "high" matches the old behavior.
reasoning_level_cap: str = "high"
# Honcho API limits — configurable for self-hosted instances
# Max chars per message sent via add_messages() (Honcho cloud: 25000)
message_max_chars: int = 25000
@ -446,6 +454,16 @@ class HonchoClientConfig:
raw.get("dialecticDepthLevels"),
depth=_parse_dialectic_depth(host_block.get("dialecticDepth"), raw.get("dialecticDepth")),
),
reasoning_heuristic=_resolve_bool(
host_block.get("reasoningHeuristic"),
raw.get("reasoningHeuristic"),
default=True,
),
reasoning_level_cap=(
host_block.get("reasoningLevelCap")
or raw.get("reasoningLevelCap")
or "high"
),
message_max_chars=int(
host_block.get("messageMaxChars")
or raw.get("messageMaxChars")

View file

@ -100,9 +100,11 @@ class HonchoSessionManager:
self._write_frequency = write_frequency
self._turn_counter: int = 0
# Prefetch caches: session_key → last result (consumed once per turn)
# Prefetch cache: session_key → last context result (consumed once per turn).
# Dialectic results are cached on the plugin side (HonchoMemoryProvider
# ._prefetch_result) so session-start prewarm and turn-driven fires share
# one source of truth; see __init__.py _do_session_init for the prewarm.
self._context_cache: dict[str, dict] = {}
self._dialectic_cache: dict[str, str] = {}
self._prefetch_cache_lock = threading.Lock()
self._dialectic_reasoning_level: str = (
config.dialectic_reasoning_level if config else "low"
@ -499,8 +501,8 @@ class HonchoSessionManager:
Query Honcho's dialectic endpoint about a peer.
Runs an LLM on Honcho's backend against the target peer's full
representation. Higher latency than context() call async via
prefetch_dialectic() to avoid blocking the response.
representation. Higher latency than context() callers run this in
a background thread (see HonchoMemoryProvider) to avoid blocking.
Args:
session_key: The session key to query against.
@ -555,42 +557,6 @@ class HonchoSessionManager:
logger.warning("Honcho dialectic query failed: %s", e)
return ""
def prefetch_dialectic(self, session_key: str, query: str) -> None:
"""
Fire a dialectic_query in a background thread, caching the result.
Non-blocking. The result is available via pop_dialectic_result()
on the next call (typically the following turn). Reasoning level
is selected dynamically based on query complexity.
Args:
session_key: The session key to query against.
query: The user's current message, used as the query.
"""
def _run():
result = self.dialectic_query(session_key, query)
if result:
self.set_dialectic_result(session_key, result)
t = threading.Thread(target=_run, name="honcho-dialectic-prefetch", daemon=True)
t.start()
def set_dialectic_result(self, session_key: str, result: str) -> None:
"""Store a prefetched dialectic result in a thread-safe way."""
if not result:
return
with self._prefetch_cache_lock:
self._dialectic_cache[session_key] = result
def pop_dialectic_result(self, session_key: str) -> str:
"""
Return and clear the cached dialectic result for this session.
Returns empty string if no result is ready yet.
"""
with self._prefetch_cache_lock:
return self._dialectic_cache.pop(session_key, "")
def prefetch_context(self, session_key: str, user_message: str | None = None) -> None:
"""
Fire get_prefetch_context in a background thread, caching the result.