fix(honcho): dialectic lifecycle — defaults, retry, prewarm consumption

Several correctness and cost-safety fixes to the Honcho dialectic path
after a multi-turn investigation surfaced a chain of silent failures:

- dialecticCadence default flipped 3 → 1. PR #10619 changed this from 1 to
  3 for cost, but existing installs with no explicit config silently went
  from per-turn dialectic to every-3-turns on upgrade. Restores pre-#10619
  behavior; 3+ remains available for cost-conscious setups. Docs + wizard
  + status output updated to match.

- Session-start prewarm now consumed. Previously fired a .chat() on init
  whose result landed in HonchoSessionManager._dialectic_cache and was
  never read — pop_dialectic_result had zero call sites. Turn 1 paid for
  a duplicate synchronous dialectic. Prewarm now writes directly to the
  plugin's _prefetch_result via _prefetch_lock so turn 1 consumes it with
  no extra call.

- Prewarm is now dialecticDepth-aware. A single-pass prewarm can return
  weak output on cold peers; the multi-pass audit/reconcile cycle is
  exactly the case dialecticDepth was built for. Prewarm now runs the
  full configured depth in the background.

- Silent dialectic failure no longer burns the cadence window.
  _last_dialectic_turn now advances only when the result is non-empty.
  Empty result → next eligible turn retries immediately instead of
  waiting the full cadence gap.

- Thread pile-up guard. queue_prefetch skips when a prior dialectic
  thread is still in-flight, preventing stacked races on _prefetch_result.

- First-turn sync timeout is recoverable. Previously on timeout the
  background thread's result was stored in a dead local list. Now the
  thread writes into _prefetch_result under lock so the next turn
  picks it up.

- Cadence gate applies uniformly. At cadence=1 the old "cadence > 1"
  guard let first-turn sync + same-turn queue_prefetch both fire.
  Gate now always applies.

- Restored query-length reasoning-level scaling, dropped in 9a0ab34c.
  Scales dialecticReasoningLevel up on longer queries (+1 at ≥120 chars,
  +2 at ≥400), clamped at reasoningLevelCap. Two new config keys:
  `reasoningHeuristic` (bool, default true) and `reasoningLevelCap`
  (string, default "high"; previously parsed but never enforced).
  Respects dialecticDepthLevels and proportional lighter-early passes.

- Restored short-prompt skip, dropped in ef7f3156. One-word
  acknowledgements ("ok", "y", "thanks") and slash commands bypass
  both injection and dialectic fire.

- Purged dead code in session.py: prefetch_dialectic, _dialectic_cache,
  set_dialectic_result, pop_dialectic_result — all unused after prewarm
  refactor.

Tests: 542 passed across honcho_plugin/, agent/test_memory_provider.py,
and run_agent/test_run_agent.py. New coverage:
- TestTrivialPromptHeuristic (classifier + prefetch/queue skip)
- TestDialecticCadenceAdvancesOnSuccess (empty-result retry, pile-up guard)
- TestSessionStartDialecticPrewarm (prewarm consumed, sync fallback)
- TestReasoningHeuristic (length bumps, cap clamp, interaction with depth)
- TestDialecticLifecycleSmoke (end-to-end 8-turn session walk)
This commit is contained in:
Erosika 2026-04-18 09:35:42 -04:00 committed by kshitij
parent bf5d7462ba
commit 78586ce036
10 changed files with 665 additions and 107 deletions

View file

@ -77,7 +77,7 @@ Cost and depth are controlled by three independent knobs:
| Knob | Controls | Default |
|------|----------|---------|
| `contextCadence` | Turns between `context()` API calls (base layer refresh) | `1` |
| `dialecticCadence` | Turns between `peer.chat()` LLM calls (dialectic layer refresh) | `3` |
| `dialecticCadence` | Turns between `peer.chat()` LLM calls (dialectic layer refresh) | `1` |
| `dialecticDepth` | Number of `.chat()` passes per dialectic invocation (13) | `1` |
These are orthogonal — you can have frequent context refreshes with infrequent dialectic, or deep multi-pass dialectic at low frequency. Example: `contextCadence: 1, dialecticCadence: 5, dialecticDepth: 2` refreshes base context every turn, runs dialectic every 5 turns, and each dialectic run makes 2 passes.
@ -104,7 +104,7 @@ Honcho is configured in `~/.honcho/config.json` (global) or `$HERMES_HOME/honcho
|-----|---------|-------------|
| `contextTokens` | `null` (uncapped) | Token budget for auto-injected context per turn. Set to an integer (e.g. 1200) to cap. Truncates at word boundaries |
| `contextCadence` | `1` | Minimum turns between `context()` API calls (base layer refresh) |
| `dialecticCadence` | `3` | Minimum turns between `peer.chat()` LLM calls (dialectic layer). In `tools` mode, irrelevant — model calls explicitly |
| `dialecticCadence` | `1` | Minimum turns between `peer.chat()` LLM calls (dialectic layer). In `tools` mode, irrelevant — model calls explicitly |
| `dialecticDepth` | `1` | Number of `.chat()` passes per dialectic invocation. Clamped to 13 |
| `dialecticDepthLevels` | `null` | Optional array of reasoning levels per pass, e.g. `["minimal", "low", "medium"]`. Overrides proportional defaults |
| `dialecticReasoningLevel` | `'low'` | Base reasoning level: `minimal`, `low`, `medium`, `high`, `max` |