feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation

Context Injection Overhaul: - Base layer: peer.context() (representation + card) cached with 5-minute TTL - Dialectic supplement: cadence-gated, cached until next refresh - Trivial prompt skip: short inputs/slash commands skip injection - New peer guard: dialectic skipped at session start when peer has no context - Targeted warm prompt for better dialectic quality Tool Surface (5 bidirectional tools): - honcho_profile: read or update peer card - honcho_search: semantic search over context - honcho_context: full session context (summary, representation, card, messages) - honcho_reasoning: synthesized answer, reasoning_level param - honcho_conclude: create or delete conclusions (PII removal) Cost Safety: - dialectic_cadence defaults to 3 (~66% fewer LLM calls) - context_tokens defaults to uncapped (cap opt-in via config/wizard) - on_turn_start hook wired up (fixes broken cadence/injection gating) Correctness: - Explicit target= on peer context/card fetches (fixes identity blur) - honcho_search perspective fix under directional observation - Timeout config plumbing - peerName precedence over gateway user_id - skip_memory on temp agents (orphan session prevention) - gateway_session_key for stable per-chat session continuity - initOnSessionStart for eager tools-mode init - get_session_context fallback respects peer param - mid -> medium in reasoning level validation ABC changes (minimal, honcho-only): - run_agent.py: gateway_session_key param + memory provider wiring (+5 lines) - gateway/run.py: skip_memory on 2 temp agents, gateway_session_key on main agent (+3 lines) - agent/memory_manager.py: sanitize regex for context tag variants (+9 lines)
2026-04-28 01:21:43 +00:00 · 2026-04-14 18:07:19 -04:00 · 2026-04-14 18:07:19 -04:00 · 11b4c9ecf9
commit 11b4c9ecf9
parent 95d11dfd8e
16 changed files with 1283 additions and 331 deletions
--- a/plugins/memory/honcho/README.md
+++ b/plugins/memory/honcho/README.md
@ -125,7 +125,7 @@ Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced ba
 |-----|------|---------|-------|-------------|
 | `contextTokens` | int | SDK default | root / host | Token budget for `context()` API calls. Also gates prefetch truncation (tokens x 4 chars) |
 | `dialecticReasoningLevel` | string | `"low"` | root / host | Base reasoning level for `peer.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
-| `dialecticDynamic` | bool | `true` | root / host | Auto-bump reasoning based on query length: `<120` chars = base level, `120-400` = +1, `>400` = +2 (capped at `"high"`). Set `false` to always use `dialecticReasoningLevel` as-is |
+| `dialecticDynamic` | bool | `true` | root / host | When `true`, the model can override reasoning level per-call via the `honcho_reasoning` tool `reasoning_level` param (agentic). When `false`, always uses `dialecticReasoningLevel` and ignores model overrides |
 | `dialecticMaxChars` | int | `600` | root / host | Max chars of dialectic result injected into system prompt |
 | `dialecticMaxInputChars` | int | `10000` | root / host | Max chars for dialectic query input to `peer.chat()`. Honcho cloud limit: 10k |
 | `messageMaxChars` | int | `25000` | root / host | Max chars per message sent via `add_messages()`. Messages exceeding this are chunked with `[continued]` markers. Honcho cloud limit: 25k |
@ -139,7 +139,7 @@ These are read from the root config object, not the host block. Must be set manu
 | `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context only on turn 0) |
 | `contextCadence` | int | `1` | Minimum turns between `context()` API calls |
 | `dialecticCadence` | int | `1` | Minimum turns between `peer.chat()` API calls |
-| `reasoningLevelCap` | string | -- | Hard cap on auto-bumped reasoning: `"minimal"`, `"low"`, `"mid"`, `"high"` |
+| `reasoningLevelCap` | string | -- | Hard cap on reasoning level: `"minimal"`, `"low"`, `"medium"`, `"high"` |

 ### Hardcoded Limits (Not Configurable)