feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619)

Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto current main with minimal core modifications. Plugin changes (plugins/memory/honcho/): - New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context) - Two-layer context injection: base context (summary + representation + card) on contextCadence, dialectic supplement on dialecticCadence - Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal - Cold/warm prompt selection based on session state - dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls - Session summary injection for conversational continuity - Bidirectional peer targeting on all 5 tools - Correctness fixes: peer param fallback, None guard on set_peer_card, schema validation, signal_sufficient anchored regex, mid->medium level fix Core changes (~20 lines across 3 files): - agent/memory_manager.py: Enhanced sanitize_context() to strip full <memory-context> blocks and system notes (prevents leak from saveMessages) - run_agent.py: gateway_session_key param for stable per-chat Honcho sessions, on_turn_start() call before prefetch_all() for cadence tracking, sanitize_context() on user messages to strip leaked memory blocks - gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions), gateway_session_key threading to main agent Tests: 509 passed (3 skipped — honcho SDK not installed locally) Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-04-25 00:51:20 +00:00 · 2026-04-15 19:12:19 -07:00 · 2026-04-15 19:12:19 -07:00 · cc6e8941db
commit cc6e8941db
parent 00ff9a26cd
17 changed files with 2632 additions and 396 deletions
--- a/website/docs/user-guide/features/memory-providers.md
+++ b/website/docs/user-guide/features/memory-providers.md
@ -42,7 +42,7 @@ The built-in memory (MEMORY.md / USER.md) continues to work exactly as before. T

 ### Honcho

-AI-native cross-session user modeling with dialectic Q&A, semantic search, and persistent conclusions.
+AI-native cross-session user modeling with dialectic reasoning, session-scoped context injection, semantic search, and persistent conclusions. Base context now includes the session summary alongside user representation and peer cards, giving the agent awareness of what has already been discussed.

 | | |
 |---|---|
@ -51,7 +51,15 @@ AI-native cross-session user modeling with dialectic Q&A, semantic search, and p
 | **Data storage** | Honcho Cloud or self-hosted |
 | **Cost** | Honcho pricing (cloud) / free (self-hosted) |

-**Tools:** `honcho_profile` (peer card), `honcho_search` (semantic search), `honcho_context` (LLM-synthesized), `honcho_conclude` (store facts)
+**Tools (5):** `honcho_profile` (read/update peer card), `honcho_search` (semantic search), `honcho_context` (session context — summary, representation, card, messages), `honcho_reasoning` (LLM-synthesized), `honcho_conclude` (create/delete conclusions)
+
+**Architecture:** Two-layer context injection — a base layer (session summary + representation + peer card, refreshed on `contextCadence`) plus a dialectic supplement (LLM reasoning, refreshed on `dialecticCadence`). The dialectic automatically selects cold-start prompts (general user facts) vs. warm prompts (session-scoped context) based on whether base context exists.
+
+**Three orthogonal config knobs** control cost and depth independently:
+
+- `contextCadence` — how often the base layer refreshes (API call frequency)
+- `dialecticCadence` — how often the dialectic LLM fires (LLM call frequency)
+- `dialecticDepth` — how many `.chat()` passes per dialectic invocation (1–3, depth of reasoning)

 **Setup Wizard:**
 ```bash
@ -63,7 +71,7 @@ hermes memory setup        # select "honcho"
 **Config:** `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.json` (global). Resolution order: `$HERMES_HOME/honcho.json` > `~/.hermes/honcho.json` > `~/.honcho/config.json`. See the [config reference](https://github.com/hermes-ai/hermes-agent/blob/main/plugins/memory/honcho/README.md) and the [Honcho integration guide](https://docs.honcho.dev/v3/guides/integrations/hermes).

 <details>
-<summary>Key config options</summary>
+<summary>Full config reference</summary>

 | Key | Default | Description |
 |-----|---------|-------------|
@ -72,13 +80,21 @@ hermes memory setup        # select "honcho"
 | `peerName` | -- | User peer identity |
 | `aiPeer` | host key | AI peer identity (one per profile) |
 | `workspace` | host key | Shared workspace ID |
-| `recallMode` | `hybrid` | `hybrid` (auto-inject + tools), `context` (inject only), `tools` (tools only) |
-| `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
-| `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
-| `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
-| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
-| `dialecticDynamic` | `true` | Auto-bump reasoning by query length |
+| `contextTokens` | `null` (uncapped) | Token budget for auto-injected context per turn. Truncates at word boundaries |
+| `contextCadence` | `1` | Minimum turns between `context()` API calls (base layer refresh) |
+| `dialecticCadence` | `3` | Minimum turns between `peer.chat()` LLM calls. Only applies to `hybrid`/`context` modes |
+| `dialecticDepth` | `1` | Number of `.chat()` passes per dialectic invocation. Clamped 1–3. Pass 0: cold/warm prompt, pass 1: self-audit, pass 2: reconciliation |
+| `dialecticDepthLevels` | `null` | Optional array of reasoning levels per pass, e.g. `["minimal", "low", "medium"]`. Overrides proportional defaults |
+| `dialecticReasoningLevel` | `'low'` | Base reasoning level: `minimal`, `low`, `medium`, `high`, `max` |
+| `dialecticDynamic` | `true` | When `true`, model can override reasoning level per-call via tool param |
+| `dialecticMaxChars` | `600` | Max chars of dialectic result injected into system prompt |
+| `recallMode` | `'hybrid'` | `hybrid` (auto-inject + tools), `context` (inject only), `tools` (tools only) |
+| `writeFrequency` | `'async'` | When to flush messages: `async` (background thread), `turn` (sync), `session` (batch on end), or integer N |
+| `saveMessages` | `true` | Whether to persist messages to Honcho API |
+| `observationMode` | `'directional'` | `directional` (all on) or `unified` (shared pool). Override with `observation` object |
 | `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
+| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input to `peer.chat()` |
+| `sessionStrategy` | `'per-directory'` | `per-directory`, `per-repo`, `per-session`, `global` |

 </details>

@ -165,7 +181,10 @@ This inherits settings from the default `hermes` host block and creates new AI p
      },
      "dialecticReasoningLevel": "low",
      "dialecticDynamic": true,
+      "dialecticCadence": 3,
+      "dialecticDepth": 1,
      "dialecticMaxChars": 600,
+      "contextCadence": 1,
      "messageMaxChars": 25000,
      "saveMessages": true
    },
@ -462,7 +481,7 @@ echo 'SUPERMEMORY_API_KEY=***' >> ~/.hermes/.env

 | Provider | Storage | Cost | Tools | Dependencies | Unique Feature |
 |----------|---------|------|-------|-------------|----------------|
-| **Honcho** | Cloud | Paid | 4 | `honcho-ai` | Dialectic user modeling |
+| **Honcho** | Cloud | Paid | 5 | `honcho-ai` | Dialectic user modeling + session-scoped context |
 | **OpenViking** | Self-hosted | Free | 5 | `openviking` + server | Filesystem hierarchy + tiered loading |
 | **Mem0** | Cloud | Paid | 3 | `mem0ai` | Server-side LLM extraction |
 | **Hindsight** | Cloud/Local | Free/Paid | 3 | `hindsight-client` | Knowledge graph + reflect synthesis |