feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619)

Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto current main with minimal core modifications. Plugin changes (plugins/memory/honcho/): - New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context) - Two-layer context injection: base context (summary + representation + card) on contextCadence, dialectic supplement on dialecticCadence - Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal - Cold/warm prompt selection based on session state - dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls - Session summary injection for conversational continuity - Bidirectional peer targeting on all 5 tools - Correctness fixes: peer param fallback, None guard on set_peer_card, schema validation, signal_sufficient anchored regex, mid->medium level fix Core changes (~20 lines across 3 files): - agent/memory_manager.py: Enhanced sanitize_context() to strip full <memory-context> blocks and system notes (prevents leak from saveMessages) - run_agent.py: gateway_session_key param for stable per-chat Honcho sessions, on_turn_start() call before prefetch_all() for cadence tracking, sanitize_context() on user messages to strip leaked memory blocks - gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions), gateway_session_key threading to main agent Tests: 509 passed (3 skipped — honcho SDK not installed locally) Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-06-25 11:02:03 +00:00 · 2026-04-15 19:12:19 -07:00 · 2026-04-15 19:12:19 -07:00 · cc6e8941db
commit cc6e8941db
parent 00ff9a26cd
17 changed files with 2632 additions and 396 deletions
--- a/agent/memory_manager.py
+++ b/agent/memory_manager.py
@ -28,6 +28,7 @@ Usage in run_agent.py:

 from __future__ import annotations

+import json
 import logging
 import re
 from typing import Any, Dict, List, Optional
@ -43,11 +44,22 @@ logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------

 _FENCE_TAG_RE = re.compile(r'</?\s*memory-context\s*>', re.IGNORECASE)
+_INTERNAL_CONTEXT_RE = re.compile(
+    r'<\s*memory-context\s*>[\s\S]*?</\s*memory-context\s*>',
+    re.IGNORECASE,
+)
+_INTERNAL_NOTE_RE = re.compile(
+    r'\[System note:\s*The following is recalled memory context,\s*NOT new user input\.\s*Treat as informational background data\.\]\s*',
+    re.IGNORECASE,
+)


 def sanitize_context(text: str) -> str:
-    """Strip fence-escape sequences from provider output."""
-    return _FENCE_TAG_RE.sub('', text)
+    """Strip fence tags, injected context blocks, and system notes from provider output."""
+    text = _INTERNAL_CONTEXT_RE.sub('', text)
+    text = _INTERNAL_NOTE_RE.sub('', text)
+    text = _FENCE_TAG_RE.sub('', text)
+    return text


 def build_memory_context_block(raw_context: str) -> str:
--- a/gateway/run.py
+++ b/gateway/run.py
@ -3739,6 +3739,7 @@ class GatewayRunner:
                                    model=_hyg_model,
                                    max_iterations=4,
                                    quiet_mode=True,
+                                    skip_memory=True,
                                    enabled_toolsets=["memory"],
                                    session_id=session_entry.session_id,
                                )
@ -6221,6 +6222,7 @@ class GatewayRunner:
                model=model,
                max_iterations=4,
                quiet_mode=True,
+                skip_memory=True,
                enabled_toolsets=["memory"],
                session_id=session_entry.session_id,
            )
@ -8588,6 +8590,7 @@ class GatewayRunner:
                    session_id=session_id,
                    platform=platform_key,
                    user_id=source.user_id,
+                    gateway_session_key=session_key,
                    session_db=self._session_db,
                    fallback_model=self._fallback_model,
                )
--- a/optional-skills/autonomous-ai-agents/honcho/SKILL.md
+++ b/optional-skills/autonomous-ai-agents/honcho/SKILL.md
@ -1,12 +1,12 @@
 ---
 name: honcho
-description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, and dialectic reasoning. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation and recall settings.
-version: 1.0.0
+description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
+version: 2.0.0
 author: Hermes Agent
 license: MIT
 metadata:
  hermes:
-    tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling]
+    tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling, Session-Summary]
    homepage: https://docs.honcho.dev
    related_skills: [hermes-agent]
 prerequisites:
@ -22,8 +22,9 @@ Honcho provides AI-native cross-session user modeling. It learns who the user is
 - Setting up Honcho (cloud or self-hosted)
 - Troubleshooting memory not working / peers not syncing
 - Creating multi-profile setups where each agent has its own Honcho peer
- Tuning observation, recall, or write frequency settings
- Understanding what the 4 Honcho tools do and when to use them
+- Tuning observation, recall, dialectic depth, or write frequency settings
+- Understanding what the 5 Honcho tools do and when to use them
+- Configuring context budgets and session summary injection

 ## Setup

@ -51,6 +52,27 @@ hermes honcho status    # shows resolved config, connection test, peer info

 ## Architecture

+### Base Context Injection
+
+When Honcho injects context into the system prompt (in `hybrid` or `context` recall modes), it assembles the base context block in this order:
+
+1. **Session summary** -- a short digest of the current session so far (placed first so the model has immediate conversational continuity)
+2. **User representation** -- Honcho's accumulated model of the user (preferences, facts, patterns)
+3. **AI peer card** -- the identity card for this Hermes profile's AI peer
+
+The session summary is generated automatically by Honcho at the start of each turn (when a prior session exists). It gives the model a warm start without replaying full history.
+
+### Cold / Warm Prompt Selection
+
+Honcho automatically selects between two prompt strategies:
+
+| Condition | Strategy | What happens |
+|-----------|----------|--------------|
+| No prior session or empty representation | **Cold start** | Lightweight intro prompt; skips summary injection; encourages the model to learn about the user |
+| Existing representation and/or session history | **Warm start** | Full base context injection (summary → representation → card); richer system prompt |
+
+You do not need to configure this -- it is automatic based on session state.
+
 ### Peers

 Honcho models conversations as interactions between **peers**. Hermes creates two peers per session:
@ -112,6 +134,63 @@ How the agent accesses Honcho memory:
 | `context` | Yes | No (hidden) | Minimal token cost, no tool calls |
 | `tools` | No | Yes | Agent controls all memory access explicitly |

+## Three Orthogonal Knobs
+
+Honcho's dialectic behavior is controlled by three independent dimensions. Each can be tuned without affecting the others:
+
+### Cadence (when)
+
+Controls **how often** dialectic and context calls happen.
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `contextCadence` | `1` | Min turns between context API calls |
+| `dialecticCadence` | `3` | Min turns between dialectic API calls |
+| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` for base context injection |
+
+Higher cadence values reduce API calls and cost. `dialecticCadence: 3` (default) means the dialectic engine fires at most every 3rd turn.
+
+### Depth (how many)
+
+Controls **how many rounds** of dialectic reasoning Honcho performs per query.
+
+| Key | Default | Range | Description |
+|-----|---------|-------|-------------|
+| `dialecticDepth` | `1` | 1-3 | Number of dialectic reasoning rounds per query |
+| `dialecticDepthLevels` | -- | array | Optional per-depth-round level overrides (see below) |
+
+`dialecticDepth: 2` means Honcho runs two rounds of dialectic synthesis. The first round produces an initial answer; the second refines it.
+
+`dialecticDepthLevels` lets you set the reasoning level for each round independently:
+
+```json
+{
+  "dialecticDepth": 3,
+  "dialecticDepthLevels": ["low", "medium", "high"]
+}
+```
+
+If `dialecticDepthLevels` is omitted, rounds use **proportional levels** derived from `dialecticReasoningLevel` (the base):
+
+| Depth | Pass levels |
+|-------|-------------|
+| 1 | [base] |
+| 2 | [minimal, base] |
+| 3 | [minimal, base, low] |
+
+This keeps earlier passes cheap while using full depth on the final synthesis.
+
+### Level (how hard)
+
+Controls the **intensity** of each dialectic reasoning round.
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
+| `dialecticDynamic` | `true` | When `true`, the model can pass `reasoning_level` to `honcho_reasoning` to override the default per-call. `false` = always use `dialecticReasoningLevel`, model overrides ignored |
+
+Higher levels produce richer synthesis but cost more tokens on Honcho's backend.
+
 ## Multi-Profile Setup

 Each Hermes profile gets its own Honcho AI peer while sharing the same workspace (user context). This means:
@ -149,6 +228,7 @@ Override any setting in the host block:
    "hermes.coder": {
      "aiPeer": "coder",
      "recallMode": "tools",
+      "dialecticDepth": 2,
      "observation": {
        "user": { "observeMe": true, "observeOthers": false },
        "ai": { "observeMe": true, "observeOthers": true }
@ -160,19 +240,97 @@ Override any setting in the host block:

 ## Tools

-The agent has 4 Honcho tools (hidden in `context` recall mode):
+The agent has 5 bidirectional Honcho tools (hidden in `context` recall mode):
+
+| Tool | LLM call? | Cost | Use when |
+|------|-----------|------|----------|
+| `honcho_profile` | No | minimal | Quick factual snapshot at conversation start or for fast name/role/pref lookups |
+| `honcho_search` | No | low | Fetch specific past facts to reason over yourself — raw excerpts, no synthesis |
+| `honcho_context` | No | low | Full session context snapshot: summary, representation, card, recent messages |
+| `honcho_reasoning` | Yes | medium–high | Natural language question synthesized by Honcho's dialectic engine |
+| `honcho_conclude` | No | minimal | Write or delete a persistent fact; pass `peer: "ai"` for AI self-knowledge |

 ### `honcho_profile`
-Quick factual snapshot of the user -- name, role, preferences, patterns. No LLM call, minimal cost. Use at conversation start or for fast lookups.
+Read or update a peer card — curated key facts (name, role, preferences, communication style). Pass `card: [...]` to update; omit to read. No LLM call.

 ### `honcho_search`
-Semantic search over stored context. Returns raw excerpts ranked by relevance, no LLM synthesis. Default 800 tokens, max 2000. Use when you want specific past facts to reason over yourself.
+Semantic search over stored context for a specific peer. Returns raw excerpts ranked by relevance, no synthesis. Default 800 tokens, max 2000. Good when you need specific past facts to reason over yourself rather than a synthesized answer.

 ### `honcho_context`
-Natural language question answered by Honcho's dialectic reasoning (LLM call on Honcho's backend). Higher cost, higher quality. Can query about user (default) or the AI peer.
+Full session context snapshot from Honcho — session summary, peer representation, peer card, and recent messages. No LLM call. Use when you want to see everything Honcho knows about the current session and peer in one shot.
+
+### `honcho_reasoning`
+Natural language question answered by Honcho's dialectic reasoning engine (LLM call on Honcho's backend). Higher cost, higher quality. Pass `reasoning_level` to control depth: `minimal` (fast/cheap) → `low` → `medium` → `high` → `max` (thorough). Omit to use the configured default (`low`). Use for synthesized understanding of the user's patterns, goals, or current state.

 ### `honcho_conclude`
-Write a persistent fact about the user. Conclusions build the user's profile over time. Use when the user states a preference, corrects you, or shares something to remember.
+Write or delete a persistent conclusion about a peer. Pass `conclusion: "..."` to create. Pass `delete_id: "..."` to remove a conclusion (for PII removal — Honcho self-heals incorrect conclusions over time, so deletion is only needed for PII). You MUST pass exactly one of the two.
+
+### Bidirectional peer targeting
+
+All 5 tools accept an optional `peer` parameter:
+- `peer: "user"` (default) — operates on the user peer
+- `peer: "ai"` — operates on this profile's AI peer
+- `peer: "<explicit-id>"` — any peer ID in the workspace
+
+Examples:
+```
+honcho_profile                        # read user's card
+honcho_profile peer="ai"              # read AI peer's card
+honcho_reasoning query="What does this user care about most?"
+honcho_reasoning query="What are my interaction patterns?" peer="ai" reasoning_level="medium"
+honcho_conclude conclusion="Prefers terse answers"
+honcho_conclude conclusion="I tend to over-explain code" peer="ai"
+honcho_conclude delete_id="abc123"    # PII removal
+```
+
+## Agent Usage Patterns
+
+Guidelines for Hermes when Honcho memory is active.
+
+### On conversation start
+
+```
+1. honcho_profile                  → fast warmup, no LLM cost
+2. If context looks thin → honcho_context  (full snapshot, still no LLM)
+3. If deep synthesis needed → honcho_reasoning  (LLM call, use sparingly)
+```
+
+Do NOT call `honcho_reasoning` on every turn. Auto-injection already handles ongoing context refresh. Use the reasoning tool only when you genuinely need synthesized insight the base context doesn't provide.
+
+### When the user shares something to remember
+
+```
+honcho_conclude conclusion="<specific, actionable fact>"
+```
+
+Good conclusions: "Prefers code examples over prose explanations", "Working on a Rust async project through April 2026"
+Bad conclusions: "User said something about Rust" (too vague), "User seems technical" (already in representation)
+
+### When the user asks about past context / you need to recall specifics
+
+```
+honcho_search query="<topic>"       → fast, no LLM, good for specific facts
+honcho_context                       → full snapshot with summary + messages
+honcho_reasoning query="<question>"  → synthesized answer, use when search isn't enough
+```
+
+### When to use `peer: "ai"`
+
+Use AI peer targeting to build and query the agent's own self-knowledge:
+- `honcho_conclude conclusion="I tend to be verbose when explaining architecture" peer="ai"` — self-correction
+- `honcho_reasoning query="How do I typically handle ambiguous requests?" peer="ai"` — self-audit
+- `honcho_profile peer="ai"` — review own identity card
+
+### When NOT to call tools
+
+In `hybrid` and `context` modes, base context (user representation + card + session summary) is auto-injected before every turn. Do not re-fetch what was already injected. Call tools only when:
+- You need something the injected context doesn't have
+- The user explicitly asks you to recall or check memory
+- You're writing a conclusion about something new
+
+### Cadence awareness
+
+`honcho_reasoning` on the tool side shares the same cost as auto-injection dialectic. After an explicit tool call, the auto-injection cadence resets — avoiding double-charging the same turn.

 ## Config Reference

@ -191,18 +349,39 @@ Config file: `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.jso
 | `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
 | `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
 | `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
-| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
-| `dialecticDynamic` | `true` | Auto-bump reasoning by query length. `false` = fixed level |
 | `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
-| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |

-### Cost-awareness (advanced, root config only)
+### Dialectic settings

 | Key | Default | Description |
 |-----|---------|-------------|
+| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
+| `dialecticDynamic` | `true` | Auto-bump reasoning by query complexity. `false` = fixed level |
+| `dialecticDepth` | `1` | Number of dialectic rounds per query (1-3) |
+| `dialecticDepthLevels` | -- | Optional array of per-round levels, e.g. `["low", "high"]` |
+| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |
+
+### Context budget and injection
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `contextTokens` | uncapped | Max tokens for the combined base context injection (summary + representation + card). Opt-in cap — omit to leave uncapped, set to an integer to bound injection size. |
 | `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` |
 | `contextCadence` | `1` | Min turns between context API calls |
-| `dialecticCadence` | `1` | Min turns between dialectic API calls |
+| `dialecticCadence` | `3` | Min turns between dialectic LLM calls |
+
+The `contextTokens` budget is enforced at injection time. If the session summary + representation + card exceed the budget, Honcho trims the summary first, then the representation, preserving the card. This prevents context blowup in long sessions.
+
+### Memory-context sanitization
+
+Honcho sanitizes the `memory-context` block before injection to prevent prompt injection and malformed content:
+
+- Strips XML/HTML tags from user-authored conclusions
+- Normalizes whitespace and control characters
+- Truncates individual conclusions that exceed `messageMaxChars`
+- Escapes delimiter sequences that could break the system prompt structure
+
+This fix addresses edge cases where raw user conclusions containing markup or special characters could corrupt the injected context block.

 ## Troubleshooting

@ -221,6 +400,12 @@ Observation config is synced from the server on each session init. Start a new s
 ### Messages truncated
 Messages over `messageMaxChars` (default 25k) are automatically chunked with `[continued]` markers. If you're hitting this often, check if tool results or skill content is inflating message size.

+### Context injection too large
+If you see warnings about context budget exceeded, lower `contextTokens` or reduce `dialecticDepth`. The session summary is trimmed first when the budget is tight.
+
+### Session summary missing
+Session summary requires at least one prior turn in the current Honcho session. On cold start (new session, no history), the summary is omitted and Honcho uses the cold-start prompt strategy instead.
+
 ## CLI Commands

 | Command | Description |
--- a/plugins/memory/honcho/README.md
+++ b/plugins/memory/honcho/README.md
@ -1,6 +1,6 @@
 # Honcho Memory Provider

-AI-native cross-session user modeling with dialectic Q&A, semantic search, peer cards, and persistent conclusions.
+AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.

 > **Honcho docs:** <https://docs.honcho.dev/v3/guides/integrations/hermes>

@ -19,9 +19,86 @@ hermes memory setup    # generic picker, also works
 Or manually:
 ```bash
 hermes config set memory.provider honcho
-echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env
+echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
 ```

+## Architecture Overview
+
+### Two-Layer Context Injection
+
+Context is injected into the **user message** at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in `<memory-context>` fences with a system note clarifying it's background data, not new user input.
+
+Two independent layers, each on its own cadence:
+
+**Layer 1 — Base context** (refreshed every `contextCadence` turns):
+1. **SESSION SUMMARY** — from `session.context(summary=True)`, placed first
+2. **User Representation** — Honcho's evolving model of the user
+3. **User Peer Card** — key facts snapshot
+4. **AI Self-Representation** — Honcho's model of the AI peer
+5. **AI Identity Card** — AI peer facts
+
+**Layer 2 — Dialectic supplement** (fired every `dialecticCadence` turns):
+Multi-pass `.chat()` reasoning about the user, appended after base context.
+
+Both layers are joined, then truncated to fit `contextTokens` budget via `_truncate_to_budget` (tokens × 4 chars, word-boundary safe).
+
+### Cold Start vs Warm Session Prompts
+
+Dialectic pass 0 automatically selects its prompt based on session state:
+
+- **Cold** (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
+- **Warm** (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
+
+Not configurable — determined automatically.
+
+### Dialectic Depth (Multi-Pass Reasoning)
+
+`dialecticDepth` (1–3, clamped) controls how many `.chat()` calls fire per dialectic cycle:
+
+| Depth | Passes | Behavior |
+|-------|--------|----------|
+| 1 | single `.chat()` | Base query only (cold or warm prompt) |
+| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
+| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
+
+### Proportional Reasoning Levels
+
+When `dialecticDepthLevels` is not set, each pass uses a proportional level relative to `dialecticReasoningLevel` (the "base"):
+
+| Depth | Pass levels |
+|-------|-------------|
+| 1 | [base] |
+| 2 | [minimal, base] |
+| 3 | [minimal, base, low] |
+
+Override with `dialecticDepthLevels`: an explicit array of reasoning level strings per pass.
+
+### Three Orthogonal Dialectic Knobs
+
+| Knob | Controls | Type |
+|------|----------|------|
+| `dialecticCadence` | How often — minimum turns between dialectic firings | int |
+| `dialecticDepth` | How many — passes per firing (1–3) | int |
+| `dialecticReasoningLevel` | How hard — reasoning ceiling per `.chat()` call | string |
+
+### Input Sanitization
+
+`run_conversation` strips leaked `<memory-context>` blocks from user input before processing. When `saveMessages` persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes `<memory-context>` blocks plus associated system notes.
+
+## Tools
+
+Five bidirectional tools. All accept an optional `peer` parameter (`"user"` or `"ai"`, default `"user"`).
+
+| Tool | LLM call? | Description |
+|------|-----------|-------------|
+| `honcho_profile` | No | Peer card — key facts snapshot |
+| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
+| `honcho_context` | No | Full session context: summary, representation, card, messages |
+| `honcho_reasoning` | Yes | LLM-synthesized answer via dialectic `.chat()` |
+| `honcho_conclude` | No | Write a persistent fact/conclusion about the user |
+
+Tool visibility depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
+
 ## Config Resolution

 Config is read from the first file that exists:
@ -34,42 +111,128 @@ Config is read from the first file that exists:

 Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>`.

-## Tools
-
-| Tool | LLM call? | Description |
-|------|-----------|-------------|
-| `honcho_profile` | No | User's peer card -- key facts snapshot |
-| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
-| `honcho_context` | Yes | LLM-synthesized answer via dialectic reasoning |
-| `honcho_conclude` | No | Write a persistent fact about the user |
-
-Tool availability depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
+For every key, resolution order is: **host block > root > env var > default**.

 ## Full Configuration Reference

 ### Identity & Connection

-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `apiKey` | string | -- | root / host | API key. Falls back to `HONCHO_API_KEY` env var |
-| `baseUrl` | string | -- | root | Base URL for self-hosted Honcho. Local URLs (`localhost`, `127.0.0.1`, `::1`) auto-skip API key auth |
-| `environment` | string | `"production"` | root / host | SDK environment mapping |
-| `enabled` | bool | auto | root / host | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
-| `workspace` | string | host key | root / host | Honcho workspace ID |
-| `peerName` | string | -- | root / host | User peer identity |
-| `aiPeer` | string | host key | root / host | AI peer identity |
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `apiKey` | string | — | API key. Falls back to `HONCHO_API_KEY` env var |
+| `baseUrl` | string | — | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
+| `environment` | string | `"production"` | SDK environment mapping |
+| `enabled` | bool | auto | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
+| `workspace` | string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
+| `peerName` | string | — | User peer identity |
+| `aiPeer` | string | host key | AI peer identity |

 ### Memory & Recall

-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `recallMode` | string | `"hybrid"` | root / host | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` normalizes to `"hybrid"` |
-| `observationMode` | string | `"directional"` | root / host | Shorthand preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
-| `observation` | object | -- | root / host | Per-peer observation config (see below) |
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `recallMode` | string | `"hybrid"` | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` → `"hybrid"` |
+| `observationMode` | string | `"directional"` | Preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
+| `observation` | object | — | Per-peer observation config (see Observation section) |

-#### Observation (granular)
+### Write Behavior

-Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block -- each profile can have different observation settings. When present, overrides `observationMode` preset.
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `writeFrequency` | string/int | `"async"` | `"async"` (background), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
+| `saveMessages` | bool | `true` | Persist messages to Honcho API |
+
+### Session Resolution
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `sessionStrategy` | string | `"per-directory"` | `"per-directory"`, `"per-session"`, `"per-repo"` (git root), `"global"` |
+| `sessionPeerPrefix` | bool | `false` | Prepend peer name to session keys |
+| `sessions` | object | `{}` | Manual directory-to-session-name mappings |
+
+#### Session Name Resolution
+
+The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
+
+| Priority | Source | Example session name |
+|----------|--------|---------------------|
+| 1 | Manual map (`sessions` config) | `"myproject-main"` |
+| 2 | `/title` command (mid-session rename) | `"refactor-auth"` |
+| 3 | Gateway session key (Telegram, Discord, etc.) | `"agent-main-telegram-dm-8439114563"` |
+| 4 | `per-session` strategy | Hermes session ID (`20260415_a3f2b1`) |
+| 5 | `per-repo` strategy | Git root directory name (`hermes-agent`) |
+| 6 | `per-directory` strategy | Current directory basename (`src`) |
+| 7 | `global` strategy | Workspace name (`hermes`) |
+
+Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of `sessionStrategy`. The strategy setting only affects CLI sessions.
+
+If `sessionPeerPrefix` is `true`, the peer name is prepended: `eri-hermes-agent`.
+
+#### What each strategy produces
+
+- **`per-directory`** — basename of `$PWD`. Opening hermes in `~/code/myapp` and `~/code/other` gives two separate sessions. Same directory = same session across runs.
+- **`per-repo`** — git root directory name. All subdirectories within a repo share one session. Falls back to `per-directory` if not inside a git repo.
+- **`per-session`** — Hermes session ID (timestamp + hex). Every `hermes` invocation starts a fresh Honcho session. Falls back to `per-directory` if no session ID is available.
+- **`global`** — workspace name. One session for everything. Memory accumulates across all directories and runs.
+
+### Multi-Profile Pattern
+
+Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is **host block > root > env var > default** — host blocks inherit from root, so shared settings only need to be declared once:
+
+```json
+{
+  "apiKey": "***",
+  "workspace": "hermes",
+  "peerName": "yourname",
+  "hosts": {
+    "hermes": {
+      "aiPeer": "hermes",
+      "recallMode": "hybrid",
+      "sessionStrategy": "per-directory"
+    },
+    "hermes.coder": {
+      "aiPeer": "coder",
+      "recallMode": "tools",
+      "sessionStrategy": "per-repo"
+    }
+  }
+}
+```
+
+Both profiles see the same user (`yourname`) in the same shared environment (`hermes`), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
+
+Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>` (e.g. `hermes -p coder` → host key `hermes.coder`).
+
+### Dialectic & Reasoning
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `dialecticDepth` | int | `1` | Passes per dialectic cycle (1–3, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
+| `dialecticDepthLevels` | array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: `["minimal", "low", "medium"]` |
+| `dialecticReasoningLevel` | string | `"low"` | Base reasoning level for `.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
+| `dialecticDynamic` | bool | `true` | When `true`, model can override reasoning level per-call via `honcho_reasoning` tool. When `false`, always uses `dialecticReasoningLevel` |
+| `dialecticMaxChars` | int | `600` | Max chars of dialectic result injected into system prompt |
+| `dialecticMaxInputChars` | int | `10000` | Max chars for dialectic query input to `.chat()`. Honcho cloud limit: 10k |
+
+### Token Budgets
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `contextTokens` | int | SDK default | Token budget for `context()` API calls. Also gates prefetch truncation (tokens × 4 chars) |
+| `messageMaxChars` | int | `25000` | Max chars per message sent via `add_messages()`. Exceeding this triggers chunking with `[continued]` markers. Honcho cloud limit: 25k |
+
+### Cadence (Cost Control)
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `contextCadence` | int | `1` | Minimum turns between base context refreshes (session summary + representation + card) |
+| `dialecticCadence` | int | `1` | Minimum turns between dialectic `.chat()` firings |
+| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context on the first user message only, skip from turn 2 onward) |
+| `reasoningLevelCap` | string | — | Hard cap on reasoning level: `"minimal"`, `"low"`, `"medium"`, `"high"` |
+
+### Observation (Granular)
+
+Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. When present, overrides `observationMode` preset.

 ```json
 "observation": {
@ -85,74 +248,16 @@ Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block
 | `ai.observeMe` | `true` | AI peer self-observation (Honcho builds AI representation) |
 | `ai.observeOthers` | `true` | AI peer observes user messages (enables cross-peer dialectic) |

-Presets for `observationMode`:
- `"directional"` (default): all four booleans `true`
+Presets:
+- `"directional"` (default): all four `true`
 - `"unified"`: user `observeMe=true`, AI `observeOthers=true`, rest `false`

-Per-profile example -- coder profile observes the user but user doesn't observe coder:
+### Hardcoded Limits

-```json
-"hosts": {
-  "hermes.coder": {
-    "observation": {
-      "user": { "observeMe": true, "observeOthers": false },
-      "ai":   { "observeMe": true, "observeOthers": true }
-    }
-  }
-}
-```
-
-Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced back on session init.
-
-### Write Behavior
-
-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `writeFrequency` | string or int | `"async"` | root / host | `"async"` (background thread), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
-| `saveMessages` | bool | `true` | root / host | Whether to persist messages to Honcho API |
-
-### Session Resolution
-
-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `sessionStrategy` | string | `"per-directory"` | root / host | `"per-directory"`, `"per-session"` (new each run), `"per-repo"` (git root name), `"global"` (single session) |
-| `sessionPeerPrefix` | bool | `false` | root / host | Prepend peer name to session keys |
-| `sessions` | object | `{}` | root | Manual directory-to-session-name mappings: `{"/path/to/project": "my-session"}` |
-
-### Token Budgets & Dialectic
-
-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `contextTokens` | int | SDK default | root / host | Token budget for `context()` API calls. Also gates prefetch truncation (tokens x 4 chars) |
-| `dialecticReasoningLevel` | string | `"low"` | root / host | Base reasoning level for `peer.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
-| `dialecticDynamic` | bool | `true` | root / host | Auto-bump reasoning based on query length: `<120` chars = base level, `120-400` = +1, `>400` = +2 (capped at `"high"`). Set `false` to always use `dialecticReasoningLevel` as-is |
-| `dialecticMaxChars` | int | `600` | root / host | Max chars of dialectic result injected into system prompt |
-| `dialecticMaxInputChars` | int | `10000` | root / host | Max chars for dialectic query input to `peer.chat()`. Honcho cloud limit: 10k |
-| `messageMaxChars` | int | `25000` | root / host | Max chars per message sent via `add_messages()`. Messages exceeding this are chunked with `[continued]` markers. Honcho cloud limit: 25k |
-
-### Cost Awareness (Advanced)
-
-These are read from the root config object, not the host block. Must be set manually in `honcho.json`.
-
-| Key | Type | Default | Description |
-|-----|------|---------|-------------|
-| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context only on turn 0) |
-| `contextCadence` | int | `1` | Minimum turns between `context()` API calls |
-| `dialecticCadence` | int | `1` | Minimum turns between `peer.chat()` API calls |
-| `reasoningLevelCap` | string | -- | Hard cap on auto-bumped reasoning: `"minimal"`, `"low"`, `"mid"`, `"high"` |
-
-### Hardcoded Limits (Not Configurable)
-
-| Limit | Value | Location |
-|-------|-------|----------|
-| Search tool max tokens | 2000 (hard cap), 800 (default) | `__init__.py` handle_tool_call |
-| Peer card fetch tokens | 200 | `session.py` get_peer_card |
-
-## Config Precedence
-
-For every key, resolution order is: **host block > root > env var > default**.
-
-Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile>`) > `"hermes"`.
+| Limit | Value |
+|-------|-------|
+| Search tool max tokens | 2000 (hard cap), 800 (default) |
+| Peer card fetch tokens | 200 |

 ## Environment Variables

@ -182,15 +287,16 @@ Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile

 ```json
 {
-  "apiKey": "your-key",
+  "apiKey": "***",
  "workspace": "hermes",
-  "peerName": "eri",
+  "peerName": "username",
+  "contextCadence": 2,
+  "dialecticCadence": 3,
+  "dialecticDepth": 2,
  "hosts": {
    "hermes": {
      "enabled": true,
      "aiPeer": "hermes",
-      "workspace": "hermes",
-      "peerName": "eri",
      "recallMode": "hybrid",
      "observation": {
        "user": { "observeMe": true, "observeOthers": true },
@ -199,14 +305,16 @@ Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile
      "writeFrequency": "async",
      "sessionStrategy": "per-directory",
      "dialecticReasoningLevel": "low",
+      "dialecticDepth": 2,
      "dialecticMaxChars": 600,
      "saveMessages": true
    },
    "hermes.coder": {
      "enabled": true,
      "aiPeer": "coder",
-      "workspace": "hermes",
-      "peerName": "eri",
+      "sessionStrategy": "per-repo",
+      "dialecticDepth": 1,
+      "dialecticDepthLevels": ["low"],
      "observation": {
        "user": { "observeMe": true, "observeOthers": false },
        "ai": { "observeMe": true, "observeOthers": true }
--- a/plugins/memory/honcho/init.py
+++ b/plugins/memory/honcho/init.py
@ -17,6 +17,7 @@ from __future__ import annotations

 import json
 import logging
+import re
 import threading
 from typing import Any, Dict, List, Optional

@ -33,20 +34,33 @@ logger = logging.getLogger(__name__)
 PROFILE_SCHEMA = {
    "name": "honcho_profile",
    "description": (
-        "Retrieve the user's peer card from Honcho — a curated list of key facts "
-        "about them (name, role, preferences, communication style, patterns). "
-        "Fast, no LLM reasoning, minimal cost. "
-        "Use this at conversation start or when you need a quick factual snapshot."
+        "Retrieve or update a peer card from Honcho — a curated list of key facts "
+        "about that peer (name, role, preferences, communication style, patterns). "
+        "Pass `card` to update; omit `card` to read."
    ),
-    "parameters": {"type": "object", "properties": {}, "required": []},
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
+            "card": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": "New peer card as a list of fact strings. Omit to read the current card.",
+            },
+        },
+        "required": [],
+    },
 }

 SEARCH_SCHEMA = {
    "name": "honcho_search",
    "description": (
-        "Semantic search over Honcho's stored context about the user. "
+        "Semantic search over Honcho's stored context about a peer. "
        "Returns raw excerpts ranked by relevance — no LLM synthesis. "
-        "Cheaper and faster than honcho_context. "
+        "Cheaper and faster than honcho_reasoning. "
        "Good when you want to find specific past facts and reason over them yourself."
    ),
    "parameters": {
@ -60,17 +74,23 @@ SEARCH_SCHEMA = {
                "type": "integer",
                "description": "Token budget for returned context (default 800, max 2000).",
            },
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
        },
        "required": ["query"],
    },
 }

-CONTEXT_SCHEMA = {
-    "name": "honcho_context",
+REASONING_SCHEMA = {
+    "name": "honcho_reasoning",
    "description": (
        "Ask Honcho a natural language question and get a synthesized answer. "
        "Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
-        "Can query about any peer: the user (default) or the AI assistant."
+        "Can query about any peer via alias or explicit peer ID. "
+        "Pass reasoning_level to control depth: minimal (fast/cheap), low (default), "
+        "medium, high, max (deep/expensive). Omit for configured default."
    ),
    "parameters": {
        "type": "object",
@ -79,37 +99,87 @@ CONTEXT_SCHEMA = {
                "type": "string",
                "description": "A natural language question.",
            },
+            "reasoning_level": {
+                "type": "string",
+                "description": (
+                    "Override the default reasoning depth. "
+                    "Omit to use the configured default (typically low). "
+                    "Guide:\n"
+                    "- minimal: quick factual lookups (name, role, simple preference)\n"
+                    "- low: straightforward questions with clear answers\n"
+                    "- medium: multi-aspect questions requiring synthesis across observations\n"
+                    "- high: complex behavioral patterns, contradictions, deep analysis\n"
+                    "- max: thorough audit-level analysis, leave no stone unturned"
+                ),
+                "enum": ["minimal", "low", "medium", "high", "max"],
+            },
            "peer": {
                "type": "string",
-                "description": "Which peer to query about: 'user' (default) or 'ai'.",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
            },
        },
        "required": ["query"],
    },
 }

+CONTEXT_SCHEMA = {
+    "name": "honcho_context",
+    "description": (
+        "Retrieve full session context from Honcho — summary, peer representation, "
+        "peer card, and recent messages. No LLM synthesis. "
+        "Cheaper than honcho_reasoning. Use this to see what Honcho knows about "
+        "the current conversation and the specified peer."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "query": {
+                "type": "string",
+                "description": "Optional focus query to filter context. Omit for full session context snapshot.",
+            },
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
+        },
+        "required": [],
+    },
+}
+
 CONCLUDE_SCHEMA = {
    "name": "honcho_conclude",
    "description": (
-        "Write a conclusion about the user back to Honcho's memory. "
-        "Conclusions are persistent facts that build the user's profile. "
-        "Use when the user states a preference, corrects you, or shares "
-        "something to remember across sessions."
+        "Write or delete a conclusion about a peer in Honcho's memory. "
+        "Conclusions are persistent facts that build a peer's profile. "
+        "You MUST pass exactly one of: `conclusion` (to create) or `delete_id` (to delete). "
+        "Passing neither is an error. "
+        "Deletion is only for PII removal — Honcho self-heals incorrect conclusions over time."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "conclusion": {
                "type": "string",
-                "description": "A factual statement about the user to persist.",
-            }
+                "description": "A factual statement to persist. Required when not using delete_id.",
+            },
+            "delete_id": {
+                "type": "string",
+                "description": "Conclusion ID to delete (for PII removal). Required when not using conclusion.",
+            },
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
        },
-        "required": ["conclusion"],
+        "anyOf": [
+            {"required": ["conclusion"]},
+            {"required": ["delete_id"]},
+        ],
    },
 }


-ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
+ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, REASONING_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]


 # ---------------------------------------------------------------------------
@ -131,16 +201,18 @@ class HonchoMemoryProvider(MemoryProvider):
        # B1: recall_mode — set during initialize from config
        self._recall_mode = "hybrid"  # "context", "tools", or "hybrid"

-        # B4: First-turn context baking
-        self._first_turn_context: Optional[str] = None
-        self._first_turn_lock = threading.Lock()
+        # Base context cache — refreshed on context_cadence, not frozen
+        self._base_context_cache: Optional[str] = None
+        self._base_context_lock = threading.Lock()

        # B5: Cost-awareness turn counting and cadence
        self._turn_count = 0
        self._injection_frequency = "every-turn"  # or "first-turn"
        self._context_cadence = 1   # minimum turns between context API calls
-        self._dialectic_cadence = 1  # minimum turns between dialectic API calls
-        self._reasoning_level_cap: Optional[str] = None  # "minimal", "low", "mid", "high"
+        self._dialectic_cadence = 3  # minimum turns between dialectic API calls
+        self._dialectic_depth = 1   # how many .chat() calls per dialectic cycle (1-3)
+        self._dialectic_depth_levels: list[str] | None = None  # per-pass reasoning levels
+        self._reasoning_level_cap: Optional[str] = None  # "minimal", "low", "medium", "high"
        self._last_context_turn = -999
        self._last_dialectic_turn = -999

@ -236,9 +308,11 @@ class HonchoMemoryProvider(MemoryProvider):
                raw = cfg.raw or {}
                self._injection_frequency = raw.get("injectionFrequency", "every-turn")
                self._context_cadence = int(raw.get("contextCadence", 1))
-                self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
+                self._dialectic_cadence = int(raw.get("dialecticCadence", 3))
+                self._dialectic_depth = max(1, min(cfg.dialectic_depth, 3))
+                self._dialectic_depth_levels = cfg.dialectic_depth_levels
                cap = raw.get("reasoningLevelCap")
-                if cap and cap in ("minimal", "low", "mid", "high"):
+                if cap and cap in ("minimal", "low", "medium", "high"):
                    self._reasoning_level_cap = cap
            except Exception as e:
                logger.debug("Honcho cost-awareness config parse error: %s", e)
@ -251,9 +325,7 @@ class HonchoMemoryProvider(MemoryProvider):
            # ----- Port #1957: lazy session init for tools-only mode -----
            if self._recall_mode == "tools":
                if cfg.init_on_session_start:
-                    # Eager init: create session now so sync_turn() works from turn 1.
-                    # Does NOT enable auto-injection — prefetch() still returns empty.
-                    logger.debug("Honcho tools-only mode — eager session init (initOnSessionStart=true)")
+                    # Eager init even in tools mode (opt-in)
                    self._do_session_init(cfg, session_id, **kwargs)
                    return
                # Defer actual session creation until first tool call
@ -287,8 +359,13 @@ class HonchoMemoryProvider(MemoryProvider):

        # ----- B3: resolve_session_name -----
        session_title = kwargs.get("session_title")
+        gateway_session_key = kwargs.get("gateway_session_key")
        self._session_key = (
-            cfg.resolve_session_name(session_title=session_title, session_id=session_id)
+            cfg.resolve_session_name(
+                session_title=session_title,
+                session_id=session_id,
+                gateway_session_key=gateway_session_key,
+            )
            or session_id
            or "hermes-default"
        )
@ -299,12 +376,21 @@ class HonchoMemoryProvider(MemoryProvider):
        self._session_initialized = True

        # ----- B6: Memory file migration (one-time, for new sessions) -----
+        # Skip under per-session strategy: every Hermes run creates a fresh
+        # Honcho session by design, so uploading MEMORY.md/USER.md/SOUL.md to
+        # each one would flood the backend with short-lived duplicates instead
+        # of performing a one-time migration.
        try:
-            if not session.messages:
+            if not session.messages and cfg.session_strategy != "per-session":
                from hermes_constants import get_hermes_home
                mem_dir = str(get_hermes_home() / "memories")
                self._manager.migrate_memory_files(self._session_key, mem_dir)
                logger.debug("Honcho memory file migration attempted for new session: %s", self._session_key)
+            elif cfg.session_strategy == "per-session":
+                logger.debug(
+                    "Honcho memory file migration skipped: per-session strategy creates a fresh session per run (%s)",
+                    self._session_key,
+                )
        except Exception as e:
            logger.debug("Honcho memory file migration skipped: %s", e)

@ -347,6 +433,11 @@ class HonchoMemoryProvider(MemoryProvider):
        """Format the prefetch context dict into a readable system prompt block."""
        parts = []

+        # Session summary — session-scoped context, placed first for relevance
+        summary = ctx.get("summary", "")
+        if summary:
+            parts.append(f"## Session Summary\n{summary}")
+
        rep = ctx.get("representation", "")
        if rep:
            parts.append(f"## User Representation\n{rep}")
@ -370,9 +461,9 @@ class HonchoMemoryProvider(MemoryProvider):
    def system_prompt_block(self) -> str:
        """Return system prompt text, adapted by recall_mode.

-        B4: On the FIRST call, fetch and bake the full Honcho context
-        (user representation, peer card, AI representation, continuity synthesis).
-        Subsequent calls return the cached block for prompt caching stability.
+        Returns only the mode header and tool instructions — static text
+        that doesn't change between turns (prompt-cache friendly).
+        Live context (representation, card) is injected via prefetch().
        """
        if self._cron_skipped:
            return ""
@ -382,24 +473,10 @@ class HonchoMemoryProvider(MemoryProvider):
                return (
                    "# Honcho Memory\n"
                    "Active (tools-only mode). Use honcho_profile, honcho_search, "
-                    "honcho_context, and honcho_conclude tools to access user memory."
+                    "honcho_reasoning, honcho_context, and honcho_conclude tools to access user memory."
                )
            return ""

-        # ----- B4: First-turn context baking -----
-        first_turn_block = ""
-        if self._recall_mode in ("context", "hybrid"):
-            with self._first_turn_lock:
-                if self._first_turn_context is None:
-                    # First call — fetch and cache
-                    try:
-                        ctx = self._manager.get_prefetch_context(self._session_key)
-                        self._first_turn_context = self._format_first_turn_context(ctx) if ctx else ""
-                    except Exception as e:
-                        logger.debug("Honcho first-turn context fetch failed: %s", e)
-                        self._first_turn_context = ""
-                first_turn_block = self._first_turn_context
-
        # ----- B1: adapt text based on recall_mode -----
        if self._recall_mode == "context":
            header = (
@ -412,7 +489,8 @@ class HonchoMemoryProvider(MemoryProvider):
            header = (
                "# Honcho Memory\n"
                "Active (tools-only mode). Use honcho_profile for a quick factual snapshot, "
-                "honcho_search for raw excerpts, honcho_context for synthesized answers, "
+                "honcho_search for raw excerpts, honcho_context for raw peer context, "
+                "honcho_reasoning for synthesized answers, "
                "honcho_conclude to save facts about the user. "
                "No automatic context injection — you must use tools to access memory."
            )
@ -421,16 +499,19 @@ class HonchoMemoryProvider(MemoryProvider):
                "# Honcho Memory\n"
                "Active (hybrid mode). Relevant context is auto-injected AND memory tools are available. "
                "Use honcho_profile for a quick factual snapshot, "
-                "honcho_search for raw excerpts, honcho_context for synthesized answers, "
+                "honcho_search for raw excerpts, honcho_context for raw peer context, "
+                "honcho_reasoning for synthesized answers, "
                "honcho_conclude to save facts about the user."
            )

-        if first_turn_block:
-            return f"{header}\n\n{first_turn_block}"
        return header

    def prefetch(self, query: str, *, session_id: str = "") -> str:
-        """Return prefetched dialectic context from background thread.
+        """Return base context (representation + card) plus dialectic supplement.
+
+        Assembles two layers:
+        1. Base context from peer.context() — cached, refreshed on context_cadence
+        2. Dialectic supplement — cached, refreshed on dialectic_cadence

        B1: Returns empty when recall_mode is "tools" (no injection).
        B5: Respects injection_frequency — "first-turn" returns cached/empty after turn 0.
@ -443,22 +524,95 @@ class HonchoMemoryProvider(MemoryProvider):
        if self._recall_mode == "tools":
            return ""

-        # B5: injection_frequency — if "first-turn" and past first turn, return empty
-        if self._injection_frequency == "first-turn" and self._turn_count > 0:
+        # B5: injection_frequency — if "first-turn" and past first turn, return empty.
+        # _turn_count is 1-indexed (first user message = 1), so > 1 means "past first".
+        if self._injection_frequency == "first-turn" and self._turn_count > 1:
            return ""

+        parts = []
+
+        # ----- Layer 1: Base context (representation + card) -----
+        # On first call, fetch synchronously so turn 1 isn't empty.
+        # After that, serve from cache and refresh in background on cadence.
+        with self._base_context_lock:
+            if self._base_context_cache is None:
+                # First call — synchronous fetch
+                try:
+                    ctx = self._manager.get_prefetch_context(self._session_key)
+                    self._base_context_cache = self._format_first_turn_context(ctx) if ctx else ""
+                    self._last_context_turn = self._turn_count
+                except Exception as e:
+                    logger.debug("Honcho base context fetch failed: %s", e)
+                    self._base_context_cache = ""
+            base_context = self._base_context_cache
+
+        # Check if background context prefetch has a fresher result
+        if self._manager:
+            fresh_ctx = self._manager.pop_context_result(self._session_key)
+            if fresh_ctx:
+                formatted = self._format_first_turn_context(fresh_ctx)
+                if formatted:
+                    with self._base_context_lock:
+                        self._base_context_cache = formatted
+                    base_context = formatted
+
+        if base_context:
+            parts.append(base_context)
+
+        # ----- Layer 2: Dialectic supplement -----
+        # On the very first turn, no queue_prefetch() has run yet so the
+        # dialectic result is empty.  Run with a bounded timeout so a slow
+        # Honcho connection doesn't block the first response indefinitely.
+        # On timeout the result is skipped and queue_prefetch() will pick it
+        # up at the next cadence-allowed turn.
+        if self._last_dialectic_turn == -999 and query:
+            _first_turn_timeout = (
+                self._config.timeout if self._config and self._config.timeout else 8.0
+            )
+            _result_holder: list[str] = []
+
+            def _run_first_turn() -> None:
+                try:
+                    _result_holder.append(self._run_dialectic_depth(query))
+                except Exception as exc:
+                    logger.debug("Honcho first-turn dialectic failed: %s", exc)
+
+            _t = threading.Thread(target=_run_first_turn, daemon=True)
+            _t.start()
+            _t.join(timeout=_first_turn_timeout)
+            if not _t.is_alive():
+                first_turn_dialectic = _result_holder[0] if _result_holder else ""
+                if first_turn_dialectic and first_turn_dialectic.strip():
+                    with self._prefetch_lock:
+                        self._prefetch_result = first_turn_dialectic
+                self._last_dialectic_turn = self._turn_count
+            else:
+                logger.debug(
+                    "Honcho first-turn dialectic timed out (%.1fs) — "
+                    "will inject at next cadence-allowed turn",
+                    _first_turn_timeout,
+                )
+                # Don't update _last_dialectic_turn: queue_prefetch() will
+                # retry at the next cadence-allowed turn via the async path.
+
        if self._prefetch_thread and self._prefetch_thread.is_alive():
            self._prefetch_thread.join(timeout=3.0)
        with self._prefetch_lock:
-            result = self._prefetch_result
+            dialectic_result = self._prefetch_result
            self._prefetch_result = ""
-        if not result:
+
+        if dialectic_result and dialectic_result.strip():
+            parts.append(dialectic_result)
+
+        if not parts:
            return ""

+        result = "\n\n".join(parts)
+
        # ----- Port #3265: token budget enforcement -----
        result = self._truncate_to_budget(result)

-        return f"## Honcho Context\n{result}"
+        return result

    def _truncate_to_budget(self, text: str) -> str:
        """Truncate text to fit within context_tokens budget if set."""
@ -475,9 +629,11 @@ class HonchoMemoryProvider(MemoryProvider):
        return truncated + " …"

    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
-        """Fire a background dialectic query for the upcoming turn.
+        """Fire background prefetch threads for the upcoming turn.

-        B5: Checks cadence before firing background threads.
+        B5: Checks cadence independently for dialectic and context refresh.
+        Context refresh updates the base layer (representation + card).
+        Dialectic fires the LLM reasoning supplement.
        """
        if self._cron_skipped:
            return
@ -488,6 +644,15 @@ class HonchoMemoryProvider(MemoryProvider):
        if self._recall_mode == "tools":
            return

+        # ----- Context refresh (base layer) — independent cadence -----
+        if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
+            self._last_context_turn = self._turn_count
+            try:
+                self._manager.prefetch_context(self._session_key, query)
+            except Exception as e:
+                logger.debug("Honcho context prefetch failed: %s", e)
+
+        # ----- Dialectic prefetch (supplement layer) -----
        # B5: cadence check — skip if too soon since last dialectic call
        if self._dialectic_cadence > 1:
            if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
@ -499,9 +664,7 @@ class HonchoMemoryProvider(MemoryProvider):

        def _run():
            try:
-                result = self._manager.dialectic_query(
-                    self._session_key, query, peer="user"
-                )
+                result = self._run_dialectic_depth(query)
                if result and result.strip():
                    with self._prefetch_lock:
                        self._prefetch_result = result
@ -513,13 +676,140 @@ class HonchoMemoryProvider(MemoryProvider):
        )
        self._prefetch_thread.start()

-        # Also fire context prefetch if cadence allows
-        if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
-            self._last_context_turn = self._turn_count
-            try:
-                self._manager.prefetch_context(self._session_key, query)
-            except Exception as e:
-                logger.debug("Honcho context prefetch failed: %s", e)
+    # ----- Dialectic depth: multi-pass .chat() with cold/warm prompts -----
+
+    # Proportional reasoning levels per depth/pass when dialecticDepthLevels
+    # is not configured. The base level is dialecticReasoningLevel.
+    # Index: (depth, pass) → level relative to base.
+    _PROPORTIONAL_LEVELS: dict[tuple[int, int], str] = {
+        # depth 1: single pass at base level
+        (1, 0): "base",
+        # depth 2: pass 0 lighter, pass 1 at base
+        (2, 0): "minimal",
+        (2, 1): "base",
+        # depth 3: pass 0 lighter, pass 1 at base, pass 2 one above minimal
+        (3, 0): "minimal",
+        (3, 1): "base",
+        (3, 2): "low",
+    }
+
+    _LEVEL_ORDER = ("minimal", "low", "medium", "high", "max")
+
+    def _resolve_pass_level(self, pass_idx: int) -> str:
+        """Resolve reasoning level for a given pass index.
+
+        Uses dialecticDepthLevels if configured, otherwise proportional
+        defaults relative to dialecticReasoningLevel.
+        """
+        if self._dialectic_depth_levels and pass_idx < len(self._dialectic_depth_levels):
+            return self._dialectic_depth_levels[pass_idx]
+
+        base = (self._config.dialectic_reasoning_level if self._config else "low")
+        mapping = self._PROPORTIONAL_LEVELS.get((self._dialectic_depth, pass_idx))
+        if mapping is None or mapping == "base":
+            return base
+        return mapping
+
+    def _build_dialectic_prompt(self, pass_idx: int, prior_results: list[str], is_cold: bool) -> str:
+        """Build the prompt for a given dialectic pass.
+
+        Pass 0: cold start (general user query) or warm (session-scoped).
+        Pass 1: self-audit / targeted synthesis against gaps from pass 0.
+        Pass 2: reconciliation / contradiction check across prior passes.
+        """
+        if pass_idx == 0:
+            if is_cold:
+                return (
+                    "Who is this person? What are their preferences, goals, "
+                    "and working style? Focus on facts that would help an AI "
+                    "assistant be immediately useful."
+                )
+            return (
+                "Given what's been discussed in this session so far, what "
+                "context about this user is most relevant to the current "
+                "conversation? Prioritize active context over biographical facts."
+            )
+        elif pass_idx == 1:
+            prior = prior_results[-1] if prior_results else ""
+            return (
+                f"Given this initial assessment:\n\n{prior}\n\n"
+                "What gaps remain in your understanding that would help "
+                "going forward? Synthesize what you actually know about "
+                "the user's current state and immediate needs, grounded "
+                "in evidence from recent sessions."
+            )
+        else:
+            # pass 2: reconciliation
+            return (
+                f"Prior passes produced:\n\n"
+                f"Pass 1:\n{prior_results[0] if len(prior_results) > 0 else '(empty)'}\n\n"
+                f"Pass 2:\n{prior_results[1] if len(prior_results) > 1 else '(empty)'}\n\n"
+                "Do these assessments cohere? Reconcile any contradictions "
+                "and produce a final, concise synthesis of what matters most "
+                "for the current conversation."
+            )
+
+    @staticmethod
+    def _signal_sufficient(result: str) -> bool:
+        """Check if a dialectic pass returned enough signal to skip further passes.
+
+        Heuristic: a response longer than 100 chars with some structure
+        (section headers, bullets, or an ordered list) is considered sufficient.
+        """
+        if not result or len(result.strip()) < 100:
+            return False
+        # Structured output with sections/bullets is strong signal
+        if "\n" in result and (
+            "##" in result
+            or "•" in result
+            or re.search(r"^[*-] ", result, re.MULTILINE)
+            or re.search(r"^\s*\d+\. ", result, re.MULTILINE)
+        ):
+            return True
+        # Long enough even without structure
+        return len(result.strip()) > 300
+
+    def _run_dialectic_depth(self, query: str) -> str:
+        """Execute up to dialecticDepth .chat() calls with conditional bail-out.
+
+        Cold start (no base context): general user-oriented query.
+        Warm session (base context exists): session-scoped query.
+        Each pass is conditional — bails early if prior pass returned strong signal.
+        Returns the best (usually last) result.
+        """
+        if not self._manager or not self._session_key:
+            return ""
+
+        is_cold = not self._base_context_cache
+        results: list[str] = []
+
+        for i in range(self._dialectic_depth):
+            if i == 0:
+                prompt = self._build_dialectic_prompt(0, results, is_cold)
+            else:
+                # Skip further passes if prior pass delivered strong signal
+                if results and self._signal_sufficient(results[-1]):
+                    logger.debug("Honcho dialectic depth %d: pass %d skipped, prior signal sufficient",
+                                 self._dialectic_depth, i)
+                    break
+                prompt = self._build_dialectic_prompt(i, results, is_cold)
+
+            level = self._resolve_pass_level(i)
+            logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
+                         self._dialectic_depth, i, level, is_cold)
+
+            result = self._manager.dialectic_query(
+                self._session_key, prompt,
+                reasoning_level=level,
+                peer="user",
+            )
+            results.append(result or "")
+
+        # Return the last non-empty result (deepest pass that ran)
+        for r in reversed(results):
+            if r and r.strip():
+                return r
+        return ""

    def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
        """Track turn count for cadence and injection_frequency logic."""
@ -659,7 +949,14 @@ class HonchoMemoryProvider(MemoryProvider):

        try:
            if tool_name == "honcho_profile":
-                card = self._manager.get_peer_card(self._session_key)
+                peer = args.get("peer", "user")
+                card_update = args.get("card")
+                if card_update:
+                    result = self._manager.set_peer_card(self._session_key, card_update, peer=peer)
+                    if result is None:
+                        return tool_error("Failed to update peer card.")
+                    return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
+                card = self._manager.get_peer_card(self._session_key, peer=peer)
                if not card:
                    return json.dumps({"result": "No profile facts available yet."})
                return json.dumps({"result": card})
@ -669,30 +966,64 @@ class HonchoMemoryProvider(MemoryProvider):
                if not query:
                    return tool_error("Missing required parameter: query")
                max_tokens = min(int(args.get("max_tokens", 800)), 2000)
+                peer = args.get("peer", "user")
                result = self._manager.search_context(
-                    self._session_key, query, max_tokens=max_tokens
+                    self._session_key, query, max_tokens=max_tokens, peer=peer
                )
                if not result:
                    return json.dumps({"result": "No relevant context found."})
                return json.dumps({"result": result})

-            elif tool_name == "honcho_context":
+            elif tool_name == "honcho_reasoning":
                query = args.get("query", "")
                if not query:
                    return tool_error("Missing required parameter: query")
                peer = args.get("peer", "user")
+                reasoning_level = args.get("reasoning_level")
                result = self._manager.dialectic_query(
-                    self._session_key, query, peer=peer
+                    self._session_key, query,
+                    reasoning_level=reasoning_level,
+                    peer=peer,
                )
+                # Update cadence tracker so auto-injection respects the gap after an explicit call
+                self._last_dialectic_turn = self._turn_count
                return json.dumps({"result": result or "No result from Honcho."})

+            elif tool_name == "honcho_context":
+                peer = args.get("peer", "user")
+                ctx = self._manager.get_session_context(self._session_key, peer=peer)
+                if not ctx:
+                    return json.dumps({"result": "No context available yet."})
+                parts = []
+                if ctx.get("summary"):
+                    parts.append(f"## Summary\n{ctx['summary']}")
+                if ctx.get("representation"):
+                    parts.append(f"## Representation\n{ctx['representation']}")
+                if ctx.get("card"):
+                    parts.append(f"## Card\n{ctx['card']}")
+                if ctx.get("recent_messages"):
+                    msgs = ctx["recent_messages"]
+                    msg_str = "\n".join(
+                        f"  [{m['role']}] {m['content'][:200]}"
+                        for m in msgs[-5:]  # last 5 for brevity
+                    )
+                    parts.append(f"## Recent messages\n{msg_str}")
+                return json.dumps({"result": "\n\n".join(parts) or "No context available."})
+
            elif tool_name == "honcho_conclude":
+                delete_id = args.get("delete_id")
+                peer = args.get("peer", "user")
+                if delete_id:
+                    ok = self._manager.delete_conclusion(self._session_key, delete_id, peer=peer)
+                    if ok:
+                        return json.dumps({"result": f"Conclusion {delete_id} deleted."})
+                    return tool_error(f"Failed to delete conclusion {delete_id}.")
                conclusion = args.get("conclusion", "")
                if not conclusion:
-                    return tool_error("Missing required parameter: conclusion")
-                ok = self._manager.create_conclusion(self._session_key, conclusion)
+                    return tool_error("Missing required parameter: conclusion or delete_id")
+                ok = self._manager.create_conclusion(self._session_key, conclusion, peer=peer)
                if ok:
-                    return json.dumps({"result": f"Conclusion saved: {conclusion}"})
+                    return json.dumps({"result": f"Conclusion saved for {peer}: {conclusion}"})
                return tool_error("Failed to save conclusion.")

            return tool_error(f"Unknown tool: {tool_name}")
--- a/plugins/memory/honcho/cli.py
+++ b/plugins/memory/honcho/cli.py
@ -440,11 +440,43 @@ def cmd_setup(args) -> None:
    if new_recall in ("hybrid", "context", "tools"):
        hermes_host["recallMode"] = new_recall

-    # --- 7. Session strategy ---
-    current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-directory")
+    # --- 7. Context token budget ---
+    current_ctx_tokens = hermes_host.get("contextTokens") or cfg.get("contextTokens")
+    current_display = str(current_ctx_tokens) if current_ctx_tokens else "uncapped"
+    print("\n  Context injection per turn (hybrid/context recall modes only):")
+    print("    uncapped -- no limit (default)")
+    print("    N        -- token limit per turn (e.g. 1200)")
+    new_ctx_tokens = _prompt("Context tokens", default=current_display)
+    if new_ctx_tokens.strip().lower() in ("none", "uncapped", "no limit"):
+        hermes_host.pop("contextTokens", None)
+    elif new_ctx_tokens.strip() == "":
+        pass  # keep current
+    else:
+        try:
+            val = int(new_ctx_tokens)
+            if val >= 0:
+                hermes_host["contextTokens"] = val
+        except (ValueError, TypeError):
+            pass  # keep current
+
+    # --- 7b. Dialectic cadence ---
+    current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "3")
+    print("\n  Dialectic cadence:")
+    print("    How often Honcho rebuilds its user model (LLM call on Honcho backend).")
+    print("    1 = every turn (aggressive), 3 = every 3 turns (recommended), 5+ = sparse.")
+    new_dialectic = _prompt("Dialectic cadence", default=current_dialectic)
+    try:
+        val = int(new_dialectic)
+        if val >= 1:
+            hermes_host["dialecticCadence"] = val
+    except (ValueError, TypeError):
+        hermes_host["dialecticCadence"] = 3
+
+    # --- 8. Session strategy ---
+    current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-session")
    print("\n  Session strategy:")
-    print("    per-directory -- one session per working directory (default)")
-    print("    per-session   -- new Honcho session each run")
+    print("    per-session   -- each run starts clean, Honcho injects context automatically")
+    print("    per-directory -- reuses session per dir, prior context auto-injected each run")
    print("    per-repo      -- one session per git repository")
    print("    global        -- single session across all directories")
    new_strat = _prompt("Session strategy", default=current_strat)
@ -490,10 +522,11 @@ def cmd_setup(args) -> None:
    print(f"  Recall:    {hcfg.recall_mode}")
    print(f"  Sessions:  {hcfg.session_strategy}")
    print("\n  Honcho tools available in chat:")
-    print("    honcho_context   -- ask Honcho about the user (LLM-synthesized)")
-    print("    honcho_search    -- semantic search over history (no LLM)")
-    print("    honcho_profile   -- peer card, key facts (no LLM)")
-    print("    honcho_conclude  -- persist a user fact to memory (no LLM)")
+    print("    honcho_context   -- session context: summary, representation, card, messages")
+    print("    honcho_search    -- semantic search over history")
+    print("    honcho_profile   -- peer card, key facts")
+    print("    honcho_reasoning -- ask Honcho a question, synthesized answer")
+    print("    honcho_conclude  -- persist a user fact to memory")
    print("\n  Other commands:")
    print("    hermes honcho status     -- show full config")
    print("    hermes honcho mode       -- change recall/observation mode")
@ -585,13 +618,26 @@ def cmd_status(args) -> None:
    print(f"  Enabled:        {hcfg.enabled}")
    print(f"  API key:        {masked}")
    print(f"  Workspace:      {hcfg.workspace_id}")
-    print(f"  Config path:    {active_path}")
+
+    # Config paths — show where config was read from and where writes go
+    global_path = Path.home() / ".honcho" / "config.json"
+    print(f"  Config:         {active_path}")
    if write_path != active_path:
-        print(f"  Write path:     {write_path}  (instance-local)")
+        print(f"  Write to:       {write_path}  (profile-local)")
+    if active_path == global_path:
+        print(f"  Fallback:       (none — using global ~/.honcho/config.json)")
+    elif global_path.exists():
+        print(f"  Fallback:       {global_path}  (exists, cross-app interop)")
+
    print(f"  AI peer:        {hcfg.ai_peer}")
    print(f"  User peer:      {hcfg.peer_name or 'not set'}")
    print(f"  Session key:    {hcfg.resolve_session_name()}")
+    print(f"  Session strat:  {hcfg.session_strategy}")
    print(f"  Recall mode:    {hcfg.recall_mode}")
+    print(f"  Context budget: {hcfg.context_tokens or '(uncapped)'} tokens")
+    raw = getattr(hcfg, "raw", None) or {}
+    dialectic_cadence = raw.get("dialecticCadence") or 3
+    print(f"  Dialectic cad:  every {dialectic_cadence} turn{'s' if dialectic_cadence != 1 else ''}")
    print(f"  Observation:    user(me={hcfg.user_observe_me},others={hcfg.user_observe_others}) ai(me={hcfg.ai_observe_me},others={hcfg.ai_observe_others})")
    print(f"  Write freq:     {hcfg.write_frequency}")

@ -599,8 +645,8 @@ def cmd_status(args) -> None:
        print("\n  Connection... ", end="", flush=True)
        try:
            client = get_honcho_client(hcfg)
-            print("OK")
            _show_peer_cards(hcfg, client)
+            print("OK")
        except Exception as e:
            print(f"FAILED ({e})\n")
    else:
@ -824,6 +870,41 @@ def cmd_mode(args) -> None:
    print(f"  {label}Recall mode -> {mode_arg}  ({MODES[mode_arg]})\n")


+def cmd_strategy(args) -> None:
+    """Show or set the session strategy."""
+    STRATEGIES = {
+        "per-session": "each run starts clean, Honcho injects context automatically",
+        "per-directory": "reuses session per dir, prior context auto-injected each run",
+        "per-repo": "one session per git repository",
+        "global": "single session across all directories",
+    }
+    cfg = _read_config()
+    strat_arg = getattr(args, "strategy", None)
+
+    if strat_arg is None:
+        current = (
+            (cfg.get("hosts") or {}).get(_host_key(), {}).get("sessionStrategy")
+            or cfg.get("sessionStrategy")
+            or "per-session"
+        )
+        print("\nHoncho session strategy\n" + "─" * 40)
+        for s, desc in STRATEGIES.items():
+            marker = " <-" if s == current else ""
+            print(f"  {s:<15}  {desc}{marker}")
+        print(f"\n  Set with: hermes honcho strategy [per-session|per-directory|per-repo|global]\n")
+        return
+
+    if strat_arg not in STRATEGIES:
+        print(f"  Invalid strategy '{strat_arg}'. Options: {', '.join(STRATEGIES)}\n")
+        return
+
+    host = _host_key()
+    label = f"[{host}] " if host != "hermes" else ""
+    cfg.setdefault("hosts", {}).setdefault(host, {})["sessionStrategy"] = strat_arg
+    _write_config(cfg)
+    print(f"  {label}Session strategy -> {strat_arg}  ({STRATEGIES[strat_arg]})\n")
+
+
 def cmd_tokens(args) -> None:
    """Show or set token budget settings."""
    cfg = _read_config()
@ -1143,10 +1224,11 @@ def cmd_migrate(args) -> None:
    print("              automatically. Files become the seed, not the live store.")
    print()
    print("  Honcho tools (available to the agent during conversation)")
-    print("    honcho_context   — ask Honcho a question, get a synthesized answer (LLM)")
-    print("    honcho_search        — semantic search over stored context (no LLM)")
-    print("    honcho_profile       — fast peer card snapshot (no LLM)")
-    print("    honcho_conclude      — write a conclusion/fact back to memory (no LLM)")
+    print("    honcho_context   — session context: summary, representation, card, messages")
+    print("    honcho_search        — semantic search over stored context")
+    print("    honcho_profile       — fast peer card snapshot")
+    print("    honcho_reasoning     — ask Honcho a question, synthesized answer")
+    print("    honcho_conclude      — write a conclusion/fact back to memory")
    print()
    print("  Session naming")
    print("    OpenClaw: no persistent session concept — files are global.")
@ -1197,6 +1279,8 @@ def honcho_command(args) -> None:
        cmd_peer(args)
    elif sub == "mode":
        cmd_mode(args)
+    elif sub == "strategy":
+        cmd_strategy(args)
    elif sub == "tokens":
        cmd_tokens(args)
    elif sub == "identity":
@ -1211,7 +1295,7 @@ def honcho_command(args) -> None:
        cmd_sync(args)
    else:
        print(f"  Unknown honcho command: {sub}")
-        print("  Available: status, sessions, map, peer, mode, tokens, identity, migrate, enable, disable, sync\n")
+        print("  Available: status, sessions, map, peer, mode, strategy, tokens, identity, migrate, enable, disable, sync\n")


 def register_cli(subparser) -> None:
@ -1270,6 +1354,15 @@ def register_cli(subparser) -> None:
        help="Recall mode to set (hybrid/context/tools). Omit to show current.",
    )

+    strategy_parser = subs.add_parser(
+        "strategy", help="Show or set session strategy (per-session/per-directory/per-repo/global)",
+    )
+    strategy_parser.add_argument(
+        "strategy", nargs="?", metavar="STRATEGY",
+        choices=("per-session", "per-directory", "per-repo", "global"),
+        help="Session strategy to set. Omit to show current.",
+    )
+
    tokens_parser = subs.add_parser(
        "tokens", help="Show or set token budget for context and dialectic",
    )
--- a/plugins/memory/honcho/client.py
+++ b/plugins/memory/honcho/client.py
@ -58,7 +58,8 @@ def resolve_config_path() -> Path:

    Resolution order:
      1. $HERMES_HOME/honcho.json      (profile-local, if it exists)
-      2. ~/.honcho/config.json          (global, cross-app interop)
+      2. ~/.hermes/honcho.json          (default profile — shared host blocks live here)
+      3. ~/.honcho/config.json          (global, cross-app interop)

    Returns the global path if none exist (for first-time setup writes).
    """
@ -66,6 +67,11 @@ def resolve_config_path() -> Path:
    if local_path.exists():
        return local_path

+    # Default profile's config — host blocks accumulate here via setup/clone
+    default_path = Path.home() / ".hermes" / "honcho.json"
+    if default_path != local_path and default_path.exists():
+        return default_path
+
    return GLOBAL_CONFIG_PATH


@ -88,6 +94,68 @@ def _resolve_bool(host_val, root_val, *, default: bool) -> bool:
    return default


+def _parse_context_tokens(host_val, root_val) -> int | None:
+    """Parse contextTokens: host wins, then root, then None (uncapped)."""
+    for val in (host_val, root_val):
+        if val is not None:
+            try:
+                return int(val)
+            except (ValueError, TypeError):
+                pass
+    return None
+
+
+def _parse_dialectic_depth(host_val, root_val) -> int:
+    """Parse dialecticDepth: host wins, then root, then 1. Clamped to 1-3."""
+    for val in (host_val, root_val):
+        if val is not None:
+            try:
+                return max(1, min(int(val), 3))
+            except (ValueError, TypeError):
+                pass
+    return 1
+
+
+_VALID_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
+
+
+def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] | None:
+    """Parse dialecticDepthLevels: optional array of reasoning levels per pass.
+
+    Returns None when not configured (use proportional defaults).
+    When configured, validates each level and truncates/pads to match depth.
+    """
+    for val in (host_val, root_val):
+        if val is not None and isinstance(val, list):
+            levels = [
+                lvl if lvl in _VALID_REASONING_LEVELS else "low"
+                for lvl in val[:depth]
+            ]
+            # Pad with "low" if array is shorter than depth
+            while len(levels) < depth:
+                levels.append("low")
+            return levels
+    return None
+
+
+def _resolve_optional_float(*values: Any) -> float | None:
+    """Return the first non-empty value coerced to a positive float."""
+    for value in values:
+        if value is None:
+            continue
+        if isinstance(value, str):
+            value = value.strip()
+            if not value:
+                continue
+        try:
+            parsed = float(value)
+        except (TypeError, ValueError):
+            continue
+        if parsed > 0:
+            return parsed
+    return None
+
+
 _VALID_OBSERVATION_MODES = {"unified", "directional"}
 _OBSERVATION_MODE_ALIASES = {"shared": "unified", "separate": "directional", "cross": "directional"}

@ -153,6 +221,8 @@ class HonchoClientConfig:
    environment: str = "production"
    # Optional base URL for self-hosted Honcho (overrides environment mapping)
    base_url: str | None = None
+    # Optional request timeout in seconds for Honcho SDK HTTP calls
+    timeout: float | None = None
    # Identity
    peer_name: str | None = None
    ai_peer: str = "hermes"
@ -162,17 +232,25 @@ class HonchoClientConfig:
    # Write frequency: "async" (background thread), "turn" (sync per turn),
    # "session" (flush on session end), or int (every N turns)
    write_frequency: str | int = "async"
-    # Prefetch budget
+    # Prefetch budget (None = no cap; set to an integer to bound auto-injected context)
    context_tokens: int | None = None
    # Dialectic (peer.chat) settings
    # reasoning_level: "minimal" | "low" | "medium" | "high" | "max"
    dialectic_reasoning_level: str = "low"
-    # dynamic: auto-bump reasoning level based on query length
-    #   true  — low->medium (120+ chars), low->high (400+ chars), capped at "high"
-    #   false — always use dialecticReasoningLevel as-is
+    # When true, the model can override reasoning_level per-call via the
+    # honcho_reasoning tool param (agentic). When false, always uses
+    # dialecticReasoningLevel and ignores model-provided overrides.
    dialectic_dynamic: bool = True
    # Max chars of dialectic result to inject into Hermes system prompt
    dialectic_max_chars: int = 600
+    # Dialectic depth: how many .chat() calls per dialectic cycle (1-3).
+    # Depth 1: single call. Depth 2: self-audit + targeted synthesis.
+    # Depth 3: self-audit + synthesis + reconciliation.
+    dialectic_depth: int = 1
+    # Optional per-pass reasoning level override. Array of reasoning levels
+    # matching dialectic_depth length. When None, uses proportional defaults
+    # derived from dialectic_reasoning_level.
+    dialectic_depth_levels: list[str] | None = None
    # Honcho API limits — configurable for self-hosted instances
    # Max chars per message sent via add_messages() (Honcho cloud: 25000)
    message_max_chars: int = 25000
@ -183,10 +261,8 @@ class HonchoClientConfig:
    # "context" — auto-injected context only, Honcho tools removed
    # "tools"   — Honcho tools only, no auto-injected context
    recall_mode: str = "hybrid"
-    # When True and recallMode is "tools", create the Honcho session eagerly
-    # during initialize() instead of deferring to the first tool call.
-    # This ensures sync_turn() can write from the very first turn.
-    # Does NOT enable automatic context injection — only changes init timing.
+    # Eager init in tools mode — when true, initializes session during
+    # initialize() instead of deferring to first tool call
    init_on_session_start: bool = False
    # Observation mode: legacy string shorthand ("directional" or "unified").
    # Kept for backward compat; granular per-peer booleans below are preferred.
@ -218,12 +294,14 @@ class HonchoClientConfig:
        resolved_host = host or resolve_active_host()
        api_key = os.environ.get("HONCHO_API_KEY")
        base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
+        timeout = _resolve_optional_float(os.environ.get("HONCHO_TIMEOUT"))
        return cls(
            host=resolved_host,
            workspace_id=workspace_id,
            api_key=api_key,
            environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
            base_url=base_url,
+            timeout=timeout,
            ai_peer=resolved_host,
            enabled=bool(api_key or base_url),
        )
@ -284,6 +362,11 @@ class HonchoClientConfig:
            or os.environ.get("HONCHO_BASE_URL", "").strip()
            or None
        )
+        timeout = _resolve_optional_float(
+            raw.get("timeout"),
+            raw.get("requestTimeout"),
+            os.environ.get("HONCHO_TIMEOUT"),
+        )

        # Auto-enable when API key or base_url is present (unless explicitly disabled)
        # Host-level enabled wins, then root-level, then auto-enable if key/url exists.
@ -329,12 +412,16 @@ class HonchoClientConfig:
            api_key=api_key,
            environment=environment,
            base_url=base_url,
+            timeout=timeout,
            peer_name=host_block.get("peerName") or raw.get("peerName"),
            ai_peer=ai_peer,
            enabled=enabled,
            save_messages=save_messages,
            write_frequency=write_frequency,
-            context_tokens=host_block.get("contextTokens") or raw.get("contextTokens"),
+            context_tokens=_parse_context_tokens(
+                host_block.get("contextTokens"),
+                raw.get("contextTokens"),
+            ),
            dialectic_reasoning_level=(
                host_block.get("dialecticReasoningLevel")
                or raw.get("dialecticReasoningLevel")
@ -350,6 +437,15 @@ class HonchoClientConfig:
                or raw.get("dialecticMaxChars")
                or 600
            ),
+            dialectic_depth=_parse_dialectic_depth(
+                host_block.get("dialecticDepth"),
+                raw.get("dialecticDepth"),
+            ),
+            dialectic_depth_levels=_parse_dialectic_depth_levels(
+                host_block.get("dialecticDepthLevels"),
+                raw.get("dialecticDepthLevels"),
+                depth=_parse_dialectic_depth(host_block.get("dialecticDepth"), raw.get("dialecticDepth")),
+            ),
            message_max_chars=int(
                host_block.get("messageMaxChars")
                or raw.get("messageMaxChars")
@ -416,16 +512,18 @@ class HonchoClientConfig:
        cwd: str | None = None,
        session_title: str | None = None,
        session_id: str | None = None,
+        gateway_session_key: str | None = None,
    ) -> str | None:
        """Resolve Honcho session name.

        Resolution order:
          1. Manual directory override from sessions map
          2. Hermes session title (from /title command)
-          3. per-session strategy — Hermes session_id ({timestamp}_{hex})
-          4. per-repo strategy — git repo root directory name
-          5. per-directory strategy — directory basename
-          6. global strategy — workspace name
+          3. Gateway session key (stable per-chat identifier from gateway platforms)
+          4. per-session strategy — Hermes session_id ({timestamp}_{hex})
+          5. per-repo strategy — git repo root directory name
+          6. per-directory strategy — directory basename
+          7. global strategy — workspace name
        """
        import re

@ -439,12 +537,22 @@ class HonchoClientConfig:

        # /title mid-session remap
        if session_title:
-            sanitized = re.sub(r'[^a-zA-Z0-9_-]', '-', session_title).strip('-')
+            sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', session_title).strip('-')
            if sanitized:
                if self.session_peer_prefix and self.peer_name:
                    return f"{self.peer_name}-{sanitized}"
                return sanitized

+        # Gateway session key: stable per-chat identifier passed by the gateway
+        # (e.g. "agent:main:telegram:dm:8439114563"). Sanitize colons to hyphens
+        # for Honcho session ID compatibility. This takes priority over strategy-
+        # based resolution because gateway platforms need per-chat isolation that
+        # cwd-based strategies cannot provide.
+        if gateway_session_key:
+            sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
+            if sanitized:
+                return sanitized
+
        # per-session: inherit Hermes session_id (new Honcho session each run)
        if self.session_strategy == "per-session" and session_id:
            if self.session_peer_prefix and self.peer_name:
@ -506,13 +614,20 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
    # mapping, enabling remote self-hosted Honcho deployments without
    # requiring the server to live on localhost.
    resolved_base_url = config.base_url
-    if not resolved_base_url:
+    resolved_timeout = config.timeout
+    if not resolved_base_url or resolved_timeout is None:
        try:
            from hermes_cli.config import load_config
            hermes_cfg = load_config()
            honcho_cfg = hermes_cfg.get("honcho", {})
            if isinstance(honcho_cfg, dict):
-                resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
+                if not resolved_base_url:
+                    resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
+                if resolved_timeout is None:
+                    resolved_timeout = _resolve_optional_float(
+                        honcho_cfg.get("timeout"),
+                        honcho_cfg.get("request_timeout"),
+                    )
        except Exception:
            pass

@ -547,6 +662,8 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
    }
    if resolved_base_url:
        kwargs["base_url"] = resolved_base_url
+    if resolved_timeout is not None:
+        kwargs["timeout"] = resolved_timeout

    _honcho_client = Honcho(**kwargs)

--- a/plugins/memory/honcho/session.py
+++ b/plugins/memory/honcho/session.py
@ -486,36 +486,9 @@ class HonchoSessionManager:

    _REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")

-    def _dynamic_reasoning_level(self, query: str) -> str:
-        """
-        Pick a reasoning level for a dialectic query.
-
-        When dialecticDynamic is true (default), auto-bumps based on query
-        length so Honcho applies more inference where it matters:
-
-          < 120 chars  -> configured default (typically "low")
-          120-400 chars -> +1 level above default (cap at "high")
-          > 400 chars  -> +2 levels above default (cap at "high")
-
-        "max" is never selected automatically -- reserve it for explicit config.
-
-        When dialecticDynamic is false, always returns the configured level.
-        """
-        if not self._dialectic_dynamic:
-            return self._dialectic_reasoning_level
-
-        levels = self._REASONING_LEVELS
-        default_idx = levels.index(self._dialectic_reasoning_level) if self._dialectic_reasoning_level in levels else 1
-        n = len(query)
-        if n < 120:
-            bump = 0
-        elif n < 400:
-            bump = 1
-        else:
-            bump = 2
-        # Cap at "high" (index 3) for auto-selection
-        idx = min(default_idx + bump, 3)
-        return levels[idx]
+    def _default_reasoning_level(self) -> str:
+        """Return the configured default reasoning level."""
+        return self._dialectic_reasoning_level

    def dialectic_query(
        self, session_key: str, query: str,
@ -532,8 +505,9 @@ class HonchoSessionManager:
        Args:
            session_key: The session key to query against.
            query: Natural language question.
-            reasoning_level: Override the config default. If None, uses
-                             _dynamic_reasoning_level(query).
+            reasoning_level: Override the configured default (dialecticReasoningLevel).
+                             Only honored when dialecticDynamic is true.
+                             If None or dialecticDynamic is false, uses the configured default.
            peer: Which peer to query — "user" (default) or "ai".

        Returns:
@ -543,29 +517,34 @@ class HonchoSessionManager:
        if not session:
            return ""

+        target_peer_id = self._resolve_peer_id(session, peer)
+        if target_peer_id is None:
+            return ""
+
        # Guard: truncate query to Honcho's dialectic input limit
        if len(query) > self._dialectic_max_input_chars:
            query = query[:self._dialectic_max_input_chars].rsplit(" ", 1)[0]

-        level = reasoning_level or self._dynamic_reasoning_level(query)
+        if self._dialectic_dynamic and reasoning_level:
+            level = reasoning_level
+        else:
+            level = self._default_reasoning_level()

        try:
            if self._ai_observe_others:
-                # AI peer can observe user — use cross-observation routing
-                if peer == "ai":
-                    ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
+                # AI peer can observe other peers — use assistant as observer.
+                ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
+                if target_peer_id == session.assistant_peer_id:
                    result = ai_peer_obj.chat(query, reasoning_level=level) or ""
                else:
-                    ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
                    result = ai_peer_obj.chat(
                        query,
-                        target=session.user_peer_id,
+                        target=target_peer_id,
                        reasoning_level=level,
                    ) or ""
            else:
-                # AI can't observe others — each peer queries self
-                peer_id = session.assistant_peer_id if peer == "ai" else session.user_peer_id
-                target_peer = self._get_or_create_peer(peer_id)
+                # Without cross-observation, each peer queries its own context.
+                target_peer = self._get_or_create_peer(target_peer_id)
                result = target_peer.chat(query, reasoning_level=level) or ""

            # Apply Hermes-side char cap before caching
@ -647,10 +626,11 @@ class HonchoSessionManager:
        """
        Pre-fetch user and AI peer context from Honcho.

-        Fetches peer_representation and peer_card for both peers. search_query
-        is intentionally omitted — it would only affect additional excerpts
-        that this code does not consume, and passing the raw message exposes
-        conversation content in server access logs.
+        Fetches peer_representation and peer_card for both peers, plus the
+        session summary when available. search_query is intentionally omitted
+        — it would only affect additional excerpts that this code does not
+        consume, and passing the raw message exposes conversation content in
+        server access logs.

        Args:
            session_key: The session key to get context for.
@ -658,15 +638,29 @@ class HonchoSessionManager:

        Returns:
            Dictionary with 'representation', 'card', 'ai_representation',
-            and 'ai_card' keys.
+            'ai_card', and optionally 'summary' keys.
        """
        session = self._cache.get(session_key)
        if not session:
            return {}

        result: dict[str, str] = {}
+
+        # Session summary — provides session-scoped context.
+        # Fresh sessions (per-session cold start, or first-ever per-directory)
+        # return null summary — the guard below handles that gracefully.
+        # Per-directory returning sessions get their accumulated summary.
        try:
-            user_ctx = self._fetch_peer_context(session.user_peer_id)
+            honcho_session = self._sessions_cache.get(session.honcho_session_id)
+            if honcho_session:
+                ctx = honcho_session.context(summary=True)
+                if ctx.summary and getattr(ctx.summary, "content", None):
+                    result["summary"] = ctx.summary.content
+        except Exception as e:
+            logger.debug("Failed to fetch session summary from Honcho: %s", e)
+
+        try:
+            user_ctx = self._fetch_peer_context(session.user_peer_id, target=session.user_peer_id)
            result["representation"] = user_ctx["representation"]
            result["card"] = "\n".join(user_ctx["card"])
        except Exception as e:
@ -674,7 +668,7 @@ class HonchoSessionManager:

        # Also fetch AI peer's own representation so Hermes knows itself.
        try:
-            ai_ctx = self._fetch_peer_context(session.assistant_peer_id)
+            ai_ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
            result["ai_representation"] = ai_ctx["representation"]
            result["ai_card"] = "\n".join(ai_ctx["card"])
        except Exception as e:
@ -862,7 +856,7 @@ class HonchoSessionManager:
            return [str(item) for item in card if item]
        return [str(card)]

-    def _fetch_peer_card(self, peer_id: str) -> list[str]:
+    def _fetch_peer_card(self, peer_id: str, *, target: str | None = None) -> list[str]:
        """Fetch a peer card directly from the peer object.

        This avoids relying on session.context(), which can return an empty
@ -872,22 +866,33 @@ class HonchoSessionManager:
        peer = self._get_or_create_peer(peer_id)
        getter = getattr(peer, "get_card", None)
        if callable(getter):
-            return self._normalize_card(getter())
+            return self._normalize_card(getter(target=target) if target is not None else getter())

        legacy_getter = getattr(peer, "card", None)
        if callable(legacy_getter):
-            return self._normalize_card(legacy_getter())
+            return self._normalize_card(legacy_getter(target=target) if target is not None else legacy_getter())

        return []

-    def _fetch_peer_context(self, peer_id: str, search_query: str | None = None) -> dict[str, Any]:
+    def _fetch_peer_context(
+        self,
+        peer_id: str,
+        search_query: str | None = None,
+        *,
+        target: str | None = None,
+    ) -> dict[str, Any]:
        """Fetch representation + peer card directly from a peer object."""
        peer = self._get_or_create_peer(peer_id)
        representation = ""
        card: list[str] = []

        try:
-            ctx = peer.context(search_query=search_query) if search_query else peer.context()
+            context_kwargs: dict[str, Any] = {}
+            if target is not None:
+                context_kwargs["target"] = target
+            if search_query is not None:
+                context_kwargs["search_query"] = search_query
+            ctx = peer.context(**context_kwargs) if context_kwargs else peer.context()
            representation = (
                getattr(ctx, "representation", None)
                or getattr(ctx, "peer_representation", None)
@ -899,24 +904,111 @@ class HonchoSessionManager:

        if not representation:
            try:
-                representation = peer.representation() or ""
+                representation = (
+                    peer.representation(target=target) if target is not None else peer.representation()
+                ) or ""
            except Exception as e:
                logger.debug("Direct peer.representation() failed for '%s': %s", peer_id, e)

        if not card:
            try:
-                card = self._fetch_peer_card(peer_id)
+                card = self._fetch_peer_card(peer_id, target=target)
            except Exception as e:
                logger.debug("Direct peer card fetch failed for '%s': %s", peer_id, e)

        return {"representation": representation, "card": card}

-    def get_peer_card(self, session_key: str) -> list[str]:
+    def get_session_context(self, session_key: str, peer: str = "user") -> dict[str, Any]:
+        """Fetch full session context from Honcho including summary.
+
+        Uses the session-level context() API which returns summary,
+        peer_representation, peer_card, and messages.
        """
-        Fetch the user peer's card — a curated list of key facts.
+        session = self._cache.get(session_key)
+        if not session:
+            return {}
+
+        honcho_session = self._sessions_cache.get(session.honcho_session_id)
+        if not honcho_session:
+            # Fall back to peer-level context, respecting the requested peer
+            peer_id = self._resolve_peer_id(session, peer)
+            if peer_id is None:
+                peer_id = session.user_peer_id
+            return self._fetch_peer_context(peer_id, target=peer_id)
+
+        try:
+            peer_id = self._resolve_peer_id(session, peer)
+            ctx = honcho_session.context(
+                summary=True,
+                peer_target=peer_id,
+                peer_perspective=session.user_peer_id if peer == "user" else session.assistant_peer_id,
+            )
+
+            result: dict[str, Any] = {}
+
+            # Summary
+            if ctx.summary:
+                result["summary"] = ctx.summary.content
+
+            # Peer representation and card
+            if ctx.peer_representation:
+                result["representation"] = ctx.peer_representation
+            if ctx.peer_card:
+                result["card"] = "\n".join(ctx.peer_card)
+
+            # Messages (last N for context)
+            if ctx.messages:
+                recent = ctx.messages[-10:]  # last 10 messages
+                result["recent_messages"] = [
+                    {"role": getattr(m, "peer_id", "unknown"), "content": (m.content or "")[:500]}
+                    for m in recent
+                ]
+
+            return result
+        except Exception as e:
+            logger.debug("Session context fetch failed: %s", e)
+            return {}
+
+    def _resolve_peer_id(self, session: HonchoSession, peer: str | None) -> str:
+        """Resolve a peer alias or explicit peer ID to a concrete Honcho peer ID.
+
+        Always returns a non-empty string: either a known peer ID or a
+        sanitized version of the caller-supplied alias/ID.
+        """
+        candidate = (peer or "user").strip()
+        if not candidate:
+            return session.user_peer_id
+
+        normalized = self._sanitize_id(candidate)
+        if normalized == self._sanitize_id("user"):
+            return session.user_peer_id
+        if normalized == self._sanitize_id("ai"):
+            return session.assistant_peer_id
+
+        return normalized
+
+    def _resolve_observer_target(
+        self,
+        session: HonchoSession,
+        peer: str | None,
+    ) -> tuple[str, str | None]:
+        """Resolve observer and target peer IDs for context/search/profile queries."""
+        target_peer_id = self._resolve_peer_id(session, peer)
+
+        if target_peer_id == session.assistant_peer_id:
+            return session.assistant_peer_id, session.assistant_peer_id
+
+        if self._ai_observe_others:
+            return session.assistant_peer_id, target_peer_id
+
+        return target_peer_id, None
+
+    def get_peer_card(self, session_key: str, peer: str = "user") -> list[str]:
+        """
+        Fetch a peer card — a curated list of key facts.

        Fast, no LLM reasoning. Returns raw structured facts Honcho has
-        inferred about the user (name, role, preferences, patterns).
+        inferred about the target peer (name, role, preferences, patterns).
        Empty list if unavailable.
        """
        session = self._cache.get(session_key)
@ -924,12 +1016,19 @@ class HonchoSessionManager:
            return []

        try:
-            return self._fetch_peer_card(session.user_peer_id)
+            observer_peer_id, target_peer_id = self._resolve_observer_target(session, peer)
+            return self._fetch_peer_card(observer_peer_id, target=target_peer_id)
        except Exception as e:
            logger.debug("Failed to fetch peer card from Honcho: %s", e)
            return []

-    def search_context(self, session_key: str, query: str, max_tokens: int = 800) -> str:
+    def search_context(
+        self,
+        session_key: str,
+        query: str,
+        max_tokens: int = 800,
+        peer: str = "user",
+    ) -> str:
        """
        Semantic search over Honcho session context.

@ -941,6 +1040,7 @@ class HonchoSessionManager:
            session_key: Session to search against.
            query: Search query for semantic matching.
            max_tokens: Token budget for returned content.
+            peer: Peer alias or explicit peer ID to search about.

        Returns:
            Relevant context excerpts as a string, or empty string if none.
@ -950,7 +1050,13 @@ class HonchoSessionManager:
            return ""

        try:
-            ctx = self._fetch_peer_context(session.user_peer_id, search_query=query)
+            observer_peer_id, target = self._resolve_observer_target(session, peer)
+
+            ctx = self._fetch_peer_context(
+                observer_peer_id,
+                search_query=query,
+                target=target,
+            )
            parts = []
            if ctx["representation"]:
                parts.append(ctx["representation"])
@ -962,16 +1068,17 @@ class HonchoSessionManager:
            logger.debug("Honcho search_context failed: %s", e)
            return ""

-    def create_conclusion(self, session_key: str, content: str) -> bool:
-        """Write a conclusion about the user back to Honcho.
+    def create_conclusion(self, session_key: str, content: str, peer: str = "user") -> bool:
+        """Write a conclusion about a target peer back to Honcho.

-        Conclusions are facts the AI peer observes about the user —
-        preferences, corrections, clarifications, project context.
-        They feed into the user's peer card and representation.
+        Conclusions are facts a peer observes about another peer or itself —
+        preferences, corrections, clarifications, and project context.
+        They feed into the target peer's card and representation.

        Args:
            session_key: Session to associate the conclusion with.
-            content: The conclusion text (e.g. "User prefers dark mode").
+            content: The conclusion text.
+            peer: Peer alias or explicit peer ID. "user" is the default alias.

        Returns:
            True on success, False on failure.
@ -985,25 +1092,90 @@ class HonchoSessionManager:
            return False

        try:
-            if self._ai_observe_others:
-                # AI peer creates conclusion about user (cross-observation)
+            target_peer_id = self._resolve_peer_id(session, peer)
+            if target_peer_id is None:
+                logger.warning("Could not resolve conclusion peer '%s' for session '%s'", peer, session_key)
+                return False
+
+            if target_peer_id == session.assistant_peer_id:
                assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
-                conclusions_scope = assistant_peer.conclusions_of(session.user_peer_id)
+                conclusions_scope = assistant_peer.conclusions_of(session.assistant_peer_id)
+            elif self._ai_observe_others:
+                assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
+                conclusions_scope = assistant_peer.conclusions_of(target_peer_id)
            else:
-                # AI can't observe others — user peer creates self-conclusion
-                user_peer = self._get_or_create_peer(session.user_peer_id)
-                conclusions_scope = user_peer.conclusions_of(session.user_peer_id)
+                target_peer = self._get_or_create_peer(target_peer_id)
+                conclusions_scope = target_peer.conclusions_of(target_peer_id)

            conclusions_scope.create([{
                "content": content.strip(),
                "session_id": session.honcho_session_id,
            }])
-            logger.info("Created conclusion for %s: %s", session_key, content[:80])
+            logger.info("Created conclusion about %s for %s: %s", target_peer_id, session_key, content[:80])
            return True
        except Exception as e:
            logger.error("Failed to create conclusion: %s", e)
            return False

+    def delete_conclusion(self, session_key: str, conclusion_id: str, peer: str = "user") -> bool:
+        """Delete a conclusion by ID. Use only for PII removal.
+
+        Args:
+            session_key: Session key for peer resolution.
+            conclusion_id: The conclusion ID to delete.
+            peer: Peer alias or explicit peer ID.
+
+        Returns:
+            True on success, False on failure.
+        """
+        session = self._cache.get(session_key)
+        if not session:
+            return False
+        try:
+            target_peer_id = self._resolve_peer_id(session, peer)
+            if target_peer_id == session.assistant_peer_id:
+                observer = self._get_or_create_peer(session.assistant_peer_id)
+                scope = observer.conclusions_of(session.assistant_peer_id)
+            elif self._ai_observe_others:
+                observer = self._get_or_create_peer(session.assistant_peer_id)
+                scope = observer.conclusions_of(target_peer_id)
+            else:
+                target_peer = self._get_or_create_peer(target_peer_id)
+                scope = target_peer.conclusions_of(target_peer_id)
+            scope.delete(conclusion_id)
+            logger.info("Deleted conclusion %s for %s", conclusion_id, session_key)
+            return True
+        except Exception as e:
+            logger.error("Failed to delete conclusion %s: %s", conclusion_id, e)
+            return False
+
+    def set_peer_card(self, session_key: str, card: list[str], peer: str = "user") -> list[str] | None:
+        """Update a peer's card.
+
+        Args:
+            session_key: Session key for peer resolution.
+            card: New peer card as list of fact strings.
+            peer: Peer alias or explicit peer ID.
+
+        Returns:
+            Updated card on success, None on failure.
+        """
+        session = self._cache.get(session_key)
+        if not session:
+            return None
+        try:
+            peer_id = self._resolve_peer_id(session, peer)
+            if peer_id is None:
+                logger.warning("Could not resolve peer '%s' for set_peer_card in session '%s'", peer, session_key)
+                return None
+            peer_obj = self._get_or_create_peer(peer_id)
+            result = peer_obj.set_card(card)
+            logger.info("Updated peer card for %s (%d facts)", peer_id, len(card))
+            return result
+        except Exception as e:
+            logger.error("Failed to set peer card: %s", e)
+            return None
+
    def seed_ai_identity(self, session_key: str, content: str, source: str = "manual") -> bool:
        """
        Seed the AI peer's Honcho representation from text content.
@ -1061,7 +1233,7 @@ class HonchoSessionManager:
            return {"representation": "", "card": ""}

        try:
-            ctx = self._fetch_peer_context(session.assistant_peer_id)
+            ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
            return {
                "representation": ctx["representation"] or "",
                "card": "\n".join(ctx["card"]),
--- a/run_agent.py
+++ b/run_agent.py
@ -75,7 +75,7 @@ from tools.browser_tool import cleanup_browser
 from hermes_constants import OPENROUTER_BASE_URL

 # Agent internals extracted to agent/ package for modularity
-from agent.memory_manager import build_memory_context_block
+from agent.memory_manager import build_memory_context_block, sanitize_context
 from agent.retry_utils import jittered_backoff
 from agent.error_classifier import classify_api_error, FailoverReason
 from agent.prompt_builder import (
@ -602,6 +602,7 @@ class AIAgent:
        prefill_messages: List[Dict[str, Any]] = None,
        platform: str = None,
        user_id: str = None,
+        gateway_session_key: str = None,
        skip_context_files: bool = False,
        skip_memory: bool = False,
        session_db=None,
@ -667,6 +668,7 @@ class AIAgent:
        self.ephemeral_system_prompt = ephemeral_system_prompt
        self.platform = platform  # "cli", "telegram", "discord", "whatsapp", etc.
        self._user_id = user_id  # Platform user identifier (gateway sessions)
+        self._gateway_session_key = gateway_session_key  # Stable per-chat key (e.g. agent:main:telegram:dm:123)
        # Pluggable print function — CLI replaces this with _cprint so that
        # raw ANSI status lines are routed through prompt_toolkit's renderer
        # instead of going directly to stdout where patch_stdout's StdoutProxy
@ -1292,6 +1294,9 @@ class AIAgent:
                        # Thread gateway user identity for per-user memory scoping
                        if self._user_id:
                            _init_kwargs["user_id"] = self._user_id
+                        # Thread gateway session key for stable per-chat Honcho session isolation
+                        if self._gateway_session_key:
+                            _init_kwargs["gateway_session_key"] = self._gateway_session_key
                        # Profile identity for per-profile provider scoping
                        try:
                            from hermes_cli.profiles import get_active_profile_name
@ -8149,6 +8154,16 @@ class AIAgent:
        if isinstance(persist_user_message, str):
            persist_user_message = _sanitize_surrogates(persist_user_message)

+        # Strip leaked <memory-context> blocks from user input.  When Honcho's
+        # saveMessages persists a turn that included injected context, the block
+        # can reappear in the next turn's user message via message history.
+        # Stripping here prevents stale memory tags from leaking into the
+        # conversation and being visible to the user or the model as user text.
+        if isinstance(user_message, str):
+            user_message = sanitize_context(user_message)
+        if isinstance(persist_user_message, str):
+            persist_user_message = sanitize_context(persist_user_message)
+
        # Store stream callback for _interruptible_api_call to pick up
        self._stream_callback = stream_callback
        self._persist_user_message_idx = None
@ -8428,6 +8443,16 @@ class AIAgent:
            self._interrupt_message = None
            self._interrupt_thread_signal_pending = False

+        # Notify memory providers of the new turn so cadence tracking works.
+        # Must happen BEFORE prefetch_all() so providers know which turn it is
+        # and can gate context/dialectic refresh via contextCadence/dialecticCadence.
+        if self._memory_manager:
+            try:
+                _turn_msg = original_user_message if isinstance(original_user_message, str) else ""
+                self._memory_manager.on_turn_start(self._user_turn_count, _turn_msg)
+            except Exception:
+                pass
+
        # External memory provider: prefetch once before the tool loop.
        # Reuse the cached result on every iteration to avoid re-calling
        # prefetch_all() on each tool call (10 tool calls = 10x latency + cost).
--- a/tests/agent/test_memory_provider.py
+++ b/tests/agent/test_memory_provider.py
@ -939,3 +939,74 @@ class TestOnMemoryWriteBridge:
        mgr.on_memory_write("add", "user", "test")
        # Good provider still received the call despite bad provider crashing
        assert good.memory_writes == [("add", "user", "test")]
+
+
+class TestHonchoCadenceTracking:
+    """Verify Honcho provider cadence gating depends on on_turn_start().
+
+    Bug: _turn_count was never updated because on_turn_start() was not called
+    from run_conversation(). This meant cadence checks always passed (every
+    turn fired both context refresh and dialectic). Fixed by calling
+    on_turn_start(self._user_turn_count, msg) before prefetch_all().
+    """
+
+    def test_turn_count_updates_on_turn_start(self):
+        """on_turn_start sets _turn_count, enabling cadence math."""
+        from plugins.memory.honcho import HonchoMemoryProvider
+        p = HonchoMemoryProvider()
+        assert p._turn_count == 0
+        p.on_turn_start(1, "hello")
+        assert p._turn_count == 1
+        p.on_turn_start(5, "world")
+        assert p._turn_count == 5
+
+    def test_queue_prefetch_respects_dialectic_cadence(self):
+        """With dialecticCadence=3, dialectic should skip turns 2 and 3."""
+        from plugins.memory.honcho import HonchoMemoryProvider
+        p = HonchoMemoryProvider()
+        p._dialectic_cadence = 3
+        p._recall_mode = "context"
+        p._session_key = "test-session"
+        # Simulate a manager that records prefetch calls
+        class FakeManager:
+            def prefetch_context(self, key, query=None):
+                pass
+            def prefetch_dialectic(self, key, query):
+                pass
+
+        p._manager = FakeManager()
+
+        # Simulate turn 1: last_dialectic_turn = -999, so (1 - (-999)) >= 3 -> fires
+        p.on_turn_start(1, "turn 1")
+        p._last_dialectic_turn = 1  # simulate it fired
+        p._last_context_turn = 1
+
+        # Simulate turn 2: (2 - 1) = 1 < 3 -> should NOT fire dialectic
+        p.on_turn_start(2, "turn 2")
+        assert (p._turn_count - p._last_dialectic_turn) < p._dialectic_cadence
+
+        # Simulate turn 3: (3 - 1) = 2 < 3 -> should NOT fire dialectic
+        p.on_turn_start(3, "turn 3")
+        assert (p._turn_count - p._last_dialectic_turn) < p._dialectic_cadence
+
+        # Simulate turn 4: (4 - 1) = 3 >= 3 -> should fire dialectic
+        p.on_turn_start(4, "turn 4")
+        assert (p._turn_count - p._last_dialectic_turn) >= p._dialectic_cadence
+
+    def test_injection_frequency_first_turn_with_1indexed(self):
+        """injection_frequency='first-turn' must inject on turn 1 (1-indexed)."""
+        from plugins.memory.honcho import HonchoMemoryProvider
+        p = HonchoMemoryProvider()
+        p._injection_frequency = "first-turn"
+
+        # Turn 1 should inject (not skip)
+        p.on_turn_start(1, "first message")
+        assert p._turn_count == 1
+        # The guard is `_turn_count > 1`, so turn 1 passes through
+        should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
+        assert not should_skip, "First turn (turn 1) should NOT be skipped"
+
+        # Turn 2 should skip
+        p.on_turn_start(2, "second message")
+        should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
+        assert should_skip, "Second turn (turn 2) SHOULD be skipped"
--- a/tests/honcho_plugin/test_cli.py
+++ b/tests/honcho_plugin/test_cli.py
@ -0,0 +1,56 @@
+"""Tests for plugins/memory/honcho/cli.py."""
+
+from types import SimpleNamespace
+
+
+class TestCmdStatus:
+    def test_reports_connection_failure_when_session_setup_fails(self, monkeypatch, capsys, tmp_path):
+        import plugins.memory.honcho.cli as honcho_cli
+
+        cfg_path = tmp_path / "honcho.json"
+        cfg_path.write_text("{}")
+
+        class FakeConfig:
+            enabled = True
+            api_key = "root-key"
+            workspace_id = "hermes"
+            host = "hermes"
+            base_url = None
+            ai_peer = "hermes"
+            peer_name = "eri"
+            recall_mode = "hybrid"
+            user_observe_me = True
+            user_observe_others = False
+            ai_observe_me = False
+            ai_observe_others = True
+            write_frequency = "async"
+            session_strategy = "per-session"
+            context_tokens = 800
+
+            def resolve_session_name(self):
+                return "hermes"
+
+        monkeypatch.setattr(honcho_cli, "_read_config", lambda: {"apiKey": "***"})
+        monkeypatch.setattr(honcho_cli, "_config_path", lambda: cfg_path)
+        monkeypatch.setattr(honcho_cli, "_local_config_path", lambda: cfg_path)
+        monkeypatch.setattr(honcho_cli, "_active_profile_name", lambda: "default")
+        monkeypatch.setattr(
+            "plugins.memory.honcho.client.HonchoClientConfig.from_global_config",
+            lambda host=None: FakeConfig(),
+        )
+        monkeypatch.setattr(
+            "plugins.memory.honcho.client.get_honcho_client",
+            lambda cfg: object(),
+        )
+
+        def _boom(hcfg, client):
+            raise RuntimeError("Invalid API key")
+
+        monkeypatch.setattr(honcho_cli, "_show_peer_cards", _boom)
+        monkeypatch.setitem(__import__("sys").modules, "honcho", SimpleNamespace())
+
+        honcho_cli.cmd_status(SimpleNamespace(all=False))
+
+        out = capsys.readouterr().out
+        assert "FAILED (Invalid API key)" in out
+        assert "Connection... OK" not in out
--- a/tests/honcho_plugin/test_client.py
+++ b/tests/honcho_plugin/test_client.py
@ -1,5 +1,6 @@
 """Tests for plugins/memory/honcho/client.py — Honcho client configuration."""

+import importlib.util
 import json
 import os
 from pathlib import Path
@ -25,6 +26,7 @@ class TestHonchoClientConfigDefaults:
        assert config.workspace_id == "hermes"
        assert config.api_key is None
        assert config.environment == "production"
+        assert config.timeout is None
        assert config.enabled is False
        assert config.save_messages is True
        assert config.session_strategy == "per-directory"
@ -76,6 +78,11 @@ class TestFromEnv:
        assert config.base_url == "http://localhost:8000"
        assert config.enabled is True

+    def test_reads_timeout_from_env(self):
+        with patch.dict(os.environ, {"HONCHO_TIMEOUT": "90"}, clear=True):
+            config = HonchoClientConfig.from_env()
+        assert config.timeout == 90.0
+

 class TestFromGlobalConfig:
    def test_missing_config_falls_back_to_env(self, tmp_path):
@ -87,10 +94,10 @@ class TestFromGlobalConfig:
        assert config.enabled is False
        assert config.api_key is None

-    def test_reads_full_config(self, tmp_path):
+    def test_reads_full_config(self, tmp_path, monkeypatch):
        config_file = tmp_path / "config.json"
        config_file.write_text(json.dumps({
-            "apiKey": "my-honcho-key",
+            "apiKey": "***",
            "workspace": "my-workspace",
            "environment": "staging",
            "peerName": "alice",
@ -108,9 +115,11 @@ class TestFromGlobalConfig:
                }
            }
        }))
+        # Isolate from real ~/.hermes/honcho.json
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))

        config = HonchoClientConfig.from_global_config(config_path=config_file)
-        assert config.api_key == "my-honcho-key"
+        assert config.api_key == "***"
        # Host block workspace overrides root workspace
        assert config.workspace_id == "override-ws"
        assert config.ai_peer == "override-ai"
@ -154,10 +163,31 @@ class TestFromGlobalConfig:
    def test_session_strategy_default_from_global_config(self, tmp_path):
        """from_global_config with no sessionStrategy should match dataclass default."""
        config_file = tmp_path / "config.json"
-        config_file.write_text(json.dumps({"apiKey": "key"}))
+        config_file.write_text(json.dumps({"apiKey": "***"}))
        config = HonchoClientConfig.from_global_config(config_path=config_file)
        assert config.session_strategy == "per-directory"

+    def test_context_tokens_default_is_none(self, tmp_path):
+        """Default context_tokens should be None (uncapped) unless explicitly set."""
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***"}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.context_tokens is None
+
+    def test_context_tokens_explicit_sets_cap(self, tmp_path):
+        """Explicit contextTokens in config sets the cap."""
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***", "contextTokens": 1200}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.context_tokens == 1200
+
+    def test_context_tokens_explicit_overrides_default(self, tmp_path):
+        """Explicit contextTokens in config should override the default."""
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***", "contextTokens": 2000}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.context_tokens == 2000
+
    def test_context_tokens_host_block_wins(self, tmp_path):
        """Host block contextTokens should override root."""
        config_file = tmp_path / "config.json"
@ -232,6 +262,20 @@ class TestFromGlobalConfig:
        config = HonchoClientConfig.from_global_config(config_path=config_file)
        assert config.base_url == "http://root:9000"

+    def test_timeout_from_config_root(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"timeout": 75}))
+
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.timeout == 75.0
+
+    def test_request_timeout_alias_from_config_root(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"requestTimeout": "82.5"}))
+
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.timeout == 82.5
+

 class TestResolveSessionName:
    def test_manual_override(self):
@ -333,13 +377,14 @@ class TestResolveConfigPath:
        hermes_home.mkdir()
        local_cfg = hermes_home / "honcho.json"
        local_cfg.write_text(json.dumps({
-            "apiKey": "local-key",
+            "apiKey": "***",
            "workspace": "local-ws",
        }))

-        with patch.dict(os.environ, {"HERMES_HOME": str(hermes_home)}):
+        with patch.dict(os.environ, {"HERMES_HOME": str(hermes_home)}), \
+             patch.object(Path, "home", return_value=tmp_path):
            config = HonchoClientConfig.from_global_config()
-        assert config.api_key == "local-key"
+        assert config.api_key == "***"
        assert config.workspace_id == "local-ws"


@ -500,46 +545,115 @@ class TestObservationModeMigration:
        assert cfg.ai_observe_others is True


-class TestInitOnSessionStart:
-    """Tests for the initOnSessionStart config field."""
+class TestGetHonchoClient:
+    def teardown_method(self):
+        reset_honcho_client()

-    def test_default_is_false(self):
+    @pytest.mark.skipif(
+        not importlib.util.find_spec("honcho"),
+        reason="honcho SDK not installed"
+    )
+    def test_passes_timeout_from_config(self):
+        fake_honcho = MagicMock(name="Honcho")
+        cfg = HonchoClientConfig(
+            api_key="test-key",
+            timeout=91.0,
+            workspace_id="hermes",
+            environment="production",
+        )
+
+        with patch("honcho.Honcho", return_value=fake_honcho) as mock_honcho:
+            client = get_honcho_client(cfg)
+
+        assert client is fake_honcho
+        mock_honcho.assert_called_once()
+        assert mock_honcho.call_args.kwargs["timeout"] == 91.0
+
+    @pytest.mark.skipif(
+        not importlib.util.find_spec("honcho"),
+        reason="honcho SDK not installed"
+    )
+    def test_hermes_config_timeout_override_used_when_config_timeout_missing(self):
+        fake_honcho = MagicMock(name="Honcho")
+        cfg = HonchoClientConfig(
+            api_key="test-key",
+            workspace_id="hermes",
+            environment="production",
+        )
+
+        with patch("honcho.Honcho", return_value=fake_honcho) as mock_honcho, \
+             patch("hermes_cli.config.load_config", return_value={"honcho": {"timeout": 88}}):
+            client = get_honcho_client(cfg)
+
+        assert client is fake_honcho
+        mock_honcho.assert_called_once()
+        assert mock_honcho.call_args.kwargs["timeout"] == 88.0
+
+    @pytest.mark.skipif(
+        not importlib.util.find_spec("honcho"),
+        reason="honcho SDK not installed"
+    )
+    def test_hermes_request_timeout_alias_used(self):
+        fake_honcho = MagicMock(name="Honcho")
+        cfg = HonchoClientConfig(
+            api_key="test-key",
+            workspace_id="hermes",
+            environment="production",
+        )
+
+        with patch("honcho.Honcho", return_value=fake_honcho) as mock_honcho, \
+             patch("hermes_cli.config.load_config", return_value={"honcho": {"request_timeout": "77.5"}}):
+            client = get_honcho_client(cfg)
+
+        assert client is fake_honcho
+        mock_honcho.assert_called_once()
+        assert mock_honcho.call_args.kwargs["timeout"] == 77.5
+
+
+class TestResolveSessionNameGatewayKey:
+    """Regression tests for gateway_session_key priority in resolve_session_name.
+
+    Ensures gateway platforms get stable per-chat Honcho sessions even when
+    sessionStrategy=per-session would otherwise create ephemeral sessions.
+    Regression: plugin refactor 924bc67e dropped gateway key plumbing.
+    """
+
+    def test_gateway_key_overrides_per_session_strategy(self):
+        """gateway_session_key must win over per-session session_id."""
+        config = HonchoClientConfig(session_strategy="per-session")
+        result = config.resolve_session_name(
+            session_id="20260412_171002_69bb38",
+            gateway_session_key="agent:main:telegram:dm:8439114563",
+        )
+        assert result == "agent-main-telegram-dm-8439114563"
+
+    def test_session_title_still_wins_over_gateway_key(self):
+        """Explicit /title remap takes priority over gateway_session_key."""
+        config = HonchoClientConfig(session_strategy="per-session")
+        result = config.resolve_session_name(
+            session_title="my-custom-title",
+            session_id="20260412_171002_69bb38",
+            gateway_session_key="agent:main:telegram:dm:8439114563",
+        )
+        assert result == "my-custom-title"
+
+    def test_per_session_fallback_without_gateway_key(self):
+        """Without gateway_session_key, per-session returns session_id (CLI path)."""
+        config = HonchoClientConfig(session_strategy="per-session")
+        result = config.resolve_session_name(
+            session_id="20260412_171002_69bb38",
+            gateway_session_key=None,
+        )
+        assert result == "20260412_171002_69bb38"
+
+    def test_gateway_key_sanitizes_special_chars(self):
+        """Colons and other non-alphanumeric chars are replaced with hyphens."""
        config = HonchoClientConfig()
-        assert config.init_on_session_start is False
-
-    def test_root_level_true(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({
-            "apiKey": "k",
-            "initOnSessionStart": True,
-        }))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is True
-
-    def test_host_block_overrides_root(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({
-            "apiKey": "k",
-            "initOnSessionStart": True,
-            "hosts": {"hermes": {"initOnSessionStart": False}},
-        }))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is False
-
-    def test_host_block_true_overrides_root_absent(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({
-            "apiKey": "k",
-            "hosts": {"hermes": {"initOnSessionStart": True}},
-        }))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is True
-
-    def test_absent_everywhere_defaults_false(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({"apiKey": "k"}))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is False
+        result = config.resolve_session_name(
+            gateway_session_key="agent:main:telegram:dm:8439114563",
+        )
+        assert result == "agent-main-telegram-dm-8439114563"
+        assert ":" not in result


 class TestResetHonchoClient:
@ -549,3 +663,91 @@ class TestResetHonchoClient:
        assert mod._honcho_client is not None
        reset_honcho_client()
        assert mod._honcho_client is None
+
+
+class TestDialecticDepthParsing:
+    """Tests for _parse_dialectic_depth and _parse_dialectic_depth_levels."""
+
+    def test_default_depth_is_1(self, tmp_path):
+        """Default dialecticDepth should be 1."""
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***"}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth == 1
+
+    def test_depth_from_root(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***", "dialecticDepth": 2}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth == 2
+
+    def test_depth_host_block_wins(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "***",
+            "dialecticDepth": 1,
+            "hosts": {"hermes": {"dialecticDepth": 3}},
+        }))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth == 3
+
+    def test_depth_clamped_high(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***", "dialecticDepth": 10}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth == 3
+
+    def test_depth_clamped_low(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***", "dialecticDepth": -1}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth == 1
+
+    def test_depth_levels_default_none(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({"apiKey": "***"}))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth_levels is None
+
+    def test_depth_levels_from_config(self, tmp_path):
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "***",
+            "dialecticDepth": 2,
+            "dialecticDepthLevels": ["minimal", "high"],
+        }))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth_levels == ["minimal", "high"]
+
+    def test_depth_levels_padded_if_short(self, tmp_path):
+        """Array shorter than depth gets padded with 'low'."""
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "***",
+            "dialecticDepth": 3,
+            "dialecticDepthLevels": ["high"],
+        }))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth_levels == ["high", "low", "low"]
+
+    def test_depth_levels_truncated_if_long(self, tmp_path):
+        """Array longer than depth gets truncated."""
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "***",
+            "dialecticDepth": 1,
+            "dialecticDepthLevels": ["high", "max", "medium"],
+        }))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth_levels == ["high"]
+
+    def test_depth_levels_invalid_values_default_to_low(self, tmp_path):
+        """Invalid reasoning levels in the array fall back to 'low'."""
+        config_file = tmp_path / "config.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "***",
+            "dialecticDepth": 2,
+            "dialecticDepthLevels": ["invalid", "high"],
+        }))
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.dialectic_depth_levels == ["low", "high"]
--- a/tests/honcho_plugin/test_session.py
+++ b/tests/honcho_plugin/test_session.py
@ -205,27 +205,62 @@ class TestPeerLookupHelpers:

    def test_get_peer_card_uses_direct_peer_lookup(self):
        mgr, session = self._make_cached_manager()
-        user_peer = MagicMock()
-        user_peer.get_card.return_value = ["Name: Robert"]
-        mgr._get_or_create_peer = MagicMock(return_value=user_peer)
+        assistant_peer = MagicMock()
+        assistant_peer.get_card.return_value = ["Name: Robert"]
+        mgr._get_or_create_peer = MagicMock(return_value=assistant_peer)

        assert mgr.get_peer_card(session.key) == ["Name: Robert"]
-        user_peer.get_card.assert_called_once_with()
+        assistant_peer.get_card.assert_called_once_with(target=session.user_peer_id)

-    def test_search_context_uses_peer_context_response(self):
+    def test_search_context_uses_assistant_perspective_with_target(self):
        mgr, session = self._make_cached_manager()
-        user_peer = MagicMock()
-        user_peer.context.return_value = SimpleNamespace(
+        assistant_peer = MagicMock()
+        assistant_peer.context.return_value = SimpleNamespace(
            representation="Robert runs neuralancer",
            peer_card=["Location: Melbourne"],
        )
-        mgr._get_or_create_peer = MagicMock(return_value=user_peer)
+        mgr._get_or_create_peer = MagicMock(return_value=assistant_peer)

        result = mgr.search_context(session.key, "neuralancer")

        assert "Robert runs neuralancer" in result
        assert "- Location: Melbourne" in result
-        user_peer.context.assert_called_once_with(search_query="neuralancer")
+        assistant_peer.context.assert_called_once_with(
+            target=session.user_peer_id,
+            search_query="neuralancer",
+        )
+
+    def test_search_context_unified_mode_uses_user_self_context(self):
+        mgr, session = self._make_cached_manager()
+        mgr._ai_observe_others = False
+        user_peer = MagicMock()
+        user_peer.context.return_value = SimpleNamespace(
+            representation="Unified self context",
+            peer_card=["Name: Robert"],
+        )
+        mgr._get_or_create_peer = MagicMock(return_value=user_peer)
+
+        result = mgr.search_context(session.key, "self")
+
+        assert "Unified self context" in result
+        user_peer.context.assert_called_once_with(search_query="self")
+
+    def test_search_context_accepts_explicit_ai_peer_id(self):
+        mgr, session = self._make_cached_manager()
+        ai_peer = MagicMock()
+        ai_peer.context.return_value = SimpleNamespace(
+            representation="Assistant self context",
+            peer_card=["Role: Assistant"],
+        )
+        mgr._get_or_create_peer = MagicMock(return_value=ai_peer)
+
+        result = mgr.search_context(session.key, "assistant", peer=session.assistant_peer_id)
+
+        assert "Assistant self context" in result
+        ai_peer.context.assert_called_once_with(
+            target=session.assistant_peer_id,
+            search_query="assistant",
+        )

    def test_get_prefetch_context_fetches_user_and_ai_from_peer_api(self):
        mgr, session = self._make_cached_manager()
@ -235,9 +270,15 @@ class TestPeerLookupHelpers:
            peer_card=["Name: Robert"],
        )
        ai_peer = MagicMock()
-        ai_peer.context.return_value = SimpleNamespace(
-            representation="AI representation",
-            peer_card=["Owner: Robert"],
+        ai_peer.context.side_effect = lambda **kwargs: SimpleNamespace(
+            representation=(
+                "AI representation" if kwargs.get("target") == session.assistant_peer_id
+                else "Mixed representation"
+            ),
+            peer_card=(
+                ["Role: Assistant"] if kwargs.get("target") == session.assistant_peer_id
+                else ["Name: Robert"]
+            ),
        )
        mgr._get_or_create_peer = MagicMock(side_effect=[user_peer, ai_peer])

@ -247,17 +288,23 @@ class TestPeerLookupHelpers:
            "representation": "User representation",
            "card": "Name: Robert",
            "ai_representation": "AI representation",
-            "ai_card": "Owner: Robert",
+            "ai_card": "Role: Assistant",
        }
-        user_peer.context.assert_called_once_with()
-        ai_peer.context.assert_called_once_with()
+        user_peer.context.assert_called_once_with(target=session.user_peer_id)
+        ai_peer.context.assert_called_once_with(target=session.assistant_peer_id)

    def test_get_ai_representation_uses_peer_api(self):
        mgr, session = self._make_cached_manager()
        ai_peer = MagicMock()
-        ai_peer.context.return_value = SimpleNamespace(
-            representation="AI representation",
-            peer_card=["Owner: Robert"],
+        ai_peer.context.side_effect = lambda **kwargs: SimpleNamespace(
+            representation=(
+                "AI representation" if kwargs.get("target") == session.assistant_peer_id
+                else "Mixed representation"
+            ),
+            peer_card=(
+                ["Role: Assistant"] if kwargs.get("target") == session.assistant_peer_id
+                else ["Name: Robert"]
+            ),
        )
        mgr._get_or_create_peer = MagicMock(return_value=ai_peer)

@ -265,9 +312,167 @@ class TestPeerLookupHelpers:

        assert result == {
            "representation": "AI representation",
-            "card": "Owner: Robert",
+            "card": "Role: Assistant",
        }
-        ai_peer.context.assert_called_once_with()
+        ai_peer.context.assert_called_once_with(target=session.assistant_peer_id)
+
+    def test_create_conclusion_defaults_to_user_target(self):
+        mgr, session = self._make_cached_manager()
+        assistant_peer = MagicMock()
+        scope = MagicMock()
+        assistant_peer.conclusions_of.return_value = scope
+        mgr._get_or_create_peer = MagicMock(return_value=assistant_peer)
+
+        ok = mgr.create_conclusion(session.key, "User prefers dark mode")
+
+        assert ok is True
+        assistant_peer.conclusions_of.assert_called_once_with(session.user_peer_id)
+        scope.create.assert_called_once_with([{
+            "content": "User prefers dark mode",
+            "session_id": session.honcho_session_id,
+        }])
+
+    def test_create_conclusion_can_target_ai_peer(self):
+        mgr, session = self._make_cached_manager()
+        assistant_peer = MagicMock()
+        scope = MagicMock()
+        assistant_peer.conclusions_of.return_value = scope
+        mgr._get_or_create_peer = MagicMock(return_value=assistant_peer)
+
+        ok = mgr.create_conclusion(session.key, "Assistant prefers terse summaries", peer="ai")
+
+        assert ok is True
+        assistant_peer.conclusions_of.assert_called_once_with(session.assistant_peer_id)
+        scope.create.assert_called_once_with([{
+            "content": "Assistant prefers terse summaries",
+            "session_id": session.honcho_session_id,
+        }])
+
+    def test_create_conclusion_accepts_explicit_user_peer_id(self):
+        mgr, session = self._make_cached_manager()
+        assistant_peer = MagicMock()
+        scope = MagicMock()
+        assistant_peer.conclusions_of.return_value = scope
+        mgr._get_or_create_peer = MagicMock(return_value=assistant_peer)
+
+        ok = mgr.create_conclusion(session.key, "Robert prefers vinyl", peer=session.user_peer_id)
+
+        assert ok is True
+        assistant_peer.conclusions_of.assert_called_once_with(session.user_peer_id)
+        scope.create.assert_called_once_with([{
+            "content": "Robert prefers vinyl",
+            "session_id": session.honcho_session_id,
+        }])
+
+
+class TestConcludeToolDispatch:
+    def test_honcho_conclude_defaults_to_user_peer(self):
+        provider = HonchoMemoryProvider()
+        provider._session_initialized = True
+        provider._session_key = "telegram:123"
+        provider._manager = MagicMock()
+        provider._manager.create_conclusion.return_value = True
+
+        result = provider.handle_tool_call(
+            "honcho_conclude",
+            {"conclusion": "User prefers dark mode"},
+        )
+
+        assert "Conclusion saved for user" in result
+        provider._manager.create_conclusion.assert_called_once_with(
+            "telegram:123",
+            "User prefers dark mode",
+            peer="user",
+        )
+
+    def test_honcho_conclude_can_target_ai_peer(self):
+        provider = HonchoMemoryProvider()
+        provider._session_initialized = True
+        provider._session_key = "telegram:123"
+        provider._manager = MagicMock()
+        provider._manager.create_conclusion.return_value = True
+
+        result = provider.handle_tool_call(
+            "honcho_conclude",
+            {"conclusion": "Assistant likes terse replies", "peer": "ai"},
+        )
+
+        assert "Conclusion saved for ai" in result
+        provider._manager.create_conclusion.assert_called_once_with(
+            "telegram:123",
+            "Assistant likes terse replies",
+            peer="ai",
+        )
+
+    def test_honcho_profile_can_target_explicit_peer_id(self):
+        provider = HonchoMemoryProvider()
+        provider._session_initialized = True
+        provider._session_key = "telegram:123"
+        provider._manager = MagicMock()
+        provider._manager.get_peer_card.return_value = ["Role: Assistant"]
+
+        result = provider.handle_tool_call(
+            "honcho_profile",
+            {"peer": "hermes"},
+        )
+
+        assert "Role: Assistant" in result
+        provider._manager.get_peer_card.assert_called_once_with("telegram:123", peer="hermes")
+
+    def test_honcho_search_can_target_explicit_peer_id(self):
+        provider = HonchoMemoryProvider()
+        provider._session_initialized = True
+        provider._session_key = "telegram:123"
+        provider._manager = MagicMock()
+        provider._manager.search_context.return_value = "Assistant self context"
+
+        result = provider.handle_tool_call(
+            "honcho_search",
+            {"query": "assistant", "peer": "hermes"},
+        )
+
+        assert "Assistant self context" in result
+        provider._manager.search_context.assert_called_once_with(
+            "telegram:123",
+            "assistant",
+            max_tokens=800,
+            peer="hermes",
+        )
+
+    def test_honcho_reasoning_can_target_explicit_peer_id(self):
+        provider = HonchoMemoryProvider()
+        provider._session_initialized = True
+        provider._session_key = "telegram:123"
+        provider._manager = MagicMock()
+        provider._manager.dialectic_query.return_value = "Assistant answer"
+
+        result = provider.handle_tool_call(
+            "honcho_reasoning",
+            {"query": "who are you", "peer": "hermes"},
+        )
+
+        assert "Assistant answer" in result
+        provider._manager.dialectic_query.assert_called_once_with(
+            "telegram:123",
+            "who are you",
+            reasoning_level=None,
+            peer="hermes",
+        )
+
+    def test_honcho_conclude_missing_both_params_returns_error(self):
+        """Calling honcho_conclude with neither conclusion nor delete_id returns a tool error."""
+        import json
+        provider = HonchoMemoryProvider()
+        provider._session_initialized = True
+        provider._session_key = "telegram:123"
+        provider._manager = MagicMock()
+
+        result = provider.handle_tool_call("honcho_conclude", {})
+
+        parsed = json.loads(result)
+        assert "error" in parsed or "Missing required" in parsed.get("result", "")
+        provider._manager.create_conclusion.assert_not_called()
+        provider._manager.delete_conclusion.assert_not_called()


 # ---------------------------------------------------------------------------
@ -366,6 +571,54 @@ class TestToolsModeInitBehavior:
        assert cfg.peer_name == "8439114563"


+class TestPerSessionMigrateGuard:
+    """Verify migrate_memory_files is skipped under per-session strategy.
+
+    per-session creates a fresh Honcho session every Hermes run. Uploading
+    MEMORY.md/USER.md/SOUL.md to each short-lived session floods the backend
+    with duplicate content. The guard was added to prevent orphan sessions
+    containing only <prior_memory_file> wrappers.
+    """
+
+    def _make_provider_with_strategy(self, strategy, init_on_session_start=True):
+        """Create a HonchoMemoryProvider and track migrate_memory_files calls."""
+        from plugins.memory.honcho.client import HonchoClientConfig
+        from unittest.mock import patch, MagicMock
+
+        cfg = HonchoClientConfig(
+            api_key="test-key",
+            enabled=True,
+            recall_mode="tools",
+            init_on_session_start=init_on_session_start,
+            session_strategy=strategy,
+        )
+
+        provider = HonchoMemoryProvider()
+
+        mock_manager = MagicMock()
+        mock_session = MagicMock()
+        mock_session.messages = []  # empty = new session → triggers migration path
+        mock_manager.get_or_create.return_value = mock_session
+
+        with patch("plugins.memory.honcho.client.HonchoClientConfig.from_global_config", return_value=cfg), \
+             patch("plugins.memory.honcho.client.get_honcho_client", return_value=MagicMock()), \
+             patch("plugins.memory.honcho.session.HonchoSessionManager", return_value=mock_manager), \
+             patch("hermes_constants.get_hermes_home", return_value=MagicMock()):
+            provider.initialize(session_id="test-session-001")
+
+        return provider, mock_manager
+
+    def test_migrate_skipped_for_per_session(self):
+        """per-session strategy must NOT call migrate_memory_files."""
+        _, mock_manager = self._make_provider_with_strategy("per-session")
+        mock_manager.migrate_memory_files.assert_not_called()
+
+    def test_migrate_runs_for_per_directory(self):
+        """per-directory strategy with empty session SHOULD call migrate_memory_files."""
+        _, mock_manager = self._make_provider_with_strategy("per-directory")
+        mock_manager.migrate_memory_files.assert_called_once()
+
+
 class TestChunkMessage:
    def test_short_message_single_chunk(self):
        result = HonchoMemoryProvider._chunk_message("hello world", 100)
@ -420,6 +673,60 @@ class TestChunkMessage:
            assert len(chunk) <= 25000


+# ---------------------------------------------------------------------------
+# Context token budget enforcement
+# ---------------------------------------------------------------------------
+
+
+class TestTruncateToBudget:
+    def test_truncates_oversized_context(self):
+        """Text exceeding context_tokens budget is truncated at a word boundary."""
+        from plugins.memory.honcho.client import HonchoClientConfig
+
+        provider = HonchoMemoryProvider()
+        provider._config = HonchoClientConfig(context_tokens=10)
+
+        long_text = "word " * 200  # ~1000 chars, well over 10*4=40 char budget
+        result = provider._truncate_to_budget(long_text)
+
+        assert len(result) <= 50  # budget_chars + ellipsis + word boundary slack
+        assert result.endswith(" …")
+
+    def test_no_truncation_within_budget(self):
+        """Text within budget passes through unchanged."""
+        from plugins.memory.honcho.client import HonchoClientConfig
+
+        provider = HonchoMemoryProvider()
+        provider._config = HonchoClientConfig(context_tokens=1000)
+
+        short_text = "Name: Robert, Location: Melbourne"
+        assert provider._truncate_to_budget(short_text) == short_text
+
+    def test_no_truncation_when_context_tokens_none(self):
+        """When context_tokens is None (explicit opt-out), no truncation."""
+        from plugins.memory.honcho.client import HonchoClientConfig
+
+        provider = HonchoMemoryProvider()
+        provider._config = HonchoClientConfig(context_tokens=None)
+
+        long_text = "word " * 500
+        assert provider._truncate_to_budget(long_text) == long_text
+
+    def test_context_tokens_cap_bounds_prefetch(self):
+        """With an explicit token budget, oversized prefetch is bounded."""
+        from plugins.memory.honcho.client import HonchoClientConfig
+
+        provider = HonchoMemoryProvider()
+        provider._config = HonchoClientConfig(context_tokens=1200)
+
+        # Simulate a massive representation (10k chars)
+        huge_text = "x" * 10000
+        result = provider._truncate_to_budget(huge_text)
+
+        # 1200 tokens * 4 chars = 4800 chars + " …"
+        assert len(result) <= 4805
+
+
 # ---------------------------------------------------------------------------
 # Dialectic input guard
 # ---------------------------------------------------------------------------
@ -452,3 +759,387 @@ class TestDialecticInputGuard:
        # The query passed to chat() should be truncated
        actual_query = mock_peer.chat.call_args[0][0]
        assert len(actual_query) <= 100
+
+
+# ---------------------------------------------------------------------------
+
+
+class TestDialecticCadenceDefaults:
+    """Regression tests for dialectic_cadence default value."""
+
+    @staticmethod
+    def _make_provider(cfg_extra=None):
+        """Create a HonchoMemoryProvider with mocked dependencies."""
+        from unittest.mock import patch, MagicMock
+        from plugins.memory.honcho.client import HonchoClientConfig
+
+        defaults = dict(api_key="test-key", enabled=True, recall_mode="hybrid")
+        if cfg_extra:
+            defaults.update(cfg_extra)
+        cfg = HonchoClientConfig(**defaults)
+        provider = HonchoMemoryProvider()
+        mock_manager = MagicMock()
+        mock_session = MagicMock()
+        mock_session.messages = []
+        mock_manager.get_or_create.return_value = mock_session
+
+        with patch("plugins.memory.honcho.client.HonchoClientConfig.from_global_config", return_value=cfg), \
+             patch("plugins.memory.honcho.client.get_honcho_client", return_value=MagicMock()), \
+             patch("plugins.memory.honcho.session.HonchoSessionManager", return_value=mock_manager), \
+             patch("hermes_constants.get_hermes_home", return_value=MagicMock()):
+            provider.initialize(session_id="test-session-001")
+
+        return provider
+
+    def test_default_is_3(self):
+        """Default dialectic_cadence should be 3 to avoid per-turn LLM calls."""
+        provider = self._make_provider()
+        assert provider._dialectic_cadence == 3
+
+    def test_config_override(self):
+        """dialecticCadence from config overrides the default."""
+        provider = self._make_provider(cfg_extra={"raw": {"dialecticCadence": 5}})
+        assert provider._dialectic_cadence == 5
+
+
+class TestBaseContextSummary:
+    """Base context injection should include session summary when available."""
+
+    def test_format_includes_summary(self):
+        """Session summary should appear first in the formatted context."""
+        provider = HonchoMemoryProvider()
+        ctx = {
+            "summary": "Testing Honcho tools and dialectic depth.",
+            "representation": "Eri is a developer.",
+            "card": "Name: Eri Barrett",
+        }
+        formatted = provider._format_first_turn_context(ctx)
+        assert "## Session Summary" in formatted
+        assert formatted.index("Session Summary") < formatted.index("User Representation")
+
+    def test_format_without_summary(self):
+        """No summary key means no summary section."""
+        provider = HonchoMemoryProvider()
+        ctx = {"representation": "Eri is a developer.", "card": "Name: Eri"}
+        formatted = provider._format_first_turn_context(ctx)
+        assert "Session Summary" not in formatted
+        assert "User Representation" in formatted
+
+    def test_format_empty_summary_skipped(self):
+        """Empty summary string should not produce a section."""
+        provider = HonchoMemoryProvider()
+        ctx = {"summary": "", "representation": "rep", "card": "card"}
+        formatted = provider._format_first_turn_context(ctx)
+        assert "Session Summary" not in formatted
+
+
+class TestDialecticDepth:
+    """Tests for the dialecticDepth multi-pass system."""
+
+    @staticmethod
+    def _make_provider(cfg_extra=None):
+        from unittest.mock import patch, MagicMock
+        from plugins.memory.honcho.client import HonchoClientConfig
+
+        defaults = dict(api_key="test-key", enabled=True, recall_mode="hybrid")
+        if cfg_extra:
+            defaults.update(cfg_extra)
+        cfg = HonchoClientConfig(**defaults)
+        provider = HonchoMemoryProvider()
+        mock_manager = MagicMock()
+        mock_session = MagicMock()
+        mock_session.messages = []
+        mock_manager.get_or_create.return_value = mock_session
+
+        with patch("plugins.memory.honcho.client.HonchoClientConfig.from_global_config", return_value=cfg), \
+             patch("plugins.memory.honcho.client.get_honcho_client", return_value=MagicMock()), \
+             patch("plugins.memory.honcho.session.HonchoSessionManager", return_value=mock_manager), \
+             patch("hermes_constants.get_hermes_home", return_value=MagicMock()):
+            provider.initialize(session_id="test-session-001")
+
+        return provider
+
+    def test_default_depth_is_1(self):
+        """Default dialecticDepth should be 1 — single .chat() call."""
+        provider = self._make_provider()
+        assert provider._dialectic_depth == 1
+
+    def test_depth_from_config(self):
+        """dialecticDepth from config sets the depth."""
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 2})
+        assert provider._dialectic_depth == 2
+
+    def test_depth_clamped_to_3(self):
+        """dialecticDepth > 3 gets clamped to 3."""
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 7})
+        assert provider._dialectic_depth == 3
+
+    def test_depth_clamped_to_1(self):
+        """dialecticDepth < 1 gets clamped to 1."""
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 0})
+        assert provider._dialectic_depth == 1
+
+    def test_depth_levels_from_config(self):
+        """dialecticDepthLevels array is read from config."""
+        provider = self._make_provider(cfg_extra={
+            "dialectic_depth": 2,
+            "dialectic_depth_levels": ["minimal", "high"],
+        })
+        assert provider._dialectic_depth_levels == ["minimal", "high"]
+
+    def test_depth_levels_none_by_default(self):
+        """When dialecticDepthLevels is not configured, it's None."""
+        provider = self._make_provider()
+        assert provider._dialectic_depth_levels is None
+
+    def test_resolve_pass_level_uses_depth_levels(self):
+        """Per-pass levels from dialecticDepthLevels override proportional."""
+        provider = self._make_provider(cfg_extra={
+            "dialectic_depth": 2,
+            "dialectic_depth_levels": ["minimal", "high"],
+        })
+        assert provider._resolve_pass_level(0) == "minimal"
+        assert provider._resolve_pass_level(1) == "high"
+
+    def test_resolve_pass_level_proportional_depth_1(self):
+        """Depth 1 pass 0 uses the base reasoning level."""
+        provider = self._make_provider(cfg_extra={
+            "dialectic_depth": 1,
+            "dialectic_reasoning_level": "medium",
+        })
+        assert provider._resolve_pass_level(0) == "medium"
+
+    def test_resolve_pass_level_proportional_depth_2(self):
+        """Depth 2: pass 0 is minimal, pass 1 is base level."""
+        provider = self._make_provider(cfg_extra={
+            "dialectic_depth": 2,
+            "dialectic_reasoning_level": "high",
+        })
+        assert provider._resolve_pass_level(0) == "minimal"
+        assert provider._resolve_pass_level(1) == "high"
+
+    def test_cold_start_prompt(self):
+        """Cold start (no base context) uses general user query."""
+        provider = self._make_provider()
+        prompt = provider._build_dialectic_prompt(0, [], is_cold=True)
+        assert "preferences" in prompt.lower()
+        assert "session" not in prompt.lower()
+
+    def test_warm_session_prompt(self):
+        """Warm session (has context) uses session-scoped query."""
+        provider = self._make_provider()
+        prompt = provider._build_dialectic_prompt(0, [], is_cold=False)
+        assert "session" in prompt.lower()
+        assert "current conversation" in prompt.lower()
+
+    def test_signal_sufficient_short_response(self):
+        """Short responses are not sufficient signal."""
+        assert not HonchoMemoryProvider._signal_sufficient("ok")
+        assert not HonchoMemoryProvider._signal_sufficient("")
+        assert not HonchoMemoryProvider._signal_sufficient(None)
+
+    def test_signal_sufficient_structured_response(self):
+        """Structured responses with bullets/headers are sufficient."""
+        result = "## Current State\n- Working on Honcho PR\n- Testing dialectic depth\n" + "x" * 50
+        assert HonchoMemoryProvider._signal_sufficient(result)
+
+    def test_signal_sufficient_long_unstructured(self):
+        """Long responses are sufficient even without structure."""
+        assert HonchoMemoryProvider._signal_sufficient("a" * 301)
+
+    def test_run_dialectic_depth_single_pass(self):
+        """Depth 1 makes exactly one .chat() call."""
+        from unittest.mock import MagicMock
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 1})
+        provider._manager = MagicMock()
+        provider._manager.dialectic_query.return_value = "user prefers zero-fluff"
+        provider._session_key = "test"
+        provider._base_context_cache = None  # cold start
+
+        result = provider._run_dialectic_depth("hello")
+        assert result == "user prefers zero-fluff"
+        assert provider._manager.dialectic_query.call_count == 1
+
+    def test_run_dialectic_depth_two_passes(self):
+        """Depth 2 makes two .chat() calls when pass 1 signal is weak."""
+        from unittest.mock import MagicMock
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 2})
+        provider._manager = MagicMock()
+        provider._manager.dialectic_query.side_effect = [
+            "thin response",  # pass 0: weak signal
+            "## Synthesis\n- Grounded in evidence\n- Current PR work\n" + "x" * 100,  # pass 1: strong
+        ]
+        provider._session_key = "test"
+        provider._base_context_cache = "existing context"
+
+        result = provider._run_dialectic_depth("test query")
+        assert provider._manager.dialectic_query.call_count == 2
+        assert "Synthesis" in result
+
+    def test_first_turn_runs_dialectic_synchronously(self):
+        """First turn should fire the dialectic synchronously (cold start)."""
+        from unittest.mock import MagicMock, patch
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 1})
+        provider._manager = MagicMock()
+        provider._manager.dialectic_query.return_value = "cold start synthesis"
+        provider._manager.get_prefetch_context.return_value = None
+        provider._manager.pop_context_result.return_value = None
+        provider._session_key = "test"
+        provider._base_context_cache = ""  # cold start
+        provider._last_dialectic_turn = -999  # never fired
+
+        result = provider.prefetch("hello world")
+        assert "cold start synthesis" in result
+        assert provider._manager.dialectic_query.call_count == 1
+        # After first-turn sync, _last_dialectic_turn should be updated
+        assert provider._last_dialectic_turn != -999
+
+    def test_first_turn_dialectic_does_not_double_fire(self):
+        """After first-turn sync dialectic, queue_prefetch should skip (cadence)."""
+        from unittest.mock import MagicMock
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 1})
+        provider._manager = MagicMock()
+        provider._manager.dialectic_query.return_value = "cold start synthesis"
+        provider._manager.get_prefetch_context.return_value = None
+        provider._manager.pop_context_result.return_value = None
+        provider._session_key = "test"
+        provider._base_context_cache = ""
+        provider._last_dialectic_turn = -999
+        provider._turn_count = 0
+
+        # First turn fires sync dialectic
+        provider.prefetch("hello")
+        assert provider._manager.dialectic_query.call_count == 1
+
+        # Now queue_prefetch on same turn should skip (cadence: 0 - 0 < 3)
+        provider._manager.dialectic_query.reset_mock()
+        provider.queue_prefetch("hello")
+        assert provider._manager.dialectic_query.call_count == 0
+
+    def test_run_dialectic_depth_bails_early_on_strong_signal(self):
+        """Depth 2 skips pass 1 when pass 0 returns strong signal."""
+        from unittest.mock import MagicMock
+        provider = self._make_provider(cfg_extra={"dialectic_depth": 2})
+        provider._manager = MagicMock()
+        provider._manager.dialectic_query.return_value = (
+            "## Full Assessment\n- Strong structured response\n- With evidence\n" + "x" * 200
+        )
+        provider._session_key = "test"
+        provider._base_context_cache = "existing context"
+
+        result = provider._run_dialectic_depth("test query")
+        # Only 1 call because pass 0 had sufficient signal
+        assert provider._manager.dialectic_query.call_count == 1
+
+
+# ---------------------------------------------------------------------------
+# set_peer_card None guard
+# ---------------------------------------------------------------------------
+
+
+class TestSetPeerCardNoneGuard:
+    """set_peer_card must return None (not raise) when peer ID cannot be resolved."""
+
+    def _make_manager(self):
+        from plugins.memory.honcho.client import HonchoClientConfig
+        from plugins.memory.honcho.session import HonchoSessionManager
+
+        cfg = HonchoClientConfig(api_key="test-key", enabled=True)
+        mgr = HonchoSessionManager.__new__(HonchoSessionManager)
+        mgr._cache = {}
+        mgr._sessions_cache = {}
+        mgr._config = cfg
+        return mgr
+
+    def test_returns_none_when_peer_resolves_to_none(self):
+        """set_peer_card returns None when _resolve_peer_id returns None."""
+        from unittest.mock import patch
+        mgr = self._make_manager()
+
+        session = HonchoSession(
+            key="test",
+            honcho_session_id="sid",
+            user_peer_id="user-peer",
+            assistant_peer_id="ai-peer",
+        )
+        mgr._cache["test"] = session
+
+        with patch.object(mgr, "_resolve_peer_id", return_value=None):
+            result = mgr.set_peer_card("test", ["fact 1", "fact 2"], peer="ghost")
+
+        assert result is None
+
+    def test_returns_none_when_session_missing(self):
+        """set_peer_card returns None when session key is not in cache."""
+        mgr = self._make_manager()
+        result = mgr.set_peer_card("nonexistent", ["fact"], peer="user")
+        assert result is None
+
+
+# ---------------------------------------------------------------------------
+# get_session_context cache-miss fallback respects peer param
+# ---------------------------------------------------------------------------
+
+
+class TestGetSessionContextFallback:
+    """get_session_context fallback must honour the peer param when honcho_session is absent."""
+
+    def _make_manager_with_session(self, user_peer_id="user-peer", assistant_peer_id="ai-peer"):
+        from plugins.memory.honcho.client import HonchoClientConfig
+        from plugins.memory.honcho.session import HonchoSessionManager
+
+        cfg = HonchoClientConfig(api_key="test-key", enabled=True)
+        mgr = HonchoSessionManager.__new__(HonchoSessionManager)
+        mgr._cache = {}
+        mgr._sessions_cache = {}
+        mgr._config = cfg
+        mgr._dialectic_dynamic = True
+        mgr._dialectic_reasoning_level = "low"
+        mgr._dialectic_max_input_chars = 10000
+        mgr._ai_observe_others = True
+
+        session = HonchoSession(
+            key="test",
+            honcho_session_id="sid-missing-from-sessions-cache",
+            user_peer_id=user_peer_id,
+            assistant_peer_id=assistant_peer_id,
+        )
+        mgr._cache["test"] = session
+        # Deliberately NOT adding to _sessions_cache to trigger fallback path
+        return mgr
+
+    def test_fallback_uses_user_peer_for_user(self):
+        """On cache miss, peer='user' fetches user peer context."""
+        mgr = self._make_manager_with_session()
+        fetch_calls = []
+
+        def _fake_fetch(peer_id, search_query=None, *, target=None):
+            fetch_calls.append((peer_id, target))
+            return {"representation": "user rep", "card": []}
+
+        mgr._fetch_peer_context = _fake_fetch
+
+        mgr.get_session_context("test", peer="user")
+
+        assert len(fetch_calls) == 1
+        peer_id, target = fetch_calls[0]
+        assert peer_id == "user-peer"
+        assert target == "user-peer"
+
+    def test_fallback_uses_ai_peer_for_ai(self):
+        """On cache miss, peer='ai' fetches assistant peer context, not user."""
+        mgr = self._make_manager_with_session()
+        fetch_calls = []
+
+        def _fake_fetch(peer_id, search_query=None, *, target=None):
+            fetch_calls.append((peer_id, target))
+            return {"representation": "ai rep", "card": []}
+
+        mgr._fetch_peer_context = _fake_fetch
+
+        mgr.get_session_context("test", peer="ai")
+
+        assert len(fetch_calls) == 1
+        peer_id, target = fetch_calls[0]
+        assert peer_id == "ai-peer", f"expected ai-peer, got {peer_id}"
+        assert target == "ai-peer"
--- a/tests/run_agent/test_run_agent.py
+++ b/tests/run_agent/test_run_agent.py
@ -3998,3 +3998,63 @@ class TestDeadRetryCode:
            f"Expected 2 occurrences of 'if retry_count >= max_retries:' "
            f"but found {occurrences}"
        )
+
+
+class TestMemoryContextSanitization:
+    """run_conversation() must strip leaked <memory-context> blocks from user input."""
+
+    def test_memory_context_stripped_from_user_message(self):
+        """Verify that <memory-context> blocks are removed before the message
+        enters the conversation loop — prevents stale Honcho injection from
+        leaking into user text."""
+        import inspect
+        src = inspect.getsource(AIAgent.run_conversation)
+        # The sanitize_context call must appear in run_conversation's preamble
+        assert "sanitize_context(user_message)" in src
+        assert "sanitize_context(persist_user_message)" in src
+
+    def test_sanitize_context_strips_full_block(self):
+        """End-to-end: a user message with an embedded memory-context block
+        is cleaned to just the actual user text."""
+        from agent.memory_manager import sanitize_context
+        user_text = "how is the honcho working"
+        injected = (
+            user_text + "\n\n"
+            "<memory-context>\n"
+            "[System note: The following is recalled memory context, "
+            "NOT new user input. Treat as informational background data.]\n\n"
+            "## User Representation\n"
+            "[2026-01-13 02:13:00] stale observation about AstroMap\n"
+            "</memory-context>"
+        )
+        result = sanitize_context(injected)
+        assert "memory-context" not in result.lower()
+        assert "stale observation" not in result
+        assert "how is the honcho working" in result
+
+
+class TestMemoryProviderTurnStart:
+    """run_conversation() must call memory_manager.on_turn_start() before prefetch_all().
+
+    Without this call, providers like Honcho never update _turn_count, so cadence
+    checks (contextCadence, dialecticCadence) are always satisfied — every turn
+    fires both context refresh and dialectic, ignoring the configured cadence.
+    """
+
+    def test_on_turn_start_called_before_prefetch(self):
+        """Source-level check: on_turn_start appears before prefetch_all in run_conversation."""
+        import inspect
+        src = inspect.getsource(AIAgent.run_conversation)
+        # Find the actual method calls, not comments
+        idx_turn_start = src.index(".on_turn_start(")
+        idx_prefetch = src.index(".prefetch_all(")
+        assert idx_turn_start < idx_prefetch, (
+            "on_turn_start() must be called before prefetch_all() in run_conversation "
+            "so that memory providers have the correct turn count for cadence checks"
+        )
+
+    def test_on_turn_start_uses_user_turn_count(self):
+        """Source-level check: on_turn_start receives self._user_turn_count."""
+        import inspect
+        src = inspect.getsource(AIAgent.run_conversation)
+        assert "on_turn_start(self._user_turn_count" in src
--- a/website/docs/reference/tools-reference.md
+++ b/website/docs/reference/tools-reference.md
@ -72,7 +72,7 @@ In addition to built-in tools, Hermes can load tools dynamically from MCP server
 | `ha_list_services` | List available Home Assistant services (actions) for device control. Shows what actions can be performed on each device type and what parameters they accept. Use this to discover how to control devices found via ha_list_entities. | — |

 :::note
-**Honcho tools** (`honcho_conclude`, `honcho_context`, `honcho_profile`, `honcho_search`) are no longer built-in. They are available via the Honcho memory provider plugin at `plugins/memory/honcho/`. See [Plugins](../user-guide/features/plugins.md) for installation and usage.
+**Honcho tools** (`honcho_profile`, `honcho_search`, `honcho_context`, `honcho_reasoning`, `honcho_conclude`) are no longer built-in. They are available via the Honcho memory provider plugin at `plugins/memory/honcho/`. See [Memory Providers](../user-guide/features/memory-providers.md) for installation and usage.
 :::

 ## `image_gen` toolset
--- a/website/docs/user-guide/features/honcho.md
+++ b/website/docs/user-guide/features/honcho.md
@ -18,12 +18,15 @@ Honcho is integrated into the [Memory Providers](./memory-providers.md) system.
 |-----------|----------------|--------|
 | Cross-session persistence | ✔ File-based MEMORY.md/USER.md | ✔ Server-side with API |
 | User profile | ✔ Manual agent curation | ✔ Automatic dialectic reasoning |
+| Session summary | — | ✔ Session-scoped context injection |
 | Multi-agent isolation | — | ✔ Per-peer profile separation |
 | Observation modes | — | ✔ Unified or directional observation |
 | Conclusions (derived insights) | — | ✔ Server-side reasoning about patterns |
 | Search across history | ✔ FTS5 session search | ✔ Semantic search over conclusions |

-**Dialectic reasoning**: After each conversation, Honcho analyzes the exchange and derives "conclusions" — insights about the user's preferences, habits, and goals. These conclusions accumulate over time, giving the agent a deepening understanding that goes beyond what the user explicitly stated.
+**Dialectic reasoning**: After each conversation turn (gated by `dialecticCadence`), Honcho analyzes the exchange and derives insights about the user's preferences, habits, and goals. These accumulate over time, giving the agent a deepening understanding that goes beyond what the user explicitly stated. The dialectic supports multi-pass depth (1–3 passes) with automatic cold/warm prompt selection — cold start queries focus on general user facts while warm queries prioritize session-scoped context.
+
+**Session-scoped context**: Base context now includes the session summary alongside the user representation and peer card. This gives the agent awareness of what has already been discussed in the current session, reducing repetition and enabling continuity.

 **Multi-agent profiles**: When multiple Hermes instances talk to the same user (e.g., a coding assistant and a personal assistant), Honcho maintains separate "peer" profiles. Each peer sees only its own observations and conclusions, preventing cross-contamination of context.

@ -42,40 +45,128 @@ memory:
 ```

 ```bash
-echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env
+echo "HONCHO_API_KEY=*** >> ~/.hermes/.env
 ```

 Get an API key at [honcho.dev](https://honcho.dev).

+## Architecture
+
+### Two-Layer Context Injection
+
+Every turn (in `hybrid` or `context` mode), Honcho assembles two layers of context injected into the system prompt:
+
+1. **Base context** — session summary, user representation, user peer card, AI self-representation, and AI identity card. Refreshed on `contextCadence`. This is the "who is this user" layer.
+2. **Dialectic supplement** — LLM-synthesized reasoning about the user's current state and needs. Refreshed on `dialecticCadence`. This is the "what matters right now" layer.
+
+Both layers are concatenated and truncated to the `contextTokens` budget (if set).
+
+### Cold/Warm Prompt Selection
+
+The dialectic automatically selects between two prompt strategies:
+
+- **Cold start** (no base context yet): General query — "Who is this person? What are their preferences, goals, and working style?"
+- **Warm session** (base context exists): Session-scoped query — "Given what's been discussed in this session so far, what context about this user is most relevant?"
+
+This happens automatically based on whether base context has been populated.
+
+### Three Orthogonal Config Knobs
+
+Cost and depth are controlled by three independent knobs:
+
+| Knob | Controls | Default |
+|------|----------|---------|
+| `contextCadence` | Turns between `context()` API calls (base layer refresh) | `1` |
+| `dialecticCadence` | Turns between `peer.chat()` LLM calls (dialectic layer refresh) | `3` |
+| `dialecticDepth` | Number of `.chat()` passes per dialectic invocation (1–3) | `1` |
+
+These are orthogonal — you can have frequent context refreshes with infrequent dialectic, or deep multi-pass dialectic at low frequency. Example: `contextCadence: 1, dialecticCadence: 5, dialecticDepth: 2` refreshes base context every turn, runs dialectic every 5 turns, and each dialectic run makes 2 passes.
+
+### Dialectic Depth (Multi-Pass)
+
+When `dialecticDepth` > 1, each dialectic invocation runs multiple `.chat()` passes:
+
+- **Pass 0**: Cold or warm prompt (see above)
+- **Pass 1**: Self-audit — identifies gaps in the initial assessment and synthesizes evidence from recent sessions
+- **Pass 2**: Reconciliation — checks for contradictions between prior passes and produces a final synthesis
+
+Each pass uses a proportional reasoning level (lighter early passes, base level for the main pass). Override per-pass levels with `dialecticDepthLevels` — e.g., `["minimal", "medium", "high"]` for a depth-3 run.
+
+Passes bail out early if the prior pass returned strong signal (long, structured output), so depth 3 doesn't always mean 3 LLM calls.
+
 ## Configuration Options

-```yaml
-# ~/.hermes/config.yaml
-honcho:
-  observation: directional    # "unified" (default for new installs) or "directional"
-  peer_name: ""               # auto-detected from platform, or set manually
-```
+Honcho is configured in `~/.honcho/config.json` (global) or `$HERMES_HOME/honcho.json` (profile-local). The setup wizard handles this for you.

-**Observation modes:**
- `unified` — All observations go into a single pool. Simpler, good for single-agent setups.
- `directional` — Observations are tagged with direction (user→agent, agent→user). Enables richer analysis of conversation dynamics.
+### Full Config Reference
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `contextTokens` | `null` (uncapped) | Token budget for auto-injected context per turn. Set to an integer (e.g. 1200) to cap. Truncates at word boundaries |
+| `contextCadence` | `1` | Minimum turns between `context()` API calls (base layer refresh) |
+| `dialecticCadence` | `3` | Minimum turns between `peer.chat()` LLM calls (dialectic layer). In `tools` mode, irrelevant — model calls explicitly |
+| `dialecticDepth` | `1` | Number of `.chat()` passes per dialectic invocation. Clamped to 1–3 |
+| `dialecticDepthLevels` | `null` | Optional array of reasoning levels per pass, e.g. `["minimal", "low", "medium"]`. Overrides proportional defaults |
+| `dialecticReasoningLevel` | `'low'` | Base reasoning level: `minimal`, `low`, `medium`, `high`, `max` |
+| `dialecticDynamic` | `true` | When `true`, model can override reasoning level per-call via tool param |
+| `dialecticMaxChars` | `600` | Max chars of dialectic result injected into system prompt |
+| `recallMode` | `'hybrid'` | `hybrid` (auto-inject + tools), `context` (inject only), `tools` (tools only) |
+| `writeFrequency` | `'async'` | When to flush messages: `async` (background thread), `turn` (sync), `session` (batch on end), or integer N |
+| `saveMessages` | `true` | Whether to persist messages to Honcho API |
+| `observationMode` | `'directional'` | `directional` (all on) or `unified` (shared pool). Override with `observation` object for granular control |
+| `messageMaxChars` | `25000` | Max chars per message sent via `add_messages()`. Chunked if exceeded |
+| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input to `peer.chat()` |
+| `sessionStrategy` | `'per-directory'` | `per-directory`, `per-repo`, `per-session`, or `global` |
+
+**Session strategy** controls how Honcho sessions map to your work:
+- `per-session` — each `hermes` run gets a fresh session. Clean starts, memory via tools. Recommended for new users.
+- `per-directory` — one Honcho session per working directory. Context accumulates across runs.
+- `per-repo` — one session per git repository.
+- `global` — single session across all directories.
+
+**Recall mode** controls how memory flows into conversations:
+- `hybrid` — context auto-injected into system prompt AND tools available (model decides when to query).
+- `context` — auto-injection only, tools hidden.
+- `tools` — tools only, no auto-injection. Agent must explicitly call `honcho_reasoning`, `honcho_search`, etc.
+
+**Settings per recall mode:**
+
+| Setting | `hybrid` | `context` | `tools` |
+|---------|----------|-----------|---------|
+| `writeFrequency` | flushes messages | flushes messages | flushes messages |
+| `contextCadence` | gates base context refresh | gates base context refresh | irrelevant — no injection |
+| `dialecticCadence` | gates auto LLM calls | gates auto LLM calls | irrelevant — model calls explicitly |
+| `dialecticDepth` | multi-pass per invocation | multi-pass per invocation | irrelevant — model calls explicitly |
+| `contextTokens` | caps injection | caps injection | irrelevant — no injection |
+| `dialecticDynamic` | gates model override | N/A (no tools) | gates model override |
+
+In `tools` mode, the model is fully in control — it calls `honcho_reasoning` when it wants, at whatever `reasoning_level` it picks. Cadence and budget settings only apply to modes with auto-injection (`hybrid` and `context`).

 ## Tools

-When Honcho is active as the memory provider, four additional tools become available:
+When Honcho is active as the memory provider, five tools become available:

 | Tool | Purpose |
 |------|---------|
-| `honcho_conclude` | Trigger server-side dialectic reasoning on recent conversations |
-| `honcho_context` | Retrieve relevant context from Honcho's memory for the current conversation |
-| `honcho_profile` | View or update the user's Honcho profile |
-| `honcho_search` | Semantic search across all stored conclusions and observations |
+| `honcho_profile` | Read or update peer card — pass `card` (list of facts) to update, omit to read |
+| `honcho_search` | Semantic search over context — raw excerpts, no LLM synthesis |
+| `honcho_context` | Full session context — summary, representation, card, recent messages |
+| `honcho_reasoning` | Synthesized answer from Honcho's LLM — pass `reasoning_level` (minimal/low/medium/high/max) to control depth |
+| `honcho_conclude` | Create or delete conclusions — pass `conclusion` to create, `delete_id` to remove (PII only) |

 ## CLI Commands

 ```bash
-hermes honcho status          # Show connection status and config
+hermes honcho status          # Connection status, config, and key settings
+hermes honcho setup           # Interactive setup wizard
+hermes honcho strategy        # Show or set session strategy
 hermes honcho peer            # Update peer names for multi-agent setups
+hermes honcho mode            # Show or set recall mode
+hermes honcho tokens          # Show or set context token budget
+hermes honcho identity        # Show Honcho peer identity
+hermes honcho sync            # Sync host blocks for all profiles
+hermes honcho enable          # Enable Honcho
+hermes honcho disable         # Disable Honcho
 ```

 ## Migrating from `hermes honcho`
--- a/website/docs/user-guide/features/memory-providers.md
+++ b/website/docs/user-guide/features/memory-providers.md
@ -42,7 +42,7 @@ The built-in memory (MEMORY.md / USER.md) continues to work exactly as before. T

 ### Honcho

-AI-native cross-session user modeling with dialectic Q&A, semantic search, and persistent conclusions.
+AI-native cross-session user modeling with dialectic reasoning, session-scoped context injection, semantic search, and persistent conclusions. Base context now includes the session summary alongside user representation and peer cards, giving the agent awareness of what has already been discussed.

 | | |
 |---|---|
@ -51,7 +51,15 @@ AI-native cross-session user modeling with dialectic Q&A, semantic search, and p
 | **Data storage** | Honcho Cloud or self-hosted |
 | **Cost** | Honcho pricing (cloud) / free (self-hosted) |

-**Tools:** `honcho_profile` (peer card), `honcho_search` (semantic search), `honcho_context` (LLM-synthesized), `honcho_conclude` (store facts)
+**Tools (5):** `honcho_profile` (read/update peer card), `honcho_search` (semantic search), `honcho_context` (session context — summary, representation, card, messages), `honcho_reasoning` (LLM-synthesized), `honcho_conclude` (create/delete conclusions)
+
+**Architecture:** Two-layer context injection — a base layer (session summary + representation + peer card, refreshed on `contextCadence`) plus a dialectic supplement (LLM reasoning, refreshed on `dialecticCadence`). The dialectic automatically selects cold-start prompts (general user facts) vs. warm prompts (session-scoped context) based on whether base context exists.
+
+**Three orthogonal config knobs** control cost and depth independently:
+
+- `contextCadence` — how often the base layer refreshes (API call frequency)
+- `dialecticCadence` — how often the dialectic LLM fires (LLM call frequency)
+- `dialecticDepth` — how many `.chat()` passes per dialectic invocation (1–3, depth of reasoning)

 **Setup Wizard:**
 ```bash
@ -63,7 +71,7 @@ hermes memory setup        # select "honcho"
 **Config:** `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.json` (global). Resolution order: `$HERMES_HOME/honcho.json` > `~/.hermes/honcho.json` > `~/.honcho/config.json`. See the [config reference](https://github.com/hermes-ai/hermes-agent/blob/main/plugins/memory/honcho/README.md) and the [Honcho integration guide](https://docs.honcho.dev/v3/guides/integrations/hermes).

 <details>
-<summary>Key config options</summary>
+<summary>Full config reference</summary>

 | Key | Default | Description |
 |-----|---------|-------------|
@ -72,13 +80,21 @@ hermes memory setup        # select "honcho"
 | `peerName` | -- | User peer identity |
 | `aiPeer` | host key | AI peer identity (one per profile) |
 | `workspace` | host key | Shared workspace ID |
-| `recallMode` | `hybrid` | `hybrid` (auto-inject + tools), `context` (inject only), `tools` (tools only) |
-| `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
-| `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
-| `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
-| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
-| `dialecticDynamic` | `true` | Auto-bump reasoning by query length |
+| `contextTokens` | `null` (uncapped) | Token budget for auto-injected context per turn. Truncates at word boundaries |
+| `contextCadence` | `1` | Minimum turns between `context()` API calls (base layer refresh) |
+| `dialecticCadence` | `3` | Minimum turns between `peer.chat()` LLM calls. Only applies to `hybrid`/`context` modes |
+| `dialecticDepth` | `1` | Number of `.chat()` passes per dialectic invocation. Clamped 1–3. Pass 0: cold/warm prompt, pass 1: self-audit, pass 2: reconciliation |
+| `dialecticDepthLevels` | `null` | Optional array of reasoning levels per pass, e.g. `["minimal", "low", "medium"]`. Overrides proportional defaults |
+| `dialecticReasoningLevel` | `'low'` | Base reasoning level: `minimal`, `low`, `medium`, `high`, `max` |
+| `dialecticDynamic` | `true` | When `true`, model can override reasoning level per-call via tool param |
+| `dialecticMaxChars` | `600` | Max chars of dialectic result injected into system prompt |
+| `recallMode` | `'hybrid'` | `hybrid` (auto-inject + tools), `context` (inject only), `tools` (tools only) |
+| `writeFrequency` | `'async'` | When to flush messages: `async` (background thread), `turn` (sync), `session` (batch on end), or integer N |
+| `saveMessages` | `true` | Whether to persist messages to Honcho API |
+| `observationMode` | `'directional'` | `directional` (all on) or `unified` (shared pool). Override with `observation` object |
 | `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
+| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input to `peer.chat()` |
+| `sessionStrategy` | `'per-directory'` | `per-directory`, `per-repo`, `per-session`, `global` |

 </details>

@ -165,7 +181,10 @@ This inherits settings from the default `hermes` host block and creates new AI p
      },
      "dialecticReasoningLevel": "low",
      "dialecticDynamic": true,
+      "dialecticCadence": 3,
+      "dialecticDepth": 1,
      "dialecticMaxChars": 600,
+      "contextCadence": 1,
      "messageMaxChars": 25000,
      "saveMessages": true
    },
@ -462,7 +481,7 @@ echo 'SUPERMEMORY_API_KEY=***' >> ~/.hermes/.env

 | Provider | Storage | Cost | Tools | Dependencies | Unique Feature |
 |----------|---------|------|-------|-------------|----------------|
-| **Honcho** | Cloud | Paid | 4 | `honcho-ai` | Dialectic user modeling |
+| **Honcho** | Cloud | Paid | 5 | `honcho-ai` | Dialectic user modeling + session-scoped context |
 | **OpenViking** | Self-hosted | Free | 5 | `openviking` + server | Filesystem hierarchy + tiered loading |
 | **Mem0** | Cloud | Paid | 3 | `mem0ai` | Server-side LLM extraction |
 | **Hindsight** | Cloud/Local | Free/Paid | 3 | `hindsight-client` | Knowledge graph + reflect synthesis |