feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation

Context Injection Overhaul:
- Base layer: peer.context() (representation + card) cached with 5-minute TTL
- Dialectic supplement: cadence-gated, cached until next refresh
- Trivial prompt skip: short inputs/slash commands skip injection
- New peer guard: dialectic skipped at session start when peer has no context
- Targeted warm prompt for better dialectic quality

Tool Surface (5 bidirectional tools):
- honcho_profile: read or update peer card
- honcho_search: semantic search over context
- honcho_context: full session context (summary, representation, card, messages)
- honcho_reasoning: synthesized answer, reasoning_level param
- honcho_conclude: create or delete conclusions (PII removal)

Cost Safety:
- dialectic_cadence defaults to 3 (~66% fewer LLM calls)
- context_tokens defaults to uncapped (cap opt-in via config/wizard)
- on_turn_start hook wired up (fixes broken cadence/injection gating)

Correctness:
- Explicit target= on peer context/card fetches (fixes identity blur)
- honcho_search perspective fix under directional observation
- Timeout config plumbing
- peerName precedence over gateway user_id
- skip_memory on temp agents (orphan session prevention)
- gateway_session_key for stable per-chat session continuity
- initOnSessionStart for eager tools-mode init
- get_session_context fallback respects peer param
- mid -> medium in reasoning level validation

ABC changes (minimal, honcho-only):
- run_agent.py: gateway_session_key param + memory provider wiring (+5 lines)
- gateway/run.py: skip_memory on 2 temp agents, gateway_session_key on main agent (+3 lines)
- agent/memory_manager.py: sanitize regex for context tag variants (+9 lines)
This commit is contained in:
Erosika 2026-04-14 18:07:19 -04:00
parent 95d11dfd8e
commit 11b4c9ecf9
16 changed files with 1283 additions and 331 deletions

View file

@ -125,7 +125,7 @@ Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced ba
|-----|------|---------|-------|-------------|
| `contextTokens` | int | SDK default | root / host | Token budget for `context()` API calls. Also gates prefetch truncation (tokens x 4 chars) |
| `dialecticReasoningLevel` | string | `"low"` | root / host | Base reasoning level for `peer.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
| `dialecticDynamic` | bool | `true` | root / host | Auto-bump reasoning based on query length: `<120` chars = base level, `120-400` = +1, `>400` = +2 (capped at `"high"`). Set `false` to always use `dialecticReasoningLevel` as-is |
| `dialecticDynamic` | bool | `true` | root / host | When `true`, the model can override reasoning level per-call via the `honcho_reasoning` tool `reasoning_level` param (agentic). When `false`, always uses `dialecticReasoningLevel` and ignores model overrides |
| `dialecticMaxChars` | int | `600` | root / host | Max chars of dialectic result injected into system prompt |
| `dialecticMaxInputChars` | int | `10000` | root / host | Max chars for dialectic query input to `peer.chat()`. Honcho cloud limit: 10k |
| `messageMaxChars` | int | `25000` | root / host | Max chars per message sent via `add_messages()`. Messages exceeding this are chunked with `[continued]` markers. Honcho cloud limit: 25k |
@ -139,7 +139,7 @@ These are read from the root config object, not the host block. Must be set manu
| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context only on turn 0) |
| `contextCadence` | int | `1` | Minimum turns between `context()` API calls |
| `dialecticCadence` | int | `1` | Minimum turns between `peer.chat()` API calls |
| `reasoningLevelCap` | string | -- | Hard cap on auto-bumped reasoning: `"minimal"`, `"low"`, `"mid"`, `"high"` |
| `reasoningLevelCap` | string | -- | Hard cap on reasoning level: `"minimal"`, `"low"`, `"medium"`, `"high"` |
### Hardcoded Limits (Not Configurable)