The honcho_conclude tool schema used anyOf with nested required fields which is unsupported by Fireworks AI, MiniMax, and other providers that only handle basic JSON Schema. The handler already validates that conclusion or delete_id is present (line 1018-1020), so the schema constraint was redundant. Replace with required: [] and let the handler reject bad calls. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| cli.py | ||
| client.py | ||
| plugin.yaml | ||
| README.md | ||
| session.py | ||
Honcho Memory Provider
AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.
Honcho docs: https://docs.honcho.dev/v3/guides/integrations/hermes
Requirements
pip install honcho-ai- Honcho API key from app.honcho.dev, or a self-hosted instance
Setup
hermes honcho setup # full interactive wizard (cloud or local)
hermes memory setup # generic picker, also works
Or manually:
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
Architecture Overview
Two-Layer Context Injection
Context is injected into the user message at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in <memory-context> fences with a system note clarifying it's background data, not new user input.
Two independent layers, each on its own cadence:
Layer 1 — Base context (refreshed every contextCadence turns):
- SESSION SUMMARY — from
session.context(summary=True), placed first - User Representation — Honcho's evolving model of the user
- User Peer Card — key facts snapshot
- AI Self-Representation — Honcho's model of the AI peer
- AI Identity Card — AI peer facts
Layer 2 — Dialectic supplement (fired every dialecticCadence turns):
Multi-pass .chat() reasoning about the user, appended after base context.
Both layers are joined, then truncated to fit contextTokens budget via _truncate_to_budget (tokens × 4 chars, word-boundary safe).
Cold Start vs Warm Session Prompts
Dialectic pass 0 automatically selects its prompt based on session state:
- Cold (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
- Warm (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
Not configurable — determined automatically.
Dialectic Depth (Multi-Pass Reasoning)
dialecticDepth (1–3, clamped) controls how many .chat() calls fire per dialectic cycle:
| Depth | Passes | Behavior |
|---|---|---|
| 1 | single .chat() |
Base query only (cold or warm prompt) |
| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
Proportional Reasoning Levels
When dialecticDepthLevels is not set, each pass uses a proportional level relative to dialecticReasoningLevel (the "base"):
| Depth | Pass levels |
|---|---|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
Override with dialecticDepthLevels: an explicit array of reasoning level strings per pass.
Three Orthogonal Dialectic Knobs
| Knob | Controls | Type |
|---|---|---|
dialecticCadence |
How often — minimum turns between dialectic firings | int |
dialecticDepth |
How many — passes per firing (1–3) | int |
dialecticReasoningLevel |
How hard — reasoning ceiling per .chat() call |
string |
Input Sanitization
run_conversation strips leaked <memory-context> blocks from user input before processing. When saveMessages persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes <memory-context> blocks plus associated system notes.
Tools
Five bidirectional tools. All accept an optional peer parameter ("user" or "ai", default "user").
| Tool | LLM call? | Description |
|---|---|---|
honcho_profile |
No | Peer card — key facts snapshot |
honcho_search |
No | Semantic search over stored context (800 tok default, 2000 max) |
honcho_context |
No | Full session context: summary, representation, card, messages |
honcho_reasoning |
Yes | LLM-synthesized answer via dialectic .chat() |
honcho_conclude |
No | Write a persistent fact/conclusion about the user |
Tool visibility depends on recallMode: hidden in context mode, always present in tools and hybrid.
Config Resolution
Config is read from the first file that exists:
| Priority | Path | Scope |
|---|---|---|
| 1 | $HERMES_HOME/honcho.json |
Profile-local (isolated Hermes instances) |
| 2 | ~/.hermes/honcho.json |
Default profile (shared host blocks) |
| 3 | ~/.honcho/config.json |
Global (cross-app interop) |
Host key is derived from the active Hermes profile: hermes (default) or hermes.<profile>.
For every key, resolution order is: host block > root > env var > default.
Full Configuration Reference
Identity & Connection
| Key | Type | Default | Description |
|---|---|---|---|
apiKey |
string | — | API key. Falls back to HONCHO_API_KEY env var |
baseUrl |
string | — | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
environment |
string | "production" |
SDK environment mapping |
enabled |
bool | auto | Master toggle. Auto-enables when apiKey or baseUrl present |
workspace |
string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
peerName |
string | — | User peer identity |
aiPeer |
string | host key | AI peer identity |
Memory & Recall
| Key | Type | Default | Description |
|---|---|---|---|
recallMode |
string | "hybrid" |
"hybrid" (auto-inject + tools), "context" (auto-inject only, tools hidden), "tools" (tools only, no injection). Legacy "auto" → "hybrid" |
observationMode |
string | "directional" |
Preset: "directional" (all on) or "unified" (shared pool). Use observation object for granular control |
observation |
object | — | Per-peer observation config (see Observation section) |
Write Behavior
| Key | Type | Default | Description |
|---|---|---|---|
writeFrequency |
string/int | "async" |
"async" (background), "turn" (sync per turn), "session" (batch on end), or integer N (every N turns) |
saveMessages |
bool | true |
Persist messages to Honcho API |
Session Resolution
| Key | Type | Default | Description |
|---|---|---|---|
sessionStrategy |
string | "per-directory" |
"per-directory", "per-session", "per-repo" (git root), "global" |
sessionPeerPrefix |
bool | false |
Prepend peer name to session keys |
sessions |
object | {} |
Manual directory-to-session-name mappings |
Session Name Resolution
The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
| Priority | Source | Example session name |
|---|---|---|
| 1 | Manual map (sessions config) |
"myproject-main" |
| 2 | /title command (mid-session rename) |
"refactor-auth" |
| 3 | Gateway session key (Telegram, Discord, etc.) | "agent-main-telegram-dm-8439114563" |
| 4 | per-session strategy |
Hermes session ID (20260415_a3f2b1) |
| 5 | per-repo strategy |
Git root directory name (hermes-agent) |
| 6 | per-directory strategy |
Current directory basename (src) |
| 7 | global strategy |
Workspace name (hermes) |
Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of sessionStrategy. The strategy setting only affects CLI sessions.
If sessionPeerPrefix is true, the peer name is prepended: eri-hermes-agent.
What each strategy produces
per-directory— basename of$PWD. Opening hermes in~/code/myappand~/code/othergives two separate sessions. Same directory = same session across runs.per-repo— git root directory name. All subdirectories within a repo share one session. Falls back toper-directoryif not inside a git repo.per-session— Hermes session ID (timestamp + hex). Everyhermesinvocation starts a fresh Honcho session. Falls back toper-directoryif no session ID is available.global— workspace name. One session for everything. Memory accumulates across all directories and runs.
Multi-Profile Pattern
Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is host block > root > env var > default — host blocks inherit from root, so shared settings only need to be declared once:
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "yourname",
"hosts": {
"hermes": {
"aiPeer": "hermes",
"recallMode": "hybrid",
"sessionStrategy": "per-directory"
},
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"sessionStrategy": "per-repo"
}
}
}
Both profiles see the same user (yourname) in the same shared environment (hermes), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
Host key is derived from the active Hermes profile: hermes (default) or hermes.<profile> (e.g. hermes -p coder → host key hermes.coder).
Dialectic & Reasoning
| Key | Type | Default | Description |
|---|---|---|---|
dialecticDepth |
int | 1 |
Passes per dialectic cycle (1–3, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
dialecticDepthLevels |
array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: ["minimal", "low", "medium"] |
dialecticReasoningLevel |
string | "low" |
Base reasoning level for .chat(): "minimal", "low", "medium", "high", "max" |
dialecticDynamic |
bool | true |
When true, model can override reasoning level per-call via honcho_reasoning tool. When false, always uses dialecticReasoningLevel |
dialecticMaxChars |
int | 600 |
Max chars of dialectic result injected into system prompt |
dialecticMaxInputChars |
int | 10000 |
Max chars for dialectic query input to .chat(). Honcho cloud limit: 10k |
Token Budgets
| Key | Type | Default | Description |
|---|---|---|---|
contextTokens |
int | SDK default | Token budget for context() API calls. Also gates prefetch truncation (tokens × 4 chars) |
messageMaxChars |
int | 25000 |
Max chars per message sent via add_messages(). Exceeding this triggers chunking with [continued] markers. Honcho cloud limit: 25k |
Cadence (Cost Control)
| Key | Type | Default | Description |
|---|---|---|---|
contextCadence |
int | 1 |
Minimum turns between base context refreshes (session summary + representation + card) |
dialecticCadence |
int | 1 |
Minimum turns between dialectic .chat() firings |
injectionFrequency |
string | "every-turn" |
"every-turn" or "first-turn" (inject context on the first user message only, skip from turn 2 onward) |
reasoningLevelCap |
string | — | Hard cap on reasoning level: "minimal", "low", "medium", "high" |
Observation (Granular)
Maps 1:1 to Honcho's per-peer SessionPeerConfig. When present, overrides observationMode preset.
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": true, "observeOthers": true }
}
| Field | Default | Description |
|---|---|---|
user.observeMe |
true |
User peer self-observation (Honcho builds user representation) |
user.observeOthers |
true |
User peer observes AI messages |
ai.observeMe |
true |
AI peer self-observation (Honcho builds AI representation) |
ai.observeOthers |
true |
AI peer observes user messages (enables cross-peer dialectic) |
Presets:
"directional"(default): all fourtrue"unified": userobserveMe=true, AIobserveOthers=true, restfalse
Hardcoded Limits
| Limit | Value |
|---|---|
| Search tool max tokens | 2000 (hard cap), 800 (default) |
| Peer card fetch tokens | 200 |
Environment Variables
| Variable | Fallback for |
|---|---|
HONCHO_API_KEY |
apiKey |
HONCHO_BASE_URL |
baseUrl |
HONCHO_ENVIRONMENT |
environment |
HERMES_HONCHO_HOST |
Host key override |
CLI Commands
| Command | Description |
|---|---|
hermes honcho setup |
Full interactive setup wizard |
hermes honcho status |
Show resolved config for active profile |
hermes honcho enable / disable |
Toggle Honcho for active profile |
hermes honcho mode <mode> |
Change recall or observation mode |
hermes honcho peer --user <name> |
Update user peer name |
hermes honcho peer --ai <name> |
Update AI peer name |
hermes honcho tokens --context <N> |
Set context token budget |
hermes honcho tokens --dialectic <N> |
Set dialectic max chars |
hermes honcho map <name> |
Map current directory to a session name |
hermes honcho sync |
Create host blocks for all Hermes profiles |
Example Config
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "username",
"contextCadence": 2,
"dialecticCadence": 3,
"dialecticDepth": 2,
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"recallMode": "hybrid",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": true, "observeOthers": true }
},
"writeFrequency": "async",
"sessionStrategy": "per-directory",
"dialecticReasoningLevel": "low",
"dialecticDepth": 2,
"dialecticMaxChars": 600,
"saveMessages": true
},
"hermes.coder": {
"enabled": true,
"aiPeer": "coder",
"sessionStrategy": "per-repo",
"dialecticDepth": 1,
"dialecticDepthLevels": ["low"],
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
}
}
},
"sessions": {
"/home/user/myproject": "myproject-main"
}
}