* feat(memory): OAuth token storage and refresh for the Honcho provider * feat(memory): refresh the Honcho OAuth token in the client and session * feat(memory): zero-CLI loopback OAuth authorization flow * feat(memory): generic memory-provider OAuth connect endpoints * feat(desktop): memory-provider OAuth connect link * feat(memory): CLI OAuth sign-in with source-tagged authorize links * fix(memory): IP-literal loopback redirect and consent config_path on the authorize link * fix(memory): profile-scope the memory-provider OAuth endpoints * refactor(desktop): generic memory-provider OAuth client functions * docs(memory): trim OAuth module docstrings to the invariants * docs(memory): document OAuth connect as an optional auth method * fix(memory): send home-relative display path to consent, not the absolute path * perf(memory): cache OAuth token expiry in memory to skip the hot-path disk read * fix(memory): log OAuth refresh failures at warning, not debug * feat(memory): fall back to an OS-assigned loopback port when 8765 is taken * test(memory): cover the desktop Connect launcher, status, and provider dispatch * fix(desktop): keep the memory-provider dropdown one size regardless of connect state * fix(desktop): move the memory connect link to the description line, leaving the dropdown untouched * refactor(memory): move OAuth connect routes out of web_server into a memory-layer router * refactor(desktop): import MemoryConnect directly, drop the single-export barrel * fix(memory): launch CLI OAuth sign-in right after the auth choice, not after the wizard * fix(desktop): auto-clear the OAuth error state instead of leaving it sticky * test(honcho): isolate auth-method prompt from deployment-shape wizard tests main's wizard suite scripts the cloud prompts without the OAuth auth-method step; auto-answer it in the shared helper so the answer lists stay shape-only. * docs(honcho): document query-adaptive reasoning level (reasoningHeuristic) README never mentioned reasoningHeuristic and listed reasoningLevelCap as an orphaned cap with the wrong default (— vs "high"). Add the query-adaptive scaling note + the reasoningHeuristic/reasoningLevelCap rows (grouped under Dialectic & Reasoning), matching the wording already on the hosted honcho.md page, and add a pointer from the memory-providers overview. * fix(honcho): default the CLI peer prompt to the OAuth consent name The CLI runs the grant with apply_config=False, so the peerName the user just entered at consent was dropped and the wizard's 'Your name' prompt fell back to $USER. Surface it as a transient OAuthCredential.consent_peer_name (set even when config isn't merged) and seed the prompt default from it. * feat(honcho): split OAuth client_id by surface (cli=hermes-agent, desktop=hermes-desktop) resolve_endpoints now picks the client_id from the initiating surface and threads it through authorize -> token exchange -> persisted grant -> refresh, so the CLI and desktop register as distinct OAuth clients. Surface-specific env overrides (HONCHO_OAUTH_CLIENT_ID_CLI/_DESKTOP) win over the generic HONCHO_OAUTH_CLIENT_ID, which still overrides every surface. * feat(honcho): show OAuth vs API key in status; detect existing OAuth in setup status now prints 'Auth: OAuth (clientId, token valid Xm/expired)' instead of masking the OAuth access token as a generic API key; setup notes an existing OAuth grant when re-run. * docs(honcho): drop 'shared pool' wording from unified observation mode help * fix(honcho): cross-process lock around OAuth refresh to prevent grant revocation The in-process threading lock can't stop a sibling process (another profile or the desktop app sharing honcho.json) from replaying the single-use refresh token and tripping reuse-detection, which revokes the whole grant. Guard the read-refresh-persist section with an OS file lock on <config>.lock so only one process rotates at a time; the others re-read the freshly-persisted token. Best-effort: platforms without flock degrade to in-process serialization. * refactor(honcho): one OAuth client (hermes-agent) for all surfaces Collapse the per-surface client_id split. CLI and desktop now use a single client_id (hermes-agent); consent branding/UI still adapt via the source query param. One grant identity means no clientId-vs-refresh-token desync that could get the grant revoked. HONCHO_OAUTH_CLIENT_ID still overrides for self-hosting. * fix(honcho): per-session resolves to session_id, never remapped by title Reorder resolve_session_name so stable identifiers win over labels: gateway per-chat key first, then the per-session session_id, then the cwd map / title. A (possibly auto-generated) title can no longer remap a live per-session conversation onto a second Honcho session mid-stream — fixes the desktop, which is per-conversation via session_id. Consequence: a gateway's per-chat key now also wins over a title (titles never remap a stable id).
21 KiB
Honcho Memory Provider
AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.
Honcho docs: https://docs.honcho.dev/v3/guides/integrations/hermes
Requirements
pip install honcho-ai- A Honcho Cloud account — connect via OAuth sign-in or an API key from app.honcho.dev — or a self-hosted instance
Setup
hermes memory setup honcho # configure Honcho directly (works on a fresh install)
hermes memory setup # generic picker, choose Honcho from the list
For cloud, the wizard asks OAuth or API key. OAuth opens a browser sign-in and stores the grant itself — nothing to copy; tokens refresh automatically. The desktop app offers the same flow as a Connect link next to the memory-provider dropdown.
Or manually:
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
hermes honcho setupalso works, but only after Honcho is the active memory provider — thehonchosubcommand is registered for the active provider only. On a fresh install, usehermes memory setup honcho.
Architecture Overview
Two-Layer Context Injection
Context is injected into the user message at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in <memory-context> fences with a system note clarifying it's background data, not new user input.
Two independent layers, each on its own cadence:
Layer 1 — Base context (refreshed every contextCadence turns):
- SESSION SUMMARY — from
session.context(summary=True), placed first - User Representation — Honcho's evolving model of the user
- User Peer Card — key facts snapshot
- AI Self-Representation — Honcho's model of the AI peer
- AI Identity Card — AI peer facts
Layer 2 — Dialectic supplement (fired every dialecticCadence turns):
Multi-pass .chat() reasoning about the user, appended after base context.
Both layers are joined, then truncated to fit contextTokens budget via _truncate_to_budget (tokens × 4 chars, word-boundary safe).
Cold Start vs Warm Session Prompts
Dialectic pass 0 automatically selects its prompt based on session state:
- Cold (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
- Warm (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
Not configurable — determined automatically.
Dialectic Depth (Multi-Pass Reasoning)
dialecticDepth (1–3, clamped) controls how many .chat() calls fire per dialectic cycle:
| Depth | Passes | Behavior |
|---|---|---|
| 1 | single .chat() |
Base query only (cold or warm prompt) |
| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
Proportional Reasoning Levels
When dialecticDepthLevels is not set, each pass uses a proportional level relative to dialecticReasoningLevel (the "base"):
| Depth | Pass levels |
|---|---|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
Override with dialecticDepthLevels: an explicit array of reasoning level strings per pass.
Query-Adaptive Reasoning Level
The auto-injected dialectic scales dialecticReasoningLevel by query length: +1 level at ≥120 chars, +2 at ≥400, clamped at reasoningLevelCap (default "high"). Disable with reasoningHeuristic: false to pin every auto call to dialecticReasoningLevel.
Three Orthogonal Dialectic Knobs
| Knob | Controls | Type |
|---|---|---|
dialecticCadence |
How often — minimum turns between dialectic firings | int |
dialecticDepth |
How many — passes per firing (1–3) | int |
dialecticReasoningLevel |
How hard — reasoning ceiling per .chat() call |
string |
Input Sanitization
run_conversation strips leaked <memory-context> blocks from user input before processing. When saveMessages persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes <memory-context> blocks plus associated system notes.
Tools
Five bidirectional tools. All accept an optional peer parameter ("user" or "ai", default "user").
| Tool | LLM call? | Description |
|---|---|---|
honcho_profile |
No | Peer card — key facts snapshot |
honcho_search |
No | Semantic search over stored context (800 tok default, 2000 max) |
honcho_context |
No | Full session context: summary, representation, card, messages |
honcho_reasoning |
Yes | LLM-synthesized answer via dialectic .chat() |
honcho_conclude |
No | Write a persistent fact/conclusion about the user |
Tool visibility depends on recallMode: hidden in context mode, always present in tools and hybrid.
Config Resolution
Config is read from the first file that exists:
| Priority | Path | Scope |
|---|---|---|
| 1 | $HERMES_HOME/honcho.json |
Profile-local (isolated Hermes instances) |
| 2 | ~/.hermes/honcho.json |
Default profile (shared host blocks) |
| 3 | ~/.honcho/config.json |
Global (cross-app interop) |
Host key is derived from the active Hermes profile: hermes (default) or hermes_<profile>.
For every key, resolution order is: host block > root > env var > default.
Full Configuration Reference
Identity & Connection
| Key | Type | Default | Description |
|---|---|---|---|
apiKey |
string | — | API key. Falls back to HONCHO_API_KEY env var. When connected via OAuth, holds the auto-refreshing access token instead |
oauth |
object | — | OAuth grant (refresh token, expiry, client, token endpoint). Written by the Connect/sign-in flows and rotated automatically — not hand-edited. Optional: an API key alone works without it |
baseUrl |
string | — | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
environment |
string | "production" |
SDK environment mapping |
enabled |
bool | auto | Master toggle. Auto-enables when apiKey or baseUrl present |
workspace |
string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
peerName |
string | — | User peer identity |
aiPeer |
string | host key | AI peer identity |
Identity Mapping (Gateway Multi-User)
In gateway deployments (Telegram, Discord, Slack, etc.) each user arrives with a platform-native runtime ID (Telegram UID, Discord snowflake, Slack user). These three keys control how those runtime IDs map to Honcho peers. The resolver is config-driven and deterministic — no automatic merging or runtime inference.
| Key | Type | Default | Description |
|---|---|---|---|
pinUserPeer |
bool | false |
When true, every gateway runtime user collapses to peerName. Single-operator deployments where you want all your platforms (and any other users) to share one peer |
userPeerAliases |
object | {} |
Map of runtime IDs to peer IDs ({"7654321": "alice"}). Many-to-one is the intended pattern — alias all your runtime IDs to one peer name. One-to-many is not supported; one runtime ID resolves to exactly one peer |
runtimePeerPrefix |
string | "" |
Prepended to unknown runtime IDs to namespace them (e.g. "telegram_" → telegram_7654321). Used only when no alias matches. Prevents collisions between platforms whose runtime IDs share the same shape |
Deprecated:
pinPeerNameis a legacy alias forpinUserPeer, still read for back-compat (pinUserPeerwins where both are set).hermes honcho setupmigrates it ontopinUserPeeron touch and never writes it.
Resolver ladder (first match wins):
1. pinUserPeer / pinPeerName=true → return peerName (ignore runtime ID)
2. userPeerAliases[runtime_id] → return aliased peer
3. userPeerAliases[runtime_id_alt] → check alt-ID too (Telegram UID + username, etc.)
4. runtimePeerPrefix + runtime_id → namespaced peer, with sha256 collision escalation
5. raw sanitized runtime_id → fallback peer
6. peerName → no runtime ID at all (CLI/TUI)
7. session-key fallback → no config either
Why no pinAiPeer? The AI peer is already pinned by construction — aiPeer is the only AI-side identity setting and the resolver never overrides it. Only the user-side peer has the runtime-vs-config tension that pinUserPeer resolves.
Host vs root semantics. All three keys are accepted at both root and hosts.<host> levels. Host-level wins. For maps and prefixes, host-level replaces the root value as a whole (not merge), so a host can intentionally own its identity universe or wipe it with userPeerAliases: {} / runtimePeerPrefix: "".
Setup — gateway identity tree. hermes honcho setup only asks about identity mapping when it detects a connected gateway platform (it inspects the gateway config; off-gateway the step is skipped because these keys do nothing without a runtime user ID). When it runs, it asks who talks to this gateway? and derives the keys:
- just me →
pinUserPeer: true. Every non-agent gateway user collapses topeerName; the pin overrides all aliases, so pick this only when no user-side identity needs its own peer. Personal use where you connect Hermes to your own Telegram/Discord/etc. If separate agents reach the gateway and each needs a distinct peer, do not pin — leavepinUserPeer: falseand map them viauserPeerAliases(the[e]editor). - me + other people, pooled →
pinUserPeer: false+userPeerAliasesmapping your runtime IDs topeerName. You stay on the shared history; everyone else gets their own peer. - me + other people / only other people →
pinUserPeer: false, optionalruntimePeerPrefix. Each runtime user → own peer. For bots serving many humans.
Pick [e] at the prompt to set the three keys directly instead of going through the tree.
Un-pinning (single → per-user). Flipping pinUserPeer from true to false does not migrate data. Memory accumulated under peerName while pinned stays there; runtime users now resolve to fresh, empty peers. To preserve your own continuity, choose the pooled path — alias your runtime IDs back to peerName so your turns keep landing on the pooled history while other users get their own peers. The wizard offers this steer automatically when it detects you're un-pinning a previously pinned profile.
Memory & Recall
| Key | Type | Default | Description |
|---|---|---|---|
recallMode |
string | "hybrid" |
"hybrid" (auto-inject + tools), "context" (auto-inject only, tools hidden), "tools" (tools only, no injection). Legacy "auto" → "hybrid" |
observationMode |
string | "directional" |
Preset: "directional" (all on) or "unified" (user observes self, AI observes others). Use observation object for granular control |
observation |
object | — | Per-peer observation config (see Observation section) |
Write Behavior
| Key | Type | Default | Description |
|---|---|---|---|
writeFrequency |
string/int | "async" |
"async" (background), "turn" (sync per turn), "session" (batch on end), or integer N (every N turns) |
saveMessages |
bool | true |
Persist messages to Honcho API |
Session Resolution
| Key | Type | Default | Description |
|---|---|---|---|
sessionStrategy |
string | "per-directory" |
"per-directory", "per-session", "per-repo" (git root), "global" |
sessionPeerPrefix |
bool | false |
Prepend peer name to session keys |
sessions |
object | {} |
Manual directory-to-session-name mappings |
Session Name Resolution
The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
| Priority | Source | Example session name |
|---|---|---|
| 1 | Manual map (sessions config) |
"myproject-main" |
| 2 | /title command (mid-session rename) |
"refactor-auth" |
| 3 | Gateway session key (Telegram, Discord, etc.) | "agent-main-telegram-dm-8439114563" |
| 4 | per-session strategy |
Hermes session ID (20260415_a3f2b1) |
| 5 | per-repo strategy |
Git root directory name (hermes-agent) |
| 6 | per-directory strategy |
Current directory basename (src) |
| 7 | global strategy |
Workspace name (hermes) |
Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of sessionStrategy. The strategy setting only affects CLI sessions.
If sessionPeerPrefix is true, the peer name is prepended: alice-hermes-agent.
What each strategy produces
per-directory— basename of$PWD. Opening hermes in~/code/myappand~/code/othergives two separate sessions. Same directory = same session across runs.per-repo— git root directory name. All subdirectories within a repo share one session. Falls back toper-directoryif not inside a git repo.per-session— Hermes session ID (timestamp + hex). Everyhermesinvocation starts a fresh Honcho session. Falls back toper-directoryif no session ID is available.global— workspace name. One session for everything. Memory accumulates across all directories and runs.
Multi-Profile Pattern
Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is host block > root > env var > default — host blocks inherit from root, so shared settings only need to be declared once:
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "yourname",
"hosts": {
"hermes": {
"aiPeer": "hermes",
"recallMode": "hybrid",
"sessionStrategy": "per-directory"
},
"hermes_coder": {
"aiPeer": "coder",
"recallMode": "tools",
"sessionStrategy": "per-repo"
}
}
}
Both profiles see the same user (yourname) in the same shared environment (hermes), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
Host key is derived from the active Hermes profile: hermes (default) or hermes_<profile> (e.g. hermes -p coder -> host key hermes_coder). Older hermes.<profile> host blocks are still read for compatibility and are migrated when the CLI writes profile-scoped Honcho config.
Dialectic & Reasoning
| Key | Type | Default | Description |
|---|---|---|---|
dialecticDepth |
int | 1 |
Passes per dialectic cycle (1–3, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
dialecticDepthLevels |
array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: ["minimal", "low", "medium"] |
dialecticReasoningLevel |
string | "low" |
Base reasoning level for .chat(): "minimal", "low", "medium", "high", "max" |
dialecticDynamic |
bool | true |
When true, model can override reasoning level per-call via honcho_reasoning tool. When false, always uses dialecticReasoningLevel |
dialecticMaxChars |
int | 600 |
Max chars of dialectic result injected into system prompt |
dialecticMaxInputChars |
int | 10000 |
Max chars for dialectic query input to .chat(). Honcho cloud limit: 10k |
reasoningHeuristic |
bool | true |
Query-adaptive: auto-scale the auto-injected dialectic's level up by query length (+1 at ≥120 chars, +2 at ≥400), clamped at reasoningLevelCap. false pins every auto call to dialecticReasoningLevel |
reasoningLevelCap |
string | "high" |
Ceiling for reasoningHeuristic scaling: "minimal", "low", "medium", "high", "max" |
Token Budgets
| Key | Type | Default | Description |
|---|---|---|---|
contextTokens |
int | SDK default | Token budget for context() API calls. Also gates prefetch truncation (tokens × 4 chars) |
messageMaxChars |
int | 25000 |
Max chars per message sent via add_messages(). Exceeding this triggers chunking with [continued] markers. Honcho cloud limit: 25k |
Cadence (Cost Control)
| Key | Type | Default | Description |
|---|---|---|---|
contextCadence |
int | 1 |
Minimum turns between base context refreshes (session summary + representation + card) |
dialecticCadence |
int | 1 |
Minimum turns between dialectic .chat() firings |
injectionFrequency |
string | "every-turn" |
"every-turn" or "first-turn" (inject context on the first user message only, skip from turn 2 onward) |
Observation (Granular)
Maps 1:1 to Honcho's per-peer SessionPeerConfig. When present, overrides observationMode preset.
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": true, "observeOthers": true }
}
| Field | Default | Description |
|---|---|---|
user.observeMe |
true |
User peer self-observation (Honcho builds user representation) |
user.observeOthers |
true |
User peer observes AI messages |
ai.observeMe |
true |
AI peer self-observation (Honcho builds AI representation) |
ai.observeOthers |
true |
AI peer observes user messages (enables cross-peer dialectic) |
Presets:
"directional"(default): all fourtrue"unified": userobserveMe=true, AIobserveOthers=true, restfalse
Hardcoded Limits
| Limit | Value |
|---|---|
| Search tool max tokens | 2000 (hard cap), 800 (default) |
| Peer card fetch tokens | 200 |
Environment Variables
| Variable | Fallback for |
|---|---|
HONCHO_API_KEY |
apiKey |
HONCHO_BASE_URL |
baseUrl |
HONCHO_ENVIRONMENT |
environment |
HERMES_HONCHO_HOST |
Host key override |
HONCHO_OAUTH_DASHBOARD |
OAuth authorize origin (default: cloud dashboard; local-dev localhost:3000) |
HONCHO_OAUTH_AUTHORIZE_URL |
Full authorize URL (overrides the dashboard origin) |
HONCHO_OAUTH_TOKEN_URL |
Token endpoint (default: cloud API; local-dev localhost:8000) |
HONCHO_OAUTH_CLIENT_ID |
OAuth client (default hermes-agent) |
HONCHO_OAUTH_SCOPE |
Requested scope (default write) |
CLI Commands
| Command | Description |
|---|---|
hermes memory setup honcho |
Configure Honcho directly — works on a fresh install |
hermes honcho setup |
Interactive setup wizard (only registered once Honcho is the active provider; redirects to hermes memory setup) |
hermes honcho status |
Show resolved config for active profile |
hermes honcho enable / disable |
Toggle Honcho for active profile |
hermes honcho mode <mode> |
Change recall or observation mode |
hermes honcho peer --user <name> |
Update user peer name |
hermes honcho peer --ai <name> |
Update AI peer name |
hermes honcho tokens --context <N> |
Set context token budget |
hermes honcho tokens --dialectic <N> |
Set dialectic max chars |
hermes honcho map <name> |
Map current directory to a session name |
hermes honcho sync |
Create host blocks for all Hermes profiles |
Example Config
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "username",
"contextCadence": 2,
"dialecticCadence": 3,
"dialecticDepth": 2,
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"recallMode": "hybrid",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": true, "observeOthers": true }
},
"writeFrequency": "async",
"sessionStrategy": "per-directory",
"dialecticReasoningLevel": "low",
"dialecticDepth": 2,
"dialecticMaxChars": 600,
"saveMessages": true
},
"hermes_coder": {
"enabled": true,
"aiPeer": "coder",
"sessionStrategy": "per-repo",
"dialecticDepth": 1,
"dialecticDepthLevels": ["low"],
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
}
}
},
"sessions": {
"/home/user/myproject": "myproject-main"
}
}