hermes-agent/plugins/memory/honcho
Teknium c1eb2dcda7
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback

Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.

# What this PR makes true

1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
   detection banner with copy-pasteable remediation steps the moment
   they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
   a fresh install to 'core only' — the installer keeps every other
   extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
   lazy-install on first use under a strict allowlist, instead of
   eagerly pulling everything at install time.

# Detection: hermes_cli/security_advisories.py

- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
  mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
  dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
  the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
  re-banner after ack.
- Wired into:
  * hermes doctor — runs first, prints full remediation block
  * hermes doctor --ack <id> — dismisses an advisory
  * cli.py interactive run() and single-query branches — short
    stderr banner pointing at hermes doctor
  * gateway/run.py startup — operator-visible warning in gateway.log

# Lazy-install framework: tools/lazy_deps.py

- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
  memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
  uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
  pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
  HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
  * tools/tts_tool.py — _import_elevenlabs() calls ensure first
  * plugins/memory/honcho/client.py — get_honcho_client lazy-installs
  * tts.mistral / stt.mistral entries pre-registered for when PyPI
    restores mistralai

# Installer fallback tiers

scripts/install.sh, scripts/install.ps1, setup-hermes.sh:

- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
  array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
  PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
  landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
  the same _BROKEN_EXTRAS array so updates stay in sync.

Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).

# Config

hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: []  (advisory IDs the user has dismissed)
- allow_lazy_installs: True  (security gate for ensure())

No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.

# Tests

tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
  gateway_log_message
- shipped catalog well-formedness invariant

tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
  every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
  pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command

Combined: 63 new tests, all passing under scripts/run_tests.sh.

# Validation

- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
  tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
  tests/hermes_cli/test_doctor_command_install.py
  tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
  tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
  9191 passed, 8 pre-existing failures (verified on origin/main
  before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
  + gateway_log_message with mocked installed version → produces
  copy-pasteable remediation output

# Community

Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md

Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md

Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>

* build(deps): pin every direct dep to ==X.Y.Z (no ranges)

Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.

Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.

What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.

Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.

Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.

mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.

LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.

Validation:

- Cross-checked all 77 pinned direct deps in pyproject.toml against
  uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
  → 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.

* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra

You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.

# What this commit fixes

1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
   uv.lock records SHA256 hashes for every transitive — a compromised
   package with a different hash gets REJECTED. Falls through to the
   existing `uv pip install` cascade if the lockfile is missing or
   stale, with a loud warning that the fallback path does NOT
   hash-verify transitives. Previously only `setup-hermes.sh` (the dev
   path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
   (the paths fresh users actually run) skipped it.

2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
   project is fully quarantined right now — every version returns 404,
   so any pin we wrote was unresolvable, which broke `uv lock --check`
   in CI. Restoration is documented in pyproject.toml as a 5-step
   checklist (verify, re-add extra, re-enable in 4 modules, regenerate
   lock, optionally re-add to [all]).

3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
   jsonpath-python pruned. `uv lock --check` now passes.

# Defense-in-depth view

| Layer                      | Where             | Protects against                          |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject    | direct deps       | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph  | transitive worm injection                  |
| Tier-0 hash-verified path  | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate  | every PR          | drift between pyproject and lockfile      |
| `hermes_cli/security_advisories.py` | runtime  | cleanup for users who already got hit      |

The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.

# Validation

- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
  (test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
  test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.

* chore: remove community announcement drafts (PR body covers it)

* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)

Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.

Moved out of core dependencies = []:
- anthropic   (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client  (image gen; only when picked)
- edge-tts    (default TTS but still optional)

New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].

New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.

Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.

Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).

Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
2026-05-12 01:02:25 -07:00
..
__init__.py feat(honcho): explain why when honcho_profile returns an empty card 2026-04-27 12:37:33 -07:00
cli.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00
client.py feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220) 2026-05-12 01:02:25 -07:00
plugin.yaml feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00
README.md feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619) 2026-04-15 19:12:19 -07:00
session.py fix(honcho): pass user_message as search_query in get_prefetch_context 2026-05-05 05:01:12 -07:00

Honcho Memory Provider

AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.

Honcho docs: https://docs.honcho.dev/v3/guides/integrations/hermes

Requirements

  • pip install honcho-ai
  • Honcho API key from app.honcho.dev, or a self-hosted instance

Setup

hermes honcho setup    # full interactive wizard (cloud or local)
hermes memory setup    # generic picker, also works

Or manually:

hermes config set memory.provider honcho
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env

Architecture Overview

Two-Layer Context Injection

Context is injected into the user message at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in <memory-context> fences with a system note clarifying it's background data, not new user input.

Two independent layers, each on its own cadence:

Layer 1 — Base context (refreshed every contextCadence turns):

  1. SESSION SUMMARY — from session.context(summary=True), placed first
  2. User Representation — Honcho's evolving model of the user
  3. User Peer Card — key facts snapshot
  4. AI Self-Representation — Honcho's model of the AI peer
  5. AI Identity Card — AI peer facts

Layer 2 — Dialectic supplement (fired every dialecticCadence turns): Multi-pass .chat() reasoning about the user, appended after base context.

Both layers are joined, then truncated to fit contextTokens budget via _truncate_to_budget (tokens × 4 chars, word-boundary safe).

Cold Start vs Warm Session Prompts

Dialectic pass 0 automatically selects its prompt based on session state:

  • Cold (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
  • Warm (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."

Not configurable — determined automatically.

Dialectic Depth (Multi-Pass Reasoning)

dialecticDepth (13, clamped) controls how many .chat() calls fire per dialectic cycle:

Depth Passes Behavior
1 single .chat() Base query only (cold or warm prompt)
2 audit + synthesis Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars)
3 audit + synthesis + reconciliation Pass 2 reconciles contradictions across prior passes into a final synthesis

Proportional Reasoning Levels

When dialecticDepthLevels is not set, each pass uses a proportional level relative to dialecticReasoningLevel (the "base"):

Depth Pass levels
1 [base]
2 [minimal, base]
3 [minimal, base, low]

Override with dialecticDepthLevels: an explicit array of reasoning level strings per pass.

Three Orthogonal Dialectic Knobs

Knob Controls Type
dialecticCadence How often — minimum turns between dialectic firings int
dialecticDepth How many — passes per firing (13) int
dialecticReasoningLevel How hard — reasoning ceiling per .chat() call string

Input Sanitization

run_conversation strips leaked <memory-context> blocks from user input before processing. When saveMessages persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes <memory-context> blocks plus associated system notes.

Tools

Five bidirectional tools. All accept an optional peer parameter ("user" or "ai", default "user").

Tool LLM call? Description
honcho_profile No Peer card — key facts snapshot
honcho_search No Semantic search over stored context (800 tok default, 2000 max)
honcho_context No Full session context: summary, representation, card, messages
honcho_reasoning Yes LLM-synthesized answer via dialectic .chat()
honcho_conclude No Write a persistent fact/conclusion about the user

Tool visibility depends on recallMode: hidden in context mode, always present in tools and hybrid.

Config Resolution

Config is read from the first file that exists:

Priority Path Scope
1 $HERMES_HOME/honcho.json Profile-local (isolated Hermes instances)
2 ~/.hermes/honcho.json Default profile (shared host blocks)
3 ~/.honcho/config.json Global (cross-app interop)

Host key is derived from the active Hermes profile: hermes (default) or hermes.<profile>.

For every key, resolution order is: host block > root > env var > default.

Full Configuration Reference

Identity & Connection

Key Type Default Description
apiKey string API key. Falls back to HONCHO_API_KEY env var
baseUrl string Base URL for self-hosted Honcho. Local URLs auto-skip API key auth
environment string "production" SDK environment mapping
enabled bool auto Master toggle. Auto-enables when apiKey or baseUrl present
workspace string host key Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories
peerName string User peer identity
aiPeer string host key AI peer identity

Memory & Recall

Key Type Default Description
recallMode string "hybrid" "hybrid" (auto-inject + tools), "context" (auto-inject only, tools hidden), "tools" (tools only, no injection). Legacy "auto""hybrid"
observationMode string "directional" Preset: "directional" (all on) or "unified" (shared pool). Use observation object for granular control
observation object Per-peer observation config (see Observation section)

Write Behavior

Key Type Default Description
writeFrequency string/int "async" "async" (background), "turn" (sync per turn), "session" (batch on end), or integer N (every N turns)
saveMessages bool true Persist messages to Honcho API

Session Resolution

Key Type Default Description
sessionStrategy string "per-directory" "per-directory", "per-session", "per-repo" (git root), "global"
sessionPeerPrefix bool false Prepend peer name to session keys
sessions object {} Manual directory-to-session-name mappings

Session Name Resolution

The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:

Priority Source Example session name
1 Manual map (sessions config) "myproject-main"
2 /title command (mid-session rename) "refactor-auth"
3 Gateway session key (Telegram, Discord, etc.) "agent-main-telegram-dm-8439114563"
4 per-session strategy Hermes session ID (20260415_a3f2b1)
5 per-repo strategy Git root directory name (hermes-agent)
6 per-directory strategy Current directory basename (src)
7 global strategy Workspace name (hermes)

Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of sessionStrategy. The strategy setting only affects CLI sessions.

If sessionPeerPrefix is true, the peer name is prepended: eri-hermes-agent.

What each strategy produces

  • per-directory — basename of $PWD. Opening hermes in ~/code/myapp and ~/code/other gives two separate sessions. Same directory = same session across runs.
  • per-repo — git root directory name. All subdirectories within a repo share one session. Falls back to per-directory if not inside a git repo.
  • per-session — Hermes session ID (timestamp + hex). Every hermes invocation starts a fresh Honcho session. Falls back to per-directory if no session ID is available.
  • global — workspace name. One session for everything. Memory accumulates across all directories and runs.

Multi-Profile Pattern

Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is host block > root > env var > default — host blocks inherit from root, so shared settings only need to be declared once:

{
  "apiKey": "***",
  "workspace": "hermes",
  "peerName": "yourname",
  "hosts": {
    "hermes": {
      "aiPeer": "hermes",
      "recallMode": "hybrid",
      "sessionStrategy": "per-directory"
    },
    "hermes.coder": {
      "aiPeer": "coder",
      "recallMode": "tools",
      "sessionStrategy": "per-repo"
    }
  }
}

Both profiles see the same user (yourname) in the same shared environment (hermes), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.

Host key is derived from the active Hermes profile: hermes (default) or hermes.<profile> (e.g. hermes -p coder → host key hermes.coder).

Dialectic & Reasoning

Key Type Default Description
dialecticDepth int 1 Passes per dialectic cycle (13, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation
dialecticDepthLevels array Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: ["minimal", "low", "medium"]
dialecticReasoningLevel string "low" Base reasoning level for .chat(): "minimal", "low", "medium", "high", "max"
dialecticDynamic bool true When true, model can override reasoning level per-call via honcho_reasoning tool. When false, always uses dialecticReasoningLevel
dialecticMaxChars int 600 Max chars of dialectic result injected into system prompt
dialecticMaxInputChars int 10000 Max chars for dialectic query input to .chat(). Honcho cloud limit: 10k

Token Budgets

Key Type Default Description
contextTokens int SDK default Token budget for context() API calls. Also gates prefetch truncation (tokens × 4 chars)
messageMaxChars int 25000 Max chars per message sent via add_messages(). Exceeding this triggers chunking with [continued] markers. Honcho cloud limit: 25k

Cadence (Cost Control)

Key Type Default Description
contextCadence int 1 Minimum turns between base context refreshes (session summary + representation + card)
dialecticCadence int 1 Minimum turns between dialectic .chat() firings
injectionFrequency string "every-turn" "every-turn" or "first-turn" (inject context on the first user message only, skip from turn 2 onward)
reasoningLevelCap string Hard cap on reasoning level: "minimal", "low", "medium", "high"

Observation (Granular)

Maps 1:1 to Honcho's per-peer SessionPeerConfig. When present, overrides observationMode preset.

"observation": {
  "user": { "observeMe": true, "observeOthers": true },
  "ai":   { "observeMe": true, "observeOthers": true }
}
Field Default Description
user.observeMe true User peer self-observation (Honcho builds user representation)
user.observeOthers true User peer observes AI messages
ai.observeMe true AI peer self-observation (Honcho builds AI representation)
ai.observeOthers true AI peer observes user messages (enables cross-peer dialectic)

Presets:

  • "directional" (default): all four true
  • "unified": user observeMe=true, AI observeOthers=true, rest false

Hardcoded Limits

Limit Value
Search tool max tokens 2000 (hard cap), 800 (default)
Peer card fetch tokens 200

Environment Variables

Variable Fallback for
HONCHO_API_KEY apiKey
HONCHO_BASE_URL baseUrl
HONCHO_ENVIRONMENT environment
HERMES_HONCHO_HOST Host key override

CLI Commands

Command Description
hermes honcho setup Full interactive setup wizard
hermes honcho status Show resolved config for active profile
hermes honcho enable / disable Toggle Honcho for active profile
hermes honcho mode <mode> Change recall or observation mode
hermes honcho peer --user <name> Update user peer name
hermes honcho peer --ai <name> Update AI peer name
hermes honcho tokens --context <N> Set context token budget
hermes honcho tokens --dialectic <N> Set dialectic max chars
hermes honcho map <name> Map current directory to a session name
hermes honcho sync Create host blocks for all Hermes profiles

Example Config

{
  "apiKey": "***",
  "workspace": "hermes",
  "peerName": "username",
  "contextCadence": 2,
  "dialecticCadence": 3,
  "dialecticDepth": 2,
  "hosts": {
    "hermes": {
      "enabled": true,
      "aiPeer": "hermes",
      "recallMode": "hybrid",
      "observation": {
        "user": { "observeMe": true, "observeOthers": true },
        "ai": { "observeMe": true, "observeOthers": true }
      },
      "writeFrequency": "async",
      "sessionStrategy": "per-directory",
      "dialecticReasoningLevel": "low",
      "dialecticDepth": 2,
      "dialecticMaxChars": 600,
      "saveMessages": true
    },
    "hermes.coder": {
      "enabled": true,
      "aiPeer": "coder",
      "sessionStrategy": "per-repo",
      "dialecticDepth": 1,
      "dialecticDepthLevels": ["low"],
      "observation": {
        "user": { "observeMe": true, "observeOthers": false },
        "ai": { "observeMe": true, "observeOthers": true }
      }
    }
  },
  "sessions": {
    "/home/user/myproject": "myproject-main"
  }
}