* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| cli.py | ||
| client.py | ||
| plugin.yaml | ||
| README.md | ||
| session.py | ||
Honcho Memory Provider
AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.
Honcho docs: https://docs.honcho.dev/v3/guides/integrations/hermes
Requirements
pip install honcho-ai- Honcho API key from app.honcho.dev, or a self-hosted instance
Setup
hermes honcho setup # full interactive wizard (cloud or local)
hermes memory setup # generic picker, also works
Or manually:
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
Architecture Overview
Two-Layer Context Injection
Context is injected into the user message at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in <memory-context> fences with a system note clarifying it's background data, not new user input.
Two independent layers, each on its own cadence:
Layer 1 — Base context (refreshed every contextCadence turns):
- SESSION SUMMARY — from
session.context(summary=True), placed first - User Representation — Honcho's evolving model of the user
- User Peer Card — key facts snapshot
- AI Self-Representation — Honcho's model of the AI peer
- AI Identity Card — AI peer facts
Layer 2 — Dialectic supplement (fired every dialecticCadence turns):
Multi-pass .chat() reasoning about the user, appended after base context.
Both layers are joined, then truncated to fit contextTokens budget via _truncate_to_budget (tokens × 4 chars, word-boundary safe).
Cold Start vs Warm Session Prompts
Dialectic pass 0 automatically selects its prompt based on session state:
- Cold (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
- Warm (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
Not configurable — determined automatically.
Dialectic Depth (Multi-Pass Reasoning)
dialecticDepth (1–3, clamped) controls how many .chat() calls fire per dialectic cycle:
| Depth | Passes | Behavior |
|---|---|---|
| 1 | single .chat() |
Base query only (cold or warm prompt) |
| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
Proportional Reasoning Levels
When dialecticDepthLevels is not set, each pass uses a proportional level relative to dialecticReasoningLevel (the "base"):
| Depth | Pass levels |
|---|---|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
Override with dialecticDepthLevels: an explicit array of reasoning level strings per pass.
Three Orthogonal Dialectic Knobs
| Knob | Controls | Type |
|---|---|---|
dialecticCadence |
How often — minimum turns between dialectic firings | int |
dialecticDepth |
How many — passes per firing (1–3) | int |
dialecticReasoningLevel |
How hard — reasoning ceiling per .chat() call |
string |
Input Sanitization
run_conversation strips leaked <memory-context> blocks from user input before processing. When saveMessages persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes <memory-context> blocks plus associated system notes.
Tools
Five bidirectional tools. All accept an optional peer parameter ("user" or "ai", default "user").
| Tool | LLM call? | Description |
|---|---|---|
honcho_profile |
No | Peer card — key facts snapshot |
honcho_search |
No | Semantic search over stored context (800 tok default, 2000 max) |
honcho_context |
No | Full session context: summary, representation, card, messages |
honcho_reasoning |
Yes | LLM-synthesized answer via dialectic .chat() |
honcho_conclude |
No | Write a persistent fact/conclusion about the user |
Tool visibility depends on recallMode: hidden in context mode, always present in tools and hybrid.
Config Resolution
Config is read from the first file that exists:
| Priority | Path | Scope |
|---|---|---|
| 1 | $HERMES_HOME/honcho.json |
Profile-local (isolated Hermes instances) |
| 2 | ~/.hermes/honcho.json |
Default profile (shared host blocks) |
| 3 | ~/.honcho/config.json |
Global (cross-app interop) |
Host key is derived from the active Hermes profile: hermes (default) or hermes.<profile>.
For every key, resolution order is: host block > root > env var > default.
Full Configuration Reference
Identity & Connection
| Key | Type | Default | Description |
|---|---|---|---|
apiKey |
string | — | API key. Falls back to HONCHO_API_KEY env var |
baseUrl |
string | — | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
environment |
string | "production" |
SDK environment mapping |
enabled |
bool | auto | Master toggle. Auto-enables when apiKey or baseUrl present |
workspace |
string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
peerName |
string | — | User peer identity |
aiPeer |
string | host key | AI peer identity |
Memory & Recall
| Key | Type | Default | Description |
|---|---|---|---|
recallMode |
string | "hybrid" |
"hybrid" (auto-inject + tools), "context" (auto-inject only, tools hidden), "tools" (tools only, no injection). Legacy "auto" → "hybrid" |
observationMode |
string | "directional" |
Preset: "directional" (all on) or "unified" (shared pool). Use observation object for granular control |
observation |
object | — | Per-peer observation config (see Observation section) |
Write Behavior
| Key | Type | Default | Description |
|---|---|---|---|
writeFrequency |
string/int | "async" |
"async" (background), "turn" (sync per turn), "session" (batch on end), or integer N (every N turns) |
saveMessages |
bool | true |
Persist messages to Honcho API |
Session Resolution
| Key | Type | Default | Description |
|---|---|---|---|
sessionStrategy |
string | "per-directory" |
"per-directory", "per-session", "per-repo" (git root), "global" |
sessionPeerPrefix |
bool | false |
Prepend peer name to session keys |
sessions |
object | {} |
Manual directory-to-session-name mappings |
Session Name Resolution
The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
| Priority | Source | Example session name |
|---|---|---|
| 1 | Manual map (sessions config) |
"myproject-main" |
| 2 | /title command (mid-session rename) |
"refactor-auth" |
| 3 | Gateway session key (Telegram, Discord, etc.) | "agent-main-telegram-dm-8439114563" |
| 4 | per-session strategy |
Hermes session ID (20260415_a3f2b1) |
| 5 | per-repo strategy |
Git root directory name (hermes-agent) |
| 6 | per-directory strategy |
Current directory basename (src) |
| 7 | global strategy |
Workspace name (hermes) |
Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of sessionStrategy. The strategy setting only affects CLI sessions.
If sessionPeerPrefix is true, the peer name is prepended: eri-hermes-agent.
What each strategy produces
per-directory— basename of$PWD. Opening hermes in~/code/myappand~/code/othergives two separate sessions. Same directory = same session across runs.per-repo— git root directory name. All subdirectories within a repo share one session. Falls back toper-directoryif not inside a git repo.per-session— Hermes session ID (timestamp + hex). Everyhermesinvocation starts a fresh Honcho session. Falls back toper-directoryif no session ID is available.global— workspace name. One session for everything. Memory accumulates across all directories and runs.
Multi-Profile Pattern
Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is host block > root > env var > default — host blocks inherit from root, so shared settings only need to be declared once:
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "yourname",
"hosts": {
"hermes": {
"aiPeer": "hermes",
"recallMode": "hybrid",
"sessionStrategy": "per-directory"
},
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"sessionStrategy": "per-repo"
}
}
}
Both profiles see the same user (yourname) in the same shared environment (hermes), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
Host key is derived from the active Hermes profile: hermes (default) or hermes.<profile> (e.g. hermes -p coder → host key hermes.coder).
Dialectic & Reasoning
| Key | Type | Default | Description |
|---|---|---|---|
dialecticDepth |
int | 1 |
Passes per dialectic cycle (1–3, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
dialecticDepthLevels |
array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: ["minimal", "low", "medium"] |
dialecticReasoningLevel |
string | "low" |
Base reasoning level for .chat(): "minimal", "low", "medium", "high", "max" |
dialecticDynamic |
bool | true |
When true, model can override reasoning level per-call via honcho_reasoning tool. When false, always uses dialecticReasoningLevel |
dialecticMaxChars |
int | 600 |
Max chars of dialectic result injected into system prompt |
dialecticMaxInputChars |
int | 10000 |
Max chars for dialectic query input to .chat(). Honcho cloud limit: 10k |
Token Budgets
| Key | Type | Default | Description |
|---|---|---|---|
contextTokens |
int | SDK default | Token budget for context() API calls. Also gates prefetch truncation (tokens × 4 chars) |
messageMaxChars |
int | 25000 |
Max chars per message sent via add_messages(). Exceeding this triggers chunking with [continued] markers. Honcho cloud limit: 25k |
Cadence (Cost Control)
| Key | Type | Default | Description |
|---|---|---|---|
contextCadence |
int | 1 |
Minimum turns between base context refreshes (session summary + representation + card) |
dialecticCadence |
int | 1 |
Minimum turns between dialectic .chat() firings |
injectionFrequency |
string | "every-turn" |
"every-turn" or "first-turn" (inject context on the first user message only, skip from turn 2 onward) |
reasoningLevelCap |
string | — | Hard cap on reasoning level: "minimal", "low", "medium", "high" |
Observation (Granular)
Maps 1:1 to Honcho's per-peer SessionPeerConfig. When present, overrides observationMode preset.
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": true, "observeOthers": true }
}
| Field | Default | Description |
|---|---|---|
user.observeMe |
true |
User peer self-observation (Honcho builds user representation) |
user.observeOthers |
true |
User peer observes AI messages |
ai.observeMe |
true |
AI peer self-observation (Honcho builds AI representation) |
ai.observeOthers |
true |
AI peer observes user messages (enables cross-peer dialectic) |
Presets:
"directional"(default): all fourtrue"unified": userobserveMe=true, AIobserveOthers=true, restfalse
Hardcoded Limits
| Limit | Value |
|---|---|
| Search tool max tokens | 2000 (hard cap), 800 (default) |
| Peer card fetch tokens | 200 |
Environment Variables
| Variable | Fallback for |
|---|---|
HONCHO_API_KEY |
apiKey |
HONCHO_BASE_URL |
baseUrl |
HONCHO_ENVIRONMENT |
environment |
HERMES_HONCHO_HOST |
Host key override |
CLI Commands
| Command | Description |
|---|---|
hermes honcho setup |
Full interactive setup wizard |
hermes honcho status |
Show resolved config for active profile |
hermes honcho enable / disable |
Toggle Honcho for active profile |
hermes honcho mode <mode> |
Change recall or observation mode |
hermes honcho peer --user <name> |
Update user peer name |
hermes honcho peer --ai <name> |
Update AI peer name |
hermes honcho tokens --context <N> |
Set context token budget |
hermes honcho tokens --dialectic <N> |
Set dialectic max chars |
hermes honcho map <name> |
Map current directory to a session name |
hermes honcho sync |
Create host blocks for all Hermes profiles |
Example Config
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "username",
"contextCadence": 2,
"dialecticCadence": 3,
"dialecticDepth": 2,
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"recallMode": "hybrid",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": true, "observeOthers": true }
},
"writeFrequency": "async",
"sessionStrategy": "per-directory",
"dialecticReasoningLevel": "low",
"dialecticDepth": 2,
"dialecticMaxChars": 600,
"saveMessages": true
},
"hermes.coder": {
"enabled": true,
"aiPeer": "coder",
"sessionStrategy": "per-repo",
"dialecticDepth": 1,
"dialecticDepthLevels": ["low"],
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
}
}
},
"sessions": {
"/home/user/myproject": "myproject-main"
}
}