hermes-agent/tests/hermes_cli/test_security_advisories.py
Teknium c1eb2dcda7
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback

Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.

# What this PR makes true

1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
   detection banner with copy-pasteable remediation steps the moment
   they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
   a fresh install to 'core only' — the installer keeps every other
   extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
   lazy-install on first use under a strict allowlist, instead of
   eagerly pulling everything at install time.

# Detection: hermes_cli/security_advisories.py

- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
  mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
  dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
  the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
  re-banner after ack.
- Wired into:
  * hermes doctor — runs first, prints full remediation block
  * hermes doctor --ack <id> — dismisses an advisory
  * cli.py interactive run() and single-query branches — short
    stderr banner pointing at hermes doctor
  * gateway/run.py startup — operator-visible warning in gateway.log

# Lazy-install framework: tools/lazy_deps.py

- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
  memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
  uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
  pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
  HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
  * tools/tts_tool.py — _import_elevenlabs() calls ensure first
  * plugins/memory/honcho/client.py — get_honcho_client lazy-installs
  * tts.mistral / stt.mistral entries pre-registered for when PyPI
    restores mistralai

# Installer fallback tiers

scripts/install.sh, scripts/install.ps1, setup-hermes.sh:

- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
  array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
  PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
  landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
  the same _BROKEN_EXTRAS array so updates stay in sync.

Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).

# Config

hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: []  (advisory IDs the user has dismissed)
- allow_lazy_installs: True  (security gate for ensure())

No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.

# Tests

tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
  gateway_log_message
- shipped catalog well-formedness invariant

tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
  every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
  pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command

Combined: 63 new tests, all passing under scripts/run_tests.sh.

# Validation

- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
  tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
  tests/hermes_cli/test_doctor_command_install.py
  tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
  tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
  9191 passed, 8 pre-existing failures (verified on origin/main
  before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
  + gateway_log_message with mocked installed version → produces
  copy-pasteable remediation output

# Community

Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md

Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md

Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>

* build(deps): pin every direct dep to ==X.Y.Z (no ranges)

Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.

Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.

What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.

Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.

Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.

mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.

LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.

Validation:

- Cross-checked all 77 pinned direct deps in pyproject.toml against
  uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
  → 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.

* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra

You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.

# What this commit fixes

1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
   uv.lock records SHA256 hashes for every transitive — a compromised
   package with a different hash gets REJECTED. Falls through to the
   existing `uv pip install` cascade if the lockfile is missing or
   stale, with a loud warning that the fallback path does NOT
   hash-verify transitives. Previously only `setup-hermes.sh` (the dev
   path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
   (the paths fresh users actually run) skipped it.

2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
   project is fully quarantined right now — every version returns 404,
   so any pin we wrote was unresolvable, which broke `uv lock --check`
   in CI. Restoration is documented in pyproject.toml as a 5-step
   checklist (verify, re-add extra, re-enable in 4 modules, regenerate
   lock, optionally re-add to [all]).

3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
   jsonpath-python pruned. `uv lock --check` now passes.

# Defense-in-depth view

| Layer                      | Where             | Protects against                          |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject    | direct deps       | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph  | transitive worm injection                  |
| Tier-0 hash-verified path  | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate  | every PR          | drift between pyproject and lockfile      |
| `hermes_cli/security_advisories.py` | runtime  | cleanup for users who already got hit      |

The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.

# Validation

- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
  (test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
  test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.

* chore: remove community announcement drafts (PR body covers it)

* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)

Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.

Moved out of core dependencies = []:
- anthropic   (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client  (image gen; only when picked)
- edge-tts    (default TTS but still optional)

New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].

New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.

Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.

Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).

Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
2026-05-12 01:02:25 -07:00

330 lines
12 KiB
Python

"""Tests for hermes_cli.security_advisories.
The advisory module is the user-facing detection / remediation surface
for supply-chain attacks (e.g. the Mini Shai-Hulud worm of May 2026 that
poisoned mistralai 2.4.6 on PyPI). These tests exercise the public API in
isolation — no real package metadata, no real config, no real cache.
"""
from __future__ import annotations
import time
from pathlib import Path
from typing import Iterator
import pytest
import hermes_cli.security_advisories as adv
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def fake_advisory() -> adv.Advisory:
"""A self-contained Advisory used across tests."""
return adv.Advisory(
id="test-advisory-2026-99",
title="Test advisory",
summary="Pretend this package has been compromised.",
url="https://example.com/advisory",
compromised=(
("fake-malicious-pkg", frozenset({"6.6.6"})),
),
remediation=(
"pip uninstall -y fake-malicious-pkg",
"Rotate any credentials that may have been exposed.",
),
published="2026-01-01",
severity="critical",
)
@pytest.fixture
def isolated_home(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
"""Redirect HERMES_HOME so banner cache and config writes are sandboxed."""
home = tmp_path / ".hermes"
home.mkdir()
(home / "cache").mkdir()
monkeypatch.setattr(Path, "home", lambda: tmp_path)
monkeypatch.setenv("HERMES_HOME", str(home))
return home
@pytest.fixture
def patched_version(monkeypatch: pytest.MonkeyPatch) -> Iterator[dict[str, str]]:
"""Override _installed_version with a controllable lookup table."""
table: dict[str, str] = {}
monkeypatch.setattr(adv, "_installed_version", lambda pkg: table.get(pkg))
yield table
# ---------------------------------------------------------------------------
# detect_compromised
# ---------------------------------------------------------------------------
class TestDetectCompromised:
def test_no_match_returns_empty_list(self, fake_advisory, patched_version):
# No matching package installed.
hits = adv.detect_compromised(advisories=[fake_advisory])
assert hits == []
def test_exact_version_match(self, fake_advisory, patched_version):
patched_version["fake-malicious-pkg"] = "6.6.6"
hits = adv.detect_compromised(advisories=[fake_advisory])
assert len(hits) == 1
assert hits[0].advisory.id == fake_advisory.id
assert hits[0].package == "fake-malicious-pkg"
assert hits[0].installed_version == "6.6.6"
def test_safe_version_does_not_match(self, fake_advisory, patched_version):
# Package is installed but the version is not in the compromised set.
patched_version["fake-malicious-pkg"] = "6.6.5"
hits = adv.detect_compromised(advisories=[fake_advisory])
assert hits == []
def test_empty_compromised_set_matches_any_version(
self, patched_version
):
# An advisory with an empty version set is a "any version is suspect"
# wildcard — used when an entire maintainer namespace is owned.
wildcard = adv.Advisory(
id="wildcard",
title="Whole namespace owned",
summary="x",
url="x",
compromised=(("evil-namespace", frozenset()),),
remediation=("uninstall it",),
)
patched_version["evil-namespace"] = "0.0.1"
hits = adv.detect_compromised(advisories=[wildcard])
assert len(hits) == 1
assert hits[0].installed_version == "0.0.1"
# ---------------------------------------------------------------------------
# Acknowledgement persistence
# ---------------------------------------------------------------------------
class TestAck:
def test_get_acked_ids_empty_when_no_config(self, monkeypatch):
# load_config raises → returns empty set, doesn't crash.
monkeypatch.setattr(
"hermes_cli.config.load_config",
lambda: (_ for _ in ()).throw(RuntimeError("boom")),
)
assert adv.get_acked_ids() == set()
def test_filter_unacked_strips_dismissed(self, fake_advisory, monkeypatch):
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
monkeypatch.setattr(adv, "get_acked_ids", lambda: {fake_advisory.id})
assert adv.filter_unacked([hit]) == []
def test_filter_unacked_passes_through_unknown(
self, fake_advisory, monkeypatch
):
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
assert adv.filter_unacked([hit]) == [hit]
def test_ack_advisory_persists_id(self, isolated_home, monkeypatch):
# Stub the config layer end-to-end with a tiny in-memory store so we
# don't depend on the full hermes_cli.config bootstrap.
store: dict = {"security": {}}
monkeypatch.setattr(
"hermes_cli.config.load_config", lambda: store
)
monkeypatch.setattr(
"hermes_cli.config.save_config",
lambda cfg: store.update(cfg) or None,
)
assert adv.ack_advisory("test-advisory-2026-99") is True
assert "test-advisory-2026-99" in store["security"]["acked_advisories"]
# Idempotent.
adv.ack_advisory("test-advisory-2026-99")
assert (
store["security"]["acked_advisories"].count("test-advisory-2026-99")
== 1
)
def test_ack_advisory_rejects_blank(self, isolated_home):
assert adv.ack_advisory("") is False
assert adv.ack_advisory(" ") is False
# ---------------------------------------------------------------------------
# Banner cache rate limiting
# ---------------------------------------------------------------------------
class TestBannerCache:
def test_first_call_returns_due_hits(
self, fake_advisory, isolated_home, monkeypatch
):
monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
due = adv.hits_due_for_banner([hit])
assert due == [hit]
def test_second_call_within_window_suppresses(
self, fake_advisory, isolated_home, monkeypatch
):
monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
adv.hits_due_for_banner([hit])
# Same banner inside repeat window → suppressed.
again = adv.hits_due_for_banner([hit])
assert again == []
def test_call_after_window_re_banners(
self, fake_advisory, isolated_home, monkeypatch
):
monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
adv.hits_due_for_banner([hit])
# Backdate the cache so it looks like the banner was shown more
# than 24h ago — should re-banner.
cache_path = adv._banner_cache_path()
assert cache_path is not None
old_lines = cache_path.read_text(encoding="utf-8").splitlines()
backdated = []
for line in old_lines:
parts = line.split(None, 1)
if len(parts) == 2:
backdated.append(f"{parts[0]} {time.time() - 48 * 3600}")
cache_path.write_text("\n".join(backdated) + "\n", encoding="utf-8")
again = adv.hits_due_for_banner([hit])
assert again == [hit]
def test_acked_hits_never_banner(
self, fake_advisory, isolated_home, monkeypatch
):
monkeypatch.setattr(adv, "get_acked_ids", lambda: {fake_advisory.id})
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
assert adv.hits_due_for_banner([hit]) == []
# ---------------------------------------------------------------------------
# Rendering
# ---------------------------------------------------------------------------
class TestRendering:
def test_short_banner_lines_includes_id_and_version(self, fake_advisory):
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
lines = adv.short_banner_lines([hit])
joined = "\n".join(lines)
assert fake_advisory.id in joined
assert fake_advisory.title in joined
assert "fake-malicious-pkg==6.6.6" in joined
assert "hermes doctor" in joined
def test_full_remediation_text_contains_all_steps(self, fake_advisory):
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
body = "\n".join(adv.full_remediation_text(hit))
# All remediation steps must be present.
for step in fake_advisory.remediation:
assert step in body
assert fake_advisory.url in body
assert fake_advisory.summary in body
def test_render_doctor_section_clean_state(self):
# No hits → success message, has_problems=False.
has_problems, lines = adv.render_doctor_section([])
assert has_problems is False
assert any("No active security advisories" in line for line in lines)
def test_render_doctor_section_with_unacked_hit(
self, fake_advisory, monkeypatch
):
monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
has_problems, lines = adv.render_doctor_section([hit])
assert has_problems is True
body = "\n".join(lines)
assert fake_advisory.title in body
def test_gateway_log_message_singular(self, fake_advisory, monkeypatch):
monkeypatch.setattr(adv, "get_acked_ids", lambda: set())
hit = adv.AdvisoryHit(
advisory=fake_advisory,
package="fake-malicious-pkg",
installed_version="6.6.6",
)
msg = adv.gateway_log_message([hit])
assert msg is not None
assert fake_advisory.id in msg
assert "fake-malicious-pkg==6.6.6" in msg
def test_gateway_log_message_returns_none_for_no_hits(self):
assert adv.gateway_log_message([]) is None
# ---------------------------------------------------------------------------
# Real catalog smoke test
# ---------------------------------------------------------------------------
class TestRealCatalog:
def test_advisories_well_formed(self):
"""Every shipped advisory must be self-consistent.
Catches data-entry mistakes (empty IDs, missing remediation, bad
compromised tuples) before they ship.
"""
seen_ids: set[str] = set()
for advisory in adv.ADVISORIES:
assert advisory.id, "advisory has empty id"
assert advisory.id not in seen_ids, f"duplicate id {advisory.id}"
seen_ids.add(advisory.id)
assert advisory.title, f"{advisory.id}: empty title"
assert advisory.summary, f"{advisory.id}: empty summary"
assert advisory.remediation, f"{advisory.id}: empty remediation"
assert advisory.url.startswith("http"), \
f"{advisory.id}: bad url {advisory.url!r}"
assert advisory.compromised, \
f"{advisory.id}: empty compromised tuple"
for pkg, versions in advisory.compromised:
assert pkg, f"{advisory.id}: empty package name"
assert isinstance(versions, frozenset), \
f"{advisory.id}: versions must be frozenset"