mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
266 lines
9.6 KiB
Python
266 lines
9.6 KiB
Python
"""Daytona cloud execution environment.
|
|
|
|
Uses the Daytona Python SDK to run commands in cloud sandboxes.
|
|
Supports persistent sandboxes: when enabled, sandboxes are stopped on cleanup
|
|
and resumed on next creation, preserving the filesystem across sessions.
|
|
"""
|
|
|
|
import logging
|
|
import math
|
|
import os
|
|
import shlex
|
|
import threading
|
|
from pathlib import Path
|
|
|
|
from tools.environments.base import (
|
|
BaseEnvironment,
|
|
_ThreadedProcessHandle,
|
|
)
|
|
from tools.environments.file_sync import (
|
|
FileSyncManager,
|
|
iter_sync_files,
|
|
quoted_mkdir_command,
|
|
quoted_rm_command,
|
|
unique_parent_dirs,
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class DaytonaEnvironment(BaseEnvironment):
|
|
"""Daytona cloud sandbox execution backend.
|
|
|
|
Spawn-per-call via _ThreadedProcessHandle wrapping blocking SDK calls.
|
|
cancel_fn wired to sandbox.stop() for interrupt support.
|
|
Shell timeout wrapper preserved (SDK timeout unreliable).
|
|
"""
|
|
|
|
_stdin_mode = "heredoc"
|
|
|
|
def __init__(
|
|
self,
|
|
image: str,
|
|
cwd: str = "/home/daytona",
|
|
timeout: int = 60,
|
|
cpu: int = 1,
|
|
memory: int = 5120,
|
|
disk: int = 10240,
|
|
persistent_filesystem: bool = True,
|
|
task_id: str = "default",
|
|
):
|
|
requested_cwd = cwd
|
|
super().__init__(cwd=cwd, timeout=timeout)
|
|
|
|
try:
|
|
from tools.lazy_deps import ensure as _lazy_ensure
|
|
_lazy_ensure("terminal.daytona", prompt=False)
|
|
except ImportError:
|
|
pass
|
|
except Exception as e:
|
|
raise ImportError(str(e))
|
|
from daytona import (
|
|
Daytona,
|
|
CreateSandboxFromImageParams,
|
|
DaytonaError,
|
|
Resources,
|
|
SandboxState,
|
|
)
|
|
|
|
self._persistent = persistent_filesystem
|
|
self._task_id = task_id
|
|
self._SandboxState = SandboxState
|
|
self._daytona = Daytona()
|
|
self._sandbox = None
|
|
self._lock = threading.Lock()
|
|
|
|
memory_gib = max(1, math.ceil(memory / 1024))
|
|
disk_gib = max(1, math.ceil(disk / 1024))
|
|
if disk_gib > 10:
|
|
logger.warning(
|
|
"Daytona: requested disk (%dGB) exceeds platform limit (10GB). "
|
|
"Capping to 10GB.", disk_gib,
|
|
)
|
|
disk_gib = 10
|
|
resources = Resources(cpu=cpu, memory=memory_gib, disk=disk_gib)
|
|
|
|
labels = {"hermes_task_id": task_id}
|
|
sandbox_name = f"hermes-{task_id}"
|
|
|
|
if self._persistent:
|
|
try:
|
|
self._sandbox = self._daytona.get(sandbox_name)
|
|
self._sandbox.start()
|
|
logger.info("Daytona: resumed sandbox %s for task %s",
|
|
self._sandbox.id, task_id)
|
|
except DaytonaError:
|
|
self._sandbox = None
|
|
except Exception as e:
|
|
logger.warning("Daytona: failed to resume sandbox for task %s: %s",
|
|
task_id, e)
|
|
self._sandbox = None
|
|
|
|
if self._sandbox is None:
|
|
try:
|
|
page = self._daytona.list(labels=labels, page=1, limit=1)
|
|
if page.items:
|
|
self._sandbox = page.items[0]
|
|
self._sandbox.start()
|
|
logger.info("Daytona: resumed legacy sandbox %s for task %s",
|
|
self._sandbox.id, task_id)
|
|
except Exception as e:
|
|
logger.debug("Daytona: no legacy sandbox found for task %s: %s",
|
|
task_id, e)
|
|
self._sandbox = None
|
|
|
|
if self._sandbox is None:
|
|
self._sandbox = self._daytona.create(
|
|
CreateSandboxFromImageParams(
|
|
image=image,
|
|
name=sandbox_name,
|
|
labels=labels,
|
|
auto_stop_interval=0,
|
|
resources=resources,
|
|
)
|
|
)
|
|
logger.info("Daytona: created sandbox %s for task %s",
|
|
self._sandbox.id, task_id)
|
|
|
|
# Detect remote home dir
|
|
self._remote_home = "/root"
|
|
try:
|
|
home = self._sandbox.process.exec("echo $HOME").result.strip()
|
|
if home:
|
|
self._remote_home = home
|
|
if requested_cwd in {"~", "/home/daytona"}:
|
|
self.cwd = home
|
|
except Exception:
|
|
pass
|
|
logger.info("Daytona: resolved home to %s, cwd to %s", self._remote_home, self.cwd)
|
|
|
|
self._sync_manager = FileSyncManager(
|
|
get_files_fn=lambda: iter_sync_files(f"{self._remote_home}/.hermes"),
|
|
upload_fn=self._daytona_upload,
|
|
delete_fn=self._daytona_delete,
|
|
bulk_upload_fn=self._daytona_bulk_upload,
|
|
bulk_download_fn=self._daytona_bulk_download,
|
|
)
|
|
self._sync_manager.sync(force=True)
|
|
self.init_session()
|
|
|
|
def _daytona_upload(self, host_path: str, remote_path: str) -> None:
|
|
"""Upload a single file via Daytona SDK."""
|
|
parent = str(Path(remote_path).parent)
|
|
self._sandbox.process.exec(f"mkdir -p {parent}")
|
|
self._sandbox.fs.upload_file(host_path, remote_path)
|
|
|
|
def _daytona_bulk_upload(self, files: list[tuple[str, str]]) -> None:
|
|
"""Upload many files in a single HTTP call via Daytona SDK.
|
|
|
|
Uses ``sandbox.fs.upload_files()`` which batches all files into one
|
|
multipart POST, avoiding per-file TLS/HTTP overhead (~580 files
|
|
goes from ~5 min to <2 s).
|
|
"""
|
|
from daytona.common.filesystem import FileUpload
|
|
|
|
if not files:
|
|
return
|
|
|
|
parents = unique_parent_dirs(files)
|
|
if parents:
|
|
self._sandbox.process.exec(quoted_mkdir_command(parents))
|
|
|
|
uploads = [
|
|
FileUpload(source=host_path, destination=remote_path)
|
|
for host_path, remote_path in files
|
|
]
|
|
self._sandbox.fs.upload_files(uploads)
|
|
|
|
def _daytona_bulk_download(self, dest: Path) -> None:
|
|
"""Download remote .hermes/ as a tar archive."""
|
|
rel_base = f"{self._remote_home}/.hermes".lstrip("/")
|
|
# PID-suffixed remote temp path avoids collisions if sync_back fires
|
|
# concurrently for the same sandbox (e.g. retry after partial failure).
|
|
remote_tar = f"/tmp/.hermes_sync.{os.getpid()}.tar"
|
|
self._sandbox.process.exec(
|
|
f"tar cf {shlex.quote(remote_tar)} -C / {shlex.quote(rel_base)}"
|
|
)
|
|
self._sandbox.fs.download_file(remote_tar, str(dest))
|
|
# Clean up remote temp file
|
|
try:
|
|
self._sandbox.process.exec(f"rm -f {shlex.quote(remote_tar)}")
|
|
except Exception:
|
|
pass # best-effort cleanup
|
|
|
|
def _daytona_delete(self, remote_paths: list[str]) -> None:
|
|
"""Batch-delete remote files via SDK exec."""
|
|
self._sandbox.process.exec(quoted_rm_command(remote_paths))
|
|
|
|
# ------------------------------------------------------------------
|
|
# Sandbox lifecycle
|
|
# ------------------------------------------------------------------
|
|
|
|
def _ensure_sandbox_ready(self) -> None:
|
|
"""Restart sandbox if it was stopped (e.g., by a previous interrupt)."""
|
|
self._sandbox.refresh_data()
|
|
if self._sandbox.state in {self._SandboxState.STOPPED, self._SandboxState.ARCHIVED}:
|
|
self._sandbox.start()
|
|
logger.info("Daytona: restarted sandbox %s", self._sandbox.id)
|
|
|
|
def _before_execute(self) -> None:
|
|
"""Ensure sandbox is ready, then sync files via FileSyncManager."""
|
|
with self._lock:
|
|
self._ensure_sandbox_ready()
|
|
self._sync_manager.sync()
|
|
|
|
def _run_bash(self, cmd_string: str, *, login: bool = False,
|
|
timeout: int = 120,
|
|
stdin_data: str | None = None):
|
|
"""Return a _ThreadedProcessHandle wrapping a blocking Daytona SDK call."""
|
|
sandbox = self._sandbox
|
|
lock = self._lock
|
|
|
|
def cancel():
|
|
with lock:
|
|
try:
|
|
sandbox.stop()
|
|
except Exception:
|
|
pass
|
|
|
|
if login:
|
|
shell_cmd = f"bash -l -c {shlex.quote(cmd_string)}"
|
|
else:
|
|
shell_cmd = f"bash -c {shlex.quote(cmd_string)}"
|
|
|
|
def exec_fn() -> tuple[str, int]:
|
|
response = sandbox.process.exec(shell_cmd, timeout=timeout)
|
|
return (response.result or "", response.exit_code)
|
|
|
|
return _ThreadedProcessHandle(exec_fn, cancel_fn=cancel)
|
|
|
|
def cleanup(self):
|
|
with self._lock:
|
|
if self._sandbox is None:
|
|
return
|
|
|
|
# Sync remote changes back to host before teardown. Running
|
|
# inside the lock (and after the _sandbox is None guard) avoids
|
|
# firing sync_back on an already-cleaned-up env, which would
|
|
# trigger a 3-attempt retry storm against a nil sandbox.
|
|
if self._sync_manager:
|
|
logger.info("Daytona: syncing files from sandbox...")
|
|
try:
|
|
self._sync_manager.sync_back()
|
|
except Exception as e:
|
|
logger.warning("Daytona: sync_back failed: %s", e)
|
|
|
|
try:
|
|
if self._persistent:
|
|
self._sandbox.stop()
|
|
logger.info("Daytona: stopped sandbox %s (filesystem preserved)",
|
|
self._sandbox.id)
|
|
else:
|
|
self._daytona.delete(self._sandbox)
|
|
logger.info("Daytona: deleted sandbox %s", self._sandbox.id)
|
|
except Exception as e:
|
|
logger.warning("Daytona: cleanup failed: %s", e)
|
|
self._sandbox = None
|