mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
473 lines
17 KiB
Python
473 lines
17 KiB
Python
"""Modal cloud execution environment using the native Modal SDK directly.
|
|
|
|
Uses ``Sandbox.create()`` + ``Sandbox.exec()`` instead of the older runtime
|
|
wrapper, while preserving Hermes' persistent snapshot behavior across sessions.
|
|
"""
|
|
|
|
import asyncio
|
|
import base64
|
|
import io
|
|
import logging
|
|
import shlex
|
|
import tarfile
|
|
import threading
|
|
from pathlib import Path
|
|
from typing import Any, Optional
|
|
|
|
from hermes_constants import get_hermes_home
|
|
from tools.environments.base import (
|
|
BaseEnvironment,
|
|
_ThreadedProcessHandle,
|
|
_load_json_store,
|
|
_save_json_store,
|
|
)
|
|
from tools.environments.file_sync import (
|
|
FileSyncManager,
|
|
iter_sync_files,
|
|
quoted_mkdir_command,
|
|
quoted_rm_command,
|
|
unique_parent_dirs,
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
_SNAPSHOT_STORE = get_hermes_home() / "modal_snapshots.json"
|
|
_DIRECT_SNAPSHOT_NAMESPACE = "direct"
|
|
|
|
|
|
def _load_snapshots() -> dict:
|
|
return _load_json_store(_SNAPSHOT_STORE)
|
|
|
|
|
|
def _save_snapshots(data: dict) -> None:
|
|
_save_json_store(_SNAPSHOT_STORE, data)
|
|
|
|
|
|
def _direct_snapshot_key(task_id: str) -> str:
|
|
return f"{_DIRECT_SNAPSHOT_NAMESPACE}:{task_id}"
|
|
|
|
|
|
def _get_snapshot_restore_candidate(task_id: str) -> tuple[str | None, bool]:
|
|
snapshots = _load_snapshots()
|
|
namespaced_key = _direct_snapshot_key(task_id)
|
|
snapshot_id = snapshots.get(namespaced_key)
|
|
if isinstance(snapshot_id, str) and snapshot_id:
|
|
return snapshot_id, False
|
|
legacy_snapshot_id = snapshots.get(task_id)
|
|
if isinstance(legacy_snapshot_id, str) and legacy_snapshot_id:
|
|
return legacy_snapshot_id, True
|
|
return None, False
|
|
|
|
|
|
def _store_direct_snapshot(task_id: str, snapshot_id: str) -> None:
|
|
snapshots = _load_snapshots()
|
|
snapshots[_direct_snapshot_key(task_id)] = snapshot_id
|
|
snapshots.pop(task_id, None)
|
|
_save_snapshots(snapshots)
|
|
|
|
|
|
def _delete_direct_snapshot(task_id: str, snapshot_id: str | None = None) -> None:
|
|
snapshots = _load_snapshots()
|
|
updated = False
|
|
for key in (_direct_snapshot_key(task_id), task_id):
|
|
value = snapshots.get(key)
|
|
if value is None:
|
|
continue
|
|
if snapshot_id is None or value == snapshot_id:
|
|
snapshots.pop(key, None)
|
|
updated = True
|
|
if updated:
|
|
_save_snapshots(snapshots)
|
|
|
|
|
|
def _ensure_modal_sdk() -> None:
|
|
"""Lazy-install modal on demand. Idempotent — fast no-op once installed."""
|
|
try:
|
|
from tools.lazy_deps import ensure as _lazy_ensure
|
|
_lazy_ensure("terminal.modal", prompt=False)
|
|
except ImportError:
|
|
pass
|
|
except Exception as e:
|
|
raise ImportError(str(e))
|
|
|
|
|
|
def _resolve_modal_image(image_spec: Any) -> Any:
|
|
"""Convert registry references or snapshot ids into Modal image objects.
|
|
|
|
Includes add_python support for ubuntu/debian images (absorbed from PR 4511).
|
|
"""
|
|
_ensure_modal_sdk()
|
|
import modal as _modal
|
|
|
|
if not isinstance(image_spec, str):
|
|
return image_spec
|
|
|
|
if image_spec.startswith("im-"):
|
|
return _modal.Image.from_id(image_spec)
|
|
|
|
# PR 4511: add python to ubuntu/debian images that don't have it
|
|
lower = image_spec.lower()
|
|
add_python = any(base in lower for base in ("ubuntu", "debian"))
|
|
|
|
setup_commands = [
|
|
"RUN rm -rf /usr/local/lib/python*/site-packages/pip* 2>/dev/null; "
|
|
"python -m ensurepip --upgrade --default-pip 2>/dev/null || true",
|
|
]
|
|
if add_python:
|
|
setup_commands.insert(0,
|
|
"RUN apt-get update -qq && apt-get install -y -qq python3 python3-venv > /dev/null 2>&1 || true"
|
|
)
|
|
|
|
return _modal.Image.from_registry(
|
|
image_spec,
|
|
setup_dockerfile_commands=setup_commands,
|
|
)
|
|
|
|
|
|
class _AsyncWorker:
|
|
"""Background thread with its own event loop for async-safe Modal calls."""
|
|
|
|
def __init__(self):
|
|
self._loop: Optional[asyncio.AbstractEventLoop] = None
|
|
self._thread: Optional[threading.Thread] = None
|
|
self._started = threading.Event()
|
|
|
|
def start(self):
|
|
self._thread = threading.Thread(target=self._run_loop, daemon=True)
|
|
self._thread.start()
|
|
self._started.wait(timeout=30)
|
|
|
|
def _run_loop(self):
|
|
self._loop = asyncio.new_event_loop()
|
|
asyncio.set_event_loop(self._loop)
|
|
self._started.set()
|
|
self._loop.run_forever()
|
|
|
|
def run_coroutine(self, coro, timeout=600):
|
|
if self._loop is None or self._loop.is_closed():
|
|
raise RuntimeError("AsyncWorker loop is not running")
|
|
future = asyncio.run_coroutine_threadsafe(coro, self._loop)
|
|
return future.result(timeout=timeout)
|
|
|
|
def stop(self):
|
|
if self._loop and self._loop.is_running():
|
|
self._loop.call_soon_threadsafe(self._loop.stop)
|
|
if self._thread:
|
|
self._thread.join(timeout=10)
|
|
|
|
|
|
class ModalEnvironment(BaseEnvironment):
|
|
"""Modal cloud execution via native Modal sandboxes.
|
|
|
|
Spawn-per-call via _ThreadedProcessHandle wrapping async SDK calls.
|
|
cancel_fn wired to sandbox.terminate for interrupt support.
|
|
"""
|
|
|
|
_stdin_mode = "heredoc"
|
|
_snapshot_timeout = 60 # Modal cold starts can be slow
|
|
|
|
def __init__(
|
|
self,
|
|
image: str,
|
|
cwd: str = "/root",
|
|
timeout: int = 60,
|
|
modal_sandbox_kwargs: Optional[dict[str, Any]] = None,
|
|
persistent_filesystem: bool = True,
|
|
task_id: str = "default",
|
|
):
|
|
super().__init__(cwd=cwd, timeout=timeout)
|
|
|
|
self._persistent = persistent_filesystem
|
|
self._task_id = task_id
|
|
self._sandbox = None
|
|
self._app = None
|
|
self._worker = _AsyncWorker()
|
|
self._sync_manager: FileSyncManager | None = None # initialized after sandbox creation
|
|
|
|
sandbox_kwargs = dict(modal_sandbox_kwargs or {})
|
|
|
|
restored_snapshot_id = None
|
|
restored_from_legacy_key = False
|
|
if self._persistent:
|
|
restored_snapshot_id, restored_from_legacy_key = _get_snapshot_restore_candidate(
|
|
self._task_id
|
|
)
|
|
if restored_snapshot_id:
|
|
logger.info("Modal: restoring from snapshot %s", restored_snapshot_id[:20])
|
|
|
|
_ensure_modal_sdk()
|
|
import modal as _modal
|
|
|
|
cred_mounts = []
|
|
try:
|
|
from tools.credential_files import (
|
|
get_credential_file_mounts,
|
|
iter_skills_files,
|
|
iter_cache_files,
|
|
)
|
|
|
|
for mount_entry in get_credential_file_mounts():
|
|
cred_mounts.append(
|
|
_modal.Mount.from_local_file(
|
|
mount_entry["host_path"],
|
|
remote_path=mount_entry["container_path"],
|
|
)
|
|
)
|
|
for entry in iter_skills_files():
|
|
cred_mounts.append(
|
|
_modal.Mount.from_local_file(
|
|
entry["host_path"],
|
|
remote_path=entry["container_path"],
|
|
)
|
|
)
|
|
cache_files = iter_cache_files()
|
|
for entry in cache_files:
|
|
cred_mounts.append(
|
|
_modal.Mount.from_local_file(
|
|
entry["host_path"],
|
|
remote_path=entry["container_path"],
|
|
)
|
|
)
|
|
except Exception as e:
|
|
logger.debug("Modal: could not load credential file mounts: %s", e)
|
|
|
|
self._worker.start()
|
|
|
|
async def _create_sandbox(image_spec: Any):
|
|
app = await _modal.App.lookup.aio("hermes-agent", create_if_missing=True)
|
|
create_kwargs = dict(sandbox_kwargs)
|
|
if cred_mounts:
|
|
existing_mounts = list(create_kwargs.pop("mounts", []))
|
|
existing_mounts.extend(cred_mounts)
|
|
create_kwargs["mounts"] = existing_mounts
|
|
sandbox = await _modal.Sandbox.create.aio(
|
|
"sleep", "infinity",
|
|
image=image_spec,
|
|
app=app,
|
|
timeout=int(create_kwargs.pop("timeout", 3600)),
|
|
**create_kwargs,
|
|
)
|
|
return app, sandbox
|
|
|
|
try:
|
|
target_image_spec = restored_snapshot_id or image
|
|
try:
|
|
effective_image = _resolve_modal_image(target_image_spec)
|
|
self._app, self._sandbox = self._worker.run_coroutine(
|
|
_create_sandbox(effective_image), timeout=300,
|
|
)
|
|
except Exception as exc:
|
|
if not restored_snapshot_id:
|
|
raise
|
|
logger.warning(
|
|
"Modal: failed to restore snapshot %s, retrying with base image: %s",
|
|
restored_snapshot_id[:20], exc,
|
|
)
|
|
_delete_direct_snapshot(self._task_id, restored_snapshot_id)
|
|
base_image = _resolve_modal_image(image)
|
|
self._app, self._sandbox = self._worker.run_coroutine(
|
|
_create_sandbox(base_image), timeout=300,
|
|
)
|
|
else:
|
|
if restored_snapshot_id and restored_from_legacy_key:
|
|
_store_direct_snapshot(self._task_id, restored_snapshot_id)
|
|
except Exception:
|
|
self._worker.stop()
|
|
raise
|
|
|
|
logger.info("Modal: sandbox created (task=%s)", self._task_id)
|
|
|
|
self._sync_manager = FileSyncManager(
|
|
get_files_fn=lambda: iter_sync_files("/root/.hermes"),
|
|
upload_fn=self._modal_upload,
|
|
delete_fn=self._modal_delete,
|
|
bulk_upload_fn=self._modal_bulk_upload,
|
|
bulk_download_fn=self._modal_bulk_download,
|
|
)
|
|
self._sync_manager.sync(force=True)
|
|
self.init_session()
|
|
|
|
def _modal_upload(self, host_path: str, remote_path: str) -> None:
|
|
"""Upload a single file via base64 piped through stdin."""
|
|
content = Path(host_path).read_bytes()
|
|
b64 = base64.b64encode(content).decode("ascii")
|
|
container_dir = str(Path(remote_path).parent)
|
|
cmd = (
|
|
f"mkdir -p {shlex.quote(container_dir)} && "
|
|
f"base64 -d > {shlex.quote(remote_path)}"
|
|
)
|
|
|
|
async def _write():
|
|
proc = await self._sandbox.exec.aio("bash", "-c", cmd)
|
|
offset = 0
|
|
chunk_size = self._STDIN_CHUNK_SIZE
|
|
while offset < len(b64):
|
|
proc.stdin.write(b64[offset:offset + chunk_size])
|
|
await proc.stdin.drain.aio()
|
|
offset += chunk_size
|
|
proc.stdin.write_eof()
|
|
await proc.stdin.drain.aio()
|
|
await proc.wait.aio()
|
|
|
|
self._worker.run_coroutine(_write(), timeout=30)
|
|
|
|
# Modal SDK stdin buffer limit (legacy server path). The command-router
|
|
# path allows 16 MB, but we must stay under the smaller 2 MB cap for
|
|
# compatibility. Chunks are written below this threshold and flushed
|
|
# individually via drain().
|
|
_STDIN_CHUNK_SIZE = 1 * 1024 * 1024 # 1 MB — safe for both transport paths
|
|
|
|
def _modal_bulk_upload(self, files: list[tuple[str, str]]) -> None:
|
|
"""Upload many files via tar archive piped through stdin.
|
|
|
|
Builds a gzipped tar archive in memory and streams it into a
|
|
``base64 -d | tar xzf -`` pipeline via the process's stdin,
|
|
avoiding the Modal SDK's 64 KB ``ARG_MAX_BYTES`` exec-arg limit.
|
|
"""
|
|
if not files:
|
|
return
|
|
|
|
buf = io.BytesIO()
|
|
with tarfile.open(fileobj=buf, mode="w:gz") as tar:
|
|
for host_path, remote_path in files:
|
|
tar.add(host_path, arcname=remote_path.lstrip("/"))
|
|
payload = base64.b64encode(buf.getvalue()).decode("ascii")
|
|
|
|
parents = unique_parent_dirs(files)
|
|
mkdir_part = quoted_mkdir_command(parents)
|
|
cmd = f"{mkdir_part} && base64 -d | tar xzf - -C /"
|
|
|
|
async def _bulk():
|
|
proc = await self._sandbox.exec.aio("bash", "-c", cmd)
|
|
|
|
# Stream payload through stdin in chunks to stay under the
|
|
# SDK's per-write buffer limit (2 MB legacy / 16 MB router).
|
|
offset = 0
|
|
chunk_size = self._STDIN_CHUNK_SIZE
|
|
while offset < len(payload):
|
|
proc.stdin.write(payload[offset:offset + chunk_size])
|
|
await proc.stdin.drain.aio()
|
|
offset += chunk_size
|
|
|
|
proc.stdin.write_eof()
|
|
await proc.stdin.drain.aio()
|
|
|
|
exit_code = await proc.wait.aio()
|
|
if exit_code != 0:
|
|
stderr_text = await proc.stderr.read.aio()
|
|
raise RuntimeError(
|
|
f"Modal bulk upload failed (exit {exit_code}): {stderr_text}"
|
|
)
|
|
|
|
self._worker.run_coroutine(_bulk(), timeout=120)
|
|
|
|
def _modal_bulk_download(self, dest: Path) -> None:
|
|
"""Download remote .hermes/ as a tar archive.
|
|
|
|
Modal sandboxes always run as root, so /root/.hermes is hardcoded
|
|
(consistent with iter_sync_files call on line 269).
|
|
"""
|
|
async def _download():
|
|
proc = await self._sandbox.exec.aio(
|
|
"bash", "-c", "tar cf - -C / root/.hermes"
|
|
)
|
|
data = await proc.stdout.read.aio()
|
|
exit_code = await proc.wait.aio()
|
|
if exit_code != 0:
|
|
raise RuntimeError(f"Modal bulk download failed (exit {exit_code})")
|
|
return data
|
|
|
|
tar_bytes = self._worker.run_coroutine(_download(), timeout=120)
|
|
if isinstance(tar_bytes, str):
|
|
tar_bytes = tar_bytes.encode()
|
|
dest.write_bytes(tar_bytes)
|
|
|
|
def _modal_delete(self, remote_paths: list[str]) -> None:
|
|
"""Batch-delete remote files via exec."""
|
|
rm_cmd = quoted_rm_command(remote_paths)
|
|
|
|
async def _rm():
|
|
proc = await self._sandbox.exec.aio("bash", "-c", rm_cmd)
|
|
await proc.wait.aio()
|
|
|
|
self._worker.run_coroutine(_rm(), timeout=15)
|
|
|
|
def _before_execute(self) -> None:
|
|
"""Sync files to sandbox via FileSyncManager (rate-limited internally)."""
|
|
self._sync_manager.sync()
|
|
|
|
# ------------------------------------------------------------------
|
|
# Execution
|
|
# ------------------------------------------------------------------
|
|
|
|
def _run_bash(self, cmd_string: str, *, login: bool = False,
|
|
timeout: int = 120,
|
|
stdin_data: str | None = None):
|
|
"""Return a _ThreadedProcessHandle wrapping an async Modal sandbox exec."""
|
|
sandbox = self._sandbox
|
|
worker = self._worker
|
|
|
|
def cancel():
|
|
worker.run_coroutine(sandbox.terminate.aio(), timeout=15)
|
|
|
|
def exec_fn() -> tuple[str, int]:
|
|
async def _do():
|
|
args = ["bash"]
|
|
if login:
|
|
args.extend(["-l", "-c", cmd_string])
|
|
else:
|
|
args.extend(["-c", cmd_string])
|
|
process = await sandbox.exec.aio(*args, timeout=timeout)
|
|
stdout = await process.stdout.read.aio()
|
|
stderr = await process.stderr.read.aio()
|
|
exit_code = await process.wait.aio()
|
|
if isinstance(stdout, bytes):
|
|
stdout = stdout.decode("utf-8", errors="replace")
|
|
if isinstance(stderr, bytes):
|
|
stderr = stderr.decode("utf-8", errors="replace")
|
|
output = stdout
|
|
if stderr:
|
|
output = f"{stdout}\n{stderr}" if stdout else stderr
|
|
return output, exit_code
|
|
|
|
return worker.run_coroutine(_do(), timeout=timeout + 30)
|
|
|
|
return _ThreadedProcessHandle(exec_fn, cancel_fn=cancel)
|
|
|
|
def cleanup(self):
|
|
"""Snapshot the filesystem (if persistent) then stop the sandbox."""
|
|
if self._sandbox is None:
|
|
return
|
|
|
|
if self._sync_manager:
|
|
logger.info("Modal: syncing files from sandbox...")
|
|
self._sync_manager.sync_back()
|
|
|
|
if self._persistent:
|
|
try:
|
|
async def _snapshot():
|
|
img = await self._sandbox.snapshot_filesystem.aio()
|
|
return img.object_id
|
|
|
|
try:
|
|
snapshot_id = self._worker.run_coroutine(_snapshot(), timeout=60)
|
|
except Exception:
|
|
snapshot_id = None
|
|
|
|
if snapshot_id:
|
|
_store_direct_snapshot(self._task_id, snapshot_id)
|
|
logger.info(
|
|
"Modal: saved filesystem snapshot %s for task %s",
|
|
snapshot_id[:20], self._task_id,
|
|
)
|
|
except Exception as e:
|
|
logger.warning("Modal: filesystem snapshot failed: %s", e)
|
|
|
|
try:
|
|
self._worker.run_coroutine(self._sandbox.terminate.aio(), timeout=15)
|
|
except Exception:
|
|
pass
|
|
finally:
|
|
self._worker.stop()
|
|
self._sandbox = None
|
|
self._app = None
|