fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards

Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018:

tools/environments/daytona.py
  - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls
    against the same sandbox don't collide on the remote temp path
  - Move sync_back() inside the cleanup lock and after the _sandbox-None
    guard, with its own try/except. Previously a no-op cleanup (sandbox
    already cleared) still fired sync_back → 3-attempt retry storm against
    a nil sandbox (~6s of sleep). Now short-circuits cleanly.

tools/environments/file_sync.py
  - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a
    tar larger than the limit. Protects against runaway sandboxes
    producing arbitrary-size archives.
  - Add 'nothing previously pushed' guard at the top of sync_back(). If
    _pushed_hashes and _synced_files are both empty, the FileSyncManager
    was never initialized from the host side — there is nothing coherent
    to sync back. Skips the retry/backoff machinery on uninitialized
    managers and eliminates test-suite slowdown from pre-existing cleanup
    tests that don't mock the sync layer.

tests/tools/test_file_sync_back.py
  - Update _make_manager helper to seed a _pushed_hashes entry by default
    so sync_back() exercises its real path. A seed_pushed_state=False
    opt-out is available for noop-path tests.
  - Add TestSyncBackSizeCap with positive and negative coverage of the
    new cap.

tests/tools/test_sync_back_backends.py
  - Update Daytona bulk download test to assert the PID-suffixed path
    pattern instead of the fixed /tmp/.hermes_sync.tar.
This commit is contained in:
Teknium 2026-04-16 17:10:30 -07:00 committed by Teknium
parent d64446e315
commit 7fd508979e
4 changed files with 113 additions and 13 deletions

View file

@ -7,6 +7,7 @@ and resumed on next creation, preserving the filesystem across sessions.
import logging
import math
import os
import shlex
import threading
from pathlib import Path
@ -170,13 +171,16 @@ class DaytonaEnvironment(BaseEnvironment):
def _daytona_bulk_download(self, dest: Path) -> None:
"""Download remote .hermes/ as a tar archive."""
rel_base = f"{self._remote_home}/.hermes".lstrip("/")
# PID-suffixed remote temp path avoids collisions if sync_back fires
# concurrently for the same sandbox (e.g. retry after partial failure).
remote_tar = f"/tmp/.hermes_sync.{os.getpid()}.tar"
self._sandbox.process.exec(
f"tar cf /tmp/.hermes_sync.tar -C / {shlex.quote(rel_base)}"
f"tar cf {shlex.quote(remote_tar)} -C / {shlex.quote(rel_base)}"
)
self._sandbox.fs.download_file("/tmp/.hermes_sync.tar", str(dest))
self._sandbox.fs.download_file(remote_tar, str(dest))
# Clean up remote temp file
try:
self._sandbox.process.exec("rm -f /tmp/.hermes_sync.tar")
self._sandbox.process.exec(f"rm -f {shlex.quote(remote_tar)}")
except Exception:
pass # best-effort cleanup
@ -227,13 +231,21 @@ class DaytonaEnvironment(BaseEnvironment):
return _ThreadedProcessHandle(exec_fn, cancel_fn=cancel)
def cleanup(self):
if self._sync_manager:
logger.info("Daytona: syncing files from sandbox...")
self._sync_manager.sync_back()
with self._lock:
if self._sandbox is None:
return
# Sync remote changes back to host before teardown. Running
# inside the lock (and after the _sandbox is None guard) avoids
# firing sync_back on an already-cleaned-up env, which would
# trigger a 3-attempt retry storm against a nil sandbox.
if self._sync_manager:
logger.info("Daytona: syncing files from sandbox...")
try:
self._sync_manager.sync_back()
except Exception as e:
logger.warning("Daytona: sync_back failed: %s", e)
try:
if self._persistent:
self._sandbox.stop()