fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards

Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018:

tools/environments/daytona.py
  - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls
    against the same sandbox don't collide on the remote temp path
  - Move sync_back() inside the cleanup lock and after the _sandbox-None
    guard, with its own try/except. Previously a no-op cleanup (sandbox
    already cleared) still fired sync_back → 3-attempt retry storm against
    a nil sandbox (~6s of sleep). Now short-circuits cleanly.

tools/environments/file_sync.py
  - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a
    tar larger than the limit. Protects against runaway sandboxes
    producing arbitrary-size archives.
  - Add 'nothing previously pushed' guard at the top of sync_back(). If
    _pushed_hashes and _synced_files are both empty, the FileSyncManager
    was never initialized from the host side — there is nothing coherent
    to sync back. Skips the retry/backoff machinery on uninitialized
    managers and eliminates test-suite slowdown from pre-existing cleanup
    tests that don't mock the sync layer.

tests/tools/test_file_sync_back.py
  - Update _make_manager helper to seed a _pushed_hashes entry by default
    so sync_back() exercises its real path. A seed_pushed_state=False
    opt-out is available for noop-path tests.
  - Add TestSyncBackSizeCap with positive and negative coverage of the
    new cap.

tests/tools/test_sync_back_backends.py
  - Update Daytona bulk download test to assert the PID-suffixed path
    pattern instead of the fixed /tmp/.hermes_sync.tar.
This commit is contained in:
Teknium 2026-04-16 17:10:30 -07:00 committed by Teknium
parent d64446e315
commit 7fd508979e
4 changed files with 113 additions and 13 deletions

View file

@ -95,6 +95,7 @@ def _sha256_file(path: str) -> str:
_SYNC_BACK_MAX_RETRIES = 3
_SYNC_BACK_BACKOFF = (2, 4, 8) # seconds between retries
_SYNC_BACK_MAX_BYTES = 2 * 1024 * 1024 * 1024 # 2 GiB — refuse to extract larger tars
class FileSyncManager:
@ -219,6 +220,13 @@ class FileSyncManager:
if self._bulk_download_fn is None:
return
# Nothing was ever committed through this manager — the initial
# push failed or never ran. Skip sync_back to avoid retry storms
# against an uninitialized remote .hermes/ directory.
if not self._pushed_hashes and not self._synced_files:
logger.debug("sync_back: no prior push state — skipping")
return
lock_path = (hermes_home or get_hermes_home()) / ".sync.lock"
lock_path.parent.mkdir(parents=True, exist_ok=True)
@ -292,6 +300,19 @@ class FileSyncManager:
with tempfile.NamedTemporaryFile(suffix=".tar") as tf:
self._bulk_download_fn(Path(tf.name))
# Defensive size cap: a misbehaving sandbox could produce an
# arbitrarily large tar. Refuse to extract if it exceeds the cap.
try:
tar_size = os.path.getsize(tf.name)
except OSError:
tar_size = 0
if tar_size > _SYNC_BACK_MAX_BYTES:
logger.warning(
"sync_back: remote tar is %d bytes (cap %d) — skipping extraction",
tar_size, _SYNC_BACK_MAX_BYTES,
)
return
with tempfile.TemporaryDirectory(prefix="hermes-sync-back-") as staging:
with tarfile.open(tf.name) as tar:
tar.extractall(staging, filter="data")