feat(file-sync): sync remote changes back to host on teardown

Salvage of PR #8018 by @alt-glitch onto current main.

On sandbox teardown, FileSyncManager now downloads the remote .hermes/
directory, diffs against SHA-256 hashes of what was originally pushed,
and applies only changed files back to the host.

Core (tools/environments/file_sync.py):
- sync_back(): orchestrates download -> unpack -> diff -> apply with:
  - Retry with exponential backoff (3 attempts, 2s/4s/8s)
  - SIGINT trap + defer (prevents partial writes on Ctrl-C)
  - fcntl.flock serialization (concurrent gateway sandboxes)
  - Last-write-wins conflict resolution with warning
  - New remote files pulled back via _infer_host_path prefix matching

Backends:
- SSH: _ssh_bulk_download — tar cf - piped over SSH
- Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read
- Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file
- All three call sync_back() at the top of cleanup()

Fixes applied during salvage (vs original PR #8018):

| # | Issue | Fix |
|---|-------|-----|
| C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None |
| W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError |
| W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer |
| W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports |
| W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root |
| S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) |
| S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) |

Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT
main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup).

Based on #8018 by @alt-glitch.
This commit is contained in:
kshitijk4poor 2026-04-12 11:18:29 +05:30 committed by Teknium
parent 764536b684
commit d64446e315
6 changed files with 1166 additions and 0 deletions

View file

@ -269,6 +269,7 @@ class ModalEnvironment(BaseEnvironment):
upload_fn=self._modal_upload,
delete_fn=self._modal_delete,
bulk_upload_fn=self._modal_bulk_upload,
bulk_download_fn=self._modal_bulk_download,
)
self._sync_manager.sync(force=True)
self.init_session()
@ -347,6 +348,27 @@ class ModalEnvironment(BaseEnvironment):
self._worker.run_coroutine(_bulk(), timeout=120)
def _modal_bulk_download(self, dest: Path) -> None:
"""Download remote .hermes/ as a tar archive.
Modal sandboxes always run as root, so /root/.hermes is hardcoded
(consistent with iter_sync_files call on line 269).
"""
async def _download():
proc = await self._sandbox.exec.aio(
"bash", "-c", "tar cf - -C / root/.hermes"
)
data = await proc.stdout.read.aio()
exit_code = await proc.wait.aio()
if exit_code != 0:
raise RuntimeError(f"Modal bulk download failed (exit {exit_code})")
return data
tar_bytes = self._worker.run_coroutine(_download(), timeout=120)
if isinstance(tar_bytes, str):
tar_bytes = tar_bytes.encode()
dest.write_bytes(tar_bytes)
def _modal_delete(self, remote_paths: list[str]) -> None:
"""Batch-delete remote files via exec."""
rm_cmd = quoted_rm_command(remote_paths)
@ -404,6 +426,10 @@ class ModalEnvironment(BaseEnvironment):
if self._sandbox is None:
return
if self._sync_manager:
logger.info("Modal: syncing files from sandbox...")
self._sync_manager.sync_back()
if self._persistent:
try:
async def _snapshot():