fix: Modal sandbox eval infra (9 fixes for TBLite baseline)

Fixes discovered while running TBLite baseline evaluation:

1. ephemeral_disk param not supported in modal 1.3.5 - check before passing
2. Modal legacy image builder requires working pip - add ensurepip fix via
   setup_dockerfile_commands to handle task images with broken pip
3. Host cwd leaked into Modal sandbox - add /home/ to host prefix check
4. Tilde ~ not expanded by subprocess.run(cwd=) in sandboxes - use /root
5. install_pipx must stay True for swerex-remote to be available

Dependencies also needed (not in this commit):
- git submodule update --init mini-swe-agent
- uv pip install swe-rex boto3
This commit is contained in:
dmahan93 2026-03-09 18:36:28 -05:00 committed by teknium1
parent 2c97bf3936
commit d7f4db53f5
3 changed files with 26 additions and 4 deletions

View file

@ -50,7 +50,7 @@ class ModalEnvironment(BaseEnvironment):
def __init__(
self,
image: str,
cwd: str = "~",
cwd: str = "/root",
timeout: int = 60,
modal_sandbox_kwargs: Optional[Dict[str, Any]] = None,
persistent_filesystem: bool = True,
@ -95,6 +95,7 @@ class ModalEnvironment(BaseEnvironment):
startup_timeout=180.0,
runtime_timeout=3600.0,
modal_sandbox_kwargs=sandbox_kwargs,
install_pipx=True, # Required: installs pipx + swe-rex runtime (swerex-remote)
)
def execute(self, command: str, cwd: str = "", *,