hermes-agent/tools/environments
Ben Barclay 82c157b267
fix(docker): clean up orphaned container when docker run fails (salvage #7440) (#39412)
When `docker run -d` fails after Docker has already created the container
object (e.g. exit 125 when the daemon isn't ready, or a timeout mid image
pull), the code raised before `self._container_id` was set — so the
container leaked permanently in "Created" state. Reported in #7439:
110+ orphaned containers accumulated over 3 days from hourly cron-
scheduled gateway sessions hitting a Docker Desktop startup race.

The orphan reaper added in #33645 (reap_orphan_containers) does NOT cover
this case: it filters `status=exited`, but a failed-create container is in
`Created` state, so it slips through and is never reaped.

Wrap the `docker run -d` call in try/except and `docker rm -f` the
container by its known name before re-raising.

Salvages #7440 by @Tranquil-Flow. Their branch predated the cross-process
reuse + labels rework on `main`, so a cherry-pick conflicted; reconstructed
the same intent (plus their two regression tests, adapted to mock the new
reuse `docker ps` probe) against current `main`.

Verified adversarially: reverted just the product change to origin/main's
`docker.py`, ran the two new tests -> both FAIL with
`assert 0 == 1 ("docker rm should be called once")`. With the fix applied,
both pass; full test_docker_environment.py is 65/65 green.

Closes #7440. Fixes #7439.

Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com>
2026-06-05 10:19:08 +10:00
..
__init__.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
base.py fix(tools): don't compound-rewrite spawn_via_env background wrappers 2026-06-01 00:05:10 +05:30
daytona.py fix(daytona): migrate legacy-sandbox lookup to cursor-based list() (#24587) 2026-05-12 16:31:46 -07:00
docker.py fix(docker): clean up orphaned container when docker run fails (salvage #7440) (#39412) 2026-06-05 10:19:08 +10:00
file_sync.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
local.py fix(security): narrow Bedrock subprocess strip to inference bearer token only 2026-05-29 01:48:08 -07:00
managed_modal.py feat(environments): unified spawn-per-call execution layer 2026-04-08 17:23:15 -07:00
modal.py fix(async): close unscheduled coroutines in all threadsafe bridges (#26584) 2026-05-15 14:00:01 -07:00
modal_utils.py fix(tools): don't compound-rewrite spawn_via_env background wrappers 2026-06-01 00:05:10 +05:30
singularity.py feat(environments): unified spawn-per-call execution layer 2026-04-08 17:23:15 -07:00
ssh.py fix(ssh): keep bulk sync extraction scoped to .hermes 2026-05-21 19:17:51 -07:00