mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-09 08:21:50 +00:00
Salvage of #36631 (@annguyenNous), rebased onto current main with regression tests added. Fixes #36266. When a persistent Docker sandbox container is removed out-of-band (idle reaper, `docker prune`, OOM kill, daemon restart), the gateway kept issuing `docker exec` against the dead container ID, returning "No such container" on every subsequent tool call — the agent was permanently blocked until the gateway process restarted. DockerEnvironment.execute() now detects the "No such container" / "is not running" error after a non-zero exit (gated on persist_across_processes) and calls _recreate_container(): it tries label-based reuse first, falls back to a fresh container replaying the same image + full all_run_args set, re-runs init_session(), and retries the command once. A genuine non-zero exit is NOT misclassified as container-gone. Differs from #36631 as submitted: adds the tests the original lacked. tests/tools/test_docker_environment.py covers _is_container_gone pattern matching (incl. the negative/control case), the recover-and-retry path, the persist_across_processes=False opt-out (no recovery), and the ordinary-failure passthrough (no spurious recreation). _make_dummy_env now forwards persist_across_processes. Verified: - Unit: 67/67 in test_docker_environment.py (4 new + existing). - Live E2E against the real docker daemon: started a persistent container, `docker rm -f`'d it out-of-band, and the next execute() transparently recreated a fresh container and succeeded; a follow-up command worked in the recovered container; a real `exit N` passed through without triggering recovery. Co-authored-by: annguyenNous <annguyenNous@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| base.py | ||
| daytona.py | ||
| docker.py | ||
| file_sync.py | ||
| local.py | ||
| managed_modal.py | ||
| modal.py | ||
| modal_utils.py | ||
| singularity.py | ||
| ssh.py | ||