hermes-agent/scripts
Teknium 9b55365f6f
fix(gateway,cron): close ephemeral agents + reap stale aux clients (salvage #13979) (#16598)
* fix: clean gateway auxiliary client caches on teardown

* fix(gateway): recover from stale pid files and close cron agents

Two issues were keeping the gateway from surviving long runs:

1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
   refuses to unlink when the file's pid differs from our own. That
   safety check exists for the --replace atexit handoff, but it also
   applied to stale-record cleanup, so after a crashy exit the pid
   file was orphaned: `write_pid_file()`'s O_EXCL create then failed
   with `FileExistsError`, and systemd looped on "PID file race lost
   to another gateway instance". Unlink unconditionally from this
   helper since the caller has already verified the record is dead.

2. The cron scheduler never closed the ephemeral `AIAgent` it creates
   per tick, and never swept the process-global auxiliary-client
   cache. Over days of 10-minute ticks this leaked subprocesses and
   async httpx transports until the gateway hit EMFILE. Release the
   agent and call `cleanup_stale_async_clients()` in `run_job`'s
   outer `finally`, matching the gateway's own per-turn cleanup.

* chore(release): map bloodcarter@gmail.com -> bloodcarter

---------

Co-authored-by: bloodcarter <bloodcarter@gmail.com>
2026-04-27 07:41:42 -07:00
..
lib feat: lazy bootstrap node 2026-04-16 10:47:37 -05:00
whatsapp-bridge Update whatsapp-bridge package-lock.json 2026-04-22 18:16:08 -07:00
build_model_catalog.py feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033) 2026-04-26 05:46:43 -07:00
build_skills_index.py feat(skills): centralized skills index — eliminate GitHub API calls for search/install 2026-04-12 16:39:04 -07:00
contributor_audit.py feat(ci): add contributor attribution check on PRs (#9376) 2026-04-13 21:13:08 -07:00
discord-voice-doctor.py fix(scripts): read gateway_voice_mode.json as UTF-8 2026-04-22 17:34:29 -07:00
hermes-gateway fix: prevent systemd restart storm on gateway connection failure 2026-03-21 09:26:39 -07:00
install.cmd feat: Windows native support via Git Bash 2026-03-02 22:03:29 -08:00
install.ps1 chore: defer WhatsApp bridge install to first use (#12992) 2026-04-20 04:55:33 -07:00
install.sh fix(install+update): add /usr/local/bin PATH guard for RHEL root non-login shells (#16191) 2026-04-26 12:22:37 -07:00
kill_modal.sh refactor: replace swe-rex with native Modal SDK for Modal backend (#3538) 2026-03-28 11:21:44 -07:00
profile-tui.py chore(tui): remove dead branch cleanup code 2026-04-26 21:54:24 -05:00
release.py fix(gateway,cron): close ephemeral agents + reap stale aux clients (salvage #13979) (#16598) 2026-04-27 07:41:42 -07:00
run_tests.sh test: make test env hermetic; enforce CI parity via scripts/run_tests.sh (#11577) 2026-04-17 06:09:09 -07:00
sample_and_compress.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00