mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-27 01:11:40 +00:00
fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839)
Two accretion-over-time leaks that compound over long CLI / gateway lifetimes. Both were flagged in the memory-leak audit. ## file_tools._read_tracker _read_tracker[task_id] holds three sub-containers that grew unbounded: read_history set of (path, offset, limit) tuples — 1 per unique read dedup dict of (path, offset, limit) → mtime — same growth pattern read_timestamps dict of resolved_path → mtime — 1 per unique path A CLI session uses one stable task_id for its lifetime, so these were uncapped. A 10k-read session accumulated ~1.5MB of tracker state that the tool no longer needed (only the most recent reads are relevant for dedup, consecutive-loop detection, and write/patch external-edit warnings). Fix: _cap_read_tracker_data() enforces hard caps on each container after every add. Defaults: read_history=500, dedup=1000, read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict guarantee) for the dicts; arbitrary for the set (which only feeds diagnostic summaries). ## process_registry._completion_consumed Module-level set that recorded every session_id ever polled / waited / logged. No pruning. Each entry is ~20 bytes, so the absolute leak is small, but on a gateway processing thousands of background commands per day the set grows until process exit. Fix: _prune_if_needed() now discards _completion_consumed entries alongside the session dict evictions it already performs (both the TTL-based prune and the LRU-over-cap prune). Adds a final belt-and-suspenders pass that drops any dangling entries whose session_id no longer appears in _running or _finished. Tests: tests/tools/test_accretion_caps.py — 9 cases * Each container bound respected, oldest evicted * No-op when under cap (no unnecessary work) * Handles missing sub-containers without crashing * Live read_file_tool path enforces caps end-to-end * _completion_consumed pruned on TTL expiry * _completion_consumed pruned on LRU eviction * Dangling entries (no backing session) cleared Broader suite: 3486 tests/tools + tests/cli pass. The single flake (test_alias_command_passes_args) reproduces on unchanged main — known cross-test pollution under suite-order load.
This commit is contained in:
parent
0a83187801
commit
3f43aec15d
3 changed files with 266 additions and 0 deletions
|
|
@ -970,12 +970,22 @@ class ProcessRegistry:
|
|||
]
|
||||
for sid in expired:
|
||||
del self._finished[sid]
|
||||
self._completion_consumed.discard(sid)
|
||||
|
||||
# If still over limit, remove oldest finished
|
||||
total = len(self._running) + len(self._finished)
|
||||
if total >= MAX_PROCESSES and self._finished:
|
||||
oldest_id = min(self._finished, key=lambda sid: self._finished[sid].started_at)
|
||||
del self._finished[oldest_id]
|
||||
self._completion_consumed.discard(oldest_id)
|
||||
|
||||
# Drop any _completion_consumed entries whose sessions are no longer
|
||||
# tracked at all — belt-and-suspenders against module-lifetime growth
|
||||
# on process-registry lookup paths that don't reach the dict prunes.
|
||||
tracked = self._running.keys() | self._finished.keys()
|
||||
stale = self._completion_consumed - tracked
|
||||
if stale:
|
||||
self._completion_consumed -= stale
|
||||
|
||||
# ----- Checkpoint (crash recovery) -----
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue