The memory tool schema already tells the model not to save task progress,
session outcomes, or completed-work logs. In practice models still over-save
under completion bias -- writing 'deployed X on DATE' or 'rotated Y on DATE'
entries that have no durable value and crowd out genuinely useful facts.
This adds a lightweight defense-in-depth backstop: advisory hygiene checks
that run on every 'add' and surface a 'hygiene_warning' field on the
response. Three checks:
1. Session-log phrasing: regex patterns for dated completion verbs
(deployed/rotated/retired/installed/completed/shipped/resolved/fixed
followed by YYYY-MM-DD), 'views (DATE)' snapshots, and 'as of DATE'
patterns. These are strong tells for ephemeral content.
2. Size pressure: warns once projected usage crosses 75% of the limit,
so pruning advice surfaces BEFORE the hard block at 100%.
3. Near-duplicate detection: fuzzy-matches new content against existing
entries using difflib SequenceMatcher. Similarity >= 0.75 suggests
the caller should use 'replace' instead of 'add'.
All checks are advisory (non-blocking) -- the entry is still persisted.
Existing injection/exfil hard-blocks (_scan_memory_content) are unchanged
and take priority. The schema's existing DO-NOT-SAVE guidance is also
unchanged; this is belt-and-suspenders runtime reinforcement.
Tests: 16 new (10 direct unit tests for _check_hygiene, 6 integration
tests via MemoryStore.add). All 49 memory tests pass.