fix(curator): defer first run and add --dry-run preview (#18373) (#18389)

* fix(curator): defer first run and add --dry-run preview (#18373)

Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.

Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
  False. The next real pass fires one full interval_hours later (7 days
  by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
  applying automatic transitions OR permitting the LLM to call
  skill_manage / terminal mv. A DRY-RUN banner is prepended to the
  prompt and the caller skips apply_automatic_transitions. State is
  NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
  --dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
  behavior and warn that hand-written SKILL.md files share the
  'agent-created' bucket, with guidance to pin or preview before the
  first pass.

Tests:
- test_first_run_defers replaces the old 'first run always eligible'
  assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
  path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
  banner injection, and apply_automatic_transitions skipping.

Fixes #18373.

* feat(curator): pre-run backup + rollback (#18373)

Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.

Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
  snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
  by the skills hub). Extract refuses absolute paths and .. components,
  and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
  pre-rollback safety snapshot FIRST, stages the current tree into
  .rollback-staging-<ts>/ so the extract lands in an empty dir, and
  cleans the staging dir on success. A failed extract restores the
  staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
  snapshot_skills(reason='pre-curator-run') before apply_automatic_
  transitions. Best-effort — a failed snapshot logs at debug and the
  run continues (a transient disk issue shouldn't silently disable
  curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
  rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
  with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
  .md table gets the new rows.

Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not

All curator tests: 210 passing locally.
This commit is contained in:
Teknium 2026-05-01 09:49:59 -07:00 committed by GitHub
parent c5b4c48165
commit 77c0bc6b13
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 1226 additions and 20 deletions

View file

@ -5433,6 +5433,45 @@ def _find_stale_dashboard_pids() -> list[int]:
return dashboard_pids
def _print_curator_first_run_notice() -> None:
"""Print a short heads-up about the skill curator after `hermes update`.
Only fires when the curator is enabled AND has no recorded run yet, which
is exactly the window where the gateway ticker used to fire Curator
against a fresh skill library immediately after an update. We defer the
first real pass by one ``interval_hours``; this notice tells the user how
to preview or disable before then. Silent on steady state.
"""
try:
from agent import curator
except Exception:
return
try:
if not curator.is_enabled():
return
state = curator.load_state()
except Exception:
return
if state.get("last_run_at"):
# Curator has run before (real or already seeded) — no notice needed.
return
try:
hours = curator.get_interval_hours()
except Exception:
hours = 24 * 7
days = max(1, hours // 24)
print()
print(" Skill curator")
print(
f" Background skill maintenance is enabled. First pass is deferred "
f"~{days}d after installation; only agent-created skills are in "
f"scope and nothing is ever auto-deleted (archive is recoverable)."
)
print(" Preview now: hermes curator run --dry-run")
print(" Pause it: hermes curator pause")
print(" Docs: https://hermes-agent.nousresearch.com/docs/user-guide/features/curator")
def _kill_stale_dashboard_processes(
reason: str = "the running backend no longer matches the updated frontend",
) -> None:
@ -5670,6 +5709,10 @@ def _update_via_zip(args):
print()
print("✓ Update complete!")
try:
_print_curator_first_run_notice()
except Exception as e:
logger.debug("Curator first-run notice failed: %s", e)
_kill_stale_dashboard_processes()
@ -7109,6 +7152,15 @@ def _cmd_update_impl(args, gateway_mode: bool):
print()
print("✓ Update complete!")
# Curator first-run heads-up. Only prints when curator is enabled AND
# has never run — i.e. the window where the ticker would otherwise
# have fired against a fresh skill library. Kept silent on steady
# state so we don't nag.
try:
_print_curator_first_run_notice()
except Exception as e:
logger.debug("Curator first-run notice failed: %s", e)
# Repair RHEL-family root installs where /usr/local/bin isn't on PATH
# for non-login interactive shells. No-op on every other platform.
try: