hermes-agent/website/docs/user-guide/checkpoints-and-rollback.md
Teknium a0fedfbb1b
feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.

Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:

1. Each working directory got its own full shadow git repo — no object
   dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
   /rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
   without ever asking for /rollback.

Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.

Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
  refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
  per-project metadata (store/projects/<hash>.json with workdir +
  created_at + last_touch). On first v2 init, any pre-v2 per-directory
  shadow repos are auto-migrated into legacy-<timestamp>/ so the new
  store starts clean. _prune() now actually rewrites the per-project ref
  to the last max_snapshots commits and runs git gc --prune=now. New
  _enforce_size_cap() drops oldest commits round-robin across projects
  when the store exceeds max_total_size_mb. _drop_oversize_from_index()
  filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
  (status / list / prune / clear / clear-legacy) for managing the store
  outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
  auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
  Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
  *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
  AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
  coverage for the pre-v2 prune path (per-directory shadow repos under
  base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
  shared store, new defaults, migration, and the new CLI;
  reference/cli-commands.md documents 'hermes checkpoints'.

E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
  7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
  --retention-days 0' removes its ref, index, and metadata; GC reclaims
  the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
  tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
  CLI without an agent running.

Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
2026-05-06 05:44:35 -07:00

9.7 KiB

sidebar_position sidebar_label title description
8 Checkpoints & Rollback Checkpoints and /rollback Filesystem safety nets for destructive operations using shadow git repos and automatic snapshots

Checkpoints and /rollback

Hermes Agent can automatically snapshot your project before destructive operations and restore it with a single command. Checkpoints are opt-in as of v2 — most users never use /rollback, and the shadow-store storage is non-trivial over time, so the default is off.

Enable checkpoints per-session with --checkpoints:

hermes chat --checkpoints

Or enable globally in ~/.hermes/config.yaml:

checkpoints:
  enabled: true

This safety net is powered by an internal Checkpoint Manager that keeps a single shared shadow git repository under ~/.hermes/checkpoints/store/ — your real project .git is never touched. Every project the agent works in shares the same store, so git's content-addressable object DB deduplicates across projects and across turns.

What Triggers a Checkpoint

Checkpoints are taken automatically before:

  • File toolswrite_file and patch
  • Destructive terminal commandsrm, rmdir, cp, install, mv, sed -i, truncate, dd, shred, output redirects (>), and git reset/clean/checkout

The agent creates at most one checkpoint per directory per turn, so long-running sessions don't spam snapshots.

Quick Reference

In-session slash commands:

Command Description
/rollback List all checkpoints with change stats
/rollback <N> Restore to checkpoint N (also undoes last chat turn)
/rollback diff <N> Preview diff between checkpoint N and current state
/rollback <N> <file> Restore a single file from checkpoint N

CLI for inspecting and managing the store outside a session:

Command Description
hermes checkpoints Show total size, project count, per-project breakdown
hermes checkpoints status Same as bare checkpoints
hermes checkpoints list Alias for status
hermes checkpoints prune Force a sweep: delete orphans/stale, GC, enforce size cap
hermes checkpoints clear Nuke the entire checkpoint base (asks first)
hermes checkpoints clear-legacy Delete only the legacy-* archives from v1 migration

How Checkpoints Work

At a high level:

  • Hermes detects when tools are about to modify files in your working tree.
  • Once per conversation turn (per directory), it:
    • Resolves a reasonable project root for the file.
    • Initialises or reuses the single shared shadow store at ~/.hermes/checkpoints/store/.
    • Stages into a per-project index, builds a tree, and commits to a per-project ref (refs/hermes/<project-hash>).
  • These per-project refs form a checkpoint history that you can inspect and restore via /rollback.
flowchart LR
  user["User command\n(hermes, gateway)"]
  agent["AIAgent\n(run_agent.py)"]
  tools["File & terminal tools"]
  cpMgr["CheckpointManager"]
  store["Shared shadow store\n~/.hermes/checkpoints/store/"]

  user --> agent
  agent -->|"tool call"| tools
  tools -->|"before mutate\nensure_checkpoint()"| cpMgr
  cpMgr -->|"git add/commit-tree/update-ref"| store
  cpMgr -->|"OK / skipped"| tools
  tools -->|"apply changes"| agent

Configuration

Configure in ~/.hermes/config.yaml:

checkpoints:
  enabled: false              # master switch (default: false — opt-in)
  max_snapshots: 20           # max checkpoints per project (enforced via ref rewrite + gc)
  max_total_size_mb: 500      # hard cap on total store size; oldest commits dropped
  max_file_size_mb: 10        # skip any single file larger than this

  # Auto-maintenance (on by default): sweep ~/.hermes/checkpoints/ at startup
  # and delete project entries whose working directory no longer exists
  # (orphans) or whose last_touch is older than retention_days. Runs at most
  # once per min_interval_hours, tracked via a .last_prune marker.
  auto_prune: true
  retention_days: 7
  delete_orphans: true
  min_interval_hours: 24

To disable everything:

checkpoints:
  enabled: false
  auto_prune: false

When enabled: false, the Checkpoint Manager is a no-op and never attempts git operations. When auto_prune: false, the store grows until you run hermes checkpoints prune manually.

Listing Checkpoints

From a CLI session:

/rollback

Hermes responds with a formatted list showing change statistics:

📸 Checkpoints for /path/to/project:

  1. 4270a8c  2026-03-16 04:36  before patch  (1 file, +1/-0)
  2. eaf4c1f  2026-03-16 04:35  before write_file
  3. b3f9d2e  2026-03-16 04:34  before terminal: sed -i s/old/new/ config.py  (1 file, +1/-1)

  /rollback <N>             restore to checkpoint N
  /rollback diff <N>        preview changes since checkpoint N
  /rollback <N> <file>      restore a single file from checkpoint N

Inspecting the Store from the Shell

hermes checkpoints

Sample output:

Checkpoint base: /home/you/.hermes/checkpoints
Total size:      142.3 MB
  store/         138.1 MB
  legacy-*       4.2 MB
Projects:        12

  WORKDIR                                                       COMMITS    LAST TOUCH  STATE
  /home/you/code/hermes-agent                                        20       2h ago  live
  /home/you/code/experiments/rl-runner                                8       1d ago  live
  /home/you/code/old-prototype                                        3       9d ago  orphan
  ...

Legacy archives (1):
  legacy-20260506-050616                           4.2 MB

Clear with: hermes checkpoints clear-legacy

Force a full sweep (ignores the 24h idempotency marker):

hermes checkpoints prune --retention-days 3 --max-size-mb 200

Previewing Changes with /rollback diff

Before committing to a restore, preview what has changed since a checkpoint:

/rollback diff 1

This shows a git diff stat summary followed by the actual diff.

Restoring with /rollback

/rollback 1

Behind the scenes, Hermes:

  1. Verifies the target commit exists in the shadow store.
  2. Takes a pre-rollback snapshot of the current state so you can "undo the undo" later.
  3. Restores tracked files in your working directory.
  4. Undoes the last conversation turn so the agent's context matches the restored filesystem state.

Single-File Restore

Restore just one file from a checkpoint without affecting the rest of the directory:

/rollback 1 src/broken_file.py

Safety and Performance Guards

  • Git availability — if git is not found on PATH, checkpoints are transparently disabled.
  • Directory scope — Hermes skips overly broad directories (root /, home $HOME).
  • Repository size — directories with more than 50,000 files are skipped.
  • Per-file size cap — files larger than max_file_size_mb (default 10 MB) are excluded from the snapshot. Prevents accidentally swallowing datasets, model weights, or generated media.
  • Total store size cap — when the store exceeds max_total_size_mb (default 500 MB), the oldest commit per project is dropped round-robin until under the cap.
  • Real pruningmax_snapshots is enforced by rewriting the per-project ref and running git gc --prune=now afterwards, so loose objects don't accumulate.
  • No-change snapshots — if there are no changes since the last snapshot, the checkpoint is skipped.
  • Non-fatal errors — all errors inside the Checkpoint Manager are logged at debug level; your tools continue to run.

Where Checkpoints Live

~/.hermes/checkpoints/
  ├── store/                 # single shared bare git repo
  │   ├── HEAD, objects/     # git internals (shared across projects)
  │   ├── refs/hermes/<hash> # per-project branch tip
  │   ├── indexes/<hash>     # per-project git index
  │   ├── projects/<hash>.json  # workdir + created_at + last_touch
  │   └── info/exclude
  ├── .last_prune            # auto-prune idempotency marker
  └── legacy-<ts>/           # archived pre-v2 per-project shadow repos

Each <hash> is derived from the absolute path of the working directory. You normally never need to touch these manually — use hermes checkpoints status / prune / clear instead.

Migration from v1

Before the v2 rewrite, each working directory got its own complete shadow git repo directly under ~/.hermes/checkpoints/<hash>/. That layout couldn't dedup objects across projects and had a documented no-op pruner — the store would grow without bound.

On first v2 run, any pre-v2 shadow repos are moved into ~/.hermes/checkpoints/legacy-<timestamp>/ so the new single-store layout starts clean. Old /rollback history is still reachable by manually inspecting the legacy archive with git; once you're confident you don't need it, run:

hermes checkpoints clear-legacy

to reclaim the space. Legacy archives are also swept by auto_prune after retention_days.

Best Practices

  • Enable checkpoints only when you need themhermes chat --checkpoints or per-profile enabled: true.
  • Use /rollback diff before restoring — preview what will change to pick the right checkpoint.
  • Use /rollback instead of git reset when you want to undo agent-driven changes only.
  • Check hermes checkpoints status occasionally if you use checkpoints regularly — shows which projects are active and what the store costs you.
  • Combine with Git worktrees for maximum safety — keep each Hermes session in its own worktree/branch, with checkpoints as an extra layer.

For running multiple agents in parallel on the same repo, see the guide on Git worktrees.