mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-08 03:01:47 +00:00
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
249 lines
9.7 KiB
Markdown
249 lines
9.7 KiB
Markdown
---
|
|
sidebar_position: 8
|
|
sidebar_label: "Checkpoints & Rollback"
|
|
title: "Checkpoints and /rollback"
|
|
description: "Filesystem safety nets for destructive operations using shadow git repos and automatic snapshots"
|
|
---
|
|
|
|
# Checkpoints and `/rollback`
|
|
|
|
Hermes Agent can automatically snapshot your project before **destructive operations** and restore it with a single command. Checkpoints are **opt-in** as of v2 — most users never use `/rollback`, and the shadow-store storage is non-trivial over time, so the default is off.
|
|
|
|
Enable checkpoints per-session with `--checkpoints`:
|
|
|
|
```bash
|
|
hermes chat --checkpoints
|
|
```
|
|
|
|
Or enable globally in `~/.hermes/config.yaml`:
|
|
|
|
```yaml
|
|
checkpoints:
|
|
enabled: true
|
|
```
|
|
|
|
This safety net is powered by an internal **Checkpoint Manager** that keeps a single shared shadow git repository under `~/.hermes/checkpoints/store/` — your real project `.git` is never touched. Every project the agent works in shares the same store, so git's content-addressable object DB deduplicates across projects and across turns.
|
|
|
|
## What Triggers a Checkpoint
|
|
|
|
Checkpoints are taken automatically before:
|
|
|
|
- **File tools** — `write_file` and `patch`
|
|
- **Destructive terminal commands** — `rm`, `rmdir`, `cp`, `install`, `mv`, `sed -i`, `truncate`, `dd`, `shred`, output redirects (`>`), and `git reset`/`clean`/`checkout`
|
|
|
|
The agent creates **at most one checkpoint per directory per turn**, so long-running sessions don't spam snapshots.
|
|
|
|
## Quick Reference
|
|
|
|
In-session slash commands:
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `/rollback` | List all checkpoints with change stats |
|
|
| `/rollback <N>` | Restore to checkpoint N (also undoes last chat turn) |
|
|
| `/rollback diff <N>` | Preview diff between checkpoint N and current state |
|
|
| `/rollback <N> <file>` | Restore a single file from checkpoint N |
|
|
|
|
CLI for inspecting and managing the store outside a session:
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `hermes checkpoints` | Show total size, project count, per-project breakdown |
|
|
| `hermes checkpoints status` | Same as bare `checkpoints` |
|
|
| `hermes checkpoints list` | Alias for `status` |
|
|
| `hermes checkpoints prune` | Force a sweep: delete orphans/stale, GC, enforce size cap |
|
|
| `hermes checkpoints clear` | Nuke the entire checkpoint base (asks first) |
|
|
| `hermes checkpoints clear-legacy` | Delete only the `legacy-*` archives from v1 migration |
|
|
|
|
## How Checkpoints Work
|
|
|
|
At a high level:
|
|
|
|
- Hermes detects when tools are about to **modify files** in your working tree.
|
|
- Once per conversation turn (per directory), it:
|
|
- Resolves a reasonable project root for the file.
|
|
- Initialises or reuses the **single shared shadow store** at `~/.hermes/checkpoints/store/`.
|
|
- Stages into a per-project index, builds a tree, and commits to a per-project ref (`refs/hermes/<project-hash>`).
|
|
- These per-project refs form a checkpoint history that you can inspect and restore via `/rollback`.
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
user["User command\n(hermes, gateway)"]
|
|
agent["AIAgent\n(run_agent.py)"]
|
|
tools["File & terminal tools"]
|
|
cpMgr["CheckpointManager"]
|
|
store["Shared shadow store\n~/.hermes/checkpoints/store/"]
|
|
|
|
user --> agent
|
|
agent -->|"tool call"| tools
|
|
tools -->|"before mutate\nensure_checkpoint()"| cpMgr
|
|
cpMgr -->|"git add/commit-tree/update-ref"| store
|
|
cpMgr -->|"OK / skipped"| tools
|
|
tools -->|"apply changes"| agent
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Configure in `~/.hermes/config.yaml`:
|
|
|
|
```yaml
|
|
checkpoints:
|
|
enabled: false # master switch (default: false — opt-in)
|
|
max_snapshots: 20 # max checkpoints per project (enforced via ref rewrite + gc)
|
|
max_total_size_mb: 500 # hard cap on total store size; oldest commits dropped
|
|
max_file_size_mb: 10 # skip any single file larger than this
|
|
|
|
# Auto-maintenance (on by default): sweep ~/.hermes/checkpoints/ at startup
|
|
# and delete project entries whose working directory no longer exists
|
|
# (orphans) or whose last_touch is older than retention_days. Runs at most
|
|
# once per min_interval_hours, tracked via a .last_prune marker.
|
|
auto_prune: true
|
|
retention_days: 7
|
|
delete_orphans: true
|
|
min_interval_hours: 24
|
|
```
|
|
|
|
To disable everything:
|
|
|
|
```yaml
|
|
checkpoints:
|
|
enabled: false
|
|
auto_prune: false
|
|
```
|
|
|
|
When `enabled: false`, the Checkpoint Manager is a no-op and never attempts git operations. When `auto_prune: false`, the store grows until you run `hermes checkpoints prune` manually.
|
|
|
|
## Listing Checkpoints
|
|
|
|
From a CLI session:
|
|
|
|
```
|
|
/rollback
|
|
```
|
|
|
|
Hermes responds with a formatted list showing change statistics:
|
|
|
|
```text
|
|
📸 Checkpoints for /path/to/project:
|
|
|
|
1. 4270a8c 2026-03-16 04:36 before patch (1 file, +1/-0)
|
|
2. eaf4c1f 2026-03-16 04:35 before write_file
|
|
3. b3f9d2e 2026-03-16 04:34 before terminal: sed -i s/old/new/ config.py (1 file, +1/-1)
|
|
|
|
/rollback <N> restore to checkpoint N
|
|
/rollback diff <N> preview changes since checkpoint N
|
|
/rollback <N> <file> restore a single file from checkpoint N
|
|
```
|
|
|
|
## Inspecting the Store from the Shell
|
|
|
|
```bash
|
|
hermes checkpoints
|
|
```
|
|
|
|
Sample output:
|
|
|
|
```text
|
|
Checkpoint base: /home/you/.hermes/checkpoints
|
|
Total size: 142.3 MB
|
|
store/ 138.1 MB
|
|
legacy-* 4.2 MB
|
|
Projects: 12
|
|
|
|
WORKDIR COMMITS LAST TOUCH STATE
|
|
/home/you/code/hermes-agent 20 2h ago live
|
|
/home/you/code/experiments/rl-runner 8 1d ago live
|
|
/home/you/code/old-prototype 3 9d ago orphan
|
|
...
|
|
|
|
Legacy archives (1):
|
|
legacy-20260506-050616 4.2 MB
|
|
|
|
Clear with: hermes checkpoints clear-legacy
|
|
```
|
|
|
|
Force a full sweep (ignores the 24h idempotency marker):
|
|
|
|
```bash
|
|
hermes checkpoints prune --retention-days 3 --max-size-mb 200
|
|
```
|
|
|
|
## Previewing Changes with `/rollback diff`
|
|
|
|
Before committing to a restore, preview what has changed since a checkpoint:
|
|
|
|
```
|
|
/rollback diff 1
|
|
```
|
|
|
|
This shows a git diff stat summary followed by the actual diff.
|
|
|
|
## Restoring with `/rollback`
|
|
|
|
```
|
|
/rollback 1
|
|
```
|
|
|
|
Behind the scenes, Hermes:
|
|
|
|
1. Verifies the target commit exists in the shadow store.
|
|
2. Takes a **pre-rollback snapshot** of the current state so you can "undo the undo" later.
|
|
3. Restores tracked files in your working directory.
|
|
4. **Undoes the last conversation turn** so the agent's context matches the restored filesystem state.
|
|
|
|
## Single-File Restore
|
|
|
|
Restore just one file from a checkpoint without affecting the rest of the directory:
|
|
|
|
```
|
|
/rollback 1 src/broken_file.py
|
|
```
|
|
|
|
## Safety and Performance Guards
|
|
|
|
- **Git availability** — if `git` is not found on `PATH`, checkpoints are transparently disabled.
|
|
- **Directory scope** — Hermes skips overly broad directories (root `/`, home `$HOME`).
|
|
- **Repository size** — directories with more than 50,000 files are skipped.
|
|
- **Per-file size cap** — files larger than `max_file_size_mb` (default 10 MB) are excluded from the snapshot. Prevents accidentally swallowing datasets, model weights, or generated media.
|
|
- **Total store size cap** — when the store exceeds `max_total_size_mb` (default 500 MB), the oldest commit per project is dropped round-robin until under the cap.
|
|
- **Real pruning** — `max_snapshots` is enforced by rewriting the per-project ref and running `git gc --prune=now` afterwards, so loose objects don't accumulate.
|
|
- **No-change snapshots** — if there are no changes since the last snapshot, the checkpoint is skipped.
|
|
- **Non-fatal errors** — all errors inside the Checkpoint Manager are logged at debug level; your tools continue to run.
|
|
|
|
## Where Checkpoints Live
|
|
|
|
```text
|
|
~/.hermes/checkpoints/
|
|
├── store/ # single shared bare git repo
|
|
│ ├── HEAD, objects/ # git internals (shared across projects)
|
|
│ ├── refs/hermes/<hash> # per-project branch tip
|
|
│ ├── indexes/<hash> # per-project git index
|
|
│ ├── projects/<hash>.json # workdir + created_at + last_touch
|
|
│ └── info/exclude
|
|
├── .last_prune # auto-prune idempotency marker
|
|
└── legacy-<ts>/ # archived pre-v2 per-project shadow repos
|
|
```
|
|
|
|
Each `<hash>` is derived from the absolute path of the working directory. You normally never need to touch these manually — use `hermes checkpoints status` / `prune` / `clear` instead.
|
|
|
|
### Migration from v1
|
|
|
|
Before the v2 rewrite, each working directory got its own complete shadow git repo directly under `~/.hermes/checkpoints/<hash>/`. That layout couldn't dedup objects across projects and had a documented no-op pruner — the store would grow without bound.
|
|
|
|
On first v2 run, any pre-v2 shadow repos are moved into `~/.hermes/checkpoints/legacy-<timestamp>/` so the new single-store layout starts clean. Old `/rollback` history is still reachable by manually inspecting the legacy archive with `git`; once you're confident you don't need it, run:
|
|
|
|
```bash
|
|
hermes checkpoints clear-legacy
|
|
```
|
|
|
|
to reclaim the space. Legacy archives are also swept by `auto_prune` after `retention_days`.
|
|
|
|
## Best Practices
|
|
|
|
- **Enable checkpoints only when you need them** — `hermes chat --checkpoints` or per-profile `enabled: true`.
|
|
- **Use `/rollback diff` before restoring** — preview what will change to pick the right checkpoint.
|
|
- **Use `/rollback` instead of `git reset`** when you want to undo agent-driven changes only.
|
|
- **Check `hermes checkpoints status` occasionally** if you use checkpoints regularly — shows which projects are active and what the store costs you.
|
|
- **Combine with Git worktrees** for maximum safety — keep each Hermes session in its own worktree/branch, with checkpoints as an extra layer.
|
|
|
|
For running multiple agents in parallel on the same repo, see the guide on [Git worktrees](./git-worktrees.md).
|