mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-08 03:01:47 +00:00
feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
This commit is contained in:
parent
b045e7a2ba
commit
a0fedfbb1b
10 changed files with 1965 additions and 715 deletions
|
|
@ -54,6 +54,7 @@ hermes [global-options] <command> [subcommand/options]
|
|||
| `hermes dump` | Copy-pasteable setup summary for support/debugging. |
|
||||
| `hermes debug` | Debug tools — upload logs and system info for support. |
|
||||
| `hermes backup` | Back up Hermes home directory to a zip file. |
|
||||
| `hermes checkpoints` | Inspect / prune / clear `~/.hermes/checkpoints/` (the shadow store used by `/rollback`). Run with no args for a status overview. |
|
||||
| `hermes import` | Restore a Hermes backup from a zip file. |
|
||||
| `hermes logs` | View, tail, and filter agent/gateway/error log files. |
|
||||
| `hermes config` | Show, edit, migrate, and query configuration files. |
|
||||
|
|
@ -579,6 +580,44 @@ hermes backup --quick # Quick state-only snapshot
|
|||
hermes backup --quick --label "pre-upgrade" # Quick snapshot with label
|
||||
```
|
||||
|
||||
## `hermes checkpoints`
|
||||
|
||||
```bash
|
||||
hermes checkpoints [COMMAND]
|
||||
```
|
||||
|
||||
Inspect and manage the shadow git store at `~/.hermes/checkpoints/` — the storage layer behind the in-session `/rollback` command. Safe to run any time; does not require the agent to be running.
|
||||
|
||||
| Subcommand | Description |
|
||||
|------------|-------------|
|
||||
| `status` (default) | Show total size, project count, and per-project breakdown. Bare `hermes checkpoints` is equivalent. |
|
||||
| `list` | Alias for `status`. |
|
||||
| `prune` | Force a cleanup sweep — delete orphan and stale projects, GC the store, enforce the size cap. Ignores the 24h idempotency marker. |
|
||||
| `clear` | Delete the entire checkpoint base. Irreversible; asks for confirmation unless `-f`. |
|
||||
| `clear-legacy` | Delete only the `legacy-<timestamp>/` archives produced by the v1→v2 migration. |
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Subcommand | Description |
|
||||
|--------|------------|-------------|
|
||||
| `--limit N` | `status`, `list` | Max projects to list (default 20). |
|
||||
| `--retention-days N` | `prune` | Drop projects whose `last_touch` is older than N days (default 7). |
|
||||
| `--max-size-mb N` | `prune` | After the orphan/stale pass, drop the oldest commit per project until total store size ≤ N MB (default 500). |
|
||||
| `--keep-orphans` | `prune` | Skip deleting projects whose working directory no longer exists. |
|
||||
| `-f`, `--force` | `clear`, `clear-legacy` | Skip the confirmation prompt. |
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
hermes checkpoints # status overview
|
||||
hermes checkpoints prune --retention-days 3 # aggressive cleanup
|
||||
hermes checkpoints prune --max-size-mb 200 # tighten size cap once
|
||||
hermes checkpoints clear-legacy -f # drop v1 archive dirs
|
||||
hermes checkpoints clear -f # wipe everything
|
||||
```
|
||||
|
||||
See [Checkpoints and `/rollback`](../user-guide/checkpoints-and-rollback.md) for the full architecture and the in-session commands.
|
||||
|
||||
## `hermes import`
|
||||
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -7,9 +7,22 @@ description: "Filesystem safety nets for destructive operations using shadow git
|
|||
|
||||
# Checkpoints and `/rollback`
|
||||
|
||||
Hermes Agent automatically snapshots your project before **destructive operations** and lets you restore it with a single command. Checkpoints are **enabled by default** — there's zero cost when no file-mutating tools fire.
|
||||
Hermes Agent can automatically snapshot your project before **destructive operations** and restore it with a single command. Checkpoints are **opt-in** as of v2 — most users never use `/rollback`, and the shadow-store storage is non-trivial over time, so the default is off.
|
||||
|
||||
This safety net is powered by an internal **Checkpoint Manager** that keeps a separate shadow git repository under `~/.hermes/checkpoints/` — your real project `.git` is never touched.
|
||||
Enable checkpoints per-session with `--checkpoints`:
|
||||
|
||||
```bash
|
||||
hermes chat --checkpoints
|
||||
```
|
||||
|
||||
Or enable globally in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
checkpoints:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
This safety net is powered by an internal **Checkpoint Manager** that keeps a single shared shadow git repository under `~/.hermes/checkpoints/store/` — your real project `.git` is never touched. Every project the agent works in shares the same store, so git's content-addressable object DB deduplicates across projects and across turns.
|
||||
|
||||
## What Triggers a Checkpoint
|
||||
|
||||
|
|
@ -22,6 +35,8 @@ The agent creates **at most one checkpoint per directory per turn**, so long-run
|
|||
|
||||
## Quick Reference
|
||||
|
||||
In-session slash commands:
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/rollback` | List all checkpoints with change stats |
|
||||
|
|
@ -29,6 +44,17 @@ The agent creates **at most one checkpoint per directory per turn**, so long-run
|
|||
| `/rollback diff <N>` | Preview diff between checkpoint N and current state |
|
||||
| `/rollback <N> <file>` | Restore a single file from checkpoint N |
|
||||
|
||||
CLI for inspecting and managing the store outside a session:
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `hermes checkpoints` | Show total size, project count, per-project breakdown |
|
||||
| `hermes checkpoints status` | Same as bare `checkpoints` |
|
||||
| `hermes checkpoints list` | Alias for `status` |
|
||||
| `hermes checkpoints prune` | Force a sweep: delete orphans/stale, GC, enforce size cap |
|
||||
| `hermes checkpoints clear` | Nuke the entire checkpoint base (asks first) |
|
||||
| `hermes checkpoints clear-legacy` | Delete only the `legacy-*` archives from v1 migration |
|
||||
|
||||
## How Checkpoints Work
|
||||
|
||||
At a high level:
|
||||
|
|
@ -36,9 +62,9 @@ At a high level:
|
|||
- Hermes detects when tools are about to **modify files** in your working tree.
|
||||
- Once per conversation turn (per directory), it:
|
||||
- Resolves a reasonable project root for the file.
|
||||
- Initialises or reuses a **shadow git repo** tied to that directory.
|
||||
- Stages and commits the current state with a short, human‑readable reason.
|
||||
- These commits form a checkpoint history that you can inspect and restore via `/rollback`.
|
||||
- Initialises or reuses the **single shared shadow store** at `~/.hermes/checkpoints/store/`.
|
||||
- Stages into a per-project index, builds a tree, and commits to a per-project ref (`refs/hermes/<project-hash>`).
|
||||
- These per-project refs form a checkpoint history that you can inspect and restore via `/rollback`.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
|
|
@ -46,44 +72,46 @@ flowchart LR
|
|||
agent["AIAgent\n(run_agent.py)"]
|
||||
tools["File & terminal tools"]
|
||||
cpMgr["CheckpointManager"]
|
||||
shadowRepo["Shadow git repo\n~/.hermes/checkpoints/<hash>"]
|
||||
store["Shared shadow store\n~/.hermes/checkpoints/store/"]
|
||||
|
||||
user --> agent
|
||||
agent -->|"tool call"| tools
|
||||
tools -->|"before mutate\nensure_checkpoint()"| cpMgr
|
||||
cpMgr -->|"git add/commit"| shadowRepo
|
||||
cpMgr -->|"git add/commit-tree/update-ref"| store
|
||||
cpMgr -->|"OK / skipped"| tools
|
||||
tools -->|"apply changes"| agent
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Checkpoints are enabled by default. Configure in `~/.hermes/config.yaml`:
|
||||
Configure in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
checkpoints:
|
||||
enabled: true # master switch (default: true)
|
||||
max_snapshots: 50 # max checkpoints per directory
|
||||
enabled: false # master switch (default: false — opt-in)
|
||||
max_snapshots: 20 # max checkpoints per project (enforced via ref rewrite + gc)
|
||||
max_total_size_mb: 500 # hard cap on total store size; oldest commits dropped
|
||||
max_file_size_mb: 10 # skip any single file larger than this
|
||||
|
||||
# Auto-maintenance (opt-in): sweep ~/.hermes/checkpoints/ at startup
|
||||
# and delete shadow repos whose working directory no longer exists
|
||||
# (orphans) or whose newest commit is older than retention_days.
|
||||
# Runs at most once per min_interval_hours, tracked via a
|
||||
# .last_prune marker inside ~/.hermes/checkpoints/.
|
||||
auto_prune: false # default off — enable to reclaim disk
|
||||
# Auto-maintenance (on by default): sweep ~/.hermes/checkpoints/ at startup
|
||||
# and delete project entries whose working directory no longer exists
|
||||
# (orphans) or whose last_touch is older than retention_days. Runs at most
|
||||
# once per min_interval_hours, tracked via a .last_prune marker.
|
||||
auto_prune: true
|
||||
retention_days: 7
|
||||
delete_orphans: true # delete repos whose workdir is gone
|
||||
delete_orphans: true
|
||||
min_interval_hours: 24
|
||||
```
|
||||
|
||||
To disable:
|
||||
To disable everything:
|
||||
|
||||
```yaml
|
||||
checkpoints:
|
||||
enabled: false
|
||||
auto_prune: false
|
||||
```
|
||||
|
||||
When disabled, the Checkpoint Manager is a no‑op and never attempts git operations.
|
||||
When `enabled: false`, the Checkpoint Manager is a no-op and never attempts git operations. When `auto_prune: false`, the store grows until you run `hermes checkpoints prune` manually.
|
||||
|
||||
## Listing Checkpoints
|
||||
|
||||
|
|
@ -107,12 +135,38 @@ Hermes responds with a formatted list showing change statistics:
|
|||
/rollback <N> <file> restore a single file from checkpoint N
|
||||
```
|
||||
|
||||
Each entry shows:
|
||||
## Inspecting the Store from the Shell
|
||||
|
||||
- Short hash
|
||||
- Timestamp
|
||||
- Reason (what triggered the snapshot)
|
||||
- Change summary (files changed, insertions/deletions)
|
||||
```bash
|
||||
hermes checkpoints
|
||||
```
|
||||
|
||||
Sample output:
|
||||
|
||||
```text
|
||||
Checkpoint base: /home/you/.hermes/checkpoints
|
||||
Total size: 142.3 MB
|
||||
store/ 138.1 MB
|
||||
legacy-* 4.2 MB
|
||||
Projects: 12
|
||||
|
||||
WORKDIR COMMITS LAST TOUCH STATE
|
||||
/home/you/code/hermes-agent 20 2h ago live
|
||||
/home/you/code/experiments/rl-runner 8 1d ago live
|
||||
/home/you/code/old-prototype 3 9d ago orphan
|
||||
...
|
||||
|
||||
Legacy archives (1):
|
||||
legacy-20260506-050616 4.2 MB
|
||||
|
||||
Clear with: hermes checkpoints clear-legacy
|
||||
```
|
||||
|
||||
Force a full sweep (ignores the 24h idempotency marker):
|
||||
|
||||
```bash
|
||||
hermes checkpoints prune --retention-days 3 --max-size-mb 200
|
||||
```
|
||||
|
||||
## Previewing Changes with `/rollback diff`
|
||||
|
||||
|
|
@ -122,49 +176,21 @@ Before committing to a restore, preview what has changed since a checkpoint:
|
|||
/rollback diff 1
|
||||
```
|
||||
|
||||
This shows a git diff stat summary followed by the actual diff:
|
||||
|
||||
```text
|
||||
test.py | 2 +-
|
||||
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||
|
||||
diff --git a/test.py b/test.py
|
||||
--- a/test.py
|
||||
+++ b/test.py
|
||||
@@ -1 +1 @@
|
||||
-print('original content')
|
||||
+print('modified content')
|
||||
```
|
||||
|
||||
Long diffs are capped at 80 lines to avoid flooding the terminal.
|
||||
This shows a git diff stat summary followed by the actual diff.
|
||||
|
||||
## Restoring with `/rollback`
|
||||
|
||||
Restore to a checkpoint by number:
|
||||
|
||||
```
|
||||
/rollback 1
|
||||
```
|
||||
|
||||
Behind the scenes, Hermes:
|
||||
|
||||
1. Verifies the target commit exists in the shadow repo.
|
||||
2. Takes a **pre‑rollback snapshot** of the current state so you can "undo the undo" later.
|
||||
1. Verifies the target commit exists in the shadow store.
|
||||
2. Takes a **pre-rollback snapshot** of the current state so you can "undo the undo" later.
|
||||
3. Restores tracked files in your working directory.
|
||||
4. **Undoes the last conversation turn** so the agent's context matches the restored filesystem state.
|
||||
|
||||
On success:
|
||||
|
||||
```text
|
||||
✅ Restored to checkpoint 4270a8c5: before patch
|
||||
A pre-rollback snapshot was saved automatically.
|
||||
(^_^)b Undid 4 message(s). Removed: "Now update test.py to ..."
|
||||
4 message(s) remaining in history.
|
||||
Chat turn undone to match restored file state.
|
||||
```
|
||||
|
||||
The conversation undo ensures the agent doesn't "remember" changes that have been rolled back, avoiding confusion on the next turn.
|
||||
|
||||
## Single-File Restore
|
||||
|
||||
Restore just one file from a checkpoint without affecting the rest of the directory:
|
||||
|
|
@ -173,42 +199,51 @@ Restore just one file from a checkpoint without affecting the rest of the direct
|
|||
/rollback 1 src/broken_file.py
|
||||
```
|
||||
|
||||
This is useful when the agent made changes to multiple files but only one needs to be reverted.
|
||||
|
||||
## Safety and Performance Guards
|
||||
|
||||
To keep checkpointing safe and fast, Hermes applies several guardrails:
|
||||
|
||||
- **Git availability** — if `git` is not found on `PATH`, checkpoints are transparently disabled.
|
||||
- **Directory scope** — Hermes skips overly broad directories (root `/`, home `$HOME`).
|
||||
- **Repository size** — directories with more than 50,000 files are skipped to avoid slow git operations.
|
||||
- **No‑change snapshots** — if there are no changes since the last snapshot, the checkpoint is skipped.
|
||||
- **Non‑fatal errors** — all errors inside the Checkpoint Manager are logged at debug level; your tools continue to run.
|
||||
- **Repository size** — directories with more than 50,000 files are skipped.
|
||||
- **Per-file size cap** — files larger than `max_file_size_mb` (default 10 MB) are excluded from the snapshot. Prevents accidentally swallowing datasets, model weights, or generated media.
|
||||
- **Total store size cap** — when the store exceeds `max_total_size_mb` (default 500 MB), the oldest commit per project is dropped round-robin until under the cap.
|
||||
- **Real pruning** — `max_snapshots` is enforced by rewriting the per-project ref and running `git gc --prune=now` afterwards, so loose objects don't accumulate.
|
||||
- **No-change snapshots** — if there are no changes since the last snapshot, the checkpoint is skipped.
|
||||
- **Non-fatal errors** — all errors inside the Checkpoint Manager are logged at debug level; your tools continue to run.
|
||||
|
||||
## Where Checkpoints Live
|
||||
|
||||
All shadow repos live under:
|
||||
|
||||
```text
|
||||
~/.hermes/checkpoints/
|
||||
├── <hash1>/ # shadow git repo for one working directory
|
||||
├── <hash2>/
|
||||
└── ...
|
||||
├── store/ # single shared bare git repo
|
||||
│ ├── HEAD, objects/ # git internals (shared across projects)
|
||||
│ ├── refs/hermes/<hash> # per-project branch tip
|
||||
│ ├── indexes/<hash> # per-project git index
|
||||
│ ├── projects/<hash>.json # workdir + created_at + last_touch
|
||||
│ └── info/exclude
|
||||
├── .last_prune # auto-prune idempotency marker
|
||||
└── legacy-<ts>/ # archived pre-v2 per-project shadow repos
|
||||
```
|
||||
|
||||
Each `<hash>` is derived from the absolute path of the working directory. Inside each shadow repo you'll find:
|
||||
Each `<hash>` is derived from the absolute path of the working directory. You normally never need to touch these manually — use `hermes checkpoints status` / `prune` / `clear` instead.
|
||||
|
||||
- Standard git internals (`HEAD`, `refs/`, `objects/`)
|
||||
- An `info/exclude` file containing a curated ignore list
|
||||
- A `HERMES_WORKDIR` file pointing back to the original project root
|
||||
### Migration from v1
|
||||
|
||||
You normally never need to touch these manually.
|
||||
Before the v2 rewrite, each working directory got its own complete shadow git repo directly under `~/.hermes/checkpoints/<hash>/`. That layout couldn't dedup objects across projects and had a documented no-op pruner — the store would grow without bound.
|
||||
|
||||
On first v2 run, any pre-v2 shadow repos are moved into `~/.hermes/checkpoints/legacy-<timestamp>/` so the new single-store layout starts clean. Old `/rollback` history is still reachable by manually inspecting the legacy archive with `git`; once you're confident you don't need it, run:
|
||||
|
||||
```bash
|
||||
hermes checkpoints clear-legacy
|
||||
```
|
||||
|
||||
to reclaim the space. Legacy archives are also swept by `auto_prune` after `retention_days`.
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Leave checkpoints enabled** — they're on by default and have zero cost when no files are modified.
|
||||
- **Enable checkpoints only when you need them** — `hermes chat --checkpoints` or per-profile `enabled: true`.
|
||||
- **Use `/rollback diff` before restoring** — preview what will change to pick the right checkpoint.
|
||||
- **Use `/rollback` instead of `git reset`** when you want to undo agent-driven changes only.
|
||||
- **Check `hermes checkpoints status` occasionally** if you use checkpoints regularly — shows which projects are active and what the store costs you.
|
||||
- **Combine with Git worktrees** for maximum safety — keep each Hermes session in its own worktree/branch, with checkpoints as an extra layer.
|
||||
|
||||
For running multiple agents in parallel on the same repo, see the guide on [Git worktrees](./git-worktrees.md).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue