hermes-agent/hermes_cli
Teknium a0fedfbb1b
feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.

Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:

1. Each working directory got its own full shadow git repo — no object
   dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
   /rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
   without ever asking for /rollback.

Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.

Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
  refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
  per-project metadata (store/projects/<hash>.json with workdir +
  created_at + last_touch). On first v2 init, any pre-v2 per-directory
  shadow repos are auto-migrated into legacy-<timestamp>/ so the new
  store starts clean. _prune() now actually rewrites the per-project ref
  to the last max_snapshots commits and runs git gc --prune=now. New
  _enforce_size_cap() drops oldest commits round-robin across projects
  when the store exceeds max_total_size_mb. _drop_oversize_from_index()
  filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
  (status / list / prune / clear / clear-legacy) for managing the store
  outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
  auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
  Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
  *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
  AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
  coverage for the pre-v2 prune path (per-directory shadow repos under
  base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
  shared store, new defaults, migration, and the new CLI;
  reference/cli-commands.md documents 'hermes checkpoints'.

E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
  7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
  --retention-days 0' removes its ref, index, and metadata; GC reclaims
  the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
  tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
  CLI without an agent running.

Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
2026-05-06 05:44:35 -07:00
..
__init__.py fix(windows): enforce UTF-8 stdout/stderr to prevent UnicodeEncodeError crash 2026-05-03 16:58:25 -07:00
_parser.py refactor(cli): derive relaunch flag table from argparse introspection 2026-04-29 20:33:29 -07:00
auth.py feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path 2026-05-05 13:40:01 -07:00
auth_commands.py feat(nous): persist Nous OAuth across profiles via shared token store (#19712) 2026-05-04 04:54:55 -07:00
azure_detect.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
backup.py fix(backup): floor pre-update backup_keep to 1 so the new backup survives 2026-05-04 05:07:13 -07:00
banner.py fix(banner): show correct update status on nix-built hermes (#17550) 2026-04-30 07:03:00 +05:30
browser_connect.py fix(browser): address Copilot review on /browser connect 2026-04-28 22:11:10 -07:00
callbacks.py fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902) 2026-04-14 16:11:37 -07:00
checkpoints.py feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709) 2026-05-06 05:44:35 -07:00
claw.py fix(claw): handle missing dir in _scan_workspace_state 2026-05-05 06:08:14 -07:00
cli_output.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
clipboard.py feat: fix img pasting in new ink plus newline after tools 2026-04-11 13:14:32 -05:00
codex_models.py feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720) 2026-04-23 13:32:43 -07:00
colors.py feat: respect NO_COLOR env var and TERM=dumb (#4079) 2026-03-30 17:07:21 -07:00
commands.py feat(telegram): /topic off + help + auth gate + screenshot debounce 2026-05-04 12:07:17 -07:00
completion.py fix: preserve profile name completion in dynamic shell completion 2026-04-14 10:45:42 -07:00
config.py feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709) 2026-05-06 05:44:35 -07:00
copilot_auth.py fix(copilot): exchange raw GitHub token for Copilot API JWT 2026-04-24 05:09:08 -07:00
cron.py feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709) 2026-05-04 12:31:01 -07:00
curator.py feat(curator): add archive and prune subcommands (#20200) 2026-05-05 05:15:54 -07:00
curses_ui.py fix: treat ctrl-c as curses cancel 2026-05-04 01:36:44 -07:00
debug.py fix(debug): redact log content at upload time in hermes debug share 2026-05-03 11:42:20 -07:00
default_soul.py fix: reset default SOUL.md to baseline identity text (#3159) 2026-03-26 01:34:27 -07:00
dingtalk_auth.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
doctor.py fix(doctor): report Kanban worker tools as runtime-gated 2026-05-05 17:26:15 -07:00
dump.py refactor(env): use shared Hermes dotenv loader 2026-05-05 10:13:13 -07:00
env_loader.py refactor: consolidate symlink-safe atomic replace into shared helper 2026-04-28 04:58:22 -07:00
fallback_cmd.py feat(cli): add 'hermes fallback' command to manage fallback providers (#16052) 2026-04-26 06:19:04 -07:00
gateway.py fix(gateway): handle planned service stops 2026-05-04 16:00:49 -07:00
goals.py feat: /goal — persistent cross-turn goals (Ralph loop) (#18262) 2026-04-30 23:10:20 -07:00
hooks.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
kanban.py feat(kanban): surface task_runs.summary on dashboard cards + `kanban show` 2026-05-05 17:26:15 -07:00
kanban_db.py feat(kanban): surface task_runs.summary on dashboard cards + `kanban show` 2026-05-05 17:26:15 -07:00
kanban_diagnostics.py fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410) 2026-05-05 13:55:37 -07:00
logs.py feat: component-separated logging with session context and filtering (#7991) 2026-04-11 17:23:36 -07:00
main.py feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709) 2026-05-06 05:44:35 -07:00
mcp_config.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00
memory_setup.py fix(cli): decode .env as UTF-8 to avoid GBK crash on Windows 2026-05-02 01:40:31 -07:00
model_catalog.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
model_normalize.py feat(minimax-oauth): full integration with peer OAuth providers 2026-04-29 09:53:42 -07:00
model_switch.py fix(gateway): preserve model picker current context 2026-05-06 03:50:59 -07:00
models.py feat(models): add x-ai/grok-4.3 to OpenRouter + Nous Portal curated lists (#20497) 2026-05-05 19:15:10 -07:00
nous_subscription.py fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
oneshot.py fix(tui): honor launch toolsets (#17623) 2026-04-29 16:55:27 -07:00
pairing.py fix(pairing): handle null user_name in pairing list display 2026-04-23 02:34:11 -07:00
platforms.py feat: complete plugin platform parity — all 12 integration points 2026-04-29 21:56:51 -07:00
plugins.py feat(providers): make all 33 providers pluggable under plugins/model-providers/ 2026-05-05 13:40:01 -07:00
plugins_cmd.py feat(dashboard): add Plugins page with enable/disable, auth status, install/remove 2026-04-30 20:29:37 -04:00
profiles.py fix(profiles): keep validate_profile_name strict; callers normalize first 2026-05-04 04:44:37 -07:00
providers.py fix: prevent bare 'custom' slug in model.provider (#17478) 2026-04-30 04:32:11 -07:00
pty_bridge.py fix(pty): default TERM for resize probes 2026-05-04 02:38:54 -07:00
relaunch.py remove relaunch_chat 2026-04-29 20:33:29 -07:00
runtime_provider.py fix(fallback): let custom_providers shadow built-in aliases 2026-04-30 20:18:44 -07:00
setup.py fix(cli): sanitize bracketed paste markers during setup 2026-05-05 06:12:42 -07:00
skills_config.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00
skills_hub.py feat(skills): install skills from a direct HTTP(S) URL (#16323) 2026-04-26 20:57:10 -07:00
skin_engine.py fix(tui): restore macOS copy behavior and theme polish (#17131) 2026-04-28 18:47:14 -05:00
slack_cli.py fix(paths): route achievements plugin + profile-tui through HERMES_HOME 2026-04-30 23:21:54 -07:00
status.py fix(status): add missing popular provider API keys to hermes status display 2026-05-04 05:14:13 -07:00
timeouts.py refactor(timeouts): drop redundant ImportError in except clause 2026-04-26 20:48:20 -07:00
tips.py docs: refresh stale platform/LOC/test counts; clarify gateway vs plugin platforms 2026-05-05 13:45:47 -07:00
tools_config.py fix(cli): sync use_gateway in _reconfigure_provider for tts, browser, and web 2026-05-04 02:33:55 -07:00
uninstall.py feat(uninstall): offer to remove named profiles when uninstalling from default 2026-04-18 19:18:13 -07:00
vercel_auth.py feat: add Vercel Sandbox backend 2026-04-29 07:22:33 -07:00
voice.py fix(tui): respect voice.record_key config (supersedes #19028, #19339) (#19835) 2026-05-04 15:49:28 -07:00
web_server.py feat(docker): launch dashboard as side-process via HERMES_DASHBOARD=1 2026-05-04 15:37:27 +10:00
webhook.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00