hermes-agent/cron
Ben b01eee0c77 feat(cron): store-level CAS claim for multi-machine at-most-once fire
Phase 4C. claim_job_for_fire(job_id, *, claim_ttl_seconds=300) in cron/jobs.py:
under the existing _jobs_lock() file lock, claim a job for a single external
fire so that across N gateway replicas exactly ONE wins. Single-machine
deployments always win (unaffected).

Semantics:
- missing / disabled / paused job → False.
- a fresh fire_claim (younger than claim_ttl_seconds) already present → False
  (someone else holds it). Stale claim (crashed winner) → overwrite, so a job
  is never wedged forever.
- on win: stamp fire_claim={at, by:_machine_id()}; for recurring (cron/interval)
  advance next_run_at (mirrors advance_next_run's at-most-once bump so a stale
  re-delivery can't re-fire); one-shots keep next_run_at but the fresh claim
  blocks a duplicate retry for the same fire.
- mark_job_run now clears fire_claim on completion so a re-armed recurring job
  is claimable again next fire.

_machine_id() (HERMES_MACHINE_ID env, else hostname:pid) is attribution-only;
correctness is the file lock + fresh-claim check, not the id.

This is consumed by CronScheduler.fire_due (Phase 4B). tick is untouched — it
still uses advance_next_run, so the built-in single-machine path is unaffected.

Tests (real store, temp HERMES_HOME): claim-once-then-block + next_run advance,
one-shot no-double-claim, unknown→False, paused→False, stale-claim reclaimable,
mark_job_run clears the claim (recurring re-claimable). tests/cron/ 470 passed.
2026-06-18 14:34:34 +10:00
..
scripts fix(cron-recipes): pre-release hardening — honest cadences, strict slot names, surface-aware UX 2026-06-11 10:49:47 -07:00
__init__.py docs: clarify gateway service scopes (#1378) 2026-03-14 21:17:41 -07:00
blueprint_catalog.py docs: finish Automation Blueprints terminology rebrand (#44470) 2026-06-11 17:22:22 -04:00
jobs.py feat(cron): store-level CAS claim for multi-machine at-most-once fire 2026-06-18 14:34:34 +10:00
scheduler.py refactor(cron): extract run_one_job shared firing helper from tick 2026-06-18 14:26:29 +10:00
scheduler_provider.py feat(cron): additive CronScheduler hooks (on_jobs_changed/fire_due/reconcile) 2026-06-18 14:30:31 +10:00
suggestion_catalog.py fix(cron-recipes): pre-release hardening — honest cadences, strict slot names, surface-aware UX 2026-06-11 10:49:47 -07:00
suggestions.py refactor(cron): rebrand Cron Recipes -> Automation Blueprints 2026-06-11 10:49:47 -07:00