hermes-agent/cron
kshitijk4poor a4e61ddf04 fix(cron): fail closed when an unpinned job's provider drifts from creation snapshot (#44585)
An unpinned cron job follows the global default provider (config.yaml
model.default + resolve_runtime_provider). If that global state is changed
after the job is created — e.g. a temporary switch to a paid provider like
nous/claude-fable-5 — the job silently inherits it on its next tick and spends
real money. This is the reported $7.73 incident: a job created under a
free/default provider later inherited a temporary paid switch.

Fix (ask #1 only) preserves the legitimate "unpinned job should follow
model.default" use case by detecting *drift* rather than freezing the model:

- create_job (cron/jobs.py): for UNPINNED, agent-backed jobs (no explicit
  provider, not no_agent), snapshot the provider that resolution WOULD pick
  right now into a new optional `provider_snapshot` field, resolved via the
  same resolve_runtime_provider() path the ticker uses. Fail-open to None on
  any resolution error so job creation never breaks.

- run_job (cron/scheduler.py): right after runtime resolution, if the job has
  a provider_snapshot AND is unpinned AND the currently-resolved provider
  DIFFERS from the snapshot, fail closed for that run — make no paid call and
  deliver a loud, actionable alert naming both providers and telling the user
  to pin explicitly (`cronjob action=update job_id=.. provider=..`).

Back-compat: jobs with no snapshot (pre-existing jobs, no_agent jobs, or any
job whose creation-time resolution failed) behave exactly as before — the
guard only engages when a snapshot exists. Explicitly-pinned jobs (job.provider
set) are unaffected since they don't drift with global state.

Tests: tests/cron/test_cron_provider_pin.py covers snapshot-matches (runs),
snapshot-differs (fail closed, no agent constructed), no-snapshot back-compat,
None-snapshot back-compat, explicitly-pinned (runs regardless), plus create_job
snapshot capture/skip/fail-open. The fail-closed case is load-bearing (fails
without the guard).

Issue #44585 asks #2-4 (hard-stop a running job, gateway-stop containment,
fail-closed on provider mutation) are out of scope for this change.
2026-06-23 02:45:52 +05:30
..
scripts fix(cron-recipes): pre-release hardening — honest cadences, strict slot names, surface-aware UX 2026-06-11 10:49:47 -07:00
__init__.py docs: clarify gateway service scopes (#1378) 2026-03-14 21:17:41 -07:00
blueprint_catalog.py docs: finish Automation Blueprints terminology rebrand (#44470) 2026-06-11 17:22:22 -04:00
jobs.py fix(cron): fail closed when an unpinned job's provider drifts from creation snapshot (#44585) 2026-06-23 02:45:52 +05:30
scheduler.py fix(cron): fail closed when an unpinned job's provider drifts from creation snapshot (#44585) 2026-06-23 02:45:52 +05:30
scheduler_provider.py fix(cron): keep ticker alive on BaseException + heartbeat-aware status 2026-06-21 13:00:50 +05:30
suggestion_catalog.py fix(cron-recipes): pre-release hardening — honest cadences, strict slot names, surface-aware UX 2026-06-11 10:49:47 -07:00
suggestions.py fix(cron): anchor cron storage at the default root home (not the active profile) 2026-06-21 16:45:14 +05:30