hermes-agent/tests/cron
annguyenNous 07424da76f fix(cron): keep ticker alive on BaseException + heartbeat-aware status
The in-process cron ticker (cron/scheduler_provider.py) caught only
`Exception` and logged at DEBUG, so a `SystemExit`/`KeyboardInterrupt`
raised from a misbehaving provider SDK or agent retry path killed the
ticker thread silently. The gateway PROCESS stayed up, so `hermes cron
status` — which only checks `find_gateway_pids()` — kept reporting
"✓ jobs will fire automatically" while no jobs ever fired (#32612,
#32895).

This makes ticker death survivable and detectable:

- The ticker loop now catches `BaseException` and logs at ERROR with a
  traceback, so a single bad tick no longer tears the thread down and
  the failure is visible in the gateway log.
- The loop records a heartbeat (`cron/ticker_heartbeat`, epoch seconds)
  on startup and after every tick — best-effort, never raised into the
  loop. Both ticker entry points (the gateway and the desktop fallback
  in web_server.py) funnel through `InProcessCronScheduler.start`, so one
  heartbeat site covers both.
- `hermes cron status` now reads the heartbeat age: if the gateway is
  running but the heartbeat is stale (> 200s, i.e. several missed ~60s
  ticks), it reports the ticker as STALLED and suggests a restart instead
  of falsely claiming jobs will fire. A missing heartbeat (older build /
  never ran) is treated as "unknown", not "dead".

Adds tests for BaseException survival, per-iteration heartbeat recording,
heartbeat round-trip/age, staleness detection, and silent-write-failure.

Salvaged from #49660 (BaseException survival on current structure),
extended with the heartbeat + honest-status reporting that the earlier
(pre-refactor) watchdog PRs #35616 and #33849 proposed.

Fixes #32612
Fixes #32895

Co-authored-by: banditburai <promptsiren@gmail.com>
Co-authored-by: sweetcornna <96944678+sweetcornna@users.noreply.github.com>
2026-06-21 13:00:50 +05:30
..
__init__.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
conftest.py fix(cron): resolve model.default + fail fast on missing model 2026-06-21 12:37:56 +05:30
test_blueprint_catalog.py docs: finish Automation Blueprints terminology rebrand (#44470) 2026-06-11 17:22:22 -04:00
test_claim_job_for_fire.py feat(cron): store-level CAS claim for multi-machine at-most-once fire 2026-06-18 14:34:34 +10:00
test_codex_execution_paths.py refactor(session-log): delete _save_session_log and all callers 2026-05-20 11:44:10 -07:00
test_compute_next_run_last_run_at.py fix(cron): use last_run_at as croniter base for cron jobs 2026-04-29 08:24:48 -07:00
test_cron_context_from.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cron_inactivity_timeout.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cron_no_agent.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_cron_prompt_injection_skill.py fix(cron): don't strict-scan script-injected output in no-skills jobs (#43223) 2026-06-10 08:27:24 +05:30
test_cron_script.py test(cron): make env-sanitize probe var deterministic 2026-06-20 00:22:55 +05:30
test_cron_workdir.py fix(cron): make sequential jobs non-blocking too + sweep MCP after jobs finish 2026-06-04 05:40:13 -07:00
test_cronjob_schema.py test(cron): guard schedule-required description text on CRONJOB_SCHEMA 2026-05-26 14:09:37 -07:00
test_file_permissions.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_jobs.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_jobs_changed_notify.py feat(cron): wire on_jobs_changed, cron.chronos config, docs + agent↔NAS contract 2026-06-18 15:11:32 +10:00
test_jobs_crossprocess_lock.py fix: complete cron jobs lock salvage 2026-06-15 06:29:00 -07:00
test_parallel_pool.py revert(cron): remove per-job profile support (PR #28124) (#43956) 2026-06-10 20:46:17 -07:00
test_rewrite_skill_refs.py fix(curator): rewrite cron job skill refs after consolidation (#18253) 2026-04-30 23:04:50 -07:00
test_run_one_job.py refactor(cron): extract run_one_job shared firing helper from tick 2026-06-18 14:26:29 +10:00
test_scheduler.py fix(cron): resolve model.default + fail fast on missing model 2026-06-21 12:37:56 +05:30
test_scheduler_mcp_init.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_scheduler_provider.py fix(cron): keep ticker alive on BaseException + heartbeat-aware status 2026-06-21 13:00:50 +05:30
test_suggestions.py test(cron): document consent-first self-learning suggestions 2026-06-20 23:23:47 -07:00