Phase 4E (E.1 + E.2). The inbound side of Chronos: NAS POSTs the agent when a
one-shot fires; the agent verifies a NAS-minted JWT and runs the job.
E.1 — plugins/cron/chronos/verify.py:
- verify_nas_fire_token(token, expected_audience, jwks_or_key, issuer): verifies
signature against the NAS JWKS (RS/ES family; symmetric rejected), aud == this
agent, exp/nbf, iss, and purpose == "cron_fire" (so a general agent JWT can't
be replayed against the fire endpoint). Returns claims or None; never raises.
Crypto delegated to PyJWT[crypto] (already a declared dep) — no hand-rolled
JWT, no new dependency. No key configured → refuse (never unsigned-decode a
security boundary).
- get_fire_verifier(): pluggable indirection so the DQ-4 escape hatch
(direct per-job cron-key) can swap in with no handler change.
E.2 — gateway/platforms/api_server.py:
- POST /api/cron/fire (registered only when _CRON_AVAILABLE). Authenticated by
the NAS-JWT via get_fire_verifier() — NOT API_SERVER_KEY (NAS holds no API
key; this is the only inbound that triggers remote job execution, so it gets
its own purpose-scoped check). Verifier args come from cron.chronos.* config.
401 on bad/missing/forged token. 400 on missing job_id. On success: 202 +
fire_due runs in the background (so a long agent turn never trips NAS's HTTP
timeout); the store CAS claim inside fire_due de-dupes a scheduler retry.
Tests:
- test_chronos_verify (11): REAL RS256 signing — valid→claims, wrong-aud,
missing/wrong purpose, expired, wrong-iss, tampered-signature (attacker key),
no-key-refuse, empty-token, JWKS-URL key resolution, get_fire_verifier.
- test_cron_fire_webhook (5): valid→202+fire, invalid→401+no-fire, missing
token→401, missing job_id→400, and fire path does NOT require API_SERVER_KEY.
api_server regression suites (214) green.
E.3 (NAS endpoints) is a separate cross-repo PR; the wire contract lands next
(docs/chronos-managed-cron-contract.md).
Phase 4D. The first non-default CronScheduler: plugins/cron/chronos/. Inert
unless cron.provider=chronos; resolve_cron_scheduler falls back to the built-in
if unavailable, so cron never loses its trigger.
Files:
- chronos/__init__.py — ChronosCronScheduler + register(ctx).
* is_available(): config-only, NO network (portal_url + callback_url + a
stored Nous access token via get_provider_auth_state). Returns False →
resolver falls back to built-in.
* start(): reconcile() then RETURN — no blocking loop, no 60s wake (DQ-1:
this is what makes scale-to-zero real; the machine wakes only on a
NAS→agent fire).
* _arm_one_shot(job): POST NAS provision {job_id, fire_at, agent_callback_url,
dedup_key=job_id:fire_at}. Agent owns the time → sub-minute fires survive
(no scheduler 1-minute floor).
* reconcile(): converge NAS arms toward jobs.json — arm missing/changed-time,
cancel orphaned, skip paused. Cold process rebuilds from jobs.json +
idempotent dedup_key.
* on_jobs_changed(): reconcile (re-arm/cancel the affected one-shot).
* fire_due(): ABC default (CAS claim + run_one_job) THEN re-arm the next
one-shot. Job gone (one-shot done / repeat-N exhausted) → no re-arm.
- chronos/_nas_client.py — thin HTTP wrapper for provision/cancel/list using
the agent's existing refresh-aware Nous token (resolve_nous_access_token).
Names no scheduler vendor; holds no scheduler creds.
- chronos/plugin.yaml — discovery metadata.
INVARIANT: zero "qstash"/"upstash" hits in plugins/cron, gateway, hermes_cli,
website/docs — the external scheduler is a NAS-internal detail, never named
agent-side.
Tests (13, all NAS mocked, zero network): is_available off-without-config +
on-with-config + makes-no-network; arm payload incl. sub-minute + noop without
next_run; reconcile arms-all / cancels-orphan / skips-paused / skips-already-
armed; fire_due re-arms next / no re-arm when job gone / no re-arm when claim
lost.
Phase 2 of the pluggable cron-scheduler refactor. Still no call-site changes;
this wires up provider SELECTION with a hard safety net.
Task 2.1: cron.provider config key (hermes_cli/config.py), empty = built-in.
Additive key — deep-merge picks it up into existing configs with no version
bump (verified: load_config() yields the key on a pre-existing config.yaml).
Task 2.2: plugins/cron/__init__.py — discovery machinery cloned near-verbatim
from plugins/memory/__init__.py, retargeted at CronScheduler /
register_cron_scheduler. Bundled (plugins/cron/<name>/) + user
(/plugins/<name>/) dirs, bundled wins collisions. The built-in is
NOT discovered here — it's core, so the fallback can't be removed.
Task 2.3: resolve_cron_scheduler() in cron/scheduler_provider.py — reads
cron.provider and ALWAYS degrades to built-in (missing / unavailable / load
error / typo all fall back with a warning). cron can never be left without a
trigger.
Deviation from plan: the plan's resolver snippet used cfg_get("cron.provider")
(dotted-string form). The real cfg_get signature is cfg_get(cfg, *keys,
default=) — corrected to cfg_get(load_config(), "cron", "provider", default=""),
matching plugins/memory/__init__.py:349. Tests monkeypatch load_config (not
cfg_get) so the real traversal runs.
Tests: default key empty, discovery returns list, unknown load returns None,
and the four resolver paths (empty→builtin, no-section→builtin,
unknown→builtin, unavailable→builtin, available→used). Full tests/cron/: 453
passed; config suite green (additive key, no migration break).