fix(cron): run due jobs in parallel to prevent serial tick starvation (#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue #9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR #9169 by @VenomMoth1.

Fixes #9086
This commit is contained in:
Teknium 2026-04-20 11:53:07 -07:00 committed by GitHub
parent d587d62eba
commit c86915024e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 259 additions and 81 deletions

View file

@ -359,11 +359,12 @@ def _describe_media_for_mirror(media_files):
def _get_cron_auto_delivery_target():
"""Return the cron scheduler's auto-delivery target for the current run, if any."""
platform = os.getenv("HERMES_CRON_AUTO_DELIVER_PLATFORM", "").strip().lower()
chat_id = os.getenv("HERMES_CRON_AUTO_DELIVER_CHAT_ID", "").strip()
from gateway.session_context import get_session_env
platform = get_session_env("HERMES_CRON_AUTO_DELIVER_PLATFORM", "").strip().lower()
chat_id = get_session_env("HERMES_CRON_AUTO_DELIVER_CHAT_ID", "").strip()
if not platform or not chat_id:
return None
thread_id = os.getenv("HERMES_CRON_AUTO_DELIVER_THREAD_ID", "").strip() or None
thread_id = get_session_env("HERMES_CRON_AUTO_DELIVER_THREAD_ID", "").strip() or None
return {
"platform": platform,
"chat_id": chat_id,