feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709)

* feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) Adds a no_agent=True option to the cronjob system. When enabled, the scheduler runs the attached script on schedule and delivers its stdout directly to the job's target — no LLM, no agent loop, no token spend. This is the classic bash-watchdog pattern (memory alert every 5 min, disk alert every 15 min, CI ping) reimplemented as a first-class Hermes primitive instead of a systemd timer + curl + bot token triplet living outside the system. ## What hermes cron create "every 5m" \ --no-agent \ --script memory-watchdog.sh \ --deliver telegram \ --name memory-watchdog Agent tool: cronjob(action='create', schedule='every 5m', script='memory-watchdog.sh', no_agent=True, deliver='telegram') Semantics: - Script stdout (trimmed) → delivered verbatim as the message - Empty stdout → silent tick (no delivery; watchdog pattern) - wakeAgent=false gate → silent tick (same gate LLM jobs use) - Non-zero exit/timeout → delivered as an error alert (broken watchdogs shouldn't fail silently) - No LLM ever invoked; no tokens spent; no provider fallback applied ## Implementation cron/jobs.py * create_job gains no_agent: bool = False * prompt becomes Optional (no_agent jobs don't need one) * Validation: no_agent=True requires a script at create time * Field roundtrips via load_jobs / save_jobs / update_job cron/scheduler.py * run_job: new short-circuit branch at the top that runs the script, wraps its output into the (success, doc, final_response, error) tuple downstream delivery already expects, and returns before any AIAgent import or construction * _run_job_script: picks interpreter by extension — .sh/.bash run under /bin/bash, anything else under sys.executable (Python). Shell support unlocks the bash-watchdog pattern without wrapping scripts in Python. Extension is explicit; we deliberately do NOT trust the file's own shebang. Path-containment guard (scripts dir) unchanged. tools/cronjob_tools.py * Schema: new no_agent boolean property with clear trigger guidance * cronjob() accepts no_agent and validates mode-specific shape: - no_agent=True requires script; prompt/skills optional - no_agent=False keeps the existing 'prompt or skill required' rule * update path rejects flipping no_agent=True on a job without a script * _format_job surfaces no_agent in list output * Handler lambda forwards no_agent from tool args hermes_cli/main.py, hermes_cli/cron.py * 'hermes cron create --no-agent' and edit's --no-agent / --agent pair for toggling at CLI parity with the agent tool * Existing --script help text updated to describe both modes * List / create / edit output now shows 'Mode: no-agent (...)' when set ## Tests tests/cron/test_cron_no_agent.py — 18 tests covering: * create_job: no_agent shape, validation, field persistence * update_job: flag roundtrip across reload * cronjob tool: schema validation, update toggling, mode-specific requirements, prompt-relaxation rule * run_job short-circuit: - success path delivers stdout verbatim - empty stdout → SILENT_MARKER (no delivery downstream) - wakeAgent=false gate → silent - script failure → error alert - run_job does NOT import AIAgent (verified via mock) * _run_job_script: - .sh executes via bash (no shebang required) - .bash executes via bash - .py still runs via sys.executable (regression) - path-traversal still blocked (security regression) All 18 new tests pass. 341/342 pre-existing cron tests still pass; the one failure (test_script_empty_output_noted) was already broken on main and is unrelated to this change. ## Docs website/docs/guides/cron-script-only.md — new dedicated guide covering the watchdog pattern, interpreter rules, delivery mapping, worked examples (memory / disk alerts), and the comparison table vs hermes send, regular LLM cron jobs, and OS-level cron. website/docs/user-guide/features/cron.md — new 'No-agent mode' section in the cron feature reference, cross-linked to the guide. website/docs/guides/automate-with-cron.md — new tip box pointing users to no-agent mode when they don't need LLM reasoning. ## Compatibility - Existing jobs: unchanged. no_agent defaults to False, existing code paths untouched until the flag is set. - Schema additive only; older jobs.json without the field load fine via .get() with False default. - New CLI flags are opt-in and don't alter existing flag behavior. * fix(cron): lazy-import AIAgent + SessionDB so no_agent ticks pay zero The unconditional `from run_agent import AIAgent` + SessionDB() init at the top of run_job() meant every no_agent tick still paid the full agent module load cost (~300ms + transitive imports + DB open) even though it never touched any of that machinery. Move both to live under the default (LLM) path, after the no_agent short-circuit has returned. Now a no_agent tick's sys.modules stays clean — verified end-to-end: assert 'run_agent' not in sys.modules # before run_job(no_agent_job) assert 'run_agent' not in sys.modules # after The existing mock-based unit test (test_run_job_no_agent_never_invokes_aiagent) kept passing because patch() replaces the class AFTER import; the leak was only visible via real subprocess-style verification. End-to-end demo confirmed: agent calls cronjob(no_agent=True) → script runs → stdout delivered → no LLM machinery loaded. * docs(cron): tighten no_agent tool schema — defaults, silent semantics, pick rule Previous description buried the important bits in one long sentence. Agents could plausibly miss three things an LLM-facing schema should make unmissable: 1. What the default is — now first sentence + JSON Schema `default: false` 2. What 'silent run' actually means for the user — now spelled out: 'nothing is sent to the user and they won't see anything happened' 3. When to pick True vs False — now a concrete decision rule with examples on both sides (watchdogs/metrics/pollers → True; summarize/draft/pick/rephrase → False) Also adds explicit 'prompt and skills are ignored when True' since the agent could otherwise still pass them out of habit. No behavior change — schema text only.
2026-05-07 02:51:50 +00:00 · 2026-05-04 12:31:01 -07:00 · 2026-05-04 12:31:01 -07:00 · 3db6b9cc87
commit 3db6b9cc87
parent d35efb9898
9 changed files with 823 additions and 17 deletions
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@ -576,8 +576,18 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
    prevent arbitrary script execution via path traversal or absolute
    path injection.

+    Supported interpreters (chosen by file extension):
+
+    * ``.sh`` / ``.bash`` — run with ``/bin/bash``
+    * anything else — run with the current Python interpreter
+      (``sys.executable``), preserving the original behaviour for
+      Python-based pre-check and data-collection scripts.
+
+    Shell support lets ``no_agent=True`` jobs ship classic bash watchdogs
+    (the `memory-watchdog.sh` pattern) without wrapping them in Python.
+
    Args:
-        script_path: Path to a Python script.  Relative paths are resolved
+        script_path: Path to the script.  Relative paths are resolved
            against HERMES_HOME/scripts/.  Absolute and ~-prefixed paths
            are also validated to ensure they stay within the scripts dir.

@ -614,9 +624,19 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:

    script_timeout = _get_script_timeout()

+    # Pick an interpreter by extension.  Bash for .sh/.bash, Python for
+    # everything else.  We deliberately do NOT honour the file's own
+    # shebang: the scripts dir is trusted, but keeping the interpreter
+    # choice explicit here keeps the allowed surface small and auditable.
+    suffix = path.suffix.lower()
+    if suffix in (".sh", ".bash"):
+        argv = ["/bin/bash", str(path)]
+    else:
+        argv = [sys.executable, str(path)]
+
    try:
        result = subprocess.run(
-            [sys.executable, str(path)],
+            argv,
            capture_output=True,
            text=True,
            timeout=script_timeout,
@ -830,8 +850,120 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
    Returns:
        Tuple of (success, full_output_doc, final_response, error_message)
    """
+    job_id = job["id"]
+    job_name = job["name"]
+
+    # ---------------------------------------------------------------
+    # no_agent short-circuit — the script IS the job, no LLM involvement.
+    # ---------------------------------------------------------------
+    # This mirrors the classic "run a bash script on a timer, send its
+    # stdout to telegram" watchdog pattern. The agent path is skipped
+    # entirely: no AIAgent, no prompt, no tool loop, no token spend.
+    #
+    # We check this BEFORE importing run_agent / constructing SessionDB so
+    # a pure-script tick never pays for the agent machinery it isn't going
+    # to use. Keep this block self-contained.
+    #
+    # Semantics:
+    #   - script stdout (trimmed) → delivered verbatim as the final message
+    #   - empty stdout            → silent run (no delivery, success=True)
+    #   - non-zero exit / timeout → delivered as an error alert, success=False
+    #   - wakeAgent=false gate    → treated like empty stdout (silent), since
+    #                               the whole point of no_agent is that there
+    #                               is no agent to wake
+    if job.get("no_agent"):
+        script_path = job.get("script")
+        if not script_path:
+            err = "no_agent=True but no script is set for this job"
+            logger.error("Job '%s': %s", job_id, err)
+            return False, "", "", err
+
+        # Apply workdir if configured — lets scripts use predictable relative
+        # paths. For no_agent jobs this is just the subprocess cwd (not an
+        # agent TERMINAL_CWD bridge).
+        _job_workdir = (job.get("workdir") or "").strip() or None
+        _prior_cwd = None
+        if _job_workdir and Path(_job_workdir).is_dir():
+            _prior_cwd = os.getcwd()
+            try:
+                os.chdir(_job_workdir)
+            except OSError:
+                _prior_cwd = None
+
+        try:
+            ok, output = _run_job_script(script_path)
+        finally:
+            if _prior_cwd is not None:
+                try:
+                    os.chdir(_prior_cwd)
+                except OSError:
+                    pass
+
+        now_iso = _hermes_now().strftime("%Y-%m-%d %H:%M:%S")
+
+        if not ok:
+            # Script crashed / timed out / exited non-zero.  Deliver the
+            # error so the user knows the watchdog itself broke — silent
+            # failure for an alerting job is the worst-case outcome.
+            alert = (
+                f"⚠ Cron watchdog '{job_name}' script failed\n\n"
+                f"{output}\n\n"
+                f"Time: {now_iso}"
+            )
+            doc = (
+                f"# Cron Job: {job_name}\n\n"
+                f"**Job ID:** {job_id}\n"
+                f"**Run Time:** {now_iso}\n"
+                f"**Mode:** no_agent (script)\n"
+                f"**Status:** script failed\n\n"
+                f"{output}\n"
+            )
+            return False, doc, alert, output
+
+        # Honour the wakeAgent gate as a silent signal — `wakeAgent: false`
+        # means "nothing to report this tick", same as empty stdout.
+        if not _parse_wake_gate(output):
+            logger.info(
+                "Job '%s' (no_agent): wakeAgent=false gate — silent run", job_id
+            )
+            silent_doc = (
+                f"# Cron Job: {job_name}\n\n"
+                f"**Job ID:** {job_id}\n"
+                f"**Run Time:** {now_iso}\n"
+                f"**Mode:** no_agent (script)\n"
+                f"**Status:** silent (wakeAgent=false)\n"
+            )
+            return True, silent_doc, SILENT_MARKER, None
+
+        if not output.strip():
+            logger.info("Job '%s' (no_agent): empty stdout — silent run", job_id)
+            silent_doc = (
+                f"# Cron Job: {job_name}\n\n"
+                f"**Job ID:** {job_id}\n"
+                f"**Run Time:** {now_iso}\n"
+                f"**Mode:** no_agent (script)\n"
+                f"**Status:** silent (empty output)\n"
+            )
+            return True, silent_doc, SILENT_MARKER, None
+
+        doc = (
+            f"# Cron Job: {job_name}\n\n"
+            f"**Job ID:** {job_id}\n"
+            f"**Run Time:** {now_iso}\n"
+            f"**Mode:** no_agent (script)\n\n"
+            f"---\n\n"
+            f"{output}\n"
+        )
+        return True, doc, output, None
+
+    # ---------------------------------------------------------------
+    # Default (LLM) path — import and construct the agent machinery now
+    # that we know we actually need it. Doing these imports here instead of
+    # at module top keeps no_agent ticks from paying for AIAgent / SessionDB
+    # construction costs.
+    # ---------------------------------------------------------------
    from run_agent import AIAgent
-    
+
    # Initialize SQLite session store so cron job messages are persisted
    # and discoverable via session_search (same pattern as gateway/run.py).
    _session_db = None
@ -840,9 +972,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        _session_db = SessionDB()
    except Exception as e:
        logger.debug("Job '%s': SQLite session store not available: %s", job.get("id", "?"), e)
-    
-    job_id = job["id"]
-    job_name = job["name"]

    # Wake-gate: if this job has a pre-check script, run it BEFORE building
    # the prompt so a ``{"wakeAgent": false}`` response can short-circuit