feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709)

* feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) Adds a no_agent=True option to the cronjob system. When enabled, the scheduler runs the attached script on schedule and delivers its stdout directly to the job's target — no LLM, no agent loop, no token spend. This is the classic bash-watchdog pattern (memory alert every 5 min, disk alert every 15 min, CI ping) reimplemented as a first-class Hermes primitive instead of a systemd timer + curl + bot token triplet living outside the system. ## What hermes cron create "every 5m" \ --no-agent \ --script memory-watchdog.sh \ --deliver telegram \ --name memory-watchdog Agent tool: cronjob(action='create', schedule='every 5m', script='memory-watchdog.sh', no_agent=True, deliver='telegram') Semantics: - Script stdout (trimmed) → delivered verbatim as the message - Empty stdout → silent tick (no delivery; watchdog pattern) - wakeAgent=false gate → silent tick (same gate LLM jobs use) - Non-zero exit/timeout → delivered as an error alert (broken watchdogs shouldn't fail silently) - No LLM ever invoked; no tokens spent; no provider fallback applied ## Implementation cron/jobs.py * create_job gains no_agent: bool = False * prompt becomes Optional (no_agent jobs don't need one) * Validation: no_agent=True requires a script at create time * Field roundtrips via load_jobs / save_jobs / update_job cron/scheduler.py * run_job: new short-circuit branch at the top that runs the script, wraps its output into the (success, doc, final_response, error) tuple downstream delivery already expects, and returns before any AIAgent import or construction * _run_job_script: picks interpreter by extension — .sh/.bash run under /bin/bash, anything else under sys.executable (Python). Shell support unlocks the bash-watchdog pattern without wrapping scripts in Python. Extension is explicit; we deliberately do NOT trust the file's own shebang. Path-containment guard (scripts dir) unchanged. tools/cronjob_tools.py * Schema: new no_agent boolean property with clear trigger guidance * cronjob() accepts no_agent and validates mode-specific shape: - no_agent=True requires script; prompt/skills optional - no_agent=False keeps the existing 'prompt or skill required' rule * update path rejects flipping no_agent=True on a job without a script * _format_job surfaces no_agent in list output * Handler lambda forwards no_agent from tool args hermes_cli/main.py, hermes_cli/cron.py * 'hermes cron create --no-agent' and edit's --no-agent / --agent pair for toggling at CLI parity with the agent tool * Existing --script help text updated to describe both modes * List / create / edit output now shows 'Mode: no-agent (...)' when set ## Tests tests/cron/test_cron_no_agent.py — 18 tests covering: * create_job: no_agent shape, validation, field persistence * update_job: flag roundtrip across reload * cronjob tool: schema validation, update toggling, mode-specific requirements, prompt-relaxation rule * run_job short-circuit: - success path delivers stdout verbatim - empty stdout → SILENT_MARKER (no delivery downstream) - wakeAgent=false gate → silent - script failure → error alert - run_job does NOT import AIAgent (verified via mock) * _run_job_script: - .sh executes via bash (no shebang required) - .bash executes via bash - .py still runs via sys.executable (regression) - path-traversal still blocked (security regression) All 18 new tests pass. 341/342 pre-existing cron tests still pass; the one failure (test_script_empty_output_noted) was already broken on main and is unrelated to this change. ## Docs website/docs/guides/cron-script-only.md — new dedicated guide covering the watchdog pattern, interpreter rules, delivery mapping, worked examples (memory / disk alerts), and the comparison table vs hermes send, regular LLM cron jobs, and OS-level cron. website/docs/user-guide/features/cron.md — new 'No-agent mode' section in the cron feature reference, cross-linked to the guide. website/docs/guides/automate-with-cron.md — new tip box pointing users to no-agent mode when they don't need LLM reasoning. ## Compatibility - Existing jobs: unchanged. no_agent defaults to False, existing code paths untouched until the flag is set. - Schema additive only; older jobs.json without the field load fine via .get() with False default. - New CLI flags are opt-in and don't alter existing flag behavior. * fix(cron): lazy-import AIAgent + SessionDB so no_agent ticks pay zero The unconditional `from run_agent import AIAgent` + SessionDB() init at the top of run_job() meant every no_agent tick still paid the full agent module load cost (~300ms + transitive imports + DB open) even though it never touched any of that machinery. Move both to live under the default (LLM) path, after the no_agent short-circuit has returned. Now a no_agent tick's sys.modules stays clean — verified end-to-end: assert 'run_agent' not in sys.modules # before run_job(no_agent_job) assert 'run_agent' not in sys.modules # after The existing mock-based unit test (test_run_job_no_agent_never_invokes_aiagent) kept passing because patch() replaces the class AFTER import; the leak was only visible via real subprocess-style verification. End-to-end demo confirmed: agent calls cronjob(no_agent=True) → script runs → stdout delivered → no LLM machinery loaded. * docs(cron): tighten no_agent tool schema — defaults, silent semantics, pick rule Previous description buried the important bits in one long sentence. Agents could plausibly miss three things an LLM-facing schema should make unmissable: 1. What the default is — now first sentence + JSON Schema `default: false` 2. What 'silent run' actually means for the user — now spelled out: 'nothing is sent to the user and they won't see anything happened' 3. When to pick True vs False — now a concrete decision rule with examples on both sides (watchdogs/metrics/pollers → True; summarize/draft/pick/rephrase → False) Also adds explicit 'prompt and skills are ignored when True' since the agent could otherwise still pass them out of habit. No behavior change — schema text only.
2026-05-05 02:31:47 +00:00 · 2026-05-04 12:31:01 -07:00 · 2026-05-04 12:31:01 -07:00 · 3db6b9cc87
commit 3db6b9cc87
parent d35efb9898
9 changed files with 823 additions and 17 deletions
--- a/website/docs/guides/automate-with-cron.md
+++ b/website/docs/guides/automate-with-cron.md
@ -14,6 +14,10 @@ For the full feature reference, see [Scheduled Tasks (Cron)](/docs/user-guide/fe
 Cron jobs run in fresh agent sessions with no memory of your current chat. Prompts must be **completely self-contained** — include everything the agent needs to know.
 :::

+:::tip Don't need the LLM? Use no-agent mode.
+For recurring watchdogs where the script already produces the exact message you want to send (memory alerts, disk alerts, CI pings, heartbeats), skip the LLM entirely with [script-only cron jobs](/docs/guides/cron-script-only). Zero tokens, same scheduler.
+:::
+
 ---

 ## Pattern 1: Website Change Monitor
--- a/website/docs/guides/cron-script-only.md
+++ b/website/docs/guides/cron-script-only.md
@ -0,0 +1,194 @@
+---
+sidebar_position: 13
+title: "Script-Only Cron Jobs (No LLM)"
+description: "Classic watchdog cron jobs that skip the LLM entirely — a script runs on schedule and its stdout gets delivered to your messaging platform. Memory alerts, disk alerts, CI pings, periodic health checks."
+---
+
+# Script-Only Cron Jobs
+
+Sometimes you already know exactly what message you want to send. You don't need an agent to reason about it — you just need a script to run on a timer, and its output (if any) to land in Telegram / Discord / Slack / Signal.
+
+Hermes calls this **no-agent mode**. It's the cron system minus the LLM.
+
+```
+   ┌──────────────────┐          ┌──────────────────┐
+   │ scheduler tick   │  every   │ run script       │
+   │ (every N minutes)│ ──────▶ │ (bash or python) │
+   └──────────────────┘          └──────────────────┘
+                                          │
+                                          │ stdout
+                                          ▼
+                                 ┌──────────────────┐
+                                 │ delivery router  │
+                                 │ (telegram/disc…) │
+                                 └──────────────────┘
+```
+
+- **No LLM call.** Zero tokens, zero agent loop, zero model spend.
+- **Script is the job.** The script decides whether to alert. Emit output → message gets sent. Emit nothing → silent tick.
+- **Bash or Python.** `.sh` / `.bash` files run under `/bin/bash`; any other extension runs under the current Python interpreter. Anything in `~/.hermes/scripts/` is accepted.
+- **Same scheduler.** Lives in `cronjob` alongside LLM jobs — pausing, resuming, listing, logs, and delivery targeting all work the same way.
+
+## When to Use It
+
+Use no-agent mode for:
+
+- **Memory / disk / GPU watchdogs.** Run every 5 minutes, alert only when a threshold is breached.
+- **CI hooks.** Deploy finished → post the commit SHA. Build failed → send the last 100 lines of the log.
+- **Periodic metrics.** "Daily Stripe revenue at 9am" as a simple API call + pretty-print.
+- **External event pollers.** Check an API, alert on state change.
+- **Heartbeats.** Ping a dashboard every N minutes to prove the host is alive.
+
+Use a normal (LLM-driven) cron job when you need the agent to **decide** what to say — summarize a long document, pick interesting items from a feed, draft a human-friendly message. The no-agent path is for cases where the script's stdout already IS the message.
+
+## Create One from the CLI
+
+```bash
+# 1. Write your script
+cat > ~/.hermes/scripts/memory-watchdog.sh <<'EOF'
+#!/usr/bin/env bash
+# Alert when RAM usage is over 85%. Silent otherwise.
+RAM_PCT=$(free | awk '/^Mem:/ {printf "%d", $3 * 100 / $2}')
+if [ "$RAM_PCT" -ge 85 ]; then
+  echo "⚠ RAM ${RAM_PCT}% on $(hostname)"
+fi
+# Empty stdout = silent run; no message sent.
+EOF
+chmod +x ~/.hermes/scripts/memory-watchdog.sh
+
+# 2. Schedule it
+hermes cron create "every 5m" \
+  --no-agent \
+  --script memory-watchdog.sh \
+  --deliver telegram \
+  --name "memory-watchdog"
+
+# 3. Verify
+hermes cron list
+hermes cron run <job_id>    # fire it once to test
+```
+
+That's the whole thing. No prompt, no skill, no model.
+
+## Create One from Chat
+
+You can also ask the agent to set one up conversationally. The `cronjob` tool now accepts a `no_agent` parameter:
+
+> *"Ping me on Telegram if RAM is over 85%, every 5 minutes."*
+
+The agent will:
+
+1. Write the check script to `~/.hermes/scripts/` via `write_file`.
+2. Call `cronjob(action='create', schedule='every 5m', script='memory-watchdog.sh', no_agent=true, deliver='telegram')`.
+
+This is the same scheduler the agent already uses for LLM-driven jobs; `no_agent=true` just picks the script-only code path.
+
+## How Script Output Maps to Delivery
+
+| Script behavior | Result |
+|-----------------|--------|
+| Exit 0, non-empty stdout | stdout is delivered verbatim |
+| Exit 0, empty stdout | Silent tick — no delivery |
+| Exit 0, stdout contains `{"wakeAgent": false}` on the last line | Silent tick (shared gate with LLM jobs) |
+| Non-zero exit code | Error alert is delivered (so a broken watchdog doesn't fail silently) |
+| Script timeout | Error alert is delivered |
+
+The "silent when empty" behavior is the key to the classic watchdog pattern: the script is free to run every minute, but the channel only sees a message when something actually needs attention.
+
+## Script Rules
+
+Scripts must live in `~/.hermes/scripts/`. This is enforced at both job-creation time and run time — absolute paths, `~/` expansion, and path-traversal patterns (`../`) are rejected. The same directory is shared with the pre-check script gate used by LLM jobs.
+
+Interpreter choice is by file extension:
+
+| Extension | Interpreter |
+|-----------|-------------|
+| `.sh`, `.bash` | `/bin/bash` |
+| anything else | `sys.executable` (current Python) |
+
+We intentionally do NOT honour `#!/...` shebangs — keeping the interpreter set explicit and small reduces the surface the scheduler trusts.
+
+## Schedule Syntax
+
+Same as all other cron jobs:
+
+```bash
+hermes cron create "every 5m"        # interval
+hermes cron create "every 2h"
+hermes cron create "0 9 * * *"       # standard cron: 9am daily
+hermes cron create "30m"             # one-shot: run once in 30 minutes
+```
+
+See the [cron feature reference](/docs/user-guide/features/cron) for the full syntax.
+
+## Delivery Targets
+
+`--deliver` accepts everything the gateway knows about. Some common shapes:
+
+```bash
+--deliver telegram                       # platform home channel
+--deliver telegram:-1001234567890        # specific chat
+--deliver telegram:-1001234567890:17585  # specific Telegram forum topic
+--deliver discord:#ops
+--deliver slack:#engineering
+--deliver signal:+15551234567
+--deliver local                          # just save to ~/.hermes/cron/output/
+```
+
+No running gateway is required at script-run time for bot-token platforms (Telegram, Discord, Slack, Signal, SMS, WhatsApp) — the tool calls each platform's REST endpoint directly using the credentials already in `~/.hermes/.env` / `~/.hermes/config.yaml`.
+
+## Editing and Lifecycle
+
+```bash
+hermes cron list                                    # see all jobs
+hermes cron pause <job_id>                          # stop firing, keep definition
+hermes cron resume <job_id>
+hermes cron edit <job_id> --schedule "every 10m"    # adjust cadence
+hermes cron edit <job_id> --agent                   # flip to LLM mode
+hermes cron edit <job_id> --no-agent --script …     # flip back
+hermes cron remove <job_id>                         # delete it
+```
+
+Everything that works on LLM jobs (pause, resume, manual trigger, delivery target changes) works on no-agent jobs too.
+
+## Worked Example: Disk Space Alert
+
+```bash
+cat > ~/.hermes/scripts/disk-alert.sh <<'EOF'
+#!/usr/bin/env bash
+# Alert when / or /home is over 90% full.
+THRESHOLD=90
+df -h / /home 2>/dev/null | awk -v t="$THRESHOLD" '
+  NR > 1 && $5+0 >= t {
+    printf "⚠ Disk %s full on %s\n", $5, $6
+  }
+'
+EOF
+chmod +x ~/.hermes/scripts/disk-alert.sh
+
+hermes cron create "*/15 * * * *" \
+  --no-agent \
+  --script disk-alert.sh \
+  --deliver telegram \
+  --name "disk-alert"
+```
+
+Silent when both filesystems are under 90%; fires exactly one line per over-threshold filesystem when one fills up.
+
+## Comparison with Other Patterns
+
+| Approach | What runs | When to use |
+|----------|-----------|-------------|
+| `hermes send` (one-shot) | Any shell command piping into it | Ad-hoc delivery or as the action of an external scheduler (systemd, launchd) |
+| `cronjob --no-agent` (this page) | Your script on Hermes' schedule | Recurring watchdogs / alerts / metrics that don't need reasoning |
+| `cronjob` (default, LLM) | Agent with optional pre-check script | When the message content requires reasoning over data |
+| OS cron + `hermes send` | Your script on the OS schedule | When Hermes might be unhealthy (the thing you're monitoring) |
+
+For critical system-health watchdogs that must fire *even when the gateway is down*, keep using OS-level cron + a plain `curl` or `hermes send` call — those run as independent OS processes and don't depend on Hermes being up. The in-gateway scheduler is the right choice when the thing being monitored is external.
+
+## Related
+
+- [Automate Anything with Cron](/docs/guides/automate-with-cron) — LLM-driven cron patterns.
+- [Scheduled Tasks (Cron) reference](/docs/user-guide/features/cron) — full schedule syntax, lifecycle, delivery routing.
+- [Pipe Script Output with `hermes send`](/docs/guides/pipe-script-output) — the one-shot counterpart for ad-hoc scripts.
+- [Gateway Internals](/docs/developer-guide/gateway-internals) — delivery-router internals.
--- a/website/docs/user-guide/features/cron.md
+++ b/website/docs/user-guide/features/cron.md
@ -286,6 +286,30 @@ cron:

 Or set the `HERMES_CRON_SCRIPT_TIMEOUT` environment variable. The resolution order is: env var → config.yaml → 120s default.

+## No-agent mode (script-only jobs)
+
+For recurring jobs that don't need LLM reasoning — classic watchdogs, disk/memory alerts, heartbeats, CI pings — pass `no_agent=True` at creation time. The scheduler runs your script on schedule and delivers its stdout directly, skipping the agent entirely:
+
+```bash
+hermes cron create "every 5m" \
+  --no-agent \
+  --script memory-watchdog.sh \
+  --deliver telegram \
+  --name "memory-watchdog"
+```
+
+Semantics:
+
+- Script stdout (trimmed) → delivered verbatim as the message.
+- **Empty stdout → silent tick**, no delivery. This is the watchdog pattern: "only say something when something is wrong".
+- Non-zero exit or timeout → an error alert is delivered, so a broken watchdog can't fail silently.
+- `{"wakeAgent": false}` on the last line → silent tick (same gate LLM jobs use).
+- No tokens, no model, no provider fallback — the job never touches the inference layer.
+
+`.sh` / `.bash` files run under `/bin/bash`; anything else under the current Python interpreter (`sys.executable`). Scripts must live in `~/.hermes/scripts/` (same sandboxing rule as the pre-run script gate).
+
+See the [Script-Only Cron Jobs guide](/docs/guides/cron-script-only) for worked examples.
+
 ## Provider recovery

 Cron jobs inherit your configured fallback providers and credential pool rotation. If the primary API key is rate-limited or the provider returns an error, the cron agent can: