mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-07 02:51:50 +00:00
feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709)
* feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern)
Adds a no_agent=True option to the cronjob system. When enabled, the
scheduler runs the attached script on schedule and delivers its stdout
directly to the job's target — no LLM, no agent loop, no token spend.
This is the classic bash-watchdog pattern (memory alert every 5 min,
disk alert every 15 min, CI ping) reimplemented as a first-class Hermes
primitive instead of a systemd timer + curl + bot token triplet living
outside the system.
## What
hermes cron create "every 5m" \
--no-agent \
--script memory-watchdog.sh \
--deliver telegram \
--name memory-watchdog
Agent tool:
cronjob(action='create',
schedule='every 5m',
script='memory-watchdog.sh',
no_agent=True,
deliver='telegram')
Semantics:
- Script stdout (trimmed) → delivered verbatim as the message
- Empty stdout → silent tick (no delivery; watchdog pattern)
- wakeAgent=false gate → silent tick (same gate LLM jobs use)
- Non-zero exit/timeout → delivered as an error alert
(broken watchdogs shouldn't fail silently)
- No LLM ever invoked; no tokens spent; no provider fallback applied
## Implementation
cron/jobs.py
* create_job gains no_agent: bool = False
* prompt becomes Optional (no_agent jobs don't need one)
* Validation: no_agent=True requires a script at create time
* Field roundtrips via load_jobs / save_jobs / update_job
cron/scheduler.py
* run_job: new short-circuit branch at the top that runs the script,
wraps its output into the (success, doc, final_response, error)
tuple downstream delivery already expects, and returns before any
AIAgent import or construction
* _run_job_script: picks interpreter by extension — .sh/.bash run
under /bin/bash, anything else under sys.executable (Python).
Shell support unlocks the bash-watchdog pattern without wrapping
scripts in Python. Extension is explicit; we deliberately do NOT
trust the file's own shebang. Path-containment guard (scripts dir)
unchanged.
tools/cronjob_tools.py
* Schema: new no_agent boolean property with clear trigger guidance
* cronjob() accepts no_agent and validates mode-specific shape:
- no_agent=True requires script; prompt/skills optional
- no_agent=False keeps the existing 'prompt or skill required' rule
* update path rejects flipping no_agent=True on a job without a script
* _format_job surfaces no_agent in list output
* Handler lambda forwards no_agent from tool args
hermes_cli/main.py, hermes_cli/cron.py
* 'hermes cron create --no-agent' and edit's --no-agent / --agent
pair for toggling at CLI parity with the agent tool
* Existing --script help text updated to describe both modes
* List / create / edit output now shows 'Mode: no-agent (...)' when set
## Tests
tests/cron/test_cron_no_agent.py — 18 tests covering:
* create_job: no_agent shape, validation, field persistence
* update_job: flag roundtrip across reload
* cronjob tool: schema validation, update toggling, mode-specific
requirements, prompt-relaxation rule
* run_job short-circuit:
- success path delivers stdout verbatim
- empty stdout → SILENT_MARKER (no delivery downstream)
- wakeAgent=false gate → silent
- script failure → error alert
- run_job does NOT import AIAgent (verified via mock)
* _run_job_script:
- .sh executes via bash (no shebang required)
- .bash executes via bash
- .py still runs via sys.executable (regression)
- path-traversal still blocked (security regression)
All 18 new tests pass. 341/342 pre-existing cron tests still pass; the
one failure (test_script_empty_output_noted) was already broken on main
and is unrelated to this change.
## Docs
website/docs/guides/cron-script-only.md — new dedicated guide covering
the watchdog pattern, interpreter rules, delivery mapping, worked
examples (memory / disk alerts), and the comparison table vs hermes send,
regular LLM cron jobs, and OS-level cron.
website/docs/user-guide/features/cron.md — new 'No-agent mode' section
in the cron feature reference, cross-linked to the guide.
website/docs/guides/automate-with-cron.md — new tip box pointing users
to no-agent mode when they don't need LLM reasoning.
## Compatibility
- Existing jobs: unchanged. no_agent defaults to False, existing code
paths untouched until the flag is set.
- Schema additive only; older jobs.json without the field load fine
via .get() with False default.
- New CLI flags are opt-in and don't alter existing flag behavior.
* fix(cron): lazy-import AIAgent + SessionDB so no_agent ticks pay zero
The unconditional `from run_agent import AIAgent` + SessionDB() init at
the top of run_job() meant every no_agent tick still paid the full agent
module load cost (~300ms + transitive imports + DB open) even though it
never touched any of that machinery.
Move both to live under the default (LLM) path, after the no_agent
short-circuit has returned. Now a no_agent tick's sys.modules stays
clean — verified end-to-end:
assert 'run_agent' not in sys.modules # before
run_job(no_agent_job)
assert 'run_agent' not in sys.modules # after
The existing mock-based unit test (test_run_job_no_agent_never_invokes_aiagent)
kept passing because patch() replaces the class AFTER import; the leak
was only visible via real subprocess-style verification. End-to-end
demo confirmed: agent calls cronjob(no_agent=True) → script runs →
stdout delivered → no LLM machinery loaded.
* docs(cron): tighten no_agent tool schema — defaults, silent semantics, pick rule
Previous description buried the important bits in one long sentence.
Agents could plausibly miss three things an LLM-facing schema should
make unmissable:
1. What the default is — now first sentence + JSON Schema `default: false`
2. What 'silent run' actually means for the user — now spelled out:
'nothing is sent to the user and they won't see anything happened'
3. When to pick True vs False — now a concrete decision rule with
examples on both sides (watchdogs/metrics/pollers → True;
summarize/draft/pick/rephrase → False)
Also adds explicit 'prompt and skills are ignored when True' since the
agent could otherwise still pass them out of habit.
No behavior change — schema text only.
This commit is contained in:
parent
d35efb9898
commit
3db6b9cc87
9 changed files with 823 additions and 17 deletions
|
|
@ -576,8 +576,18 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
|
|||
prevent arbitrary script execution via path traversal or absolute
|
||||
path injection.
|
||||
|
||||
Supported interpreters (chosen by file extension):
|
||||
|
||||
* ``.sh`` / ``.bash`` — run with ``/bin/bash``
|
||||
* anything else — run with the current Python interpreter
|
||||
(``sys.executable``), preserving the original behaviour for
|
||||
Python-based pre-check and data-collection scripts.
|
||||
|
||||
Shell support lets ``no_agent=True`` jobs ship classic bash watchdogs
|
||||
(the `memory-watchdog.sh` pattern) without wrapping them in Python.
|
||||
|
||||
Args:
|
||||
script_path: Path to a Python script. Relative paths are resolved
|
||||
script_path: Path to the script. Relative paths are resolved
|
||||
against HERMES_HOME/scripts/. Absolute and ~-prefixed paths
|
||||
are also validated to ensure they stay within the scripts dir.
|
||||
|
||||
|
|
@ -614,9 +624,19 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
|
|||
|
||||
script_timeout = _get_script_timeout()
|
||||
|
||||
# Pick an interpreter by extension. Bash for .sh/.bash, Python for
|
||||
# everything else. We deliberately do NOT honour the file's own
|
||||
# shebang: the scripts dir is trusted, but keeping the interpreter
|
||||
# choice explicit here keeps the allowed surface small and auditable.
|
||||
suffix = path.suffix.lower()
|
||||
if suffix in (".sh", ".bash"):
|
||||
argv = ["/bin/bash", str(path)]
|
||||
else:
|
||||
argv = [sys.executable, str(path)]
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[sys.executable, str(path)],
|
||||
argv,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=script_timeout,
|
||||
|
|
@ -830,8 +850,120 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
|||
Returns:
|
||||
Tuple of (success, full_output_doc, final_response, error_message)
|
||||
"""
|
||||
job_id = job["id"]
|
||||
job_name = job["name"]
|
||||
|
||||
# ---------------------------------------------------------------
|
||||
# no_agent short-circuit — the script IS the job, no LLM involvement.
|
||||
# ---------------------------------------------------------------
|
||||
# This mirrors the classic "run a bash script on a timer, send its
|
||||
# stdout to telegram" watchdog pattern. The agent path is skipped
|
||||
# entirely: no AIAgent, no prompt, no tool loop, no token spend.
|
||||
#
|
||||
# We check this BEFORE importing run_agent / constructing SessionDB so
|
||||
# a pure-script tick never pays for the agent machinery it isn't going
|
||||
# to use. Keep this block self-contained.
|
||||
#
|
||||
# Semantics:
|
||||
# - script stdout (trimmed) → delivered verbatim as the final message
|
||||
# - empty stdout → silent run (no delivery, success=True)
|
||||
# - non-zero exit / timeout → delivered as an error alert, success=False
|
||||
# - wakeAgent=false gate → treated like empty stdout (silent), since
|
||||
# the whole point of no_agent is that there
|
||||
# is no agent to wake
|
||||
if job.get("no_agent"):
|
||||
script_path = job.get("script")
|
||||
if not script_path:
|
||||
err = "no_agent=True but no script is set for this job"
|
||||
logger.error("Job '%s': %s", job_id, err)
|
||||
return False, "", "", err
|
||||
|
||||
# Apply workdir if configured — lets scripts use predictable relative
|
||||
# paths. For no_agent jobs this is just the subprocess cwd (not an
|
||||
# agent TERMINAL_CWD bridge).
|
||||
_job_workdir = (job.get("workdir") or "").strip() or None
|
||||
_prior_cwd = None
|
||||
if _job_workdir and Path(_job_workdir).is_dir():
|
||||
_prior_cwd = os.getcwd()
|
||||
try:
|
||||
os.chdir(_job_workdir)
|
||||
except OSError:
|
||||
_prior_cwd = None
|
||||
|
||||
try:
|
||||
ok, output = _run_job_script(script_path)
|
||||
finally:
|
||||
if _prior_cwd is not None:
|
||||
try:
|
||||
os.chdir(_prior_cwd)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
now_iso = _hermes_now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
|
||||
if not ok:
|
||||
# Script crashed / timed out / exited non-zero. Deliver the
|
||||
# error so the user knows the watchdog itself broke — silent
|
||||
# failure for an alerting job is the worst-case outcome.
|
||||
alert = (
|
||||
f"⚠ Cron watchdog '{job_name}' script failed\n\n"
|
||||
f"{output}\n\n"
|
||||
f"Time: {now_iso}"
|
||||
)
|
||||
doc = (
|
||||
f"# Cron Job: {job_name}\n\n"
|
||||
f"**Job ID:** {job_id}\n"
|
||||
f"**Run Time:** {now_iso}\n"
|
||||
f"**Mode:** no_agent (script)\n"
|
||||
f"**Status:** script failed\n\n"
|
||||
f"{output}\n"
|
||||
)
|
||||
return False, doc, alert, output
|
||||
|
||||
# Honour the wakeAgent gate as a silent signal — `wakeAgent: false`
|
||||
# means "nothing to report this tick", same as empty stdout.
|
||||
if not _parse_wake_gate(output):
|
||||
logger.info(
|
||||
"Job '%s' (no_agent): wakeAgent=false gate — silent run", job_id
|
||||
)
|
||||
silent_doc = (
|
||||
f"# Cron Job: {job_name}\n\n"
|
||||
f"**Job ID:** {job_id}\n"
|
||||
f"**Run Time:** {now_iso}\n"
|
||||
f"**Mode:** no_agent (script)\n"
|
||||
f"**Status:** silent (wakeAgent=false)\n"
|
||||
)
|
||||
return True, silent_doc, SILENT_MARKER, None
|
||||
|
||||
if not output.strip():
|
||||
logger.info("Job '%s' (no_agent): empty stdout — silent run", job_id)
|
||||
silent_doc = (
|
||||
f"# Cron Job: {job_name}\n\n"
|
||||
f"**Job ID:** {job_id}\n"
|
||||
f"**Run Time:** {now_iso}\n"
|
||||
f"**Mode:** no_agent (script)\n"
|
||||
f"**Status:** silent (empty output)\n"
|
||||
)
|
||||
return True, silent_doc, SILENT_MARKER, None
|
||||
|
||||
doc = (
|
||||
f"# Cron Job: {job_name}\n\n"
|
||||
f"**Job ID:** {job_id}\n"
|
||||
f"**Run Time:** {now_iso}\n"
|
||||
f"**Mode:** no_agent (script)\n\n"
|
||||
f"---\n\n"
|
||||
f"{output}\n"
|
||||
)
|
||||
return True, doc, output, None
|
||||
|
||||
# ---------------------------------------------------------------
|
||||
# Default (LLM) path — import and construct the agent machinery now
|
||||
# that we know we actually need it. Doing these imports here instead of
|
||||
# at module top keeps no_agent ticks from paying for AIAgent / SessionDB
|
||||
# construction costs.
|
||||
# ---------------------------------------------------------------
|
||||
from run_agent import AIAgent
|
||||
|
||||
|
||||
# Initialize SQLite session store so cron job messages are persisted
|
||||
# and discoverable via session_search (same pattern as gateway/run.py).
|
||||
_session_db = None
|
||||
|
|
@ -840,9 +972,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
|||
_session_db = SessionDB()
|
||||
except Exception as e:
|
||||
logger.debug("Job '%s': SQLite session store not available: %s", job.get("id", "?"), e)
|
||||
|
||||
job_id = job["id"]
|
||||
job_name = job["name"]
|
||||
|
||||
# Wake-gate: if this job has a pre-check script, run it BEFORE building
|
||||
# the prompt so a ``{"wakeAgent": false}`` response can short-circuit
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue