mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-09 03:11:58 +00:00
feat(skills): watchers skill — poll RSS / HTTP JSON / GitHub via cron no-agent (#21881)
* feat(skills): watchers skill — poll RSS / HTTP JSON / GitHub via cron no-agent Ships three reusable polling scripts plus a shared watermark helper as an optional skill. Users wire them into the existing cron (no_agent=True) mode rather than learning a new subsystem. Supersedes the closed PR #21497 (parallel watcher subsystem). Same value, zero new core surface. ## What ships - optional-skills/devops/watchers/SKILL.md: pattern + three example cron commands - optional-skills/devops/watchers/scripts/_watermark.py: shared helper (atomic state writes, bounded ID set, first-run baseline) - optional-skills/devops/watchers/scripts/watch_rss.py: RSS 2.0 + Atom - optional-skills/devops/watchers/scripts/watch_http_json.py: any JSON endpoint with configurable id_field / items_path / headers - optional-skills/devops/watchers/scripts/watch_github.py: issues / pulls / releases / commits (uses GITHUB_TOKEN if present) ## Invariants enforced by the shared helper - First run records baseline, emits nothing (never replays existing feed) - Watermark file is <state_dir>/<name>.json, atomic replace on write - Bounded to 500 IDs (configurable) - Empty stdout when no new items — cron treats that as silent delivery ## Validation - watch_rss.py against news.ycombinator.com/rss first run → empty stdout, watermark populated - Removed one seen-id, second run → emitted exactly that item - No DeprecationWarnings (ET element truth-value footgun dodged explicitly) End-user pattern: 'hermes cron create my-feed --schedule "*/15 * * * *" --no-agent --script $HERMES_HOME/skills/devops/watchers/scripts/watch_rss.py --script-args "--name hn --url https://news.ycombinator.com/rss" --deliver telegram' * docs(skills/watchers): tighten description to match peer optional skills * docs(skills/watchers): align frontmatter + structure with peer optional skills * docs(skills/watchers): gate to linux/macos (shell syntax in examples)
This commit is contained in:
parent
839cdd1b05
commit
ea8e608821
5 changed files with 680 additions and 0 deletions
112
optional-skills/devops/watchers/SKILL.md
Normal file
112
optional-skills/devops/watchers/SKILL.md
Normal file
|
|
@ -0,0 +1,112 @@
|
|||
---
|
||||
name: watchers
|
||||
description: Poll RSS, JSON APIs, and GitHub with watermark dedup.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
platforms: [linux, macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [cron, polling, rss, github, http, automation, monitoring]
|
||||
category: devops
|
||||
requires_toolsets: [terminal]
|
||||
related_skills: []
|
||||
---
|
||||
|
||||
# Watchers
|
||||
|
||||
Poll external sources on an interval and react only to new items. Three ready-made scripts plus a shared watermark helper; wire them into a cron job (or run them ad-hoc from the terminal).
|
||||
|
||||
## When to Use
|
||||
|
||||
- User wants to watch an RSS/Atom feed and be notified of new entries
|
||||
- User wants to watch a GitHub repo's issues / pulls / releases / commits
|
||||
- User wants to poll an arbitrary JSON endpoint and get notified on new items
|
||||
- User asks for "a watcher for X" or "notify me when X changes"
|
||||
|
||||
## Mental model
|
||||
|
||||
A watcher is just a script that:
|
||||
|
||||
1. Fetches data from the external source
|
||||
2. Compares against a watermark file of previously-seen IDs
|
||||
3. Writes the new watermark back
|
||||
4. Prints new items to stdout (or nothing on no-change)
|
||||
|
||||
The scripts below handle all three. The agent runs them via the terminal tool — from a cron job, a webhook, or an interactive chat — and reports what's new.
|
||||
|
||||
## Ready-made scripts
|
||||
|
||||
All three live in `$HERMES_HOME/skills/devops/watchers/scripts/` once the skill is installed. Each reads `WATCHER_STATE_DIR` (defaults to `$HERMES_HOME/watcher-state/`) for its state file, keyed by the `--name` argument.
|
||||
|
||||
| Script | What it watches | Dedup key |
|
||||
|---|---|---|
|
||||
| `watch_rss.py` | RSS 2.0 or Atom feed URL | `<guid>` / `<id>` |
|
||||
| `watch_http_json.py` | Any JSON endpoint returning a list of objects | Configurable id field |
|
||||
| `watch_github.py` | GitHub issues / pulls / releases / commits for a repo | `id` / `sha` |
|
||||
|
||||
All three:
|
||||
|
||||
- First run records a baseline — never replays existing feed
|
||||
- Watermark is a bounded ID set (max 500) to cap memory
|
||||
- Output format: `## <title>\n<url>\n\n<optional body>` per item
|
||||
- Empty stdout on no-new — the caller treats that as silent
|
||||
- Non-zero exit on fetch errors
|
||||
|
||||
## Usage
|
||||
|
||||
Run a watcher directly from the terminal tool:
|
||||
|
||||
```bash
|
||||
python $HERMES_HOME/skills/devops/watchers/scripts/watch_rss.py \
|
||||
--name hn --url https://news.ycombinator.com/rss --max 5
|
||||
```
|
||||
|
||||
Watch a GitHub repo (set `GITHUB_TOKEN` in `~/.hermes/.env` to avoid the 60 req/hr anonymous rate limit):
|
||||
|
||||
```bash
|
||||
python $HERMES_HOME/skills/devops/watchers/scripts/watch_github.py \
|
||||
--name hermes-issues --repo NousResearch/hermes-agent --scope issues
|
||||
```
|
||||
|
||||
Poll an arbitrary JSON API:
|
||||
|
||||
```bash
|
||||
python $HERMES_HOME/skills/devops/watchers/scripts/watch_http_json.py \
|
||||
--name api --url https://api.example.com/events \
|
||||
--id-field event_id --items-path data.events
|
||||
```
|
||||
|
||||
## Wiring into cron
|
||||
|
||||
Ask the agent to schedule a cron job with a prompt like:
|
||||
|
||||
> Every 15 minutes, run `watch_rss.py --name hn --url https://news.ycombinator.com/rss`. If it prints anything, summarize the headlines and deliver them. If it prints nothing, stay silent.
|
||||
|
||||
The agent invokes the script via the terminal tool inside the cron job's agent loop; no changes to cron's built-in `--script` flag are needed.
|
||||
|
||||
## State files
|
||||
|
||||
Every watcher writes `$HERMES_HOME/watcher-state/<name>.json`. Inspect:
|
||||
|
||||
```bash
|
||||
cat $HERMES_HOME/watcher-state/hn.json
|
||||
```
|
||||
|
||||
Force a replay (next run treated as first poll):
|
||||
|
||||
```bash
|
||||
rm $HERMES_HOME/watcher-state/hn.json
|
||||
```
|
||||
|
||||
## Writing your own
|
||||
|
||||
All three scripts use the same template: load watermark, fetch, diff, save, emit. `scripts/_watermark.py` is the shared helper; import it to get atomic writes + bounded ID set + first-run baseline for free. See any of the three reference scripts for how little boilerplate it takes.
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Printing a "no new items" header every tick.** Callers rely on empty stdout = silent. If you print anything on an empty delta, you spam the channel. The shipped scripts handle this; custom scripts must too.
|
||||
2. **Expecting the first run to emit items.** It won't — first run records a baseline. If you need an initial digest, delete the state file after the first run or add a `--prime-with-latest N` flag in your own script.
|
||||
3. **Unbounded watermark growth.** The shared helper caps at 500 IDs. Raise it for high-churn feeds; lower it on constrained filesystems.
|
||||
4. **Putting the state dir where the agent's sandbox can't write.** `$HERMES_HOME/watcher-state/` is always writable. Docker/Modal backends may not see arbitrary host paths.
|
||||
|
||||
148
optional-skills/devops/watchers/scripts/_watermark.py
Executable file
148
optional-skills/devops/watchers/scripts/_watermark.py
Executable file
|
|
@ -0,0 +1,148 @@
|
|||
"""Shared watermark helper used by the three watcher scripts.
|
||||
|
||||
A watermark is just a JSON file that records the IDs we've seen on previous
|
||||
runs, so the next run only emits items we haven't seen before.
|
||||
|
||||
Contract:
|
||||
- First run: record all IDs from the fetched batch, emit nothing.
|
||||
- Subsequent runs: emit items whose ID isn't in the stored set.
|
||||
- Bounded: keep at most `max_seen` IDs (default 500).
|
||||
- Atomic: write to a .tmp file and rename, so a crashed script can't
|
||||
leave a half-written state file that permanently breaks dedup.
|
||||
|
||||
Import and use from any custom watcher script:
|
||||
|
||||
from _watermark import Watermark
|
||||
|
||||
wm = Watermark.load("my-feed-name")
|
||||
new_items = wm.filter_new(fetched_items, id_key="id")
|
||||
wm.save()
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Iterable, List, Optional
|
||||
|
||||
|
||||
def _state_dir() -> Path:
|
||||
"""Where watermark files live — respects WATCHER_STATE_DIR override."""
|
||||
override = os.environ.get("WATCHER_STATE_DIR")
|
||||
if override:
|
||||
return Path(override)
|
||||
# Default: $HERMES_HOME/watcher-state/, falling back to ~/.hermes/watcher-state/.
|
||||
hermes_home = os.environ.get("HERMES_HOME") or str(Path.home() / ".hermes")
|
||||
return Path(hermes_home) / "watcher-state"
|
||||
|
||||
|
||||
class Watermark:
|
||||
"""Per-watcher state. Persisted to <state_dir>/<name>.json."""
|
||||
|
||||
def __init__(self, name: str, *, max_seen: int = 500) -> None:
|
||||
if not name or not name.replace("-", "").replace("_", "").isalnum():
|
||||
raise ValueError(
|
||||
f"watermark name must be alphanumeric + '-'/'_' (got {name!r})"
|
||||
)
|
||||
self.name = name
|
||||
self.max_seen = max_seen
|
||||
self._path = _state_dir() / f"{name}.json"
|
||||
self._data: Dict[str, Any] = {"seen_ids": [], "first_run": True}
|
||||
|
||||
@classmethod
|
||||
def load(cls, name: str, *, max_seen: int = 500) -> "Watermark":
|
||||
wm = cls(name, max_seen=max_seen)
|
||||
if wm._path.exists():
|
||||
try:
|
||||
wm._data = json.loads(wm._path.read_text(encoding="utf-8"))
|
||||
wm._data.setdefault("seen_ids", [])
|
||||
wm._data["first_run"] = False
|
||||
except (OSError, json.JSONDecodeError):
|
||||
# Corrupt state file — treat as a first run but don't crash.
|
||||
wm._data = {"seen_ids": [], "first_run": True}
|
||||
return wm
|
||||
|
||||
@property
|
||||
def is_first_run(self) -> bool:
|
||||
return bool(self._data.get("first_run", True))
|
||||
|
||||
@property
|
||||
def seen(self) -> List[str]:
|
||||
return list(self._data.get("seen_ids", []))
|
||||
|
||||
def filter_new(
|
||||
self, items: Iterable[Dict[str, Any]], *, id_key: str = "id"
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""Return items whose id isn't in the stored set.
|
||||
|
||||
Side effect: updates the in-memory seen set with every id in the
|
||||
batch (so save() persists the full new watermark). On first run,
|
||||
records every id but returns an empty list (baseline, no replay).
|
||||
"""
|
||||
existing = set(str(x) for x in self._data.get("seen_ids", []))
|
||||
was_first_run = self.is_first_run
|
||||
|
||||
new_items: List[Dict[str, Any]] = []
|
||||
batch_ids: List[str] = []
|
||||
for item in items:
|
||||
ident = item.get(id_key)
|
||||
if ident is None:
|
||||
continue
|
||||
ident_str = str(ident)
|
||||
batch_ids.append(ident_str)
|
||||
if ident_str in existing:
|
||||
continue
|
||||
if was_first_run:
|
||||
continue # record but don't emit
|
||||
new_items.append(item)
|
||||
|
||||
combined = list(existing) + [i for i in batch_ids if i not in existing]
|
||||
if len(combined) > self.max_seen:
|
||||
combined = combined[-self.max_seen:]
|
||||
self._data["seen_ids"] = combined
|
||||
self._data["first_run"] = False
|
||||
return new_items
|
||||
|
||||
def save(self) -> None:
|
||||
self._path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = self._path.with_suffix(".tmp")
|
||||
tmp.write_text(
|
||||
json.dumps(self._data, indent=2, sort_keys=True),
|
||||
encoding="utf-8",
|
||||
)
|
||||
os.replace(tmp, self._path)
|
||||
|
||||
|
||||
def format_items_as_markdown(
|
||||
items: List[Dict[str, Any]],
|
||||
*,
|
||||
title_key: str = "title",
|
||||
url_key: str = "url",
|
||||
body_key: Optional[str] = None,
|
||||
max_body_chars: int = 500,
|
||||
) -> str:
|
||||
"""Render a list of items as Markdown for cron delivery.
|
||||
|
||||
One heading per item + its URL + optional snippet of body. Output is
|
||||
empty string when items is empty — cron will then treat stdout as
|
||||
silent and skip delivery (existing behavior).
|
||||
"""
|
||||
if not items:
|
||||
return ""
|
||||
lines: List[str] = []
|
||||
for item in items:
|
||||
title = (item.get(title_key) or "(no title)").strip()
|
||||
url = (item.get(url_key) or "").strip()
|
||||
lines.append(f"## {title}")
|
||||
if url:
|
||||
lines.append(url)
|
||||
if body_key:
|
||||
body = (item.get(body_key) or "").strip()
|
||||
if body:
|
||||
if len(body) > max_body_chars:
|
||||
body = body[:max_body_chars].rstrip() + "…"
|
||||
lines.append("")
|
||||
lines.append(body)
|
||||
lines.append("")
|
||||
return "\n".join(lines).rstrip() + "\n"
|
||||
168
optional-skills/devops/watchers/scripts/watch_github.py
Executable file
168
optional-skills/devops/watchers/scripts/watch_github.py
Executable file
|
|
@ -0,0 +1,168 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Watch GitHub activity — issues, pulls, releases, or commits — with dedup.
|
||||
|
||||
Usage (via cron with --no-agent):
|
||||
|
||||
hermes cron create hermes-issues \\
|
||||
--schedule "*/5 * * * *" --no-agent \\
|
||||
--script "$HERMES_HOME/skills/devops/watchers/scripts/watch_github.py" \\
|
||||
--script-args "--name hermes-issues --repo NousResearch/hermes-agent --scope issues"
|
||||
|
||||
Set GITHUB_TOKEN (or GH_TOKEN) in ~/.hermes/.env to avoid the 60 req/hr
|
||||
anonymous rate limit.
|
||||
|
||||
Scopes: issues | pulls | releases | commits. Or pass --search QUERY to
|
||||
use the /search/issues endpoint instead of /repos/:owner/:repo/:scope.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
from _watermark import Watermark, format_items_as_markdown # type: ignore
|
||||
|
||||
|
||||
VALID_SCOPES = ("issues", "pulls", "releases", "commits")
|
||||
|
||||
|
||||
def _flatten_commit(item):
|
||||
"""Commit objects nest title/author/date under 'commit' — flatten for rendering."""
|
||||
commit = item.get("commit") or {}
|
||||
msg = (commit.get("message") or "").strip().splitlines()
|
||||
title = msg[0] if msg else ""
|
||||
body = "\n".join(msg[1:]).strip() if len(msg) > 1 else ""
|
||||
author = (item.get("author") or {}).get("login") or (commit.get("author") or {}).get("name", "")
|
||||
date = (commit.get("author") or {}).get("date", "")
|
||||
return {
|
||||
"id": item.get("sha", ""),
|
||||
"title": f"{title} ({author})" if author else title,
|
||||
"url": item.get("html_url"),
|
||||
"body": body,
|
||||
"created_at": date,
|
||||
}
|
||||
|
||||
|
||||
def _flatten_issue_or_release(item):
|
||||
return {
|
||||
"id": str(item.get("id", "")),
|
||||
"title": item.get("title") or item.get("name") or "",
|
||||
"url": item.get("html_url") or item.get("url"),
|
||||
"body": (item.get("body") or "").strip(),
|
||||
"state": item.get("state"),
|
||||
"author": (item.get("user") or {}).get("login")
|
||||
or (item.get("author") or {}).get("login"),
|
||||
"created_at": item.get("created_at"),
|
||||
}
|
||||
|
||||
|
||||
def main() -> int:
|
||||
p = argparse.ArgumentParser(description="Watch GitHub issues / pulls / releases / commits.")
|
||||
p.add_argument("--name", required=True, help="Watcher name (used for state file)")
|
||||
p.add_argument("--repo", default="",
|
||||
help="owner/name of the repo (one of --repo or --search is required)")
|
||||
p.add_argument("--scope", default="issues", choices=VALID_SCOPES,
|
||||
help="What to poll (default: issues)")
|
||||
p.add_argument("--search", default="",
|
||||
help="GitHub issues search query (alternative to --repo/--scope)")
|
||||
p.add_argument("--per-page", type=int, default=30,
|
||||
help="Results per page (default: 30, max: 100)")
|
||||
p.add_argument("--max", type=int, default=20,
|
||||
help="Max new items to emit per tick (default: 20)")
|
||||
p.add_argument("--with-body", action="store_true",
|
||||
help="Include issue/commit body as a snippet under each item")
|
||||
p.add_argument("--timeout", type=float, default=30.0,
|
||||
help="HTTP timeout in seconds (default: 30)")
|
||||
args = p.parse_args()
|
||||
|
||||
if not args.repo and not args.search:
|
||||
print("watch_github: one of --repo or --search is required", file=sys.stderr)
|
||||
return 2
|
||||
if args.repo and not re.fullmatch(r"[A-Za-z0-9._-]+/[A-Za-z0-9._-]+", args.repo):
|
||||
print(f"watch_github: --repo must be owner/name (got {args.repo!r})", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
# URL + flattening strategy.
|
||||
if args.search:
|
||||
url = (
|
||||
"https://api.github.com/search/issues"
|
||||
f"?q={urllib.parse.quote(args.search)}&per_page={args.per_page}"
|
||||
)
|
||||
flatten = _flatten_issue_or_release
|
||||
items_path = "items"
|
||||
elif args.scope == "commits":
|
||||
url = f"https://api.github.com/repos/{args.repo}/commits?per_page={args.per_page}"
|
||||
flatten = _flatten_commit
|
||||
items_path = ""
|
||||
else:
|
||||
url = (
|
||||
f"https://api.github.com/repos/{args.repo}/{args.scope}"
|
||||
f"?per_page={args.per_page}&state=all"
|
||||
)
|
||||
flatten = _flatten_issue_or_release
|
||||
items_path = ""
|
||||
|
||||
headers = {
|
||||
"Accept": "application/vnd.github+json",
|
||||
"User-Agent": "Hermes-Watcher/1.0",
|
||||
}
|
||||
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
|
||||
if token:
|
||||
headers["Authorization"] = f"Bearer {token}"
|
||||
|
||||
req = urllib.request.Request(url)
|
||||
for k, v in headers.items():
|
||||
req.add_header(k, v)
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=args.timeout) as resp:
|
||||
raw = resp.read()
|
||||
except urllib.error.HTTPError as e:
|
||||
print(f"watch_github: HTTP {e.code} from {url}", file=sys.stderr)
|
||||
return 2
|
||||
except (urllib.error.URLError, TimeoutError, OSError) as e:
|
||||
print(f"watch_github: network error: {e}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
try:
|
||||
data = json.loads(raw.decode("utf-8"))
|
||||
except (UnicodeDecodeError, json.JSONDecodeError) as e:
|
||||
print(f"watch_github: response is not valid JSON: {e}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
# Drill into items_path if needed (search endpoint returns {"items":[...]}).
|
||||
if items_path:
|
||||
data = data.get(items_path) if isinstance(data, dict) else None
|
||||
if not isinstance(data, list):
|
||||
print(f"watch_github: expected a list of items; got {type(data).__name__}",
|
||||
file=sys.stderr)
|
||||
return 2
|
||||
|
||||
items = [flatten(i) for i in data if isinstance(i, dict)]
|
||||
# Drop any items that flattened without an ID (defensive).
|
||||
items = [i for i in items if i.get("id")]
|
||||
|
||||
wm = Watermark.load(args.name)
|
||||
new_items = wm.filter_new(items, id_key="id")
|
||||
wm.save()
|
||||
|
||||
if args.max > 0:
|
||||
new_items = new_items[: args.max]
|
||||
|
||||
body_key = "body" if args.with_body else None
|
||||
output = format_items_as_markdown(new_items, body_key=body_key)
|
||||
if output:
|
||||
sys.stdout.write(output)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
131
optional-skills/devops/watchers/scripts/watch_http_json.py
Executable file
131
optional-skills/devops/watchers/scripts/watch_http_json.py
Executable file
|
|
@ -0,0 +1,131 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Watch any JSON endpoint that returns a list of objects; dedup by ID field.
|
||||
|
||||
Usage (via cron with --no-agent):
|
||||
|
||||
hermes cron create api-events \\
|
||||
--schedule "*/1 * * * *" --no-agent \\
|
||||
--script "$HERMES_HOME/skills/devops/watchers/scripts/watch_http_json.py" \\
|
||||
--script-args "--name api --url https://api.example.com/events \\
|
||||
--id-field event_id --items-path data.events"
|
||||
|
||||
The response can be:
|
||||
- a top-level JSON list (default), or
|
||||
- a JSON object with a dotted ``--items-path`` pointing to the list.
|
||||
|
||||
Each item is deduped by ``--id-field`` (default "id").
|
||||
|
||||
Optional ``--header KEY:VALUE`` flags pass HTTP headers (repeatable).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
from _watermark import Watermark, format_items_as_markdown # type: ignore
|
||||
|
||||
|
||||
def _dig(obj, path: str):
|
||||
"""Dotted-path lookup: _dig({'a':{'b':[1,2]}}, 'a.b') → [1,2]."""
|
||||
if not path:
|
||||
return obj
|
||||
cur = obj
|
||||
for part in path.split("."):
|
||||
if isinstance(cur, dict) and part in cur:
|
||||
cur = cur[part]
|
||||
else:
|
||||
return None
|
||||
return cur
|
||||
|
||||
|
||||
def _parse_header(s: str):
|
||||
if ":" not in s:
|
||||
raise argparse.ArgumentTypeError(
|
||||
f"--header expects 'KEY: VALUE' (got {s!r})"
|
||||
)
|
||||
k, v = s.split(":", 1)
|
||||
return (k.strip(), v.strip())
|
||||
|
||||
|
||||
def main() -> int:
|
||||
p = argparse.ArgumentParser(description="Poll a JSON endpoint.")
|
||||
p.add_argument("--name", required=True, help="Watcher name (used for state file)")
|
||||
p.add_argument("--url", required=True, help="JSON endpoint URL")
|
||||
p.add_argument("--id-field", default="id",
|
||||
help="Field used to dedup items (default: 'id')")
|
||||
p.add_argument("--items-path", default="",
|
||||
help="Dotted path to the list inside the JSON response (e.g. 'data.events')")
|
||||
p.add_argument("--title-field", default="title",
|
||||
help="Field used as the item title in the rendered output (default: 'title')")
|
||||
p.add_argument("--url-field", default="url",
|
||||
help="Field used as the item URL in the rendered output (default: 'url')")
|
||||
p.add_argument("--body-field", default="",
|
||||
help="Optional body field to include as a snippet under each item")
|
||||
p.add_argument("--max", type=int, default=20,
|
||||
help="Max new items to emit per tick (default: 20)")
|
||||
p.add_argument("--header", action="append", type=_parse_header, default=[],
|
||||
metavar="KEY: VALUE",
|
||||
help="HTTP header (repeatable)")
|
||||
p.add_argument("--timeout", type=float, default=20.0,
|
||||
help="HTTP timeout in seconds (default: 20)")
|
||||
args = p.parse_args()
|
||||
|
||||
req = urllib.request.Request(args.url, headers={"User-Agent": "Hermes-Watcher/1.0"})
|
||||
for k, v in args.header:
|
||||
req.add_header(k, v)
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=args.timeout) as resp:
|
||||
raw = resp.read()
|
||||
except urllib.error.HTTPError as e:
|
||||
print(f"watch_http_json: HTTP {e.code} from {args.url}", file=sys.stderr)
|
||||
return 2
|
||||
except (urllib.error.URLError, TimeoutError, OSError) as e:
|
||||
print(f"watch_http_json: network error: {e}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
try:
|
||||
data = json.loads(raw.decode("utf-8"))
|
||||
except (UnicodeDecodeError, json.JSONDecodeError) as e:
|
||||
print(f"watch_http_json: response is not valid JSON: {e}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
items = _dig(data, args.items_path) if args.items_path else data
|
||||
if not isinstance(items, list):
|
||||
print(
|
||||
f"watch_http_json: items_path={args.items_path!r} did not resolve to a list "
|
||||
f"(got {type(items).__name__})",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 2
|
||||
|
||||
# Keep only dicts — skip any bare strings / numbers so filter_new doesn't crash.
|
||||
items = [i for i in items if isinstance(i, dict)]
|
||||
|
||||
wm = Watermark.load(args.name)
|
||||
new_items = wm.filter_new(items, id_key=args.id_field)
|
||||
wm.save()
|
||||
|
||||
if args.max > 0:
|
||||
new_items = new_items[: args.max]
|
||||
|
||||
body_key = args.body_field or None
|
||||
output = format_items_as_markdown(
|
||||
new_items,
|
||||
title_key=args.title_field,
|
||||
url_key=args.url_field,
|
||||
body_key=body_key,
|
||||
)
|
||||
if output:
|
||||
sys.stdout.write(output)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
121
optional-skills/devops/watchers/scripts/watch_rss.py
Executable file
121
optional-skills/devops/watchers/scripts/watch_rss.py
Executable file
|
|
@ -0,0 +1,121 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Watch an RSS 2.0 or Atom feed; print new items to stdout, silent on empty.
|
||||
|
||||
Usage (via cron with --no-agent):
|
||||
|
||||
hermes cron create my-feed \\
|
||||
--schedule "*/15 * * * *" --no-agent \\
|
||||
--script "$HERMES_HOME/skills/devops/watchers/scripts/watch_rss.py" \\
|
||||
--script-args "--name hn --url https://news.ycombinator.com/rss"
|
||||
|
||||
First run records a baseline (emits nothing). Subsequent runs emit only
|
||||
items whose <guid> / <id> isn't in the watermark.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
from xml.etree import ElementTree as ET
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
from _watermark import Watermark, format_items_as_markdown # type: ignore
|
||||
|
||||
|
||||
def _strip_ns(tag: str) -> str:
|
||||
return tag.split("}", 1)[1] if "}" in tag else tag
|
||||
|
||||
|
||||
def _parse_feed(xml_bytes: bytes):
|
||||
"""Return a list of {id, title, url, summary} dicts.
|
||||
|
||||
Handles both RSS 2.0 ``<item>`` and Atom ``<entry>``.
|
||||
"""
|
||||
try:
|
||||
root = ET.fromstring(xml_bytes)
|
||||
except ET.ParseError as e:
|
||||
print(f"watch_rss: invalid XML: {e}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
entries = []
|
||||
for item in root.iter():
|
||||
tag = _strip_ns(item.tag)
|
||||
if tag not in ("item", "entry"):
|
||||
continue
|
||||
# ElementTree Elements without children are *falsy* — use `is not None`.
|
||||
children = {_strip_ns(c.tag): c for c in item}
|
||||
|
||||
guid_el = children.get("guid")
|
||||
if guid_el is None:
|
||||
guid_el = children.get("id")
|
||||
link_el = children.get("link")
|
||||
if link_el is not None:
|
||||
href = link_el.attrib.get("href") or (link_el.text or "").strip()
|
||||
else:
|
||||
href = ""
|
||||
guid = (guid_el.text or "").strip() if guid_el is not None else ""
|
||||
guid = guid or href
|
||||
if not guid:
|
||||
continue
|
||||
|
||||
title_el = children.get("title")
|
||||
title = (title_el.text or "").strip() if title_el is not None else ""
|
||||
|
||||
summ_el = children.get("description")
|
||||
if summ_el is None:
|
||||
summ_el = children.get("summary")
|
||||
summary = (summ_el.text or "").strip() if summ_el is not None else ""
|
||||
|
||||
entries.append(
|
||||
{"id": guid, "title": title, "url": href, "summary": summary}
|
||||
)
|
||||
return entries
|
||||
|
||||
|
||||
def main() -> int:
|
||||
p = argparse.ArgumentParser(description="Watch an RSS/Atom feed.")
|
||||
p.add_argument("--name", required=True, help="Watcher name (used for state file)")
|
||||
p.add_argument("--url", required=True, help="Feed URL")
|
||||
p.add_argument("--max", type=int, default=10,
|
||||
help="Max new items to emit per tick (default: 10)")
|
||||
p.add_argument("--with-summary", action="store_true",
|
||||
help="Include <description>/<summary> snippet under each item")
|
||||
p.add_argument("--timeout", type=float, default=20.0,
|
||||
help="HTTP timeout in seconds (default: 20)")
|
||||
args = p.parse_args()
|
||||
|
||||
try:
|
||||
req = urllib.request.Request(args.url, headers={"User-Agent": "Hermes-Watcher/1.0"})
|
||||
with urllib.request.urlopen(req, timeout=args.timeout) as resp:
|
||||
xml_bytes = resp.read()
|
||||
except urllib.error.HTTPError as e:
|
||||
print(f"watch_rss: HTTP {e.code} from {args.url}", file=sys.stderr)
|
||||
return 2
|
||||
except (urllib.error.URLError, TimeoutError, OSError) as e:
|
||||
print(f"watch_rss: network error: {e}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
entries = _parse_feed(xml_bytes)
|
||||
|
||||
wm = Watermark.load(args.name)
|
||||
new_items = wm.filter_new(entries, id_key="id")
|
||||
wm.save()
|
||||
|
||||
# Cap emitted items (watermark still records all seen IDs so we don't
|
||||
# re-emit them next tick).
|
||||
if args.max > 0:
|
||||
new_items = new_items[: args.max]
|
||||
|
||||
body_key = "summary" if args.with_summary else None
|
||||
output = format_items_as_markdown(new_items, body_key=body_key)
|
||||
if output:
|
||||
sys.stdout.write(output)
|
||||
# Empty stdout on no-new — cron treats that as silent.
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
Loading…
Add table
Add a link
Reference in a new issue