mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
feat(skills-hub): health checks, freshness badge, and a watchdog cron (#32345)
Layered safety so the Skills Hub at /docs/skills stays in sync without silent rot. Three pieces: 1. build_skills_index.py — refuses to ship a degenerate index. EXPECTED_FLOORS per source (skills.sh ≥100, lobehub ≥100, clawhub ≥50, official ≥50, github ≥30, browse-sh ≥50) and MIN_TOTAL=1500. Any source collapsing to zero (the silent OpenAI breakage that hid for weeks) now fails the workflow loud — broken index never reaches the live site. 2. extract-skills.py + the React page — visible freshness signal. Sidecar website/src/data/skills-meta.json carries the index's generated_at timestamp, plus per-source counts. Skills Hub renders a 'Catalog refreshed N hours ago · auto-rebuilt twice daily' line under the hero copy. If the cron stalls, users see the staleness immediately. 3. .github/workflows/skills-index-freshness.yml — watchdog cron. Every 4 hours, fetches the live /docs/api/skills-index.json, validates shape, checks age (>26h is stale), checks the same per-source floors, and opens (or appends to) a GitHub issue when anything is off. The issue is title-prefixed [skills-index-watchdog] so subsequent failures append a comment instead of spamming new issues. Net effect: - A silent regression like 'OpenAI tap moved its skills' now fails the build instead of shipping a quietly broken catalog. - A stuck cron (like the landingpage breakage that ran red for weeks) now files an issue within 4 hours. - Users see how fresh the catalog is on the page itself. Test plan: - Local: built skills-meta.json from the live index → 'Catalog refreshed N minutes ago' rendered correctly in the static HTML. - Probe logic dry-run against the live index: total=2456, all 6 sources above floor, age 0.1h — issues=NONE. - Triggered skills-index.yml manually; both jobs green, deploy-site.yml dispatch fired.
This commit is contained in:
parent
cea87d9139
commit
d8703e27f5
5 changed files with 273 additions and 8 deletions
|
|
@ -322,6 +322,50 @@ def main():
|
|||
extra = f" ({resolved} resolved)" if resolved else ""
|
||||
print(f" {src}: {count}{extra}")
|
||||
|
||||
# Health check: catch silent breakage early. Every source listed below
|
||||
# has historically returned at least `floor` entries; a zero (or near-
|
||||
# zero) result almost certainly means a tap path moved, an API changed,
|
||||
# or rate limiting kicked in. Failing here forces a human look before
|
||||
# the broken index reaches the live docs.
|
||||
EXPECTED_FLOORS = {
|
||||
"skills.sh": 100,
|
||||
"lobehub": 100,
|
||||
"clawhub": 50,
|
||||
"official": 50,
|
||||
"github": 30, # collapsed across all GitHub taps
|
||||
"browse-sh": 50,
|
||||
}
|
||||
health_errors = []
|
||||
for src, floor in EXPECTED_FLOORS.items():
|
||||
# 'skills-sh' and 'skills.sh' are the same source; both labels exist.
|
||||
count = by_source.get(src, 0)
|
||||
if src == "skills.sh":
|
||||
count = by_source.get("skills.sh", 0) + by_source.get("skills-sh", 0)
|
||||
if count < floor:
|
||||
health_errors.append(f" {src}: {count} < expected floor {floor}")
|
||||
|
||||
MIN_TOTAL = 1500
|
||||
if len(deduped) < MIN_TOTAL:
|
||||
health_errors.append(
|
||||
f" total: {len(deduped)} < expected floor {MIN_TOTAL}"
|
||||
)
|
||||
|
||||
if health_errors:
|
||||
print(
|
||||
"\nERROR: skills index health check failed — refusing to ship "
|
||||
"a degenerate index. Investigate the following sources:",
|
||||
file=sys.stderr,
|
||||
)
|
||||
for line in health_errors:
|
||||
print(line, file=sys.stderr)
|
||||
print(
|
||||
"\nIf the drop is expected (e.g. a hub is genuinely shutting "
|
||||
"down), lower the floor in scripts/build_skills_index.py "
|
||||
"EXPECTED_FLOORS in the same PR.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue