feat(skills-hub): health checks, freshness badge, and a watchdog cron (#32345)

Layered safety so the Skills Hub at /docs/skills stays in sync without
silent rot. Three pieces:

1. build_skills_index.py — refuses to ship a degenerate index.
   EXPECTED_FLOORS per source (skills.sh ≥100, lobehub ≥100, clawhub ≥50,
   official ≥50, github ≥30, browse-sh ≥50) and MIN_TOTAL=1500. Any source
   collapsing to zero (the silent OpenAI breakage that hid for weeks) now
   fails the workflow loud — broken index never reaches the live site.

2. extract-skills.py + the React page — visible freshness signal.
   Sidecar website/src/data/skills-meta.json carries the index's
   generated_at timestamp, plus per-source counts. Skills Hub renders a
   'Catalog refreshed N hours ago · auto-rebuilt twice daily' line under
   the hero copy. If the cron stalls, users see the staleness immediately.

3. .github/workflows/skills-index-freshness.yml — watchdog cron.
   Every 4 hours, fetches the live /docs/api/skills-index.json, validates
   shape, checks age (>26h is stale), checks the same per-source floors,
   and opens (or appends to) a GitHub issue when anything is off. The
   issue is title-prefixed [skills-index-watchdog] so subsequent failures
   append a comment instead of spamming new issues.

Net effect:
- A silent regression like 'OpenAI tap moved its skills' now fails the
  build instead of shipping a quietly broken catalog.
- A stuck cron (like the landingpage breakage that ran red for weeks) now
  files an issue within 4 hours.
- Users see how fresh the catalog is on the page itself.

Test plan:
- Local: built skills-meta.json from the live index → 'Catalog refreshed
  N minutes ago' rendered correctly in the static HTML.
- Probe logic dry-run against the live index: total=2456, all 6 sources
  above floor, age 0.1h — issues=NONE.
- Triggered skills-index.yml manually; both jobs green, deploy-site.yml
  dispatch fired.
This commit is contained in:
Teknium 2026-05-25 23:10:45 -07:00 committed by GitHub
parent cea87d9139
commit d8703e27f5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 273 additions and 8 deletions

View file

@ -322,6 +322,50 @@ def main():
extra = f" ({resolved} resolved)" if resolved else ""
print(f" {src}: {count}{extra}")
# Health check: catch silent breakage early. Every source listed below
# has historically returned at least `floor` entries; a zero (or near-
# zero) result almost certainly means a tap path moved, an API changed,
# or rate limiting kicked in. Failing here forces a human look before
# the broken index reaches the live docs.
EXPECTED_FLOORS = {
"skills.sh": 100,
"lobehub": 100,
"clawhub": 50,
"official": 50,
"github": 30, # collapsed across all GitHub taps
"browse-sh": 50,
}
health_errors = []
for src, floor in EXPECTED_FLOORS.items():
# 'skills-sh' and 'skills.sh' are the same source; both labels exist.
count = by_source.get(src, 0)
if src == "skills.sh":
count = by_source.get("skills.sh", 0) + by_source.get("skills-sh", 0)
if count < floor:
health_errors.append(f" {src}: {count} < expected floor {floor}")
MIN_TOTAL = 1500
if len(deduped) < MIN_TOTAL:
health_errors.append(
f" total: {len(deduped)} < expected floor {MIN_TOTAL}"
)
if health_errors:
print(
"\nERROR: skills index health check failed — refusing to ship "
"a degenerate index. Investigate the following sources:",
file=sys.stderr,
)
for line in health_errors:
print(line, file=sys.stderr)
print(
"\nIf the drop is expected (e.g. a hub is genuinely shutting "
"down), lower the floor in scripts/build_skills_index.py "
"EXPECTED_FLOORS in the same PR.",
file=sys.stderr,
)
sys.exit(2)
if __name__ == "__main__":
main()