hermes-agent/scripts
Teknium fb9f3a4ef9
fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748)
* fix(skills): pull full ClawHub catalog into the skills index

The website was showing 200 ClawHub skills out of 20k+ because
`ClawHubSource.search("")` for empty queries went straight to a single
unpaginated request. ClawHub's API caps any single page at 200 items and
returns a `nextCursor`; we grabbed page 1 and stopped, so the cached
index served from hermes-agent.nousresearch.com had a silent 99%
truncation.

End users never hit clawhub.ai directly (the index is rebuilt twice
daily by .github/workflows/skills-index.yml and served as a static JSON
on the docs site), so the cap-and-cache architecture is correct — it
just wasn't being filled.

Changes:
- `ClawHubSource.search(query="")` now routes through the existing
  `_load_catalog_index()` paginating walker instead of the unpaginated
  listing fallback (non-empty queries still hit the fast catalog search).
- `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live
  catalog is ~20k as of May 2026, with headroom for growth).
- `build_skills_index.py`: per-source crawl limits split out — ClawHub
  and LobeHub get 100k, others keep their effective caps.
- `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination
  regression hard-fails the CI build instead of silently shipping a
  degenerate index.

Test plan:
- New unit test `test_search_empty_query_paginates_full_catalog`
  exercises the cursor-following path with three mocked pages (450
  total items) and asserts all pages are walked.
- Existing 9 ClawHub tests + 127 broader skills_hub tests all pass.
- E2E against live ClawHub API: walker reached 9700+ skills across 49
  pages before this commit landed, paginating well past the previous
  50-page cap.

* fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k

E2E walk against live ClawHub API hit my initial 250-page cap at 49,698
skills with cursor=yes still pending. The catalog is roughly 2.5x larger
than the docstring estimate.

- max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None
  well before this in practice)
- SOURCE_LIMITS['clawhub'] 100k → 200k
- EXPECTED_FLOORS['clawhub'] 5000 → 20000
2026-05-28 01:42:19 -07:00
..
lib feat: lazy bootstrap node 2026-04-16 10:47:37 -05:00
tests fix(install.ps1): trim completion banner + strip em-dash in test 2026-05-16 22:55:12 -07:00
whatsapp-bridge chore(deps): bump protobufjs in /scripts/whatsapp-bridge (#28889) 2026-05-20 15:25:32 -04:00
benchmark_browser_eval.py perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226) 2026-05-10 07:37:55 -07:00
build_model_catalog.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
build_skills_index.py fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748) 2026-05-28 01:42:19 -07:00
check-windows-footguns.py fix(scripts): fix UnicodeEncodeError in footgun checker on Windows 2026-05-16 23:05:27 -07:00
contributor_audit.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
discord-voice-doctor.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
hermes-gateway fix: prevent systemd restart storm on gateway connection failure 2026-03-21 09:26:39 -07:00
install.cmd docs(windows): avoid piping installer directly into iex 2026-05-18 20:05:47 -07:00
install.ps1 fix(install.ps1): pin PortableGit instead of hitting rate-limited GitHub API (#28943) 2026-05-19 14:38:34 -07:00
install.sh fix(install): set world-readable uv python dirs for root FHS layout 2026-05-27 13:55:51 -07:00
install_psutil_android.py fix(install): also patch psutil on Termux fresh-install path 2026-05-09 17:53:15 -07:00
keystroke_diagnostic.py docs: add Windows-Specific Quirks section to hermes-agent skill + keystroke diagnostic 2026-05-08 14:27:40 -07:00
kill_modal.sh refactor: replace swe-rex with native Modal SDK for Modal backend (#3538) 2026-03-28 11:21:44 -07:00
lint_diff.py feat(ci): add typecheck (warnings only in CI) 2026-05-06 10:58:12 -04:00
profile-tui.py Merge remote-tracking branch 'origin/main' into fix/bundle-size 2026-05-11 16:01:04 -04:00
release.py chore(release): map MoonRay305 contributor email for #32759 salvage 2026-05-27 23:28:51 -07:00
run_tests.sh test: use subprocesses for each test file (#29016) 2026-05-21 16:40:04 +05:30
run_tests_parallel.py ci(docker): run tests/docker/ in build-amd64 against the freshly-built image 2026-05-25 12:40:57 +10:00
sample_and_compress.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
setup_open_webui.sh fix(install): use resolved python variable in setup_open_webui.sh 2026-05-16 22:54:22 -07:00