perf(skills-page): lazy-fetch the catalog instead of bundling 34MB into JS (#33809)

PR #33748 grew the live skills index from ~2k skills to ~69k, which made
the previous build-time bundling strategy untenable: the skills page's
JS chunk was about to balloon from ~1MB to ~35MB.  Initial page load
on mobile became unusable, search lagged on every keystroke against the
68k-item array, and JSON.parse blocked the main thread at startup.

Three changes:

1. extract-skills.py writes skills.json + skills-meta.json into
   website/static/api/ instead of website/src/data/.  Static-served by
   Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that
   already serves skills-index.json.

2. skills/index.tsx drops the static import and fetches both files in
   parallel on mount.  Loading state shows '…' for the count; failures
   surface a small error pill instead of blanking the page.

3. Search is debounced 150ms and runs against a precomputed lowercase
   haystack stamped onto each row at load time.  Before: array-join +
   toLowerCase per row per keystroke on a 68k array.  After: single
   .includes() per row, deferred until typing settles.

Validation:

| | before | after |
|---|---|---|
| skills.json location | src/data/ (bundled) | static/api/ (CDN) |
| Largest JS chunk | would be ~35MB at 68k skills | 659 KB |
| Initial page render | wait for full parse | immediate, fetch async |
| Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows |
| Debounce | none | 150ms |

Built locally for both en and zh-Hans locales; the 34MB skills.json now
lives in build/api/ and is served separately rather than inlined into
the page's bundle.

skills.json and skills-meta.json added to .gitignore — they were already
build artifacts, but the gitignore only listed skills-index.json before.
This commit is contained in:
Teknium 2026-05-28 03:41:43 -07:00 committed by GitHub
parent 6f9182cb34
commit a1eaad2fc0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 125 additions and 33 deletions

View file

@ -1,7 +1,8 @@
#!/usr/bin/env node
// Runs website/scripts/extract-skills.py and generate-llms-txt.py before
// docusaurus build/start so that:
// - website/src/data/skills.json (imported by src/pages/skills/index.tsx)
// - website/static/api/skills.json (lazy-fetched by src/pages/skills/index.tsx)
// - website/static/api/skills-meta.json (sidecar metadata for the Skills Hub)
// - website/static/llms.txt (agent-friendly short docs index)
// - website/static/llms-full.txt (full docs concat for LLM context)
// all exist without contributors remembering to run Python scripts manually.
@ -30,7 +31,7 @@ const scriptDir = dirname(fileURLToPath(import.meta.url));
const websiteDir = resolve(scriptDir, "..");
const extractScript = join(scriptDir, "extract-skills.py");
const llmsScript = join(scriptDir, "generate-llms-txt.py");
const outputFile = join(websiteDir, "src", "data", "skills.json");
const outputFile = join(websiteDir, "static", "api", "skills.json");
const unifiedIndexFile = join(websiteDir, "static", "api", "skills-index.json");
const UNIFIED_INDEX_URL =
"https://hermes-agent.nousresearch.com/docs/api/skills-index.json";