docs(website): dedicated page per bundled + optional skill (#14929)

Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-25 00:51:20 +00:00 · 2026-04-23 22:22:11 -07:00 · 2026-04-23 22:22:11 -07:00 · 0f6eabb890
commit 0f6eabb890
parent eb93f88e1d
139 changed files with 43523 additions and 306 deletions
--- a/website/docs/user-guide/skills/bundled/research/research-llm-wiki.md
+++ b/website/docs/user-guide/skills/bundled/research/research-llm-wiki.md
@ -0,0 +1,523 @@
+---
+title: "Llm Wiki — Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base"
+sidebar_label: "Llm Wiki"
+description: "Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Llm Wiki
+
+Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Bundled (installed by default) |
+| Path | `skills/research/llm-wiki` |
+| Version | `2.1.0` |
+| Author | Hermes Agent |
+| License | MIT |
+| Tags | `wiki`, `knowledge-base`, `research`, `notes`, `markdown`, `rag-alternative` |
+| Related skills | [`obsidian`](/docs/user-guide/skills/bundled/note-taking/note-taking-obsidian), [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Karpathy's LLM Wiki
+
+Build and maintain a persistent, compounding knowledge base as interlinked markdown files.
+Based on [Andrej Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
+
+Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki
+compiles knowledge once and keeps it current. Cross-references are already there.
+Contradictions have already been flagged. Synthesis reflects everything ingested.
+
+**Division of labor:** The human curates sources and directs analysis. The agent
+summarizes, cross-references, files, and maintains consistency.
+
+## When This Skill Activates
+
+Use this skill when the user:
+- Asks to create, build, or start a wiki or knowledge base
+- Asks to ingest, add, or process a source into their wiki
+- Asks a question and an existing wiki is present at the configured path
+- Asks to lint, audit, or health-check their wiki
+- References their wiki, knowledge base, or "notes" in a research context
+
+## Wiki Location
+
+**Location:** Set via `WIKI_PATH` environment variable (e.g. in `~/.hermes/.env`).
+
+If unset, defaults to `~/wiki`.
+
+```bash
+WIKI="${WIKI_PATH:-$HOME/wiki}"
+```
+
+The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or
+any editor. No database, no special tooling required.
+
+## Architecture: Three Layers
+
+```
+wiki/
+├── SCHEMA.md           # Conventions, structure rules, domain config
+├── index.md            # Sectioned content catalog with one-line summaries
+├── log.md              # Chronological action log (append-only, rotated yearly)
+├── raw/                # Layer 1: Immutable source material
+│   ├── articles/       # Web articles, clippings
+│   ├── papers/         # PDFs, arxiv papers
+│   ├── transcripts/    # Meeting notes, interviews
+│   └── assets/         # Images, diagrams referenced by sources
+├── entities/           # Layer 2: Entity pages (people, orgs, products, models)
+├── concepts/           # Layer 2: Concept/topic pages
+├── comparisons/        # Layer 2: Side-by-side analyses
+└── queries/            # Layer 2: Filed query results worth keeping
+```
+
+**Layer 1 — Raw Sources:** Immutable. The agent reads but never modifies these.
+**Layer 2 — The Wiki:** Agent-owned markdown files. Created, updated, and
+cross-referenced by the agent.
+**Layer 3 — The Schema:** `SCHEMA.md` defines structure, conventions, and tag taxonomy.
+
+## Resuming an Existing Wiki (CRITICAL — do this every session)
+
+When the user has an existing wiki, **always orient yourself before doing anything**:
+
+① **Read `SCHEMA.md`** — understand the domain, conventions, and tag taxonomy.
+② **Read `index.md`** — learn what pages exist and their summaries.
+③ **Scan recent `log.md`** — read the last 20-30 entries to understand recent activity.
+
+```bash
+WIKI="${WIKI_PATH:-$HOME/wiki}"
+# Orientation reads at session start
+read_file "$WIKI/SCHEMA.md"
+read_file "$WIKI/index.md"
+read_file "$WIKI/log.md" offset=<last 30 lines>
+```
+
+Only after orientation should you ingest, query, or lint. This prevents:
+- Creating duplicate pages for entities that already exist
+- Missing cross-references to existing content
+- Contradicting the schema's conventions
+- Repeating work already logged
+
+For large wikis (100+ pages), also run a quick `search_files` for the topic
+at hand before creating anything new.
+
+## Initializing a New Wiki
+
+When the user asks to create or start a wiki:
+
+1. Determine the wiki path (from `$WIKI_PATH` env var, or ask the user; default `~/wiki`)
+2. Create the directory structure above
+3. Ask the user what domain the wiki covers — be specific
+4. Write `SCHEMA.md` customized to the domain (see template below)
+5. Write initial `index.md` with sectioned header
+6. Write initial `log.md` with creation entry
+7. Confirm the wiki is ready and suggest first sources to ingest
+
+### SCHEMA.md Template
+
+Adapt to the user's domain. The schema constrains agent behavior and ensures consistency:
+
+```markdown
+# Wiki Schema
+
+## Domain
+[What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"]
+
+## Conventions
+- File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`)
+- Every wiki page starts with YAML frontmatter (see below)
+- Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page)
+- When updating a page, always bump the `updated` date
+- Every new page must be added to `index.md` under the correct section
+- Every action must be appended to `log.md`
+- **Provenance markers:** On pages that synthesize 3+ sources, append `^[raw/articles/source-file.md]`
+  at the end of paragraphs whose claims come from a specific source. This lets a reader trace each
+  claim back without re-reading the whole raw file. Optional on single-source pages where the
+  `sources:` frontmatter is enough.
+
+## Frontmatter
+  ```yaml
+  ---
+  title: Page Title
+  created: YYYY-MM-DD
+  updated: YYYY-MM-DD
+  type: entity | concept | comparison | query | summary
+  tags: [from taxonomy below]
+  sources: [raw/articles/source-name.md]
+  # Optional quality signals:
+  confidence: high | medium | low        # how well-supported the claims are
+  contested: true                        # set when the page has unresolved contradictions
+  contradictions: [other-page-slug]      # pages this one conflicts with
+  ---
+  ```
+
+`confidence` and `contested` are optional but recommended for opinion-heavy or fast-moving
+topics. Lint surfaces `contested: true` and `confidence: low` pages for review so weak claims
+don't silently harden into accepted wiki fact.
+
+### raw/ Frontmatter
+
+Raw sources ALSO get a small frontmatter block so re-ingests can detect drift:
+
+```yaml
+---
+source_url: https://example.com/article   # original URL, if applicable
+ingested: YYYY-MM-DD
+sha256: &lt;hex digest of the raw content below the frontmatter>
+---
+```
+
+The `sha256:` lets a future re-ingest of the same URL skip processing when content is unchanged,
+and flag drift when it has changed. Compute over the body only (everything after the closing
+`---`), not the frontmatter itself.
+
+## Tag Taxonomy
+[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]
+
+Example for AI/ML:
+- Models: model, architecture, benchmark, training
+- People/Orgs: person, company, lab, open-source
+- Techniques: optimization, fine-tuning, inference, alignment, data
+- Meta: comparison, timeline, controversy, prediction
+
+Rule: every tag on a page must appear in this taxonomy. If a new tag is needed,
+add it here first, then use it. This prevents tag sprawl.
+
+## Page Thresholds
+- **Create a page** when an entity/concept appears in 2+ sources OR is central to one source
+- **Add to existing page** when a source mentions something already covered
+- **DON'T create a page** for passing mentions, minor details, or things outside the domain
+- **Split a page** when it exceeds ~200 lines — break into sub-topics with cross-links
+- **Archive a page** when its content is fully superseded — move to `_archive/`, remove from index
+
+## Entity Pages
+One page per notable entity. Include:
+- Overview / what it is
+- Key facts and dates
+- Relationships to other entities ([[wikilinks]])
+- Source references
+
+## Concept Pages
+One page per concept or topic. Include:
+- Definition / explanation
+- Current state of knowledge
+- Open questions or debates
+- Related concepts ([[wikilinks]])
+
+## Comparison Pages
+Side-by-side analyses. Include:
+- What is being compared and why
+- Dimensions of comparison (table format preferred)
+- Verdict or synthesis
+- Sources
+
+## Update Policy
+When new information conflicts with existing content:
+1. Check the dates — newer sources generally supersede older ones
+2. If genuinely contradictory, note both positions with dates and sources
+3. Mark the contradiction in frontmatter: `contradictions: [page-name]`
+4. Flag for user review in the lint report
+```
+
+### index.md Template
+
+The index is sectioned by type. Each entry is one line: wikilink + summary.
+
+```markdown
+# Wiki Index
+
+> Content catalog. Every wiki page listed under its type with a one-line summary.
+> Read this first to find relevant pages for any query.
+> Last updated: YYYY-MM-DD | Total pages: N
+
+## Entities
+<!-- Alphabetical within section -->
+
+## Concepts
+
+## Comparisons
+
+## Queries
+```
+
+**Scaling rule:** When any section exceeds 50 entries, split it into sub-sections
+by first letter or sub-domain. When the index exceeds 200 entries total, create
+a `_meta/topic-map.md` that groups pages by theme for faster navigation.
+
+### log.md Template
+
+```markdown
+# Wiki Log
+
+> Chronological record of all wiki actions. Append-only.
+> Format: `## [YYYY-MM-DD] action | subject`
+> Actions: ingest, update, query, lint, create, archive, delete
+> When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh.
+
+## [YYYY-MM-DD] create | Wiki initialized
+- Domain: [domain]
+- Structure created with SCHEMA.md, index.md, log.md
+```
+
+## Core Operations
+
+### 1. Ingest
+
+When the user provides a source (URL, file, paste), integrate it into the wiki:
+
+① **Capture the raw source:**
+   - URL → use `web_extract` to get markdown, save to `raw/articles/`
+   - PDF → use `web_extract` (handles PDFs), save to `raw/papers/`
+   - Pasted text → save to appropriate `raw/` subdirectory
+   - Name the file descriptively: `raw/articles/karpathy-llm-wiki-2026.md`
+   - **Add raw frontmatter** (`source_url`, `ingested`, `sha256` of the body).
+     On re-ingest of the same URL: recompute the sha256, compare to the stored value —
+     skip if identical, flag drift and update if different. This is cheap enough to
+     do on every re-ingest and catches silent source changes.
+
+② **Discuss takeaways** with the user — what's interesting, what matters for
+   the domain. (Skip this in automated/cron contexts — proceed directly.)
+
+③ **Check what already exists** — search index.md and use `search_files` to find
+   existing pages for mentioned entities/concepts. This is the difference between
+   a growing wiki and a pile of duplicates.
+
+④ **Write or update wiki pages:**
+   - **New entities/concepts:** Create pages only if they meet the Page Thresholds
+     in SCHEMA.md (2+ source mentions, or central to one source)
+   - **Existing pages:** Add new information, update facts, bump `updated` date.
+     When new info contradicts existing content, follow the Update Policy.
+   - **Cross-reference:** Every new or updated page must link to at least 2 other
+     pages via `[[wikilinks]]`. Check that existing pages link back.
+   - **Tags:** Only use tags from the taxonomy in SCHEMA.md
+   - **Provenance:** On pages synthesizing 3+ sources, append `^[raw/articles/source.md]`
+     markers to paragraphs whose claims trace to a specific source.
+   - **Confidence:** For opinion-heavy, fast-moving, or single-source claims, set
+     `confidence: medium` or `low` in frontmatter. Don't mark `high` unless the
+     claim is well-supported across multiple sources.
+
+⑤ **Update navigation:**
+   - Add new pages to `index.md` under the correct section, alphabetically
+   - Update the "Total pages" count and "Last updated" date in index header
+   - Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Title`
+   - List every file created or updated in the log entry
+
+⑥ **Report what changed** — list every file created or updated to the user.
+
+A single source can trigger updates across 5-15 wiki pages. This is normal
+and desired — it's the compounding effect.
+
+### 2. Query
+
+When the user asks a question about the wiki's domain:
+
+① **Read `index.md`** to identify relevant pages.
+② **For wikis with 100+ pages**, also `search_files` across all `.md` files
+   for key terms — the index alone may miss relevant content.
+③ **Read the relevant pages** using `read_file`.
+④ **Synthesize an answer** from the compiled knowledge. Cite the wiki pages
+   you drew from: "Based on [[page-a]] and [[page-b]]..."
+⑤ **File valuable answers back** — if the answer is a substantial comparison,
+   deep dive, or novel synthesis, create a page in `queries/` or `comparisons/`.
+   Don't file trivial lookups — only answers that would be painful to re-derive.
+⑥ **Update log.md** with the query and whether it was filed.
+
+### 3. Lint
+
+When the user asks to lint, health-check, or audit the wiki:
+
+① **Orphan pages:** Find pages with no inbound `[[wikilinks]]` from other pages.
+```python
+# Use execute_code for this — programmatic scan across all wiki pages
+import os, re
+from collections import defaultdict
+wiki = "<WIKI_PATH>"
+# Scan all .md files in entities/, concepts/, comparisons/, queries/
+# Extract all [[wikilinks]] — build inbound link map
+# Pages with zero inbound links are orphans
+```
+
+② **Broken wikilinks:** Find `[[links]]` that point to pages that don't exist.
+
+③ **Index completeness:** Every wiki page should appear in `index.md`. Compare
+   the filesystem against index entries.
+
+④ **Frontmatter validation:** Every wiki page must have all required fields
+   (title, created, updated, type, tags, sources). Tags must be in the taxonomy.
+
+⑤ **Stale content:** Pages whose `updated` date is >90 days older than the most
+   recent source that mentions the same entities.
+
+⑥ **Contradictions:** Pages on the same topic with conflicting claims. Look for
+   pages that share tags/entities but state different facts. Surface all pages
+   with `contested: true` or `contradictions:` frontmatter for user review.
+
+⑦ **Quality signals:** List pages with `confidence: low` and any page that cites
+   only a single source but has no confidence field set — these are candidates
+   for either finding corroboration or demoting to `confidence: medium`.
+
+⑧ **Source drift:** For each file in `raw/` with a `sha256:` frontmatter, recompute
+   the hash and flag mismatches. Mismatches indicate the raw file was edited
+   (shouldn't happen — raw/ is immutable) or ingested from a URL that has since
+   changed. Not a hard error, but worth reporting.
+
+⑨ **Page size:** Flag pages over 200 lines — candidates for splitting.
+
+⑩ **Tag audit:** List all tags in use, flag any not in the SCHEMA.md taxonomy.
+
+⑪ **Log rotation:** If log.md exceeds 500 entries, rotate it.
+
+⑫ **Report findings** with specific file paths and suggested actions, grouped by
+   severity (broken links > orphans > source drift > contested pages > stale content > style issues).
+
+⑬ **Append to log.md:** `## [YYYY-MM-DD] lint | N issues found`
+
+## Working with the Wiki
+
+### Searching
+
+```bash
+# Find pages by content
+search_files "transformer" path="$WIKI" file_glob="*.md"
+
+# Find pages by filename
+search_files "*.md" target="files" path="$WIKI"
+
+# Find pages by tag
+search_files "tags:.*alignment" path="$WIKI" file_glob="*.md"
+
+# Recent activity
+read_file "$WIKI/log.md" offset=<last 20 lines>
+```
+
+### Bulk Ingest
+
+When ingesting multiple sources at once, batch the updates:
+1. Read all sources first
+2. Identify all entities and concepts across all sources
+3. Check existing pages for all of them (one search pass, not N)
+4. Create/update pages in one pass (avoids redundant updates)
+5. Update index.md once at the end
+6. Write a single log entry covering the batch
+
+### Archiving
+
+When content is fully superseded or the domain scope changes:
+1. Create `_archive/` directory if it doesn't exist
+2. Move the page to `_archive/` with its original path (e.g., `_archive/entities/old-page.md`)
+3. Remove from `index.md`
+4. Update any pages that linked to it — replace wikilink with plain text + "(archived)"
+5. Log the archive action
+
+### Obsidian Integration
+
+The wiki directory works as an Obsidian vault out of the box:
+- `[[wikilinks]]` render as clickable links
+- Graph View visualizes the knowledge network
+- YAML frontmatter powers Dataview queries
+- The `raw/assets/` folder holds images referenced via `![[image.png]]`
+
+For best results:
+- Set Obsidian's attachment folder to `raw/assets/`
+- Enable "Wikilinks" in Obsidian settings (usually on by default)
+- Install Dataview plugin for queries like `TABLE tags FROM "entities" WHERE contains(tags, "company")`
+
+If using the Obsidian skill alongside this one, set `OBSIDIAN_VAULT_PATH` to the
+same directory as the wiki path.
+
+### Obsidian Headless (servers and headless machines)
+
+On machines without a display, use `obsidian-headless` instead of the desktop app.
+It syncs vaults via Obsidian Sync without a GUI — perfect for agents running on
+servers that write to the wiki while Obsidian desktop reads it on another device.
+
+**Setup:**
+```bash
+# Requires Node.js 22+
+npm install -g obsidian-headless
+
+# Login (requires Obsidian account with Sync subscription)
+ob login --email <email> --password '<password>'
+
+# Create a remote vault for the wiki
+ob sync-create-remote --name "LLM Wiki"
+
+# Connect the wiki directory to the vault
+cd ~/wiki
+ob sync-setup --vault "<vault-id>"
+
+# Initial sync
+ob sync
+
+# Continuous sync (foreground — use systemd for background)
+ob sync --continuous
+```
+
+**Continuous background sync via systemd:**
+```ini
+# ~/.config/systemd/user/obsidian-wiki-sync.service
+[Unit]
+Description=Obsidian LLM Wiki Sync
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+ExecStart=/path/to/ob sync --continuous
+WorkingDirectory=/home/user/wiki
+Restart=on-failure
+RestartSec=10
+
+[Install]
+WantedBy=default.target
+```
+
+```bash
+systemctl --user daemon-reload
+systemctl --user enable --now obsidian-wiki-sync
+# Enable linger so sync survives logout:
+sudo loginctl enable-linger $USER
+```
+
+This lets the agent write to `~/wiki` on a server while you browse the same
+vault in Obsidian on your laptop/phone — changes appear within seconds.
+
+## Pitfalls
+
+- **Never modify files in `raw/`** — sources are immutable. Corrections go in wiki pages.
+- **Always orient first** — read SCHEMA + index + recent log before any operation in a new session.
+  Skipping this causes duplicates and missed cross-references.
+- **Always update index.md and log.md** — skipping this makes the wiki degrade. These are the
+  navigational backbone.
+- **Don't create pages for passing mentions** — follow the Page Thresholds in SCHEMA.md. A name
+  appearing once in a footnote doesn't warrant an entity page.
+- **Don't create pages without cross-references** — isolated pages are invisible. Every page must
+  link to at least 2 other pages.
+- **Frontmatter is required** — it enables search, filtering, and staleness detection.
+- **Tags must come from the taxonomy** — freeform tags decay into noise. Add new tags to SCHEMA.md
+  first, then use them.
+- **Keep pages scannable** — a wiki page should be readable in 30 seconds. Split pages over
+  200 lines. Move detailed analysis to dedicated deep-dive pages.
+- **Ask before mass-updating** — if an ingest would touch 10+ existing pages, confirm
+  the scope with the user first.
+- **Rotate the log** — when log.md exceeds 500 entries, rename it `log-YYYY.md` and start fresh.
+  The agent should check log size during lint.
+- **Handle contradictions explicitly** — don't silently overwrite. Note both claims with dates,
+  mark in frontmatter, flag for user review.
+
+## Related Tools
+
+[llm-wiki-compiler](https://github.com/atomicmemory/llm-wiki-compiler) is a Node.js CLI that
+compiles sources into a concept wiki with the same Karpathy inspiration. It's Obsidian-compatible,
+so users who want a scheduled/CLI-driven compile pipeline can point it at the same vault this
+skill maintains. Trade-offs: it owns page generation (replaces the agent's judgment on page
+creation) and is tuned for small corpora. Use this skill when you want agent-in-the-loop curation;
+use llmwiki when you want batch compile of a source directory.