mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 01:21:43 +00:00
docs(website): dedicated page per bundled + optional skill (#14929)
Generates a full dedicated Docusaurus page for every one of the 132 skills
(73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/.
Each page carries the skill's description, metadata (version, author, license,
dependencies, platform gating, tags, related skills cross-linked to their own
pages), and the complete SKILL.md body that Hermes loads at runtime.
Previously the two catalog pages just listed skills with a one-line blurb and
no way to see what the skill actually did — users had to go read the source
repo. Now every skill has a browsable, searchable, cross-linked reference in
the docs.
- website/scripts/generate-skill-docs.py — generator that reads skills/ and
optional-skills/, writes per-skill pages, regenerates both catalog indexes,
and rewrites the Skills section of sidebars.ts. Handles MDX escaping
(outside fenced code blocks: curly braces, unsafe HTML-ish tags) and
rewrites relative references/*.md links to point at the GitHub source.
- website/docs/reference/skills-catalog.md — regenerated; each row links to
the new dedicated page.
- website/docs/reference/optional-skills-catalog.md — same.
- website/sidebars.ts — Skills section now has Bundled / Optional subtrees
with one nested category per skill folder.
- .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator
before docusaurus build so CI stays in sync with the source SKILL.md files.
Build verified locally with `npx docusaurus build`. Only remaining warnings
are pre-existing broken link/anchor issues in unrelated pages.
This commit is contained in:
parent
eb93f88e1d
commit
0f6eabb890
139 changed files with 43523 additions and 306 deletions
|
|
@ -0,0 +1,299 @@
|
|||
---
|
||||
title: "Arxiv — Search and retrieve academic papers from arXiv using their free REST API"
|
||||
sidebar_label: "Arxiv"
|
||||
description: "Search and retrieve academic papers from arXiv using their free REST API"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Arxiv
|
||||
|
||||
Search and retrieve academic papers from arXiv using their free REST API. No API key needed. Search by keyword, author, category, or ID. Combine with web_extract or the ocr-and-documents skill to read full paper content.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/research/arxiv` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `Research`, `Arxiv`, `Papers`, `Academic`, `Science`, `API` |
|
||||
| Related skills | [`ocr-and-documents`](/docs/user-guide/skills/bundled/productivity/productivity-ocr-and-documents) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# arXiv Research
|
||||
|
||||
Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Action | Command |
|
||||
|--------|---------|
|
||||
| Search papers | `curl "https://export.arxiv.org/api/query?search_query=all:QUERY&max_results=5"` |
|
||||
| Get specific paper | `curl "https://export.arxiv.org/api/query?id_list=2402.03300"` |
|
||||
| Read abstract (web) | `web_extract(urls=["https://arxiv.org/abs/2402.03300"])` |
|
||||
| Read full paper (PDF) | `web_extract(urls=["https://arxiv.org/pdf/2402.03300"])` |
|
||||
|
||||
## Searching Papers
|
||||
|
||||
The API returns Atom XML. Parse with `grep`/`sed` or pipe through `python3` for clean output.
|
||||
|
||||
### Basic search
|
||||
|
||||
```bash
|
||||
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"
|
||||
```
|
||||
|
||||
### Clean output (parse XML to readable format)
|
||||
|
||||
```bash
|
||||
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
|
||||
import sys, xml.etree.ElementTree as ET
|
||||
ns = {'a': 'http://www.w3.org/2005/Atom'}
|
||||
root = ET.parse(sys.stdin).getroot()
|
||||
for i, entry in enumerate(root.findall('a:entry', ns)):
|
||||
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
|
||||
arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
|
||||
published = entry.find('a:published', ns).text[:10]
|
||||
authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
|
||||
summary = entry.find('a:summary', ns).text.strip()[:200]
|
||||
cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns))
|
||||
print(f'{i+1}. [{arxiv_id}] {title}')
|
||||
print(f' Authors: {authors}')
|
||||
print(f' Published: {published} | Categories: {cats}')
|
||||
print(f' Abstract: {summary}...')
|
||||
print(f' PDF: https://arxiv.org/pdf/{arxiv_id}')
|
||||
print()
|
||||
"
|
||||
```
|
||||
|
||||
## Search Query Syntax
|
||||
|
||||
| Prefix | Searches | Example |
|
||||
|--------|----------|---------|
|
||||
| `all:` | All fields | `all:transformer+attention` |
|
||||
| `ti:` | Title | `ti:large+language+models` |
|
||||
| `au:` | Author | `au:vaswani` |
|
||||
| `abs:` | Abstract | `abs:reinforcement+learning` |
|
||||
| `cat:` | Category | `cat:cs.AI` |
|
||||
| `co:` | Comment | `co:accepted+NeurIPS` |
|
||||
|
||||
### Boolean operators
|
||||
|
||||
```
|
||||
# AND (default when using +)
|
||||
search_query=all:transformer+attention
|
||||
|
||||
# OR
|
||||
search_query=all:GPT+OR+all:BERT
|
||||
|
||||
# AND NOT
|
||||
search_query=all:language+model+ANDNOT+all:vision
|
||||
|
||||
# Exact phrase
|
||||
search_query=ti:"chain+of+thought"
|
||||
|
||||
# Combined
|
||||
search_query=au:hinton+AND+cat:cs.LG
|
||||
```
|
||||
|
||||
## Sort and Pagination
|
||||
|
||||
| Parameter | Options |
|
||||
|-----------|---------|
|
||||
| `sortBy` | `relevance`, `lastUpdatedDate`, `submittedDate` |
|
||||
| `sortOrder` | `ascending`, `descending` |
|
||||
| `start` | Result offset (0-based) |
|
||||
| `max_results` | Number of results (default 10, max 30000) |
|
||||
|
||||
```bash
|
||||
# Latest 10 papers in cs.AI
|
||||
curl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"
|
||||
```
|
||||
|
||||
## Fetching Specific Papers
|
||||
|
||||
```bash
|
||||
# By arXiv ID
|
||||
curl -s "https://export.arxiv.org/api/query?id_list=2402.03300"
|
||||
|
||||
# Multiple papers
|
||||
curl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"
|
||||
```
|
||||
|
||||
## BibTeX Generation
|
||||
|
||||
After fetching metadata for a paper, generate a BibTeX entry:
|
||||
|
||||
{% raw %}
|
||||
```bash
|
||||
curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "
|
||||
import sys, xml.etree.ElementTree as ET
|
||||
ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
|
||||
root = ET.parse(sys.stdin).getroot()
|
||||
entry = root.find('a:entry', ns)
|
||||
if entry is None: sys.exit('Paper not found')
|
||||
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
|
||||
authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
|
||||
year = entry.find('a:published', ns).text[:4]
|
||||
raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
|
||||
cat = entry.find('arxiv:primary_category', ns)
|
||||
primary = cat.get('term') if cat is not None else 'cs.LG'
|
||||
last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]
|
||||
print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')
|
||||
print(f' title = {{{title}}},')
|
||||
print(f' author = {{{authors}}},')
|
||||
print(f' year = {{{year}}},')
|
||||
print(f' eprint = {{{raw_id}}},')
|
||||
print(f' archivePrefix = {{arXiv}},')
|
||||
print(f' primaryClass = {{{primary}}},')
|
||||
print(f' url = {{https://arxiv.org/abs/{raw_id}}}')
|
||||
print('}')
|
||||
"
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
## Reading Paper Content
|
||||
|
||||
After finding a paper, read it:
|
||||
|
||||
```
|
||||
# Abstract page (fast, metadata + abstract)
|
||||
web_extract(urls=["https://arxiv.org/abs/2402.03300"])
|
||||
|
||||
# Full paper (PDF → markdown via Firecrawl)
|
||||
web_extract(urls=["https://arxiv.org/pdf/2402.03300"])
|
||||
```
|
||||
|
||||
For local PDF processing, see the `ocr-and-documents` skill.
|
||||
|
||||
## Common Categories
|
||||
|
||||
| Category | Field |
|
||||
|----------|-------|
|
||||
| `cs.AI` | Artificial Intelligence |
|
||||
| `cs.CL` | Computation and Language (NLP) |
|
||||
| `cs.CV` | Computer Vision |
|
||||
| `cs.LG` | Machine Learning |
|
||||
| `cs.CR` | Cryptography and Security |
|
||||
| `stat.ML` | Machine Learning (Statistics) |
|
||||
| `math.OC` | Optimization and Control |
|
||||
| `physics.comp-ph` | Computational Physics |
|
||||
|
||||
Full list: https://arxiv.org/category_taxonomy
|
||||
|
||||
## Helper Script
|
||||
|
||||
The `scripts/search_arxiv.py` script handles XML parsing and provides clean output:
|
||||
|
||||
```bash
|
||||
python scripts/search_arxiv.py "GRPO reinforcement learning"
|
||||
python scripts/search_arxiv.py "transformer attention" --max 10 --sort date
|
||||
python scripts/search_arxiv.py --author "Yann LeCun" --max 5
|
||||
python scripts/search_arxiv.py --category cs.AI --sort date
|
||||
python scripts/search_arxiv.py --id 2402.03300
|
||||
python scripts/search_arxiv.py --id 2402.03300,2401.12345
|
||||
```
|
||||
|
||||
No dependencies — uses only Python stdlib.
|
||||
|
||||
---
|
||||
|
||||
## Semantic Scholar (Citations, Related Papers, Author Profiles)
|
||||
|
||||
arXiv doesn't provide citation data or recommendations. Use the **Semantic Scholar API** for that — free, no key needed for basic use (1 req/sec), returns JSON.
|
||||
|
||||
### Get paper details + citations
|
||||
|
||||
```bash
|
||||
# By arXiv ID
|
||||
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300?fields=title,authors,citationCount,referenceCount,influentialCitationCount,year,abstract" | python3 -m json.tool
|
||||
|
||||
# By Semantic Scholar paper ID or DOI
|
||||
curl -s "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example?fields=title,citationCount"
|
||||
```
|
||||
|
||||
### Get citations OF a paper (who cited it)
|
||||
|
||||
```bash
|
||||
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool
|
||||
```
|
||||
|
||||
### Get references FROM a paper (what it cites)
|
||||
|
||||
```bash
|
||||
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool
|
||||
```
|
||||
|
||||
### Search papers (alternative to arXiv search, returns JSON)
|
||||
|
||||
```bash
|
||||
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.tool
|
||||
```
|
||||
|
||||
### Get paper recommendations
|
||||
|
||||
```bash
|
||||
curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.tool
|
||||
```
|
||||
|
||||
### Author profile
|
||||
|
||||
```bash
|
||||
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.tool
|
||||
```
|
||||
|
||||
### Useful Semantic Scholar fields
|
||||
|
||||
`title`, `authors`, `year`, `abstract`, `citationCount`, `referenceCount`, `influentialCitationCount`, `isOpenAccess`, `openAccessPdf`, `fieldsOfStudy`, `publicationVenue`, `externalIds` (contains arXiv ID, DOI, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Complete Research Workflow
|
||||
|
||||
1. **Discover**: `python scripts/search_arxiv.py "your topic" --sort date --max 10`
|
||||
2. **Assess impact**: `curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount"`
|
||||
3. **Read abstract**: `web_extract(urls=["https://arxiv.org/abs/ID"])`
|
||||
4. **Read full paper**: `web_extract(urls=["https://arxiv.org/pdf/ID"])`
|
||||
5. **Find related work**: `curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20"`
|
||||
6. **Get recommendations**: POST to Semantic Scholar recommendations endpoint
|
||||
7. **Track authors**: `curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=NAME"`
|
||||
|
||||
## Rate Limits
|
||||
|
||||
| API | Rate | Auth |
|
||||
|-----|------|------|
|
||||
| arXiv | ~1 req / 3 seconds | None needed |
|
||||
| Semantic Scholar | 1 req / second | None (100/sec with API key) |
|
||||
|
||||
## Notes
|
||||
|
||||
- arXiv returns Atom XML — use the helper script or parsing snippet for clean output
|
||||
- Semantic Scholar returns JSON — pipe through `python3 -m json.tool` for readability
|
||||
- arXiv IDs: old format (`hep-th/0601001`) vs new (`2402.03300`)
|
||||
- PDF: `https://arxiv.org/pdf/{id}` — Abstract: `https://arxiv.org/abs/{id}`
|
||||
- HTML (when available): `https://arxiv.org/html/{id}`
|
||||
- For local PDF processing, see the `ocr-and-documents` skill
|
||||
|
||||
## ID Versioning
|
||||
|
||||
- `arxiv.org/abs/1706.03762` always resolves to the **latest** version
|
||||
- `arxiv.org/abs/1706.03762v1` points to a **specific** immutable version
|
||||
- When generating citations, preserve the version suffix you actually read to prevent citation drift (a later version may substantially change content)
|
||||
- The API `<id>` field returns the versioned URL (e.g., `http://arxiv.org/abs/1706.03762v7`)
|
||||
|
||||
## Withdrawn Papers
|
||||
|
||||
Papers can be withdrawn after submission. When this happens:
|
||||
- The `<summary>` field contains a withdrawal notice (look for "withdrawn" or "retracted")
|
||||
- Metadata fields may be incomplete
|
||||
- Always check the summary before treating a result as a valid paper
|
||||
|
|
@ -0,0 +1,151 @@
|
|||
---
|
||||
title: "Blogwatcher — Monitor blogs and RSS/Atom feeds for updates using the blogwatcher-cli tool"
|
||||
sidebar_label: "Blogwatcher"
|
||||
description: "Monitor blogs and RSS/Atom feeds for updates using the blogwatcher-cli tool"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Blogwatcher
|
||||
|
||||
Monitor blogs and RSS/Atom feeds for updates using the blogwatcher-cli tool. Add blogs, scan for new articles, track read status, and filter by category.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/research/blogwatcher` |
|
||||
| Version | `2.0.0` |
|
||||
| Author | JulienTant (fork of Hyaxia/blogwatcher) |
|
||||
| License | MIT |
|
||||
| Tags | `RSS`, `Blogs`, `Feed-Reader`, `Monitoring` |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Blogwatcher
|
||||
|
||||
Track blog and RSS/Atom feed updates with the `blogwatcher-cli` tool. Supports automatic feed discovery, HTML scraping fallback, OPML import, and read/unread article management.
|
||||
|
||||
## Installation
|
||||
|
||||
Pick one method:
|
||||
|
||||
- **Go:** `go install github.com/JulienTant/blogwatcher-cli/cmd/blogwatcher-cli@latest`
|
||||
- **Docker:** `docker run --rm -v blogwatcher-cli:/data ghcr.io/julientant/blogwatcher-cli`
|
||||
- **Binary (Linux amd64):** `curl -sL https://github.com/JulienTant/blogwatcher-cli/releases/latest/download/blogwatcher-cli_linux_amd64.tar.gz | tar xz -C /usr/local/bin blogwatcher-cli`
|
||||
- **Binary (Linux arm64):** `curl -sL https://github.com/JulienTant/blogwatcher-cli/releases/latest/download/blogwatcher-cli_linux_arm64.tar.gz | tar xz -C /usr/local/bin blogwatcher-cli`
|
||||
- **Binary (macOS Apple Silicon):** `curl -sL https://github.com/JulienTant/blogwatcher-cli/releases/latest/download/blogwatcher-cli_darwin_arm64.tar.gz | tar xz -C /usr/local/bin blogwatcher-cli`
|
||||
- **Binary (macOS Intel):** `curl -sL https://github.com/JulienTant/blogwatcher-cli/releases/latest/download/blogwatcher-cli_darwin_amd64.tar.gz | tar xz -C /usr/local/bin blogwatcher-cli`
|
||||
|
||||
All releases: https://github.com/JulienTant/blogwatcher-cli/releases
|
||||
|
||||
### Docker with persistent storage
|
||||
|
||||
By default the database lives at `~/.blogwatcher-cli/blogwatcher-cli.db`. In Docker this is lost on container restart. Use `BLOGWATCHER_DB` or a volume mount to persist it:
|
||||
|
||||
```bash
|
||||
# Named volume (simplest)
|
||||
docker run --rm -v blogwatcher-cli:/data -e BLOGWATCHER_DB=/data/blogwatcher-cli.db ghcr.io/julientant/blogwatcher-cli scan
|
||||
|
||||
# Host bind mount
|
||||
docker run --rm -v /path/on/host:/data -e BLOGWATCHER_DB=/data/blogwatcher-cli.db ghcr.io/julientant/blogwatcher-cli scan
|
||||
```
|
||||
|
||||
### Migrating from the original blogwatcher
|
||||
|
||||
If upgrading from `Hyaxia/blogwatcher`, move your database:
|
||||
|
||||
```bash
|
||||
mv ~/.blogwatcher/blogwatcher.db ~/.blogwatcher-cli/blogwatcher-cli.db
|
||||
```
|
||||
|
||||
The binary name changed from `blogwatcher` to `blogwatcher-cli`.
|
||||
|
||||
## Common Commands
|
||||
|
||||
### Managing blogs
|
||||
|
||||
- Add a blog: `blogwatcher-cli add "My Blog" https://example.com`
|
||||
- Add with explicit feed: `blogwatcher-cli add "My Blog" https://example.com --feed-url https://example.com/feed.xml`
|
||||
- Add with HTML scraping: `blogwatcher-cli add "My Blog" https://example.com --scrape-selector "article h2 a"`
|
||||
- List tracked blogs: `blogwatcher-cli blogs`
|
||||
- Remove a blog: `blogwatcher-cli remove "My Blog" --yes`
|
||||
- Import from OPML: `blogwatcher-cli import subscriptions.opml`
|
||||
|
||||
### Scanning and reading
|
||||
|
||||
- Scan all blogs: `blogwatcher-cli scan`
|
||||
- Scan one blog: `blogwatcher-cli scan "My Blog"`
|
||||
- List unread articles: `blogwatcher-cli articles`
|
||||
- List all articles: `blogwatcher-cli articles --all`
|
||||
- Filter by blog: `blogwatcher-cli articles --blog "My Blog"`
|
||||
- Filter by category: `blogwatcher-cli articles --category "Engineering"`
|
||||
- Mark article read: `blogwatcher-cli read 1`
|
||||
- Mark article unread: `blogwatcher-cli unread 1`
|
||||
- Mark all read: `blogwatcher-cli read-all`
|
||||
- Mark all read for a blog: `blogwatcher-cli read-all --blog "My Blog" --yes`
|
||||
|
||||
## Environment Variables
|
||||
|
||||
All flags can be set via environment variables with the `BLOGWATCHER_` prefix:
|
||||
|
||||
| Variable | Description |
|
||||
|---|---|
|
||||
| `BLOGWATCHER_DB` | Path to SQLite database file |
|
||||
| `BLOGWATCHER_WORKERS` | Number of concurrent scan workers (default: 8) |
|
||||
| `BLOGWATCHER_SILENT` | Only output "scan done" when scanning |
|
||||
| `BLOGWATCHER_YES` | Skip confirmation prompts |
|
||||
| `BLOGWATCHER_CATEGORY` | Default filter for articles by category |
|
||||
|
||||
## Example Output
|
||||
|
||||
```
|
||||
$ blogwatcher-cli blogs
|
||||
Tracked blogs (1):
|
||||
|
||||
xkcd
|
||||
URL: https://xkcd.com
|
||||
Feed: https://xkcd.com/atom.xml
|
||||
Last scanned: 2026-04-03 10:30
|
||||
```
|
||||
|
||||
```
|
||||
$ blogwatcher-cli scan
|
||||
Scanning 1 blog(s)...
|
||||
|
||||
xkcd
|
||||
Source: RSS | Found: 4 | New: 4
|
||||
|
||||
Found 4 new article(s) total!
|
||||
```
|
||||
|
||||
```
|
||||
$ blogwatcher-cli articles
|
||||
Unread articles (2):
|
||||
|
||||
[1] [new] Barrel - Part 13
|
||||
Blog: xkcd
|
||||
URL: https://xkcd.com/3095/
|
||||
Published: 2026-04-02
|
||||
Categories: Comics, Science
|
||||
|
||||
[2] [new] Volcano Fact
|
||||
Blog: xkcd
|
||||
URL: https://xkcd.com/3094/
|
||||
Published: 2026-04-01
|
||||
Categories: Comics
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Auto-discovers RSS/Atom feeds from blog homepages when no `--feed-url` is provided.
|
||||
- Falls back to HTML scraping if RSS fails and `--scrape-selector` is configured.
|
||||
- Categories from RSS/Atom feeds are stored and can be used to filter articles.
|
||||
- Import blogs in bulk from OPML files exported by Feedly, Inoreader, NewsBlur, etc.
|
||||
- Database stored at `~/.blogwatcher-cli/blogwatcher-cli.db` by default (override with `--db` or `BLOGWATCHER_DB`).
|
||||
- Use `blogwatcher-cli <command> --help` to discover all flags and options.
|
||||
|
|
@ -0,0 +1,523 @@
|
|||
---
|
||||
title: "Llm Wiki — Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base"
|
||||
sidebar_label: "Llm Wiki"
|
||||
description: "Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Llm Wiki
|
||||
|
||||
Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/research/llm-wiki` |
|
||||
| Version | `2.1.0` |
|
||||
| Author | Hermes Agent |
|
||||
| License | MIT |
|
||||
| Tags | `wiki`, `knowledge-base`, `research`, `notes`, `markdown`, `rag-alternative` |
|
||||
| Related skills | [`obsidian`](/docs/user-guide/skills/bundled/note-taking/note-taking-obsidian), [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv) |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Karpathy's LLM Wiki
|
||||
|
||||
Build and maintain a persistent, compounding knowledge base as interlinked markdown files.
|
||||
Based on [Andrej Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
|
||||
|
||||
Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki
|
||||
compiles knowledge once and keeps it current. Cross-references are already there.
|
||||
Contradictions have already been flagged. Synthesis reflects everything ingested.
|
||||
|
||||
**Division of labor:** The human curates sources and directs analysis. The agent
|
||||
summarizes, cross-references, files, and maintains consistency.
|
||||
|
||||
## When This Skill Activates
|
||||
|
||||
Use this skill when the user:
|
||||
- Asks to create, build, or start a wiki or knowledge base
|
||||
- Asks to ingest, add, or process a source into their wiki
|
||||
- Asks a question and an existing wiki is present at the configured path
|
||||
- Asks to lint, audit, or health-check their wiki
|
||||
- References their wiki, knowledge base, or "notes" in a research context
|
||||
|
||||
## Wiki Location
|
||||
|
||||
**Location:** Set via `WIKI_PATH` environment variable (e.g. in `~/.hermes/.env`).
|
||||
|
||||
If unset, defaults to `~/wiki`.
|
||||
|
||||
```bash
|
||||
WIKI="${WIKI_PATH:-$HOME/wiki}"
|
||||
```
|
||||
|
||||
The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or
|
||||
any editor. No database, no special tooling required.
|
||||
|
||||
## Architecture: Three Layers
|
||||
|
||||
```
|
||||
wiki/
|
||||
├── SCHEMA.md # Conventions, structure rules, domain config
|
||||
├── index.md # Sectioned content catalog with one-line summaries
|
||||
├── log.md # Chronological action log (append-only, rotated yearly)
|
||||
├── raw/ # Layer 1: Immutable source material
|
||||
│ ├── articles/ # Web articles, clippings
|
||||
│ ├── papers/ # PDFs, arxiv papers
|
||||
│ ├── transcripts/ # Meeting notes, interviews
|
||||
│ └── assets/ # Images, diagrams referenced by sources
|
||||
├── entities/ # Layer 2: Entity pages (people, orgs, products, models)
|
||||
├── concepts/ # Layer 2: Concept/topic pages
|
||||
├── comparisons/ # Layer 2: Side-by-side analyses
|
||||
└── queries/ # Layer 2: Filed query results worth keeping
|
||||
```
|
||||
|
||||
**Layer 1 — Raw Sources:** Immutable. The agent reads but never modifies these.
|
||||
**Layer 2 — The Wiki:** Agent-owned markdown files. Created, updated, and
|
||||
cross-referenced by the agent.
|
||||
**Layer 3 — The Schema:** `SCHEMA.md` defines structure, conventions, and tag taxonomy.
|
||||
|
||||
## Resuming an Existing Wiki (CRITICAL — do this every session)
|
||||
|
||||
When the user has an existing wiki, **always orient yourself before doing anything**:
|
||||
|
||||
① **Read `SCHEMA.md`** — understand the domain, conventions, and tag taxonomy.
|
||||
② **Read `index.md`** — learn what pages exist and their summaries.
|
||||
③ **Scan recent `log.md`** — read the last 20-30 entries to understand recent activity.
|
||||
|
||||
```bash
|
||||
WIKI="${WIKI_PATH:-$HOME/wiki}"
|
||||
# Orientation reads at session start
|
||||
read_file "$WIKI/SCHEMA.md"
|
||||
read_file "$WIKI/index.md"
|
||||
read_file "$WIKI/log.md" offset=<last 30 lines>
|
||||
```
|
||||
|
||||
Only after orientation should you ingest, query, or lint. This prevents:
|
||||
- Creating duplicate pages for entities that already exist
|
||||
- Missing cross-references to existing content
|
||||
- Contradicting the schema's conventions
|
||||
- Repeating work already logged
|
||||
|
||||
For large wikis (100+ pages), also run a quick `search_files` for the topic
|
||||
at hand before creating anything new.
|
||||
|
||||
## Initializing a New Wiki
|
||||
|
||||
When the user asks to create or start a wiki:
|
||||
|
||||
1. Determine the wiki path (from `$WIKI_PATH` env var, or ask the user; default `~/wiki`)
|
||||
2. Create the directory structure above
|
||||
3. Ask the user what domain the wiki covers — be specific
|
||||
4. Write `SCHEMA.md` customized to the domain (see template below)
|
||||
5. Write initial `index.md` with sectioned header
|
||||
6. Write initial `log.md` with creation entry
|
||||
7. Confirm the wiki is ready and suggest first sources to ingest
|
||||
|
||||
### SCHEMA.md Template
|
||||
|
||||
Adapt to the user's domain. The schema constrains agent behavior and ensures consistency:
|
||||
|
||||
```markdown
|
||||
# Wiki Schema
|
||||
|
||||
## Domain
|
||||
[What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"]
|
||||
|
||||
## Conventions
|
||||
- File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`)
|
||||
- Every wiki page starts with YAML frontmatter (see below)
|
||||
- Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page)
|
||||
- When updating a page, always bump the `updated` date
|
||||
- Every new page must be added to `index.md` under the correct section
|
||||
- Every action must be appended to `log.md`
|
||||
- **Provenance markers:** On pages that synthesize 3+ sources, append `^[raw/articles/source-file.md]`
|
||||
at the end of paragraphs whose claims come from a specific source. This lets a reader trace each
|
||||
claim back without re-reading the whole raw file. Optional on single-source pages where the
|
||||
`sources:` frontmatter is enough.
|
||||
|
||||
## Frontmatter
|
||||
```yaml
|
||||
---
|
||||
title: Page Title
|
||||
created: YYYY-MM-DD
|
||||
updated: YYYY-MM-DD
|
||||
type: entity | concept | comparison | query | summary
|
||||
tags: [from taxonomy below]
|
||||
sources: [raw/articles/source-name.md]
|
||||
# Optional quality signals:
|
||||
confidence: high | medium | low # how well-supported the claims are
|
||||
contested: true # set when the page has unresolved contradictions
|
||||
contradictions: [other-page-slug] # pages this one conflicts with
|
||||
---
|
||||
```
|
||||
|
||||
`confidence` and `contested` are optional but recommended for opinion-heavy or fast-moving
|
||||
topics. Lint surfaces `contested: true` and `confidence: low` pages for review so weak claims
|
||||
don't silently harden into accepted wiki fact.
|
||||
|
||||
### raw/ Frontmatter
|
||||
|
||||
Raw sources ALSO get a small frontmatter block so re-ingests can detect drift:
|
||||
|
||||
```yaml
|
||||
---
|
||||
source_url: https://example.com/article # original URL, if applicable
|
||||
ingested: YYYY-MM-DD
|
||||
sha256: <hex digest of the raw content below the frontmatter>
|
||||
---
|
||||
```
|
||||
|
||||
The `sha256:` lets a future re-ingest of the same URL skip processing when content is unchanged,
|
||||
and flag drift when it has changed. Compute over the body only (everything after the closing
|
||||
`---`), not the frontmatter itself.
|
||||
|
||||
## Tag Taxonomy
|
||||
[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]
|
||||
|
||||
Example for AI/ML:
|
||||
- Models: model, architecture, benchmark, training
|
||||
- People/Orgs: person, company, lab, open-source
|
||||
- Techniques: optimization, fine-tuning, inference, alignment, data
|
||||
- Meta: comparison, timeline, controversy, prediction
|
||||
|
||||
Rule: every tag on a page must appear in this taxonomy. If a new tag is needed,
|
||||
add it here first, then use it. This prevents tag sprawl.
|
||||
|
||||
## Page Thresholds
|
||||
- **Create a page** when an entity/concept appears in 2+ sources OR is central to one source
|
||||
- **Add to existing page** when a source mentions something already covered
|
||||
- **DON'T create a page** for passing mentions, minor details, or things outside the domain
|
||||
- **Split a page** when it exceeds ~200 lines — break into sub-topics with cross-links
|
||||
- **Archive a page** when its content is fully superseded — move to `_archive/`, remove from index
|
||||
|
||||
## Entity Pages
|
||||
One page per notable entity. Include:
|
||||
- Overview / what it is
|
||||
- Key facts and dates
|
||||
- Relationships to other entities ([[wikilinks]])
|
||||
- Source references
|
||||
|
||||
## Concept Pages
|
||||
One page per concept or topic. Include:
|
||||
- Definition / explanation
|
||||
- Current state of knowledge
|
||||
- Open questions or debates
|
||||
- Related concepts ([[wikilinks]])
|
||||
|
||||
## Comparison Pages
|
||||
Side-by-side analyses. Include:
|
||||
- What is being compared and why
|
||||
- Dimensions of comparison (table format preferred)
|
||||
- Verdict or synthesis
|
||||
- Sources
|
||||
|
||||
## Update Policy
|
||||
When new information conflicts with existing content:
|
||||
1. Check the dates — newer sources generally supersede older ones
|
||||
2. If genuinely contradictory, note both positions with dates and sources
|
||||
3. Mark the contradiction in frontmatter: `contradictions: [page-name]`
|
||||
4. Flag for user review in the lint report
|
||||
```
|
||||
|
||||
### index.md Template
|
||||
|
||||
The index is sectioned by type. Each entry is one line: wikilink + summary.
|
||||
|
||||
```markdown
|
||||
# Wiki Index
|
||||
|
||||
> Content catalog. Every wiki page listed under its type with a one-line summary.
|
||||
> Read this first to find relevant pages for any query.
|
||||
> Last updated: YYYY-MM-DD | Total pages: N
|
||||
|
||||
## Entities
|
||||
<!-- Alphabetical within section -->
|
||||
|
||||
## Concepts
|
||||
|
||||
## Comparisons
|
||||
|
||||
## Queries
|
||||
```
|
||||
|
||||
**Scaling rule:** When any section exceeds 50 entries, split it into sub-sections
|
||||
by first letter or sub-domain. When the index exceeds 200 entries total, create
|
||||
a `_meta/topic-map.md` that groups pages by theme for faster navigation.
|
||||
|
||||
### log.md Template
|
||||
|
||||
```markdown
|
||||
# Wiki Log
|
||||
|
||||
> Chronological record of all wiki actions. Append-only.
|
||||
> Format: `## [YYYY-MM-DD] action | subject`
|
||||
> Actions: ingest, update, query, lint, create, archive, delete
|
||||
> When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh.
|
||||
|
||||
## [YYYY-MM-DD] create | Wiki initialized
|
||||
- Domain: [domain]
|
||||
- Structure created with SCHEMA.md, index.md, log.md
|
||||
```
|
||||
|
||||
## Core Operations
|
||||
|
||||
### 1. Ingest
|
||||
|
||||
When the user provides a source (URL, file, paste), integrate it into the wiki:
|
||||
|
||||
① **Capture the raw source:**
|
||||
- URL → use `web_extract` to get markdown, save to `raw/articles/`
|
||||
- PDF → use `web_extract` (handles PDFs), save to `raw/papers/`
|
||||
- Pasted text → save to appropriate `raw/` subdirectory
|
||||
- Name the file descriptively: `raw/articles/karpathy-llm-wiki-2026.md`
|
||||
- **Add raw frontmatter** (`source_url`, `ingested`, `sha256` of the body).
|
||||
On re-ingest of the same URL: recompute the sha256, compare to the stored value —
|
||||
skip if identical, flag drift and update if different. This is cheap enough to
|
||||
do on every re-ingest and catches silent source changes.
|
||||
|
||||
② **Discuss takeaways** with the user — what's interesting, what matters for
|
||||
the domain. (Skip this in automated/cron contexts — proceed directly.)
|
||||
|
||||
③ **Check what already exists** — search index.md and use `search_files` to find
|
||||
existing pages for mentioned entities/concepts. This is the difference between
|
||||
a growing wiki and a pile of duplicates.
|
||||
|
||||
④ **Write or update wiki pages:**
|
||||
- **New entities/concepts:** Create pages only if they meet the Page Thresholds
|
||||
in SCHEMA.md (2+ source mentions, or central to one source)
|
||||
- **Existing pages:** Add new information, update facts, bump `updated` date.
|
||||
When new info contradicts existing content, follow the Update Policy.
|
||||
- **Cross-reference:** Every new or updated page must link to at least 2 other
|
||||
pages via `[[wikilinks]]`. Check that existing pages link back.
|
||||
- **Tags:** Only use tags from the taxonomy in SCHEMA.md
|
||||
- **Provenance:** On pages synthesizing 3+ sources, append `^[raw/articles/source.md]`
|
||||
markers to paragraphs whose claims trace to a specific source.
|
||||
- **Confidence:** For opinion-heavy, fast-moving, or single-source claims, set
|
||||
`confidence: medium` or `low` in frontmatter. Don't mark `high` unless the
|
||||
claim is well-supported across multiple sources.
|
||||
|
||||
⑤ **Update navigation:**
|
||||
- Add new pages to `index.md` under the correct section, alphabetically
|
||||
- Update the "Total pages" count and "Last updated" date in index header
|
||||
- Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Title`
|
||||
- List every file created or updated in the log entry
|
||||
|
||||
⑥ **Report what changed** — list every file created or updated to the user.
|
||||
|
||||
A single source can trigger updates across 5-15 wiki pages. This is normal
|
||||
and desired — it's the compounding effect.
|
||||
|
||||
### 2. Query
|
||||
|
||||
When the user asks a question about the wiki's domain:
|
||||
|
||||
① **Read `index.md`** to identify relevant pages.
|
||||
② **For wikis with 100+ pages**, also `search_files` across all `.md` files
|
||||
for key terms — the index alone may miss relevant content.
|
||||
③ **Read the relevant pages** using `read_file`.
|
||||
④ **Synthesize an answer** from the compiled knowledge. Cite the wiki pages
|
||||
you drew from: "Based on [[page-a]] and [[page-b]]..."
|
||||
⑤ **File valuable answers back** — if the answer is a substantial comparison,
|
||||
deep dive, or novel synthesis, create a page in `queries/` or `comparisons/`.
|
||||
Don't file trivial lookups — only answers that would be painful to re-derive.
|
||||
⑥ **Update log.md** with the query and whether it was filed.
|
||||
|
||||
### 3. Lint
|
||||
|
||||
When the user asks to lint, health-check, or audit the wiki:
|
||||
|
||||
① **Orphan pages:** Find pages with no inbound `[[wikilinks]]` from other pages.
|
||||
```python
|
||||
# Use execute_code for this — programmatic scan across all wiki pages
|
||||
import os, re
|
||||
from collections import defaultdict
|
||||
wiki = "<WIKI_PATH>"
|
||||
# Scan all .md files in entities/, concepts/, comparisons/, queries/
|
||||
# Extract all [[wikilinks]] — build inbound link map
|
||||
# Pages with zero inbound links are orphans
|
||||
```
|
||||
|
||||
② **Broken wikilinks:** Find `[[links]]` that point to pages that don't exist.
|
||||
|
||||
③ **Index completeness:** Every wiki page should appear in `index.md`. Compare
|
||||
the filesystem against index entries.
|
||||
|
||||
④ **Frontmatter validation:** Every wiki page must have all required fields
|
||||
(title, created, updated, type, tags, sources). Tags must be in the taxonomy.
|
||||
|
||||
⑤ **Stale content:** Pages whose `updated` date is >90 days older than the most
|
||||
recent source that mentions the same entities.
|
||||
|
||||
⑥ **Contradictions:** Pages on the same topic with conflicting claims. Look for
|
||||
pages that share tags/entities but state different facts. Surface all pages
|
||||
with `contested: true` or `contradictions:` frontmatter for user review.
|
||||
|
||||
⑦ **Quality signals:** List pages with `confidence: low` and any page that cites
|
||||
only a single source but has no confidence field set — these are candidates
|
||||
for either finding corroboration or demoting to `confidence: medium`.
|
||||
|
||||
⑧ **Source drift:** For each file in `raw/` with a `sha256:` frontmatter, recompute
|
||||
the hash and flag mismatches. Mismatches indicate the raw file was edited
|
||||
(shouldn't happen — raw/ is immutable) or ingested from a URL that has since
|
||||
changed. Not a hard error, but worth reporting.
|
||||
|
||||
⑨ **Page size:** Flag pages over 200 lines — candidates for splitting.
|
||||
|
||||
⑩ **Tag audit:** List all tags in use, flag any not in the SCHEMA.md taxonomy.
|
||||
|
||||
⑪ **Log rotation:** If log.md exceeds 500 entries, rotate it.
|
||||
|
||||
⑫ **Report findings** with specific file paths and suggested actions, grouped by
|
||||
severity (broken links > orphans > source drift > contested pages > stale content > style issues).
|
||||
|
||||
⑬ **Append to log.md:** `## [YYYY-MM-DD] lint | N issues found`
|
||||
|
||||
## Working with the Wiki
|
||||
|
||||
### Searching
|
||||
|
||||
```bash
|
||||
# Find pages by content
|
||||
search_files "transformer" path="$WIKI" file_glob="*.md"
|
||||
|
||||
# Find pages by filename
|
||||
search_files "*.md" target="files" path="$WIKI"
|
||||
|
||||
# Find pages by tag
|
||||
search_files "tags:.*alignment" path="$WIKI" file_glob="*.md"
|
||||
|
||||
# Recent activity
|
||||
read_file "$WIKI/log.md" offset=<last 20 lines>
|
||||
```
|
||||
|
||||
### Bulk Ingest
|
||||
|
||||
When ingesting multiple sources at once, batch the updates:
|
||||
1. Read all sources first
|
||||
2. Identify all entities and concepts across all sources
|
||||
3. Check existing pages for all of them (one search pass, not N)
|
||||
4. Create/update pages in one pass (avoids redundant updates)
|
||||
5. Update index.md once at the end
|
||||
6. Write a single log entry covering the batch
|
||||
|
||||
### Archiving
|
||||
|
||||
When content is fully superseded or the domain scope changes:
|
||||
1. Create `_archive/` directory if it doesn't exist
|
||||
2. Move the page to `_archive/` with its original path (e.g., `_archive/entities/old-page.md`)
|
||||
3. Remove from `index.md`
|
||||
4. Update any pages that linked to it — replace wikilink with plain text + "(archived)"
|
||||
5. Log the archive action
|
||||
|
||||
### Obsidian Integration
|
||||
|
||||
The wiki directory works as an Obsidian vault out of the box:
|
||||
- `[[wikilinks]]` render as clickable links
|
||||
- Graph View visualizes the knowledge network
|
||||
- YAML frontmatter powers Dataview queries
|
||||
- The `raw/assets/` folder holds images referenced via `![[image.png]]`
|
||||
|
||||
For best results:
|
||||
- Set Obsidian's attachment folder to `raw/assets/`
|
||||
- Enable "Wikilinks" in Obsidian settings (usually on by default)
|
||||
- Install Dataview plugin for queries like `TABLE tags FROM "entities" WHERE contains(tags, "company")`
|
||||
|
||||
If using the Obsidian skill alongside this one, set `OBSIDIAN_VAULT_PATH` to the
|
||||
same directory as the wiki path.
|
||||
|
||||
### Obsidian Headless (servers and headless machines)
|
||||
|
||||
On machines without a display, use `obsidian-headless` instead of the desktop app.
|
||||
It syncs vaults via Obsidian Sync without a GUI — perfect for agents running on
|
||||
servers that write to the wiki while Obsidian desktop reads it on another device.
|
||||
|
||||
**Setup:**
|
||||
```bash
|
||||
# Requires Node.js 22+
|
||||
npm install -g obsidian-headless
|
||||
|
||||
# Login (requires Obsidian account with Sync subscription)
|
||||
ob login --email <email> --password '<password>'
|
||||
|
||||
# Create a remote vault for the wiki
|
||||
ob sync-create-remote --name "LLM Wiki"
|
||||
|
||||
# Connect the wiki directory to the vault
|
||||
cd ~/wiki
|
||||
ob sync-setup --vault "<vault-id>"
|
||||
|
||||
# Initial sync
|
||||
ob sync
|
||||
|
||||
# Continuous sync (foreground — use systemd for background)
|
||||
ob sync --continuous
|
||||
```
|
||||
|
||||
**Continuous background sync via systemd:**
|
||||
```ini
|
||||
# ~/.config/systemd/user/obsidian-wiki-sync.service
|
||||
[Unit]
|
||||
Description=Obsidian LLM Wiki Sync
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
ExecStart=/path/to/ob sync --continuous
|
||||
WorkingDirectory=/home/user/wiki
|
||||
Restart=on-failure
|
||||
RestartSec=10
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
```
|
||||
|
||||
```bash
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now obsidian-wiki-sync
|
||||
# Enable linger so sync survives logout:
|
||||
sudo loginctl enable-linger $USER
|
||||
```
|
||||
|
||||
This lets the agent write to `~/wiki` on a server while you browse the same
|
||||
vault in Obsidian on your laptop/phone — changes appear within seconds.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Never modify files in `raw/`** — sources are immutable. Corrections go in wiki pages.
|
||||
- **Always orient first** — read SCHEMA + index + recent log before any operation in a new session.
|
||||
Skipping this causes duplicates and missed cross-references.
|
||||
- **Always update index.md and log.md** — skipping this makes the wiki degrade. These are the
|
||||
navigational backbone.
|
||||
- **Don't create pages for passing mentions** — follow the Page Thresholds in SCHEMA.md. A name
|
||||
appearing once in a footnote doesn't warrant an entity page.
|
||||
- **Don't create pages without cross-references** — isolated pages are invisible. Every page must
|
||||
link to at least 2 other pages.
|
||||
- **Frontmatter is required** — it enables search, filtering, and staleness detection.
|
||||
- **Tags must come from the taxonomy** — freeform tags decay into noise. Add new tags to SCHEMA.md
|
||||
first, then use them.
|
||||
- **Keep pages scannable** — a wiki page should be readable in 30 seconds. Split pages over
|
||||
200 lines. Move detailed analysis to dedicated deep-dive pages.
|
||||
- **Ask before mass-updating** — if an ingest would touch 10+ existing pages, confirm
|
||||
the scope with the user first.
|
||||
- **Rotate the log** — when log.md exceeds 500 entries, rename it `log-YYYY.md` and start fresh.
|
||||
The agent should check log size during lint.
|
||||
- **Handle contradictions explicitly** — don't silently overwrite. Note both claims with dates,
|
||||
mark in frontmatter, flag for user review.
|
||||
|
||||
## Related Tools
|
||||
|
||||
[llm-wiki-compiler](https://github.com/atomicmemory/llm-wiki-compiler) is a Node.js CLI that
|
||||
compiles sources into a concept wiki with the same Karpathy inspiration. It's Obsidian-compatible,
|
||||
so users who want a scheduled/CLI-driven compile pipeline can point it at the same vault this
|
||||
skill maintains. Trade-offs: it owns page generation (replaces the agent's judgment on page
|
||||
creation) and is tuned for small corpora. Use this skill when you want agent-in-the-loop curation;
|
||||
use llmwiki when you want batch compile of a source directory.
|
||||
|
|
@ -0,0 +1,95 @@
|
|||
---
|
||||
title: "Polymarket — Query Polymarket prediction market data — search markets, get prices, orderbooks, and price history"
|
||||
sidebar_label: "Polymarket"
|
||||
description: "Query Polymarket prediction market data — search markets, get prices, orderbooks, and price history"
|
||||
---
|
||||
|
||||
{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
|
||||
|
||||
# Polymarket
|
||||
|
||||
Query Polymarket prediction market data — search markets, get prices, orderbooks, and price history. Read-only via public REST APIs, no API key needed.
|
||||
|
||||
## Skill metadata
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Source | Bundled (installed by default) |
|
||||
| Path | `skills/research/polymarket` |
|
||||
| Version | `1.0.0` |
|
||||
| Author | Hermes Agent + Teknium |
|
||||
|
||||
## Reference: full SKILL.md
|
||||
|
||||
:::info
|
||||
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
|
||||
:::
|
||||
|
||||
# Polymarket — Prediction Market Data
|
||||
|
||||
Query prediction market data from Polymarket using their public REST APIs.
|
||||
All endpoints are read-only and require zero authentication.
|
||||
|
||||
See `references/api-endpoints.md` for the full endpoint reference with curl examples.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks about prediction markets, betting odds, or event probabilities
|
||||
- User wants to know "what are the odds of X happening?"
|
||||
- User asks about Polymarket specifically
|
||||
- User wants market prices, orderbook data, or price history
|
||||
- User asks to monitor or track prediction market movements
|
||||
|
||||
## Key Concepts
|
||||
|
||||
- **Events** contain one or more **Markets** (1:many relationship)
|
||||
- **Markets** are binary outcomes with Yes/No prices between 0.00 and 1.00
|
||||
- Prices ARE probabilities: price 0.65 means the market thinks 65% likely
|
||||
- `outcomePrices` field: JSON-encoded array like `["0.80", "0.20"]`
|
||||
- `clobTokenIds` field: JSON-encoded array of two token IDs [Yes, No] for price/book queries
|
||||
- `conditionId` field: hex string used for price history queries
|
||||
- Volume is in USDC (US dollars)
|
||||
|
||||
## Three Public APIs
|
||||
|
||||
1. **Gamma API** at `gamma-api.polymarket.com` — Discovery, search, browsing
|
||||
2. **CLOB API** at `clob.polymarket.com` — Real-time prices, orderbooks, history
|
||||
3. **Data API** at `data-api.polymarket.com` — Trades, open interest
|
||||
|
||||
## Typical Workflow
|
||||
|
||||
When a user asks about prediction market odds:
|
||||
|
||||
1. **Search** using the Gamma API public-search endpoint with their query
|
||||
2. **Parse** the response — extract events and their nested markets
|
||||
3. **Present** market question, current prices as percentages, and volume
|
||||
4. **Deep dive** if asked — use clobTokenIds for orderbook, conditionId for history
|
||||
|
||||
## Presenting Results
|
||||
|
||||
Format prices as percentages for readability:
|
||||
- outcomePrices `["0.652", "0.348"]` becomes "Yes: 65.2%, No: 34.8%"
|
||||
- Always show the market question and probability
|
||||
- Include volume when available
|
||||
|
||||
Example: `"Will X happen?" — 65.2% Yes ($1.2M volume)`
|
||||
|
||||
## Parsing Double-Encoded Fields
|
||||
|
||||
The Gamma API returns `outcomePrices`, `outcomes`, and `clobTokenIds` as JSON strings
|
||||
inside JSON responses (double-encoded). When processing with Python, parse them with
|
||||
`json.loads(market['outcomePrices'])` to get the actual array.
|
||||
|
||||
## Rate Limits
|
||||
|
||||
Generous — unlikely to hit for normal usage:
|
||||
- Gamma: 4,000 requests per 10 seconds (general)
|
||||
- CLOB: 9,000 requests per 10 seconds (general)
|
||||
- Data: 1,000 requests per 10 seconds (general)
|
||||
|
||||
## Limitations
|
||||
|
||||
- This skill is read-only — it does not support placing trades
|
||||
- Trading requires wallet-based crypto authentication (EIP-712 signatures)
|
||||
- Some new markets may have empty price history
|
||||
- Geographic restrictions apply to trading but read-only data is globally accessible
|
||||
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue