docs(website): dedicated page per bundled + optional skill (#14929)

Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-26 01:01:40 +00:00 · 2026-04-23 22:22:11 -07:00 · 2026-04-23 22:22:11 -07:00 · 0f6eabb890
commit 0f6eabb890
parent eb93f88e1d
139 changed files with 43523 additions and 306 deletions
--- a/website/docs/user-guide/skills/optional/mlops/mlops-pinecone.md
+++ b/website/docs/user-guide/skills/optional/mlops/mlops-pinecone.md
@ -0,0 +1,376 @@
+---
+title: "Pinecone — Managed vector database for production AI applications"
+sidebar_label: "Pinecone"
+description: "Managed vector database for production AI applications"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Pinecone
+
+Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (&lt;100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/mlops/pinecone` |
+| Path | `optional-skills/mlops/pinecone` |
+| Version | `1.0.0` |
+| Author | Orchestra Research |
+| License | MIT |
+| Dependencies | `pinecone-client` |
+| Tags | `RAG`, `Pinecone`, `Vector Database`, `Managed Service`, `Serverless`, `Hybrid Search`, `Production`, `Auto-Scaling`, `Low Latency`, `Recommendations` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Pinecone - Managed Vector Database
+
+The vector database for production AI applications.
+
+## When to use Pinecone
+
+**Use when:**
+- Need managed, serverless vector database
+- Production RAG applications
+- Auto-scaling required
+- Low latency critical (&lt;100ms)
+- Don't want to manage infrastructure
+- Need hybrid search (dense + sparse vectors)
+
+**Metrics**:
+- Fully managed SaaS
+- Auto-scales to billions of vectors
+- **p95 latency &lt;100ms**
+- 99.9% uptime SLA
+
+**Use alternatives instead**:
+- **Chroma**: Self-hosted, open-source
+- **FAISS**: Offline, pure similarity search
+- **Weaviate**: Self-hosted with more features
+
+## Quick start
+
+### Installation
+
+```bash
+pip install pinecone-client
+```
+
+### Basic usage
+
+```python
+from pinecone import Pinecone, ServerlessSpec
+
+# Initialize
+pc = Pinecone(api_key="your-api-key")
+
+# Create index
+pc.create_index(
+    name="my-index",
+    dimension=1536,  # Must match embedding dimension
+    metric="cosine",  # or "euclidean", "dotproduct"
+    spec=ServerlessSpec(cloud="aws", region="us-east-1")
+)
+
+# Connect to index
+index = pc.Index("my-index")
+
+# Upsert vectors
+index.upsert(vectors=[
+    {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
+    {"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
+])
+
+# Query
+results = index.query(
+    vector=[0.1, 0.2, ...],
+    top_k=5,
+    include_metadata=True
+)
+
+print(results["matches"])
+```
+
+## Core operations
+
+### Create index
+
+```python
+# Serverless (recommended)
+pc.create_index(
+    name="my-index",
+    dimension=1536,
+    metric="cosine",
+    spec=ServerlessSpec(
+        cloud="aws",         # or "gcp", "azure"
+        region="us-east-1"
+    )
+)
+
+# Pod-based (for consistent performance)
+from pinecone import PodSpec
+
+pc.create_index(
+    name="my-index",
+    dimension=1536,
+    metric="cosine",
+    spec=PodSpec(
+        environment="us-east1-gcp",
+        pod_type="p1.x1"
+    )
+)
+```
+
+### Upsert vectors
+
+```python
+# Single upsert
+index.upsert(vectors=[
+    {
+        "id": "doc1",
+        "values": [0.1, 0.2, ...],  # 1536 dimensions
+        "metadata": {
+            "text": "Document content",
+            "category": "tutorial",
+            "timestamp": "2025-01-01"
+        }
+    }
+])
+
+# Batch upsert (recommended)
+vectors = [
+    {"id": f"vec{i}", "values": embedding, "metadata": metadata}
+    for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
+]
+
+index.upsert(vectors=vectors, batch_size=100)
+```
+
+### Query vectors
+
+```python
+# Basic query
+results = index.query(
+    vector=[0.1, 0.2, ...],
+    top_k=10,
+    include_metadata=True,
+    include_values=False
+)
+
+# With metadata filtering
+results = index.query(
+    vector=[0.1, 0.2, ...],
+    top_k=5,
+    filter={"category": {"$eq": "tutorial"}}
+)
+
+# Namespace query
+results = index.query(
+    vector=[0.1, 0.2, ...],
+    top_k=5,
+    namespace="production"
+)
+
+# Access results
+for match in results["matches"]:
+    print(f"ID: {match['id']}")
+    print(f"Score: {match['score']}")
+    print(f"Metadata: {match['metadata']}")
+```
+
+### Metadata filtering
+
+```python
+# Exact match
+filter = {"category": "tutorial"}
+
+# Comparison
+filter = {"price": {"$gte": 100}}  # $gt, $gte, $lt, $lte, $ne
+
+# Logical operators
+filter = {
+    "$and": [
+        {"category": "tutorial"},
+        {"difficulty": {"$lte": 3}}
+    ]
+}  # Also: $or
+
+# In operator
+filter = {"tags": {"$in": ["python", "ml"]}}
+```
+
+## Namespaces
+
+```python
+# Partition data by namespace
+index.upsert(
+    vectors=[{"id": "vec1", "values": [...]}],
+    namespace="user-123"
+)
+
+# Query specific namespace
+results = index.query(
+    vector=[...],
+    namespace="user-123",
+    top_k=5
+)
+
+# List namespaces
+stats = index.describe_index_stats()
+print(stats['namespaces'])
+```
+
+## Hybrid search (dense + sparse)
+
+```python
+# Upsert with sparse vectors
+index.upsert(vectors=[
+    {
+        "id": "doc1",
+        "values": [0.1, 0.2, ...],  # Dense vector
+        "sparse_values": {
+            "indices": [10, 45, 123],  # Token IDs
+            "values": [0.5, 0.3, 0.8]   # TF-IDF scores
+        },
+        "metadata": {"text": "..."}
+    }
+])
+
+# Hybrid query
+results = index.query(
+    vector=[0.1, 0.2, ...],
+    sparse_vector={
+        "indices": [10, 45],
+        "values": [0.5, 0.3]
+    },
+    top_k=5,
+    alpha=0.5  # 0=sparse, 1=dense, 0.5=hybrid
+)
+```
+
+## LangChain integration
+
+```python
+from langchain_pinecone import PineconeVectorStore
+from langchain_openai import OpenAIEmbeddings
+
+# Create vector store
+vectorstore = PineconeVectorStore.from_documents(
+    documents=docs,
+    embedding=OpenAIEmbeddings(),
+    index_name="my-index"
+)
+
+# Query
+results = vectorstore.similarity_search("query", k=5)
+
+# With metadata filter
+results = vectorstore.similarity_search(
+    "query",
+    k=5,
+    filter={"category": "tutorial"}
+)
+
+# As retriever
+retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
+```
+
+## LlamaIndex integration
+
+```python
+from llama_index.vector_stores.pinecone import PineconeVectorStore
+
+# Connect to Pinecone
+pc = Pinecone(api_key="your-key")
+pinecone_index = pc.Index("my-index")
+
+# Create vector store
+vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
+
+# Use in LlamaIndex
+from llama_index.core import StorageContext, VectorStoreIndex
+
+storage_context = StorageContext.from_defaults(vector_store=vector_store)
+index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
+```
+
+## Index management
+
+```python
+# List indices
+indexes = pc.list_indexes()
+
+# Describe index
+index_info = pc.describe_index("my-index")
+print(index_info)
+
+# Get index stats
+stats = index.describe_index_stats()
+print(f"Total vectors: {stats['total_vector_count']}")
+print(f"Namespaces: {stats['namespaces']}")
+
+# Delete index
+pc.delete_index("my-index")
+```
+
+## Delete vectors
+
+```python
+# Delete by ID
+index.delete(ids=["vec1", "vec2"])
+
+# Delete by filter
+index.delete(filter={"category": "old"})
+
+# Delete all in namespace
+index.delete(delete_all=True, namespace="test")
+
+# Delete entire index
+index.delete(delete_all=True)
+```
+
+## Best practices
+
+1. **Use serverless** - Auto-scaling, cost-effective
+2. **Batch upserts** - More efficient (100-200 per batch)
+3. **Add metadata** - Enable filtering
+4. **Use namespaces** - Isolate data by user/tenant
+5. **Monitor usage** - Check Pinecone dashboard
+6. **Optimize filters** - Index frequently filtered fields
+7. **Test with free tier** - 1 index, 100K vectors free
+8. **Use hybrid search** - Better quality
+9. **Set appropriate dimensions** - Match embedding model
+10. **Regular backups** - Export important data
+
+## Performance
+
+| Operation | Latency | Notes |
+|-----------|---------|-------|
+| Upsert | ~50-100ms | Per batch |
+| Query (p50) | ~50ms | Depends on index size |
+| Query (p95) | ~100ms | SLA target |
+| Metadata filter | ~+10-20ms | Additional overhead |
+
+## Pricing (as of 2025)
+
+**Serverless**:
+- $0.096 per million read units
+- $0.06 per million write units
+- $0.06 per GB storage/month
+
+**Free tier**:
+- 1 serverless index
+- 100K vectors (1536 dimensions)
+- Great for prototyping
+
+## Resources
+
+- **Website**: https://www.pinecone.io
+- **Docs**: https://docs.pinecone.io
+- **Console**: https://app.pinecone.io
+- **Pricing**: https://www.pinecone.io/pricing