docs(website): dedicated page per bundled + optional skill (#14929)

Generates a full dedicated Docusaurus page for every one of the 132 skills (73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/. Each page carries the skill's description, metadata (version, author, license, dependencies, platform gating, tags, related skills cross-linked to their own pages), and the complete SKILL.md body that Hermes loads at runtime. Previously the two catalog pages just listed skills with a one-line blurb and no way to see what the skill actually did — users had to go read the source repo. Now every skill has a browsable, searchable, cross-linked reference in the docs. - website/scripts/generate-skill-docs.py — generator that reads skills/ and optional-skills/, writes per-skill pages, regenerates both catalog indexes, and rewrites the Skills section of sidebars.ts. Handles MDX escaping (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and rewrites relative references/*.md links to point at the GitHub source. - website/docs/reference/skills-catalog.md — regenerated; each row links to the new dedicated page. - website/docs/reference/optional-skills-catalog.md — same. - website/sidebars.ts — Skills section now has Bundled / Optional subtrees with one nested category per skill folder. - .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator before docusaurus build so CI stays in sync with the source SKILL.md files. Build verified locally with `npx docusaurus build`. Only remaining warnings are pre-existing broken link/anchor issues in unrelated pages.
2026-04-26 01:01:40 +00:00 · 2026-04-23 22:22:11 -07:00 · 2026-04-23 22:22:11 -07:00 · 0f6eabb890
commit 0f6eabb890
parent eb93f88e1d
139 changed files with 43523 additions and 306 deletions
--- a/website/docs/user-guide/skills/optional/research/research-bioinformatics.md
+++ b/website/docs/user-guide/skills/optional/research/research-bioinformatics.md
@ -0,0 +1,252 @@
+---
+title: "Bioinformatics — Gateway to 400+ bioinformatics skills from bioSkills and ClawBio"
+sidebar_label: "Bioinformatics"
+description: "Gateway to 400+ bioinformatics skills from bioSkills and ClawBio"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Bioinformatics
+
+Gateway to 400+ bioinformatics skills from bioSkills and ClawBio. Covers genomics, transcriptomics, single-cell, variant calling, pharmacogenomics, metagenomics, structural biology, and more. Fetches domain-specific reference material on demand.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/bioinformatics` |
+| Path | `optional-skills/research/bioinformatics` |
+| Version | `1.0.0` |
+| Platforms | linux, macos |
+| Tags | `bioinformatics`, `genomics`, `sequencing`, `biology`, `research`, `science` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Bioinformatics Skills Gateway
+
+Use when asked about bioinformatics, genomics, sequencing, variant calling, gene expression, single-cell analysis, protein structure, pharmacogenomics, metagenomics, phylogenetics, or any computational biology task.
+
+This skill is a gateway to two open-source bioinformatics skill libraries. Instead of bundling hundreds of domain-specific skills, it indexes them and fetches what you need on demand.
+
+## Sources
+
+◆ **bioSkills** — 385 reference skills (code patterns, parameter guides, decision trees)
+  Repo: https://github.com/GPTomics/bioSkills
+  Format: SKILL.md per topic with code examples. Python/R/CLI.
+
+◆ **ClawBio** — 33 runnable pipeline skills (executable scripts, reproducibility bundles)
+  Repo: https://github.com/ClawBio/ClawBio
+  Format: Python scripts with demos. Each analysis exports report.md + commands.sh + environment.yml.
+
+## How to fetch and use a skill
+
+1. Identify the domain and skill name from the index below.
+2. Clone the relevant repo (shallow clone to save time):
+   ```bash
+   # bioSkills (reference material)
+   git clone --depth 1 https://github.com/GPTomics/bioSkills.git /tmp/bioSkills
+
+   # ClawBio (runnable pipelines)
+   git clone --depth 1 https://github.com/ClawBio/ClawBio.git /tmp/ClawBio
+   ```
+3. Read the specific skill:
+   ```bash
+   # bioSkills — each skill is at: <category>/<skill-name>/SKILL.md
+   cat /tmp/bioSkills/variant-calling/gatk-variant-calling/SKILL.md
+
+   # ClawBio — each skill is at: skills/<skill-name>/
+   cat /tmp/ClawBio/skills/pharmgx-reporter/README.md
+   ```
+4. Follow the fetched skill as reference material. These are NOT Hermes-format skills — treat them as expert domain guides. They contain correct parameters, proper tool flags, and validated pipelines.
+
+## Skill Index by Domain
+
+### Sequence Fundamentals
+bioSkills:
+  sequence-io/ — read-sequences, write-sequences, format-conversion, batch-processing, compressed-files, fastq-quality, filter-sequences, paired-end-fastq, sequence-statistics
+  sequence-manipulation/ — seq-objects, reverse-complement, transcription-translation, motif-search, codon-usage, sequence-properties, sequence-slicing
+ClawBio:
+  seq-wrangler — Sequence QC, alignment, and BAM processing (wraps FastQC, BWA, SAMtools)
+
+### Read QC & Alignment
+bioSkills:
+  read-qc/ — quality-reports, fastp-workflow, adapter-trimming, quality-filtering, umi-processing, contamination-screening, rnaseq-qc
+  read-alignment/ — bwa-alignment, star-alignment, hisat2-alignment, bowtie2-alignment
+  alignment-files/ — sam-bam-basics, alignment-sorting, alignment-filtering, bam-statistics, duplicate-handling, pileup-generation
+
+### Variant Calling & Annotation
+bioSkills:
+  variant-calling/ — gatk-variant-calling, deepvariant, variant-calling (bcftools), joint-calling, structural-variant-calling, filtering-best-practices, variant-annotation, variant-normalization, vcf-basics, vcf-manipulation, vcf-statistics, consensus-sequences, clinical-interpretation
+ClawBio:
+  vcf-annotator — VEP + ClinVar + gnomAD annotation with ancestry-aware context
+  variant-annotation — Variant annotation pipeline
+
+### Differential Expression (Bulk RNA-seq)
+bioSkills:
+  differential-expression/ — deseq2-basics, edger-basics, batch-correction, de-results, de-visualization, timeseries-de
+  rna-quantification/ — alignment-free-quant (Salmon/kallisto), featurecounts-counting, tximport-workflow, count-matrix-qc
+  expression-matrix/ — counts-ingest, gene-id-mapping, metadata-joins, sparse-handling
+ClawBio:
+  rnaseq-de — Full DE pipeline with QC, normalization, and visualization
+  diff-visualizer — Rich visualization and reporting for DE results
+
+### Single-Cell RNA-seq
+bioSkills:
+  single-cell/ — preprocessing, clustering, batch-integration, cell-annotation, cell-communication, doublet-detection, markers-annotation, trajectory-inference, multimodal-integration, perturb-seq, scatac-analysis, lineage-tracing, metabolite-communication, data-io
+ClawBio:
+  scrna-orchestrator — Full Scanpy pipeline (QC, clustering, markers, annotation)
+  scrna-embedding — scVI-based latent embedding and batch integration
+
+### Spatial Transcriptomics
+bioSkills:
+  spatial-transcriptomics/ — spatial-data-io, spatial-preprocessing, spatial-domains, spatial-deconvolution, spatial-communication, spatial-neighbors, spatial-statistics, spatial-visualization, spatial-multiomics, spatial-proteomics, image-analysis
+
+### Epigenomics
+bioSkills:
+  chip-seq/ — peak-calling, differential-binding, motif-analysis, peak-annotation, chipseq-qc, chipseq-visualization, super-enhancers
+  atac-seq/ — atac-peak-calling, atac-qc, differential-accessibility, footprinting, motif-deviation, nucleosome-positioning
+  methylation-analysis/ — bismark-alignment, methylation-calling, dmr-detection, methylkit-analysis
+  hi-c-analysis/ — hic-data-io, tad-detection, loop-calling, compartment-analysis, contact-pairs, matrix-operations, hic-visualization, hic-differential
+ClawBio:
+  methylation-clock — Epigenetic age estimation
+
+### Pharmacogenomics & Clinical
+bioSkills:
+  clinical-databases/ — clinvar-lookup, gnomad-frequencies, dbsnp-queries, pharmacogenomics, polygenic-risk, hla-typing, variant-prioritization, somatic-signatures, tumor-mutational-burden, myvariant-queries
+ClawBio:
+  pharmgx-reporter — PGx report from 23andMe/AncestryDNA (12 genes, 31 SNPs, 51 drugs)
+  drug-photo — Photo of medication → personalized PGx dosage card (via vision)
+  clinpgx — ClinPGx API for gene-drug data and CPIC guidelines
+  gwas-lookup — Federated variant lookup across 9 genomic databases
+  gwas-prs — Polygenic risk scores from consumer genetic data
+  nutrigx_advisor — Personalized nutrition from consumer genetic data
+
+### Population Genetics & GWAS
+bioSkills:
+  population-genetics/ — association-testing (PLINK GWAS), plink-basics, population-structure, linkage-disequilibrium, scikit-allel-analysis, selection-statistics
+  causal-genomics/ — mendelian-randomization, fine-mapping, colocalization-analysis, mediation-analysis, pleiotropy-detection
+  phasing-imputation/ — haplotype-phasing, genotype-imputation, imputation-qc, reference-panels
+ClawBio:
+  claw-ancestry-pca — Ancestry PCA against SGDP reference panel
+
+### Metagenomics & Microbiome
+bioSkills:
+  metagenomics/ — kraken-classification, metaphlan-profiling, abundance-estimation, functional-profiling, amr-detection, strain-tracking, metagenome-visualization
+  microbiome/ — amplicon-processing, diversity-analysis, differential-abundance, taxonomy-assignment, functional-prediction, qiime2-workflow
+ClawBio:
+  claw-metagenomics — Shotgun metagenomics profiling (taxonomy, resistome, functional pathways)
+
+### Genome Assembly & Annotation
+bioSkills:
+  genome-assembly/ — hifi-assembly, long-read-assembly, short-read-assembly, metagenome-assembly, assembly-polishing, assembly-qc, scaffolding, contamination-detection
+  genome-annotation/ — eukaryotic-gene-prediction, prokaryotic-annotation, functional-annotation, ncrna-annotation, repeat-annotation, annotation-transfer
+  long-read-sequencing/ — basecalling, long-read-alignment, long-read-qc, clair3-variants, structural-variants, medaka-polishing, nanopore-methylation, isoseq-analysis
+
+### Structural Biology & Chemoinformatics
+bioSkills:
+  structural-biology/ — alphafold-predictions, modern-structure-prediction, structure-io, structure-navigation, structure-modification, geometric-analysis
+  chemoinformatics/ — molecular-io, molecular-descriptors, similarity-searching, substructure-search, virtual-screening, admet-prediction, reaction-enumeration
+ClawBio:
+  struct-predictor — Local AlphaFold/Boltz/Chai structure prediction with comparison
+
+### Proteomics
+bioSkills:
+  proteomics/ — data-import, peptide-identification, protein-inference, quantification, differential-abundance, dia-analysis, ptm-analysis, proteomics-qc, spectral-libraries
+ClawBio:
+  proteomics-de — Proteomics differential expression
+
+### Pathway Analysis & Gene Networks
+bioSkills:
+  pathway-analysis/ — go-enrichment, gsea, kegg-pathways, reactome-pathways, wikipathways, enrichment-visualization
+  gene-regulatory-networks/ — scenic-regulons, coexpression-networks, differential-networks, multiomics-grn, perturbation-simulation
+
+### Immunoinformatics
+bioSkills:
+  immunoinformatics/ — mhc-binding-prediction, epitope-prediction, neoantigen-prediction, immunogenicity-scoring, tcr-epitope-binding
+  tcr-bcr-analysis/ — mixcr-analysis, scirpy-analysis, immcantation-analysis, repertoire-visualization, vdjtools-analysis
+
+### CRISPR & Genome Engineering
+bioSkills:
+  crispr-screens/ — mageck-analysis, jacks-analysis, hit-calling, screen-qc, library-design, crispresso-editing, base-editing-analysis, batch-correction
+  genome-engineering/ — grna-design, off-target-prediction, hdr-template-design, base-editing-design, prime-editing-design
+
+### Workflow Management
+bioSkills:
+  workflow-management/ — snakemake-workflows, nextflow-pipelines, cwl-workflows, wdl-workflows
+ClawBio:
+  repro-enforcer — Export any analysis as reproducibility bundle (Conda env + Singularity + checksums)
+  galaxy-bridge — Access 8,000+ Galaxy tools from usegalaxy.org
+
+### Specialized Domains
+bioSkills:
+  alternative-splicing/ — splicing-quantification, differential-splicing, isoform-switching, sashimi-plots, single-cell-splicing, splicing-qc
+  ecological-genomics/ — edna-metabarcoding, landscape-genomics, conservation-genetics, biodiversity-metrics, community-ecology, species-delimitation
+  epidemiological-genomics/ — pathogen-typing, variant-surveillance, phylodynamics, transmission-inference, amr-surveillance
+  liquid-biopsy/ — cfdna-preprocessing, ctdna-mutation-detection, fragment-analysis, tumor-fraction-estimation, methylation-based-detection, longitudinal-monitoring
+  epitranscriptomics/ — m6a-peak-calling, m6a-differential, m6anet-analysis, merip-preprocessing, modification-visualization
+  metabolomics/ — xcms-preprocessing, metabolite-annotation, normalization-qc, statistical-analysis, pathway-mapping, lipidomics, targeted-analysis, msdial-preprocessing
+  flow-cytometry/ — fcs-handling, gating-analysis, compensation-transformation, clustering-phenotyping, differential-analysis, cytometry-qc, doublet-detection, bead-normalization
+  systems-biology/ — flux-balance-analysis, metabolic-reconstruction, gene-essentiality, context-specific-models, model-curation
+  rna-structure/ — secondary-structure-prediction, ncrna-search, structure-probing
+
+### Data Visualization & Reporting
+bioSkills:
+  data-visualization/ — ggplot2-fundamentals, heatmaps-clustering, volcano-customization, circos-plots, genome-browser-tracks, interactive-visualization, multipanel-figures, network-visualization, upset-plots, color-palettes, specialized-omics-plots, genome-tracks
+  reporting/ — rmarkdown-reports, quarto-reports, jupyter-reports, automated-qc-reports, figure-export
+ClawBio:
+  profile-report — Analysis profile reporting
+  data-extractor — Extract numerical data from scientific figure images (via vision)
+  lit-synthesizer — PubMed/bioRxiv search, summarization, citation graphs
+  pubmed-summariser — Gene/disease PubMed search with structured briefing
+
+### Database Access
+bioSkills:
+  database-access/ — entrez-search, entrez-fetch, entrez-link, blast-searches, local-blast, sra-data, geo-data, uniprot-access, batch-downloads, interaction-databases, sequence-similarity
+ClawBio:
+  ukb-navigator — Semantic search across 12,000+ UK Biobank fields
+  clinical-trial-finder — Clinical trial discovery
+
+### Experimental Design
+bioSkills:
+  experimental-design/ — power-analysis, sample-size, batch-design, multiple-testing
+
+### Machine Learning for Omics
+bioSkills:
+  machine-learning/ — omics-classifiers, biomarker-discovery, survival-analysis, model-validation, prediction-explanation, atlas-mapping
+ClawBio:
+  claw-semantic-sim — Semantic similarity index for disease literature (PubMedBERT)
+  omics-target-evidence-mapper — Aggregate target-level evidence across omics sources
+
+## Environment Setup
+
+These skills assume a bioinformatics workstation. Common dependencies:
+
+```bash
+# Python
+pip install biopython pysam cyvcf2 pybedtools pyBigWig scikit-allel anndata scanpy mygene
+
+# R/Bioconductor
+Rscript -e 'BiocManager::install(c("DESeq2","edgeR","Seurat","clusterProfiler","methylKit"))'
+
+# CLI tools (Ubuntu/Debian)
+sudo apt install samtools bcftools ncbi-blast+ minimap2 bedtools
+
+# CLI tools (macOS)
+brew install samtools bcftools blast minimap2 bedtools
+
+# Or via Conda (recommended for reproducibility)
+conda install -c bioconda samtools bcftools blast minimap2 bedtools fastp kraken2
+```
+
+## Pitfalls
+
+- The fetched skills are NOT in Hermes SKILL.md format. They use their own structure (bioSkills: code pattern cookbooks; ClawBio: README + Python scripts). Read them as expert reference material.
+- bioSkills are reference guides — they show correct parameters and code patterns but aren't executable pipelines.
+- ClawBio skills are executable — many have `--demo` flags and can be run directly.
+- Both repos assume bioinformatics tools are installed. Check prerequisites before running pipelines.
+- For ClawBio, run `pip install -r requirements.txt` in the cloned repo first.
+- Genomic data files can be very large. Be mindful of disk space when downloading reference genomes, SRA datasets, or building indices.
--- a/website/docs/user-guide/skills/optional/research/research-domain-intel.md
+++ b/website/docs/user-guide/skills/optional/research/research-domain-intel.md
@ -0,0 +1,116 @@
+---
+title: "Domain Intel — Passive domain reconnaissance using Python stdlib"
+sidebar_label: "Domain Intel"
+description: "Passive domain reconnaissance using Python stdlib"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Domain Intel
+
+Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, domain availability checks, and bulk multi-domain analysis. No API keys required.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/domain-intel` |
+| Path | `optional-skills/research/domain-intel` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Domain Intelligence — Passive OSINT
+
+Passive domain reconnaissance using only Python stdlib.
+**Zero dependencies. Zero API keys. Works on Linux, macOS, and Windows.**
+
+## Helper script
+
+This skill includes `scripts/domain_intel.py` — a complete CLI tool for all domain intelligence operations.
+
+```bash
+# Subdomain discovery via Certificate Transparency logs
+python3 SKILL_DIR/scripts/domain_intel.py subdomains example.com
+
+# SSL certificate inspection (expiry, cipher, SANs, issuer)
+python3 SKILL_DIR/scripts/domain_intel.py ssl example.com
+
+# WHOIS lookup (registrar, dates, name servers — 100+ TLDs)
+python3 SKILL_DIR/scripts/domain_intel.py whois example.com
+
+# DNS records (A, AAAA, MX, NS, TXT, CNAME)
+python3 SKILL_DIR/scripts/domain_intel.py dns example.com
+
+# Domain availability check (passive: DNS + WHOIS + SSL signals)
+python3 SKILL_DIR/scripts/domain_intel.py available coolstartup.io
+
+# Bulk analysis — multiple domains, multiple checks in parallel
+python3 SKILL_DIR/scripts/domain_intel.py bulk example.com github.com google.com
+python3 SKILL_DIR/scripts/domain_intel.py bulk example.com github.com --checks ssl,dns
+```
+
+`SKILL_DIR` is the directory containing this SKILL.md file. All output is structured JSON.
+
+## Available commands
+
+| Command | What it does | Data source |
+|---------|-------------|-------------|
+| `subdomains` | Find subdomains from certificate logs | crt.sh (HTTPS) |
+| `ssl` | Inspect TLS certificate details | Direct TCP:443 to target |
+| `whois` | Registration info, registrar, dates | WHOIS servers (TCP:43) |
+| `dns` | A, AAAA, MX, NS, TXT, CNAME records | System DNS + Google DoH |
+| `available` | Check if domain is registered | DNS + WHOIS + SSL signals |
+| `bulk` | Run multiple checks on multiple domains | All of the above |
+
+## When to use this vs built-in tools
+
+- **Use this skill** for infrastructure questions: subdomains, SSL certs, WHOIS, DNS records, availability
+- **Use `web_search`** for general research about what a domain/company does
+- **Use `web_extract`** to get the actual content of a webpage
+- **Use `terminal` with `curl -I`** for a simple "is this URL reachable" check
+
+| Task | Better tool | Why |
+|------|-------------|-----|
+| "What does example.com do?" | `web_extract` | Gets page content, not DNS/WHOIS data |
+| "Find info about a company" | `web_search` | General research, not domain-specific |
+| "Is this website safe?" | `web_search` | Reputation checks need web context |
+| "Check if a URL is reachable" | `terminal` with `curl -I` | Simple HTTP check |
+| "Find subdomains of X" | **This skill** | Only passive source for this |
+| "When does the SSL cert expire?" | **This skill** | Built-in tools can't inspect TLS |
+| "Who registered this domain?" | **This skill** | WHOIS data not in web search |
+| "Is coolstartup.io available?" | **This skill** | Passive availability via DNS+WHOIS+SSL |
+
+## Platform compatibility
+
+Pure Python stdlib (`socket`, `ssl`, `urllib`, `json`, `concurrent.futures`).
+Works identically on Linux, macOS, and Windows with no dependencies.
+
+- **crt.sh queries** use HTTPS (port 443) — works behind most firewalls
+- **WHOIS queries** use TCP port 43 — may be blocked on restrictive networks
+- **DNS queries** use Google DoH (HTTPS) for MX/NS/TXT — firewall-friendly
+- **SSL checks** connect to the target on port 443 — the only "active" operation
+
+## Data sources
+
+All queries are **passive** — no port scanning, no vulnerability testing:
+
+- **crt.sh** — Certificate Transparency logs (subdomain discovery, HTTPS only)
+- **WHOIS servers** — Direct TCP to 100+ authoritative TLD registrars
+- **Google DNS-over-HTTPS** — MX, NS, TXT, CNAME resolution (firewall-friendly)
+- **System DNS** — A/AAAA record resolution
+- **SSL check** is the only "active" operation (TCP connection to target:443)
+
+## Notes
+
+- WHOIS queries use TCP port 43 — may be blocked on restrictive networks
+- Some WHOIS servers redact registrant info (GDPR) — mention this to the user
+- crt.sh can be slow for very popular domains (thousands of certs) — set reasonable expectations
+- The availability check is heuristic-based (3 passive signals) — not authoritative like a registrar API
+
+---
+
+*Contributed by [@FurkanL0](https://github.com/FurkanL0)*
--- a/website/docs/user-guide/skills/optional/research/research-drug-discovery.md
+++ b/website/docs/user-guide/skills/optional/research/research-drug-discovery.md
@ -0,0 +1,236 @@
+---
+title: "Drug Discovery — Pharmaceutical research assistant for drug discovery workflows"
+sidebar_label: "Drug Discovery"
+description: "Pharmaceutical research assistant for drug discovery workflows"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Drug Discovery
+
+Pharmaceutical research assistant for drug discovery workflows. Search bioactive compounds on ChEMBL, calculate drug-likeness (Lipinski Ro5, QED, TPSA, synthetic accessibility), look up drug-drug interactions via OpenFDA, interpret ADMET profiles, and assist with lead optimization. Use for medicinal chemistry questions, molecule property analysis, clinical pharmacology, and open-science drug research.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/drug-discovery` |
+| Path | `optional-skills/research/drug-discovery` |
+| Version | `1.0.0` |
+| Author | bennytimz |
+| License | MIT |
+| Tags | `science`, `chemistry`, `pharmacology`, `research`, `health` |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Drug Discovery & Pharmaceutical Research
+
+You are an expert pharmaceutical scientist and medicinal chemist with deep
+knowledge of drug discovery, cheminformatics, and clinical pharmacology.
+Use this skill for all pharma/chemistry research tasks.
+
+## Core Workflows
+
+### 1 — Bioactive Compound Search (ChEMBL)
+
+Search ChEMBL (the world's largest open bioactivity database) for compounds
+by target, activity, or molecule name. No API key required.
+
+```bash
+# Search compounds by target name (e.g. "EGFR", "COX-2", "ACE")
+TARGET="$1"
+ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$TARGET")
+curl -s "https://www.ebi.ac.uk/chembl/api/data/target/search?q=${ENCODED}&format=json" \
+  | python3 -c "
+import json,sys
+data=json.load(sys.stdin)
+targets=data.get('targets',[])[:5]
+for t in targets:
+    print(f\"ChEMBL ID : {t.get('target_chembl_id')}\")
+    print(f\"Name      : {t.get('pref_name')}\")
+    print(f\"Type      : {t.get('target_type')}\")
+    print()
+"
+```
+
+```bash
+# Get bioactivity data for a ChEMBL target ID
+TARGET_ID="$1"   # e.g. CHEMBL203
+curl -s "https://www.ebi.ac.uk/chembl/api/data/activity?target_chembl_id=${TARGET_ID}&pchembl_value__gte=6&limit=10&format=json" \
+  | python3 -c "
+import json,sys
+data=json.load(sys.stdin)
+acts=data.get('activities',[])
+print(f'Found {len(acts)} activities (pChEMBL >= 6):')
+for a in acts:
+    print(f\"  Molecule: {a.get('molecule_chembl_id')}  |  {a.get('standard_type')}: {a.get('standard_value')} {a.get('standard_units')}  |  pChEMBL: {a.get('pchembl_value')}\")
+"
+```
+
+```bash
+# Look up a specific molecule by ChEMBL ID
+MOL_ID="$1"   # e.g. CHEMBL25 (aspirin)
+curl -s "https://www.ebi.ac.uk/chembl/api/data/molecule/${MOL_ID}?format=json" \
+  | python3 -c "
+import json,sys
+m=json.load(sys.stdin)
+props=m.get('molecule_properties',{}) or {}
+print(f\"Name       : {m.get('pref_name','N/A')}\")
+print(f\"SMILES     : {m.get('molecule_structures',{}).get('canonical_smiles','N/A') if m.get('molecule_structures') else 'N/A'}\")
+print(f\"MW         : {props.get('full_mwt','N/A')} Da\")
+print(f\"LogP       : {props.get('alogp','N/A')}\")
+print(f\"HBD        : {props.get('hbd','N/A')}\")
+print(f\"HBA        : {props.get('hba','N/A')}\")
+print(f\"TPSA       : {props.get('psa','N/A')} Å²\")
+print(f\"Ro5 violations: {props.get('num_ro5_violations','N/A')}\")
+print(f\"QED        : {props.get('qed_weighted','N/A')}\")
+"
+```
+
+### 2 — Drug-Likeness Calculation (Lipinski Ro5 + Veber)
+
+Assess any molecule against established oral bioavailability rules using
+PubChem's free property API — no RDKit install needed.
+
+```bash
+COMPOUND="$1"
+ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$COMPOUND")
+curl -s "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/${ENCODED}/property/MolecularWeight,XLogP,HBondDonorCount,HBondAcceptorCount,RotatableBondCount,TPSA,InChIKey/JSON" \
+  | python3 -c "
+import json,sys
+data=json.load(sys.stdin)
+props=data['PropertyTable']['Properties'][0]
+mw   = float(props.get('MolecularWeight', 0))
+logp = float(props.get('XLogP', 0))
+hbd  = int(props.get('HBondDonorCount', 0))
+hba  = int(props.get('HBondAcceptorCount', 0))
+rot  = int(props.get('RotatableBondCount', 0))
+tpsa = float(props.get('TPSA', 0))
+print('=== Lipinski Rule of Five (Ro5) ===')
+print(f'  MW   {mw:.1f} Da    {\"✓\" if mw<=500 else \"✗ VIOLATION (>500)\"}')
+print(f'  LogP {logp:.2f}       {\"✓\" if logp<=5 else \"✗ VIOLATION (>5)\"}')
+print(f'  HBD  {hbd}           {\"✓\" if hbd<=5 else \"✗ VIOLATION (>5)\"}')
+print(f'  HBA  {hba}           {\"✓\" if hba<=10 else \"✗ VIOLATION (>10)\"}')
+viol = sum([mw>500, logp>5, hbd>5, hba>10])
+print(f'  Violations: {viol}/4  {\"→ Likely orally bioavailable\" if viol<=1 else \"→ Poor oral bioavailability predicted\"}')
+print()
+print('=== Veber Oral Bioavailability Rules ===')
+print(f'  TPSA         {tpsa:.1f} Å²   {\"✓\" if tpsa<=140 else \"✗ VIOLATION (>140)\"}')
+print(f'  Rot. bonds   {rot}           {\"✓\" if rot<=10 else \"✗ VIOLATION (>10)\"}')
+print(f'  Both rules met: {\"Yes → good oral absorption predicted\" if tpsa<=140 and rot<=10 else \"No → reduced oral absorption\"}')
+"
+```
+
+### 3 — Drug Interaction & Safety Lookup (OpenFDA)
+
+```bash
+DRUG="$1"
+ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$DRUG")
+curl -s "https://api.fda.gov/drug/label.json?search=drug_interactions:\"${ENCODED}\"&limit=3" \
+  | python3 -c "
+import json,sys
+data=json.load(sys.stdin)
+results=data.get('results',[])
+if not results:
+    print('No interaction data found in FDA labels.')
+    sys.exit()
+for r in results[:2]:
+    brand=r.get('openfda',{}).get('brand_name',['Unknown'])[0]
+    generic=r.get('openfda',{}).get('generic_name',['Unknown'])[0]
+    interactions=r.get('drug_interactions',['N/A'])[0]
+    print(f'--- {brand} ({generic}) ---')
+    print(interactions[:800])
+    print()
+"
+```
+
+```bash
+DRUG="$1"
+ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$DRUG")
+curl -s "https://api.fda.gov/drug/event.json?search=patient.drug.medicinalproduct:\"${ENCODED}\"&count=patient.reaction.reactionmeddrapt.exact&limit=10" \
+  | python3 -c "
+import json,sys
+data=json.load(sys.stdin)
+results=data.get('results',[])
+if not results:
+    print('No adverse event data found.')
+    sys.exit()
+print(f'Top adverse events reported:')
+for r in results[:10]:
+    print(f\"  {r['count']:>5}x  {r['term']}\")
+"
+```
+
+### 4 — PubChem Compound Search
+
+```bash
+COMPOUND="$1"
+ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$COMPOUND")
+CID=$(curl -s "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/${ENCODED}/cids/TXT" | head -1 | tr -d '[:space:]')
+echo "PubChem CID: $CID"
+curl -s "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/${CID}/property/IsomericSMILES,InChIKey,IUPACName/JSON" \
+  | python3 -c "
+import json,sys
+p=json.load(sys.stdin)['PropertyTable']['Properties'][0]
+print(f\"IUPAC Name : {p.get('IUPACName','N/A')}\")
+print(f\"SMILES     : {p.get('IsomericSMILES','N/A')}\")
+print(f\"InChIKey   : {p.get('InChIKey','N/A')}\")
+"
+```
+
+### 5 — Target & Disease Literature (OpenTargets)
+
+```bash
+GENE="$1"
+curl -s -X POST "https://api.platform.opentargets.org/api/v4/graphql" \
+  -H "Content-Type: application/json" \
+  -d "{\"query\":\"{ search(queryString: \\\"${GENE}\\\", entityNames: [\\\"target\\\"], page: {index: 0, size: 1}) { hits { id score object { ... on Target { id approvedSymbol approvedName associatedDiseases(page: {index: 0, size: 5}) { count rows { score disease { id name } } } } } } } }\"}" \
+  | python3 -c "
+import json,sys
+data=json.load(sys.stdin)
+hits=data.get('data',{}).get('search',{}).get('hits',[])
+if not hits:
+    print('Target not found.')
+    sys.exit()
+obj=hits[0]['object']
+print(f\"Target: {obj.get('approvedSymbol')} — {obj.get('approvedName')}\")
+assoc=obj.get('associatedDiseases',{})
+print(f\"Associated with {assoc.get('count',0)} diseases. Top associations:\")
+for row in assoc.get('rows',[]):
+    print(f\"  Score {row['score']:.3f}  |  {row['disease']['name']}\")
+"
+```
+
+## Reasoning Guidelines
+
+When analysing drug-likeness or molecular properties, always:
+
+1. **State raw values first** — MW, LogP, HBD, HBA, TPSA, RotBonds
+2. **Apply rule sets** — Ro5 (Lipinski), Veber, Ghose filter where relevant
+3. **Flag liabilities** — metabolic hotspots, hERG risk, high TPSA for CNS penetration
+4. **Suggest optimizations** — bioisosteric replacements, prodrug strategies, ring truncation
+5. **Cite the source API** — ChEMBL, PubChem, OpenFDA, or OpenTargets
+
+For ADMET questions, reason through Absorption, Distribution, Metabolism, Excretion, Toxicity systematically. See references/ADMET_REFERENCE.md for detailed guidance.
+
+## Important Notes
+
+- All APIs are free, public, require no authentication
+- ChEMBL rate limits: add sleep 1 between batch requests
+- FDA data reflects reported adverse events, not necessarily causation
+- Always recommend consulting a licensed pharmacist or physician for clinical decisions
+
+## Quick Reference
+
+| Task | API | Endpoint |
+|------|-----|----------|
+| Find target | ChEMBL | `/api/data/target/search?q=` |
+| Get bioactivity | ChEMBL | `/api/data/activity?target_chembl_id=` |
+| Molecule properties | PubChem | `/rest/pug/compound/name/{name}/property/` |
+| Drug interactions | OpenFDA | `/drug/label.json?search=drug_interactions:` |
+| Adverse events | OpenFDA | `/drug/event.json?search=...&count=reaction` |
+| Gene-disease | OpenTargets | GraphQL POST `/api/v4/graphql` |
--- a/website/docs/user-guide/skills/optional/research/research-duckduckgo-search.md
+++ b/website/docs/user-guide/skills/optional/research/research-duckduckgo-search.md
@ -0,0 +1,254 @@
+---
+title: "Duckduckgo Search — Free web search via DuckDuckGo — text, news, images, videos"
+sidebar_label: "Duckduckgo Search"
+description: "Free web search via DuckDuckGo — text, news, images, videos"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Duckduckgo Search
+
+Free web search via DuckDuckGo — text, news, images, videos. No API key needed. Prefer the `ddgs` CLI when installed; use the Python DDGS library only after verifying that `ddgs` is available in the current runtime.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/duckduckgo-search` |
+| Path | `optional-skills/research/duckduckgo-search` |
+| Version | `1.3.0` |
+| Author | gamedevCloudy |
+| License | MIT |
+| Tags | `search`, `duckduckgo`, `web-search`, `free`, `fallback` |
+| Related skills | [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# DuckDuckGo Search
+
+Free web search using DuckDuckGo. **No API key required.**
+
+Preferred when `web_search` is unavailable or unsuitable (for example when `FIRECRAWL_API_KEY` is not set). Can also be used as a standalone search path when DuckDuckGo results are specifically desired.
+
+## Detection Flow
+
+Check what is actually available before choosing an approach:
+
+```bash
+# Check CLI availability
+command -v ddgs >/dev/null && echo "DDGS_CLI=installed" || echo "DDGS_CLI=missing"
+```
+
+Decision tree:
+1. If `ddgs` CLI is installed, prefer `terminal` + `ddgs`
+2. If `ddgs` CLI is missing, do not assume `execute_code` can import `ddgs`
+3. If the user wants DuckDuckGo specifically, install `ddgs` first in the relevant environment
+4. Otherwise fall back to built-in web/browser tools
+
+Important runtime note:
+- Terminal and `execute_code` are separate runtimes
+- A successful shell install does not guarantee `execute_code` can import `ddgs`
+- Never assume third-party Python packages are preinstalled inside `execute_code`
+
+## Installation
+
+Install `ddgs` only when DuckDuckGo search is specifically needed and the runtime does not already provide it.
+
+```bash
+# Python package + CLI entrypoint
+pip install ddgs
+
+# Verify CLI
+ddgs --help
+```
+
+If a workflow depends on Python imports, verify that same runtime can import `ddgs` before using `from ddgs import DDGS`.
+
+## Method 1: CLI Search (Preferred)
+
+Use the `ddgs` command via `terminal` when it exists. This is the preferred path because it avoids assuming the `execute_code` sandbox has the `ddgs` Python package installed.
+
+```bash
+# Text search
+ddgs text -q "python async programming" -m 5
+
+# News search
+ddgs news -q "artificial intelligence" -m 5
+
+# Image search
+ddgs images -q "landscape photography" -m 10
+
+# Video search
+ddgs videos -q "python tutorial" -m 5
+
+# With region filter
+ddgs text -q "best restaurants" -m 5 -r us-en
+
+# Recent results only (d=day, w=week, m=month, y=year)
+ddgs text -q "latest AI news" -m 5 -t w
+
+# JSON output for parsing
+ddgs text -q "fastapi tutorial" -m 5 -o json
+```
+
+### CLI Flags
+
+| Flag | Description | Example |
+|------|-------------|---------|
+| `-q` | Query — **required** | `-q "search terms"` |
+| `-m` | Max results | `-m 5` |
+| `-r` | Region | `-r us-en` |
+| `-t` | Time limit | `-t w` (week) |
+| `-s` | Safe search | `-s off` |
+| `-o` | Output format | `-o json` |
+
+## Method 2: Python API (Only After Verification)
+
+Use the `DDGS` class in `execute_code` or another Python runtime only after verifying that `ddgs` is installed there. Do not assume `execute_code` includes third-party packages by default.
+
+Safe wording:
+- "Use `execute_code` with `ddgs` after installing or verifying the package if needed"
+
+Avoid saying:
+- "`execute_code` includes `ddgs`"
+- "DuckDuckGo search works by default in `execute_code`"
+
+**Important:** `max_results` must always be passed as a **keyword argument** — positional usage raises an error on all methods.
+
+### Text Search
+
+Best for: general research, companies, documentation.
+
+```python
+from ddgs import DDGS
+
+with DDGS() as ddgs:
+    for r in ddgs.text("python async programming", max_results=5):
+        print(r["title"])
+        print(r["href"])
+        print(r.get("body", "")[:200])
+        print()
+```
+
+Returns: `title`, `href`, `body`
+
+### News Search
+
+Best for: current events, breaking news, latest updates.
+
+```python
+from ddgs import DDGS
+
+with DDGS() as ddgs:
+    for r in ddgs.news("AI regulation 2026", max_results=5):
+        print(r["date"], "-", r["title"])
+        print(r.get("source", ""), "|", r["url"])
+        print(r.get("body", "")[:200])
+        print()
+```
+
+Returns: `date`, `title`, `body`, `url`, `image`, `source`
+
+### Image Search
+
+Best for: visual references, product images, diagrams.
+
+```python
+from ddgs import DDGS
+
+with DDGS() as ddgs:
+    for r in ddgs.images("semiconductor chip", max_results=5):
+        print(r["title"])
+        print(r["image"])
+        print(r.get("thumbnail", ""))
+        print(r.get("source", ""))
+        print()
+```
+
+Returns: `title`, `image`, `thumbnail`, `url`, `height`, `width`, `source`
+
+### Video Search
+
+Best for: tutorials, demos, explainers.
+
+```python
+from ddgs import DDGS
+
+with DDGS() as ddgs:
+    for r in ddgs.videos("FastAPI tutorial", max_results=5):
+        print(r["title"])
+        print(r.get("content", ""))
+        print(r.get("duration", ""))
+        print(r.get("provider", ""))
+        print(r.get("published", ""))
+        print()
+```
+
+Returns: `title`, `content`, `description`, `duration`, `provider`, `published`, `statistics`, `uploader`
+
+### Quick Reference
+
+| Method | Use When | Key Fields |
+|--------|----------|------------|
+| `text()` | General research, companies | title, href, body |
+| `news()` | Current events, updates | date, title, source, body, url |
+| `images()` | Visuals, diagrams | title, image, thumbnail, url |
+| `videos()` | Tutorials, demos | title, content, duration, provider |
+
+## Workflow: Search then Extract
+
+DuckDuckGo returns titles, URLs, and snippets — not full page content. To get full page content, search first and then extract the most relevant URL with `web_extract`, browser tools, or curl.
+
+CLI example:
+
+```bash
+ddgs text -q "fastapi deployment guide" -m 3 -o json
+```
+
+Python example, only after verifying `ddgs` is installed in that runtime:
+
+```python
+from ddgs import DDGS
+
+with DDGS() as ddgs:
+    results = list(ddgs.text("fastapi deployment guide", max_results=3))
+    for r in results:
+        print(r["title"], "->", r["href"])
+```
+
+Then extract the best URL with `web_extract` or another content-retrieval tool.
+
+## Limitations
+
+- **Rate limiting**: DuckDuckGo may throttle after many rapid requests. Add a short delay between searches if needed.
+- **No content extraction**: `ddgs` returns snippets, not full page content. Use `web_extract`, browser tools, or curl for the full article/page.
+- **Results quality**: Generally good but less configurable than Firecrawl's search.
+- **Availability**: DuckDuckGo may block requests from some cloud IPs. If searches return empty, try different keywords or wait a few seconds.
+- **Field variability**: Return fields may vary between results or `ddgs` versions. Use `.get()` for optional fields to avoid `KeyError`.
+- **Separate runtimes**: A successful `ddgs` install in terminal does not automatically mean `execute_code` can import it.
+
+## Troubleshooting
+
+| Problem | Likely Cause | What To Do |
+|---------|--------------|------------|
+| `ddgs: command not found` | CLI not installed in the shell environment | Install `ddgs`, or use built-in web/browser tools instead |
+| `ModuleNotFoundError: No module named 'ddgs'` | Python runtime does not have the package installed | Do not use Python DDGS there until that runtime is prepared |
+| Search returns nothing | Temporary rate limiting or poor query | Wait a few seconds, retry, or adjust the query |
+| CLI works but `execute_code` import fails | Terminal and `execute_code` are different runtimes | Keep using CLI, or separately prepare the Python runtime |
+
+## Pitfalls
+
+- **`max_results` is keyword-only**: `ddgs.text("query", 5)` raises an error. Use `ddgs.text("query", max_results=5)`.
+- **Do not assume the CLI exists**: Check `command -v ddgs` before using it.
+- **Do not assume `execute_code` can import `ddgs`**: `from ddgs import DDGS` may fail with `ModuleNotFoundError` unless that runtime was prepared separately.
+- **Package name**: The package is `ddgs` (previously `duckduckgo-search`). Install with `pip install ddgs`.
+- **Don't confuse `-q` and `-m`** (CLI): `-q` is for the query, `-m` is for max results count.
+- **Empty results**: If `ddgs` returns nothing, it may be rate-limited. Wait a few seconds and retry.
+
+## Validated With
+
+Validated examples against `ddgs==9.11.2` semantics. Skill guidance now treats CLI availability and Python import availability as separate concerns so the documented workflow matches actual runtime behavior.
--- a/website/docs/user-guide/skills/optional/research/research-gitnexus-explorer.md
+++ b/website/docs/user-guide/skills/optional/research/research-gitnexus-explorer.md
@ -0,0 +1,231 @@
+---
+title: "Gitnexus Explorer"
+sidebar_label: "Gitnexus Explorer"
+description: "Index a codebase with GitNexus and serve an interactive knowledge graph via web UI + Cloudflare tunnel"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Gitnexus Explorer
+
+Index a codebase with GitNexus and serve an interactive knowledge graph via web UI + Cloudflare tunnel.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/gitnexus-explorer` |
+| Path | `optional-skills/research/gitnexus-explorer` |
+| Version | `1.0.0` |
+| Author | Hermes Agent + Teknium |
+| License | MIT |
+| Tags | `gitnexus`, `code-intelligence`, `knowledge-graph`, `visualization` |
+| Related skills | [`native-mcp`](/docs/user-guide/skills/bundled/mcp/mcp-native-mcp), [`codebase-inspection`](/docs/user-guide/skills/bundled/github/github-codebase-inspection) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# GitNexus Explorer
+
+Index any codebase into a knowledge graph and serve an interactive web UI for exploring
+symbols, call chains, clusters, and execution flows. Tunneled via Cloudflare for remote access.
+
+## When to Use
+
+- User wants to visually explore a codebase's architecture
+- User asks for a knowledge graph / dependency graph of a repo
+- User wants to share an interactive codebase explorer with someone
+
+## Prerequisites
+
+- **Node.js** (v18+) — required for GitNexus and the proxy
+- **git** — repo must have a `.git` directory
+- **cloudflared** — for tunneling (auto-installed to ~/.local/bin if missing)
+
+## Size Warning
+
+The web UI renders all nodes in the browser. Repos under ~5,000 files work well. Large
+repos (30k+ nodes) will be sluggish or crash the browser tab. The CLI/MCP tools work
+at any scale — only the web visualization has this limit.
+
+## Steps
+
+### 1. Clone and Build GitNexus (one-time setup)
+
+```bash
+GITNEXUS_DIR="${GITNEXUS_DIR:-$HOME/.local/share/gitnexus}"
+
+if [ ! -d "$GITNEXUS_DIR/gitnexus-web/dist" ]; then
+  git clone https://github.com/abhigyanpatwari/GitNexus.git "$GITNEXUS_DIR"
+  cd "$GITNEXUS_DIR/gitnexus-shared" && npm install && npm run build
+  cd "$GITNEXUS_DIR/gitnexus-web" && npm install
+fi
+```
+
+### 2. Patch the Web UI for Remote Access
+
+The web UI defaults to `localhost:4747` for API calls. Patch it to use same-origin
+so it works through a tunnel/proxy:
+
+**File: `$GITNEXUS_DIR/gitnexus-web/src/config/ui-constants.ts`**
+Change:
+```typescript
+export const DEFAULT_BACKEND_URL = 'http://localhost:4747';
+```
+To:
+```typescript
+export const DEFAULT_BACKEND_URL = typeof window !== 'undefined' && window.location.hostname !== 'localhost' ? window.location.origin : 'http://localhost:4747';
+```
+
+**File: `$GITNEXUS_DIR/gitnexus-web/vite.config.ts`**
+Add `allowedHosts: true` inside the `server: { }` block (only needed if running dev
+mode instead of production build):
+```typescript
+server: {
+    allowedHosts: true,
+    // ... existing config
+},
+```
+
+Then build the production bundle:
+```bash
+cd "$GITNEXUS_DIR/gitnexus-web" && npx vite build
+```
+
+### 3. Index the Target Repo
+
+```bash
+cd /path/to/target-repo
+npx gitnexus analyze --skip-agents-md
+rm -rf .claude/    # remove Claude Code-specific artifacts
+```
+
+Add `--embeddings` for semantic search (slower — minutes instead of seconds).
+
+The index lives in `.gitnexus/` inside the repo (auto-gitignored).
+
+### 4. Create the Proxy Script
+
+Write this to a file (e.g., `$GITNEXUS_DIR/proxy.mjs`). It serves the production
+web UI and proxies `/api/*` to the GitNexus backend — same origin, no CORS issues,
+no sudo, no nginx.
+
+```javascript
+import http from 'node:http';
+import fs from 'node:fs';
+import path from 'node:path';
+
+const API_PORT = parseInt(process.env.API_PORT || '4747');
+const DIST_DIR = process.argv[2] || './dist';
+const PORT = parseInt(process.argv[3] || '8888');
+
+const MIME = {
+  '.html': 'text/html', '.js': 'application/javascript', '.css': 'text/css',
+  '.json': 'application/json', '.png': 'image/png', '.svg': 'image/svg+xml',
+  '.ico': 'image/x-icon', '.woff2': 'font/woff2', '.woff': 'font/woff',
+  '.wasm': 'application/wasm',
+};
+
+function proxyToApi(req, res) {
+  const opts = {
+    hostname: '127.0.0.1', port: API_PORT,
+    path: req.url, method: req.method, headers: req.headers,
+  };
+  const proxy = http.request(opts, (upstream) => {
+    res.writeHead(upstream.statusCode, upstream.headers);
+    upstream.pipe(res, { end: true });
+  });
+  proxy.on('error', () => { res.writeHead(502); res.end('Backend unavailable'); });
+  req.pipe(proxy, { end: true });
+}
+
+function serveStatic(req, res) {
+  let filePath = path.join(DIST_DIR, req.url === '/' ? 'index.html' : req.url.split('?')[0]);
+  if (!fs.existsSync(filePath)) filePath = path.join(DIST_DIR, 'index.html');
+  const ext = path.extname(filePath);
+  const mime = MIME[ext] || 'application/octet-stream';
+  try {
+    const data = fs.readFileSync(filePath);
+    res.writeHead(200, { 'Content-Type': mime, 'Cache-Control': 'public, max-age=3600' });
+    res.end(data);
+  } catch { res.writeHead(404); res.end('Not found'); }
+}
+
+http.createServer((req, res) => {
+  if (req.url.startsWith('/api')) proxyToApi(req, res);
+  else serveStatic(req, res);
+}).listen(PORT, () => console.log(`GitNexus proxy on http://localhost:${PORT}`));
+```
+
+### 5. Start the Services
+
+```bash
+# Terminal 1: GitNexus backend API
+npx gitnexus serve &
+
+# Terminal 2: Proxy (web UI + API on one port)
+node "$GITNEXUS_DIR/proxy.mjs" "$GITNEXUS_DIR/gitnexus-web/dist" 8888 &
+```
+
+Verify: `curl -s http://localhost:8888/api/repos` should return the indexed repo(s).
+
+### 6. Tunnel with Cloudflare (optional — for remote access)
+
+```bash
+# Install cloudflared if needed (no sudo)
+if ! command -v cloudflared &>/dev/null; then
+  mkdir -p ~/.local/bin
+  curl -sL https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 \
+    -o ~/.local/bin/cloudflared
+  chmod +x ~/.local/bin/cloudflared
+  export PATH="$HOME/.local/bin:$PATH"
+fi
+
+# Start tunnel (--config /dev/null avoids conflicts with existing named tunnels)
+cloudflared tunnel --config /dev/null --url http://localhost:8888 --no-autoupdate --protocol http2
+```
+
+The tunnel URL (e.g., `https://random-words.trycloudflare.com`) is printed to stderr.
+Share it — anyone with the link can explore the graph.
+
+### 7. Cleanup
+
+```bash
+# Stop services
+pkill -f "gitnexus serve"
+pkill -f "proxy.mjs"
+pkill -f cloudflared
+
+# Remove index from the target repo
+cd /path/to/target-repo
+npx gitnexus clean
+rm -rf .claude/
+```
+
+## Pitfalls
+
+- **`--config /dev/null` is required for cloudflared** if the user has an existing
+  named tunnel config at `~/.cloudflared/config.yml`. Without it, the catch-all
+  ingress rule in the config returns 404 for all quick tunnel requests.
+
+- **Production build is mandatory for tunneling.** The Vite dev server blocks
+  non-localhost hosts by default (`allowedHosts`). The production build + Node
+  proxy avoids this entirely.
+
+- **The web UI does NOT create `.claude/` or `CLAUDE.md`.** Those are created by
+  `npx gitnexus analyze`. Use `--skip-agents-md` to suppress the markdown files,
+  then `rm -rf .claude/` for the rest. These are Claude Code integrations that
+  hermes-agent users don't need.
+
+- **Browser memory limit.** The web UI loads the entire graph into browser memory.
+  Repos with 5k+ files may be sluggish. 30k+ files will likely crash the tab.
+
+- **Embeddings are optional.** `--embeddings` enables semantic search but takes
+  minutes on large repos. Skip it for quick exploration; add it if you want
+  natural language queries via the AI chat panel.
+
+- **Multiple repos.** `gitnexus serve` serves ALL indexed repos. Index several
+  repos, start serve once, and the web UI lets you switch between them.
--- a/website/docs/user-guide/skills/optional/research/research-parallel-cli.md
+++ b/website/docs/user-guide/skills/optional/research/research-parallel-cli.md
@ -0,0 +1,408 @@
+---
+title: "Parallel Cli"
+sidebar_label: "Parallel Cli"
+description: "Optional vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, FindAll, and monitoring"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Parallel Cli
+
+Optional vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, FindAll, and monitoring. Prefer JSON output and non-interactive flows.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/parallel-cli` |
+| Path | `optional-skills/research/parallel-cli` |
+| Version | `1.1.0` |
+| Author | Hermes Agent |
+| License | MIT |
+| Tags | `Research`, `Web`, `Search`, `Deep-Research`, `Enrichment`, `CLI` |
+| Related skills | [`duckduckgo-search`](/docs/user-guide/skills/optional/research/research-duckduckgo-search), [`mcporter`](/docs/user-guide/skills/optional/mcp/mcp-mcporter) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Parallel CLI
+
+Use `parallel-cli` when the user explicitly wants Parallel, or when a terminal-native workflow would benefit from Parallel's vendor-specific stack for web search, extraction, deep research, enrichment, entity discovery, or monitoring.
+
+This is an optional third-party workflow, not a Hermes core capability.
+
+Important expectations:
+- Parallel is a paid service with a free tier, not a fully free local tool.
+- It overlaps with Hermes native `web_search` / `web_extract`, so do not prefer it by default for ordinary lookups.
+- Prefer this skill when the user mentions Parallel specifically or needs capabilities like Parallel's enrichment, FindAll, or monitor workflows.
+
+`parallel-cli` is designed for agents:
+- JSON output via `--json`
+- Non-interactive command execution
+- Async long-running jobs with `--no-wait`, `status`, and `poll`
+- Context chaining with `--previous-interaction-id`
+- Search, extract, research, enrichment, entity discovery, and monitoring in one CLI
+
+## When to use it
+
+Prefer this skill when:
+- The user explicitly mentions Parallel or `parallel-cli`
+- The task needs richer workflows than a simple one-shot search/extract pass
+- You need async deep research jobs that can be launched and polled later
+- You need structured enrichment, FindAll entity discovery, or monitoring
+
+Prefer Hermes native `web_search` / `web_extract` for quick one-off lookups when Parallel is not specifically requested.
+
+## Installation
+
+Try the least invasive install path available for the environment.
+
+### Homebrew
+
+```bash
+brew install parallel-web/tap/parallel-cli
+```
+
+### npm
+
+```bash
+npm install -g parallel-web-cli
+```
+
+### Python package
+
+```bash
+pip install "parallel-web-tools[cli]"
+```
+
+### Standalone installer
+
+```bash
+curl -fsSL https://parallel.ai/install.sh | bash
+```
+
+If you want an isolated Python install, `pipx` can also work:
+
+```bash
+pipx install "parallel-web-tools[cli]"
+pipx ensurepath
+```
+
+## Authentication
+
+Interactive login:
+
+```bash
+parallel-cli login
+```
+
+Headless / SSH / CI:
+
+```bash
+parallel-cli login --device
+```
+
+API key environment variable:
+
+```bash
+export PARALLEL_API_KEY="***"
+```
+
+Verify current auth status:
+
+```bash
+parallel-cli auth
+```
+
+If auth requires browser interaction, run with `pty=true`.
+
+## Core rule set
+
+1. Always prefer `--json` when you need machine-readable output.
+2. Prefer explicit arguments and non-interactive flows.
+3. For long-running jobs, use `--no-wait` and then `status` / `poll`.
+4. Cite only URLs returned by the CLI output.
+5. Save large JSON outputs to a temp file when follow-up questions are likely.
+6. Use background processes only for genuinely long-running workflows; otherwise run in foreground.
+7. Prefer Hermes native tools unless the user wants Parallel specifically or needs Parallel-only workflows.
+
+## Quick reference
+
+```text
+parallel-cli
+├── auth
+├── login
+├── logout
+├── search
+├── extract / fetch
+├── research run|status|poll|processors
+├── enrich run|status|poll|plan|suggest|deploy
+├── findall run|ingest|status|poll|result|enrich|extend|schema|cancel
+└── monitor create|list|get|update|delete|events|event-group|simulate
+```
+
+## Common flags and patterns
+
+Commonly useful flags:
+- `--json` for structured output
+- `--no-wait` for async jobs
+- `--previous-interaction-id <id>` for follow-up tasks that reuse earlier context
+- `--max-results <n>` for search result count
+- `--mode one-shot|agentic` for search behavior
+- `--include-domains domain1.com,domain2.com`
+- `--exclude-domains domain1.com,domain2.com`
+- `--after-date YYYY-MM-DD`
+
+Read from stdin when convenient:
+
+```bash
+echo "What is the latest funding for Anthropic?" | parallel-cli search - --json
+echo "Research question" | parallel-cli research run - --json
+```
+
+## Search
+
+Use for current web lookups with structured results.
+
+```bash
+parallel-cli search "What is Anthropic's latest AI model?" --json
+parallel-cli search "SEC filings for Apple" --include-domains sec.gov --json
+parallel-cli search "bitcoin price" --after-date 2026-01-01 --max-results 10 --json
+parallel-cli search "latest browser benchmarks" --mode one-shot --json
+parallel-cli search "AI coding agent enterprise reviews" --mode agentic --json
+```
+
+Useful constraints:
+- `--include-domains` to narrow trusted sources
+- `--exclude-domains` to strip noisy domains
+- `--after-date` for recency filtering
+- `--max-results` when you need broader coverage
+
+If you expect follow-up questions, save output:
+
+```bash
+parallel-cli search "latest React 19 changes" --json -o /tmp/react-19-search.json
+```
+
+When summarizing results:
+- lead with the answer
+- include dates, names, and concrete facts
+- cite only returned sources
+- avoid inventing URLs or source titles
+
+## Extraction
+
+Use to pull clean content or markdown from a URL.
+
+```bash
+parallel-cli extract https://example.com --json
+parallel-cli extract https://company.com --objective "Find pricing info" --json
+parallel-cli extract https://example.com --full-content --json
+parallel-cli fetch https://example.com --json
+```
+
+Use `--objective` when the page is broad and you only need one slice of information.
+
+## Deep research
+
+Use for deeper multi-step research tasks that may take time.
+
+Common processor tiers:
+- `lite` / `base` for faster, cheaper passes
+- `core` / `pro` for more thorough synthesis
+- `ultra` for the heaviest research jobs
+
+### Synchronous
+
+```bash
+parallel-cli research run \
+  "Compare the leading AI coding agents by pricing, model support, and enterprise controls" \
+  --processor core \
+  --json
+```
+
+### Async launch + poll
+
+```bash
+parallel-cli research run \
+  "Compare the leading AI coding agents by pricing, model support, and enterprise controls" \
+  --processor ultra \
+  --no-wait \
+  --json
+
+parallel-cli research status trun_xxx --json
+parallel-cli research poll trun_xxx --json
+parallel-cli research processors --json
+```
+
+### Context chaining / follow-up
+
+```bash
+parallel-cli research run "What are the top AI coding agents?" --json
+parallel-cli research run \
+  "What enterprise controls does the top-ranked one offer?" \
+  --previous-interaction-id trun_xxx \
+  --json
+```
+
+Recommended Hermes workflow:
+1. launch with `--no-wait --json`
+2. capture the returned run/task ID
+3. if the user wants to continue other work, keep moving
+4. later call `status` or `poll`
+5. summarize the final report with citations from the returned sources
+
+## Enrichment
+
+Use when the user has CSV/JSON/tabular inputs and wants additional columns inferred from web research.
+
+### Suggest columns
+
+```bash
+parallel-cli enrich suggest "Find the CEO and annual revenue" --json
+```
+
+### Plan a config
+
+```bash
+parallel-cli enrich plan -o config.yaml
+```
+
+### Inline data
+
+```bash
+parallel-cli enrich run \
+  --data '[{"company": "Anthropic"}, {"company": "Mistral"}]' \
+  --intent "Find headquarters and employee count" \
+  --json
+```
+
+### Non-interactive file run
+
+```bash
+parallel-cli enrich run \
+  --source-type csv \
+  --source companies.csv \
+  --target enriched.csv \
+  --source-columns '[{"name": "company", "description": "Company name"}]' \
+  --intent "Find the CEO and annual revenue"
+```
+
+### YAML config run
+
+```bash
+parallel-cli enrich run config.yaml
+```
+
+### Status / polling
+
+```bash
+parallel-cli enrich status <task_group_id> --json
+parallel-cli enrich poll <task_group_id> --json
+```
+
+Use explicit JSON arrays for column definitions when operating non-interactively.
+Validate the output file before reporting success.
+
+## FindAll
+
+Use for web-scale entity discovery when the user wants a discovered dataset rather than a short answer.
+
+```bash
+parallel-cli findall run "Find AI coding agent startups with enterprise offerings" --json
+parallel-cli findall run "AI startups in healthcare" -n 25 --json
+parallel-cli findall status <run_id> --json
+parallel-cli findall poll <run_id> --json
+parallel-cli findall result <run_id> --json
+parallel-cli findall schema <run_id> --json
+```
+
+This is a better fit than ordinary search when the user wants a discovered set of entities that can be reviewed, filtered, or enriched later.
+
+## Monitor
+
+Use for ongoing change detection over time.
+
+```bash
+parallel-cli monitor list --json
+parallel-cli monitor get <monitor_id> --json
+parallel-cli monitor events <monitor_id> --json
+parallel-cli monitor delete <monitor_id> --json
+```
+
+Creation is usually the sensitive part because cadence and delivery matter:
+
+```bash
+parallel-cli monitor create --help
+```
+
+Use this when the user wants recurring tracking of a page or source rather than a one-time fetch.
+
+## Recommended Hermes usage patterns
+
+### Fast answer with citations
+1. Run `parallel-cli search ... --json`
+2. Parse titles, URLs, dates, excerpts
+3. Summarize with inline citations from the returned URLs only
+
+### URL investigation
+1. Run `parallel-cli extract URL --json`
+2. If needed, rerun with `--objective` or `--full-content`
+3. Quote or summarize the extracted markdown
+
+### Long research workflow
+1. Run `parallel-cli research run ... --no-wait --json`
+2. Store the returned ID
+3. Continue other work or periodically poll
+4. Summarize the final report with citations
+
+### Structured enrichment workflow
+1. Inspect the input file and columns
+2. Use `enrich suggest` or provide explicit enriched columns
+3. Run `enrich run`
+4. Poll for completion if needed
+5. Validate the output file before reporting success
+
+## Error handling and exit codes
+
+The CLI documents these exit codes:
+- `0` success
+- `2` bad input
+- `3` auth error
+- `4` API error
+- `5` timeout
+
+If you hit auth errors:
+1. check `parallel-cli auth`
+2. confirm `PARALLEL_API_KEY` or run `parallel-cli login` / `parallel-cli login --device`
+3. verify `parallel-cli` is on `PATH`
+
+## Maintenance
+
+Check current auth / install state:
+
+```bash
+parallel-cli auth
+parallel-cli --help
+```
+
+Update commands:
+
+```bash
+parallel-cli update
+pip install --upgrade parallel-web-tools
+parallel-cli config auto-update-check off
+```
+
+## Pitfalls
+
+- Do not omit `--json` unless the user explicitly wants human-formatted output.
+- Do not cite sources not present in the CLI output.
+- `login` may require PTY/browser interaction.
+- Prefer foreground execution for short tasks; do not overuse background processes.
+- For large result sets, save JSON to `/tmp/*.json` instead of stuffing everything into context.
+- Do not silently choose Parallel when Hermes native tools are already sufficient.
+- Remember this is a vendor workflow that usually requires account auth and paid usage beyond the free tier.
--- a/website/docs/user-guide/skills/optional/research/research-qmd.md
+++ b/website/docs/user-guide/skills/optional/research/research-qmd.md
@ -0,0 +1,459 @@
+---
+title: "Qmd"
+sidebar_label: "Qmd"
+description: "Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Qmd
+
+Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/qmd` |
+| Path | `optional-skills/research/qmd` |
+| Version | `1.0.0` |
+| Author | Hermes Agent + Teknium |
+| License | MIT |
+| Platforms | macos, linux |
+| Tags | `Search`, `Knowledge-Base`, `RAG`, `Notes`, `MCP`, `Local-AI` |
+| Related skills | [`obsidian`](/docs/user-guide/skills/bundled/note-taking/note-taking-obsidian), [`native-mcp`](/docs/user-guide/skills/bundled/mcp/mcp-native-mcp), [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# QMD — Query Markup Documents
+
+Local, on-device search engine for personal knowledge bases. Indexes markdown
+notes, meeting transcripts, documentation, and any text-based files, then
+provides hybrid search combining keyword matching, semantic understanding, and
+LLM-powered reranking — all running locally with no cloud dependencies.
+
+Created by [Tobi Lütke](https://github.com/tobi/qmd). MIT licensed.
+
+## When to Use
+
+- User asks to search their notes, docs, knowledge base, or meeting transcripts
+- User wants to find something across a large collection of markdown/text files
+- User wants semantic search ("find notes about X concept") not just keyword grep
+- User has already set up qmd collections and wants to query them
+- User asks to set up a local knowledge base or document search system
+- Keywords: "search my notes", "find in my docs", "knowledge base", "qmd"
+
+## Prerequisites
+
+### Node.js >= 22 (required)
+
+```bash
+# Check version
+node --version  # must be >= 22
+
+# macOS — install or upgrade via Homebrew
+brew install node@22
+
+# Linux — use NodeSource or nvm
+curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
+sudo apt-get install -y nodejs
+# or with nvm:
+nvm install 22 && nvm use 22
+```
+
+### SQLite with Extension Support (macOS only)
+
+macOS system SQLite lacks extension loading. Install via Homebrew:
+
+```bash
+brew install sqlite
+```
+
+### Install qmd
+
+```bash
+npm install -g @tobilu/qmd
+# or with Bun:
+bun install -g @tobilu/qmd
+```
+
+First run auto-downloads 3 local GGUF models (~2GB total):
+
+| Model | Purpose | Size |
+|-------|---------|------|
+| embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB |
+| qwen3-reranker-0.6b-q8_0 | Result reranking | ~640MB |
+| qmd-query-expansion-1.7B | Query expansion | ~1.1GB |
+
+### Verify Installation
+
+```bash
+qmd --version
+qmd status
+```
+
+## Quick Reference
+
+| Command | What It Does | Speed |
+|---------|-------------|-------|
+| `qmd search "query"` | BM25 keyword search (no models) | ~0.2s |
+| `qmd vsearch "query"` | Semantic vector search (1 model) | ~3s |
+| `qmd query "query"` | Hybrid + reranking (all 3 models) | ~2-3s warm, ~19s cold |
+| `qmd get <docid>` | Retrieve full document content | instant |
+| `qmd multi-get "glob"` | Retrieve multiple files | instant |
+| `qmd collection add <path> --name <n>` | Add a directory as a collection | instant |
+| `qmd context add <path> "description"` | Add context metadata to improve retrieval | instant |
+| `qmd embed` | Generate/update vector embeddings | varies |
+| `qmd status` | Show index health and collection info | instant |
+| `qmd mcp` | Start MCP server (stdio) | persistent |
+| `qmd mcp --http --daemon` | Start MCP server (HTTP, warm models) | persistent |
+
+## Setup Workflow
+
+### 1. Add Collections
+
+Point qmd at directories containing your documents:
+
+```bash
+# Add a notes directory
+qmd collection add ~/notes --name notes
+
+# Add project docs
+qmd collection add ~/projects/myproject/docs --name project-docs
+
+# Add meeting transcripts
+qmd collection add ~/meetings --name meetings
+
+# List all collections
+qmd collection list
+```
+
+### 2. Add Context Descriptions
+
+Context metadata helps the search engine understand what each collection
+contains. This significantly improves retrieval quality:
+
+```bash
+qmd context add qmd://notes "Personal notes, ideas, and journal entries"
+qmd context add qmd://project-docs "Technical documentation for the main project"
+qmd context add qmd://meetings "Meeting transcripts and action items from team syncs"
+```
+
+### 3. Generate Embeddings
+
+```bash
+qmd embed
+```
+
+This processes all documents in all collections and generates vector
+embeddings. Re-run after adding new documents or collections.
+
+### 4. Verify
+
+```bash
+qmd status   # shows index health, collection stats, model info
+```
+
+## Search Patterns
+
+### Fast Keyword Search (BM25)
+
+Best for: exact terms, code identifiers, names, known phrases.
+No models loaded — near-instant results.
+
+```bash
+qmd search "authentication middleware"
+qmd search "handleError async"
+```
+
+### Semantic Vector Search
+
+Best for: natural language questions, conceptual queries.
+Loads embedding model (~3s first query).
+
+```bash
+qmd vsearch "how does the rate limiter handle burst traffic"
+qmd vsearch "ideas for improving onboarding flow"
+```
+
+### Hybrid Search with Reranking (Best Quality)
+
+Best for: important queries where quality matters most.
+Uses all 3 models — query expansion, parallel BM25+vector, reranking.
+
+```bash
+qmd query "what decisions were made about the database migration"
+```
+
+### Structured Multi-Mode Queries
+
+Combine different search types in a single query for precision:
+
+```bash
+# BM25 for exact term + vector for concept
+qmd query $'lex: rate limiter\nvec: how does throttling work under load'
+
+# With query expansion
+qmd query $'expand: database migration plan\nlex: "schema change"'
+```
+
+### Query Syntax (lex/BM25 mode)
+
+| Syntax | Effect | Example |
+|--------|--------|---------|
+| `term` | Prefix match | `perf` matches "performance" |
+| `"phrase"` | Exact phrase | `"rate limiter"` |
+| `-term` | Exclude term | `performance -sports` |
+
+### HyDE (Hypothetical Document Embeddings)
+
+For complex topics, write what you expect the answer to look like:
+
+```bash
+qmd query $'hyde: The migration plan involves three phases. First, we add the new columns without dropping the old ones. Then we backfill data. Finally we cut over and remove legacy columns.'
+```
+
+### Scoping to Collections
+
+```bash
+qmd search "query" --collection notes
+qmd query "query" --collection project-docs
+```
+
+### Output Formats
+
+```bash
+qmd search "query" --json        # JSON output (best for parsing)
+qmd search "query" --limit 5     # Limit results
+qmd get "#abc123"                # Get by document ID
+qmd get "path/to/file.md"       # Get by file path
+qmd get "file.md:50" -l 100     # Get specific line range
+qmd multi-get "journals/*.md" --json  # Batch retrieve by glob
+```
+
+## MCP Integration (Recommended)
+
+qmd exposes an MCP server that provides search tools directly to
+Hermes Agent via the native MCP client. This is the preferred
+integration — once configured, the agent gets qmd tools automatically
+without needing to load this skill.
+
+### Option A: Stdio Mode (Simple)
+
+Add to `~/.hermes/config.yaml`:
+
+```yaml
+mcp_servers:
+  qmd:
+    command: "qmd"
+    args: ["mcp"]
+    timeout: 30
+    connect_timeout: 45
+```
+
+This registers tools: `mcp_qmd_search`, `mcp_qmd_vsearch`,
+`mcp_qmd_deep_search`, `mcp_qmd_get`, `mcp_qmd_status`.
+
+**Tradeoff:** Models load on first search call (~19s cold start),
+then stay warm for the session. Acceptable for occasional use.
+
+### Option B: HTTP Daemon Mode (Fast, Recommended for Heavy Use)
+
+Start the qmd daemon separately — it keeps models warm in memory:
+
+```bash
+# Start daemon (persists across agent restarts)
+qmd mcp --http --daemon
+
+# Runs on http://localhost:8181 by default
+```
+
+Then configure Hermes Agent to connect via HTTP:
+
+```yaml
+mcp_servers:
+  qmd:
+    url: "http://localhost:8181/mcp"
+    timeout: 30
+```
+
+**Tradeoff:** Uses ~2GB RAM while running, but every query is fast
+(~2-3s). Best for users who search frequently.
+
+### Keeping the Daemon Running
+
+#### macOS (launchd)
+
+```bash
+cat > ~/Library/LaunchAgents/com.qmd.daemon.plist << 'EOF'
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
+  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+  <key>Label</key>
+  <string>com.qmd.daemon</string>
+  <key>ProgramArguments</key>
+  <array>
+    <string>qmd</string>
+    <string>mcp</string>
+    <string>--http</string>
+    <string>--daemon</string>
+  </array>
+  <key>RunAtLoad</key>
+  <true/>
+  <key>KeepAlive</key>
+  <true/>
+  <key>StandardOutPath</key>
+  <string>/tmp/qmd-daemon.log</string>
+  <key>StandardErrorPath</key>
+  <string>/tmp/qmd-daemon.log</string>
+</dict>
+</plist>
+EOF
+
+launchctl load ~/Library/LaunchAgents/com.qmd.daemon.plist
+```
+
+#### Linux (systemd user service)
+
+```bash
+mkdir -p ~/.config/systemd/user
+
+cat > ~/.config/systemd/user/qmd-daemon.service << 'EOF'
+[Unit]
+Description=QMD MCP Daemon
+After=network.target
+
+[Service]
+ExecStart=qmd mcp --http --daemon
+Restart=on-failure
+RestartSec=10
+Environment=PATH=/usr/local/bin:/usr/bin:/bin
+
+[Install]
+WantedBy=default.target
+EOF
+
+systemctl --user daemon-reload
+systemctl --user enable --now qmd-daemon
+systemctl --user status qmd-daemon
+```
+
+### MCP Tools Reference
+
+Once connected, these tools are available as `mcp_qmd_*`:
+
+| MCP Tool | Maps To | Description |
+|----------|---------|-------------|
+| `mcp_qmd_search` | `qmd search` | BM25 keyword search |
+| `mcp_qmd_vsearch` | `qmd vsearch` | Semantic vector search |
+| `mcp_qmd_deep_search` | `qmd query` | Hybrid search + reranking |
+| `mcp_qmd_get` | `qmd get` | Retrieve document by ID or path |
+| `mcp_qmd_status` | `qmd status` | Index health and stats |
+
+The MCP tools accept structured JSON queries for multi-mode search:
+
+```json
+{
+  "searches": [
+    {"type": "lex", "query": "authentication middleware"},
+    {"type": "vec", "query": "how user login is verified"}
+  ],
+  "collections": ["project-docs"],
+  "limit": 10
+}
+```
+
+## CLI Usage (Without MCP)
+
+When MCP is not configured, use qmd directly via terminal:
+
+```
+terminal(command="qmd query 'what was decided about the API redesign' --json", timeout=30)
+```
+
+For setup and management tasks, always use terminal:
+
+```
+terminal(command="qmd collection add ~/Documents/notes --name notes")
+terminal(command="qmd context add qmd://notes 'Personal research notes and ideas'")
+terminal(command="qmd embed")
+terminal(command="qmd status")
+```
+
+## How the Search Pipeline Works
+
+Understanding the internals helps choose the right search mode:
+
+1. **Query Expansion** — A fine-tuned 1.7B model generates 2 alternative
+   queries. The original gets 2x weight in fusion.
+2. **Parallel Retrieval** — BM25 (SQLite FTS5) and vector search run
+   simultaneously across all query variants.
+3. **RRF Fusion** — Reciprocal Rank Fusion (k=60) merges results.
+   Top-rank bonus: #1 gets +0.05, #2-3 get +0.02.
+4. **LLM Reranking** — qwen3-reranker scores top 30 candidates (0.0-1.0).
+5. **Position-Aware Blending** — Ranks 1-3: 75% retrieval / 25% reranker.
+   Ranks 4-10: 60/40. Ranks 11+: 40/60 (trusts reranker more for long tail).
+
+**Smart Chunking:** Documents are split at natural break points (headings,
+code blocks, blank lines) targeting ~900 tokens with 15% overlap. Code
+blocks are never split mid-block.
+
+## Best Practices
+
+1. **Always add context descriptions** — `qmd context add` dramatically
+   improves retrieval accuracy. Describe what each collection contains.
+2. **Re-embed after adding documents** — `qmd embed` must be re-run when
+   new files are added to collections.
+3. **Use `qmd search` for speed** — when you need fast keyword lookup
+   (code identifiers, exact names), BM25 is instant and needs no models.
+4. **Use `qmd query` for quality** — when the question is conceptual or
+   the user needs the best possible results, use hybrid search.
+5. **Prefer MCP integration** — once configured, the agent gets native
+   tools without needing to load this skill each time.
+6. **Daemon mode for frequent users** — if the user searches their
+   knowledge base regularly, recommend the HTTP daemon setup.
+7. **First query in structured search gets 2x weight** — put the most
+   important/certain query first when combining lex and vec.
+
+## Troubleshooting
+
+### "Models downloading on first run"
+Normal — qmd auto-downloads ~2GB of GGUF models on first use.
+This is a one-time operation.
+
+### Cold start latency (~19s)
+This happens when models aren't loaded in memory. Solutions:
+- Use HTTP daemon mode (`qmd mcp --http --daemon`) to keep warm
+- Use `qmd search` (BM25 only) when models aren't needed
+- MCP stdio mode loads models on first search, stays warm for session
+
+### macOS: "unable to load extension"
+Install Homebrew SQLite: `brew install sqlite`
+Then ensure it's on PATH before system SQLite.
+
+### "No collections found"
+Run `qmd collection add <path> --name <name>` to add directories,
+then `qmd embed` to index them.
+
+### Embedding model override (CJK/multilingual)
+Set `QMD_EMBED_MODEL` environment variable for non-English content:
+```bash
+export QMD_EMBED_MODEL="your-multilingual-model"
+```
+
+## Data Storage
+
+- **Index & vectors:** `~/.cache/qmd/index.sqlite`
+- **Models:** Auto-downloaded to local cache on first run
+- **No cloud dependencies** — everything runs locally
+
+## References
+
+- [GitHub: tobi/qmd](https://github.com/tobi/qmd)
+- [QMD Changelog](https://github.com/tobi/qmd/blob/main/CHANGELOG.md)
--- a/website/docs/user-guide/skills/optional/research/research-scrapling.md
+++ b/website/docs/user-guide/skills/optional/research/research-scrapling.md
@ -0,0 +1,350 @@
+---
+title: "Scrapling"
+sidebar_label: "Scrapling"
+description: "Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Scrapling
+
+Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/research/scrapling` |
+| Path | `optional-skills/research/scrapling` |
+| Version | `1.0.0` |
+| Author | FEUAZUR |
+| License | MIT |
+| Tags | `Web Scraping`, `Browser`, `Cloudflare`, `Stealth`, `Crawling`, `Spider` |
+| Related skills | [`duckduckgo-search`](/docs/user-guide/skills/optional/research/research-duckduckgo-search), [`domain-intel`](/docs/user-guide/skills/optional/research/research-domain-intel) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Scrapling
+
+[Scrapling](https://github.com/D4Vinci/Scrapling) is a web scraping framework with anti-bot bypass, stealth browser automation, and a spider framework. It provides three fetching strategies (HTTP, dynamic JS, stealth/Cloudflare) and a full CLI.
+
+**This skill is for educational and research purposes only.** Users must comply with local/international data scraping laws and respect website Terms of Service.
+
+## When to Use
+
+- Scraping static HTML pages (faster than browser tools)
+- Scraping JS-rendered pages that need a real browser
+- Bypassing Cloudflare Turnstile or bot detection
+- Crawling multiple pages with a spider
+- When the built-in `web_extract` tool does not return the data you need
+
+## Installation
+
+```bash
+pip install "scrapling[all]"
+scrapling install
+```
+
+Minimal install (HTTP only, no browser):
+```bash
+pip install scrapling
+```
+
+With browser automation only:
+```bash
+pip install "scrapling[fetchers]"
+scrapling install
+```
+
+## Quick Reference
+
+| Approach | Class | Use When |
+|----------|-------|----------|
+| HTTP | `Fetcher` / `FetcherSession` | Static pages, APIs, fast bulk requests |
+| Dynamic | `DynamicFetcher` / `DynamicSession` | JS-rendered content, SPAs |
+| Stealth | `StealthyFetcher` / `StealthySession` | Cloudflare, anti-bot protected sites |
+| Spider | `Spider` | Multi-page crawling with link following |
+
+## CLI Usage
+
+### Extract Static Page
+
+```bash
+scrapling extract get 'https://example.com' output.md
+```
+
+With CSS selector and browser impersonation:
+
+```bash
+scrapling extract get 'https://example.com' output.md \
+  --css-selector '.content' \
+  --impersonate 'chrome'
+```
+
+### Extract JS-Rendered Page
+
+```bash
+scrapling extract fetch 'https://example.com' output.md \
+  --css-selector '.dynamic-content' \
+  --disable-resources \
+  --network-idle
+```
+
+### Extract Cloudflare-Protected Page
+
+```bash
+scrapling extract stealthy-fetch 'https://protected-site.com' output.html \
+  --solve-cloudflare \
+  --block-webrtc \
+  --hide-canvas
+```
+
+### POST Request
+
+```bash
+scrapling extract post 'https://example.com/api' output.json \
+  --json '{"query": "search term"}'
+```
+
+### Output Formats
+
+The output format is determined by the file extension:
+- `.html` -- raw HTML
+- `.md` -- converted to Markdown
+- `.txt` -- plain text
+- `.json` / `.jsonl` -- JSON
+
+## Python: HTTP Scraping
+
+### Single Request
+
+```python
+from scrapling.fetchers import Fetcher
+
+page = Fetcher.get('https://quotes.toscrape.com/')
+quotes = page.css('.quote .text::text').getall()
+for q in quotes:
+    print(q)
+```
+
+### Session (Persistent Cookies)
+
+```python
+from scrapling.fetchers import FetcherSession
+
+with FetcherSession(impersonate='chrome') as session:
+    page = session.get('https://example.com/', stealthy_headers=True)
+    links = page.css('a::attr(href)').getall()
+    for link in links[:5]:
+        sub = session.get(link)
+        print(sub.css('h1::text').get())
+```
+
+### POST / PUT / DELETE
+
+```python
+page = Fetcher.post('https://api.example.com/data', json={"key": "value"})
+page = Fetcher.put('https://api.example.com/item/1', data={"name": "updated"})
+page = Fetcher.delete('https://api.example.com/item/1')
+```
+
+### With Proxy
+
+```python
+page = Fetcher.get('https://example.com', proxy='http://user:pass@proxy:8080')
+```
+
+## Python: Dynamic Pages (JS-Rendered)
+
+For pages that require JavaScript execution (SPAs, lazy-loaded content):
+
+```python
+from scrapling.fetchers import DynamicFetcher
+
+page = DynamicFetcher.fetch('https://example.com', headless=True)
+data = page.css('.js-loaded-content::text').getall()
+```
+
+### Wait for Specific Element
+
+```python
+page = DynamicFetcher.fetch(
+    'https://example.com',
+    wait_selector=('.results', 'visible'),
+    network_idle=True,
+)
+```
+
+### Disable Resources for Speed
+
+Blocks fonts, images, media, stylesheets (~25% faster):
+
+```python
+from scrapling.fetchers import DynamicSession
+
+with DynamicSession(headless=True, disable_resources=True, network_idle=True) as session:
+    page = session.fetch('https://example.com')
+    items = page.css('.item::text').getall()
+```
+
+### Custom Page Automation
+
+```python
+from playwright.sync_api import Page
+from scrapling.fetchers import DynamicFetcher
+
+def scroll_and_click(page: Page):
+    page.mouse.wheel(0, 3000)
+    page.wait_for_timeout(1000)
+    page.click('button.load-more')
+    page.wait_for_selector('.extra-results')
+
+page = DynamicFetcher.fetch('https://example.com', page_action=scroll_and_click)
+results = page.css('.extra-results .item::text').getall()
+```
+
+## Python: Stealth Mode (Anti-Bot Bypass)
+
+For Cloudflare-protected or heavily fingerprinted sites:
+
+```python
+from scrapling.fetchers import StealthyFetcher
+
+page = StealthyFetcher.fetch(
+    'https://protected-site.com',
+    headless=True,
+    solve_cloudflare=True,
+    block_webrtc=True,
+    hide_canvas=True,
+)
+content = page.css('.protected-content::text').getall()
+```
+
+### Stealth Session
+
+```python
+from scrapling.fetchers import StealthySession
+
+with StealthySession(headless=True, solve_cloudflare=True) as session:
+    page1 = session.fetch('https://protected-site.com/page1')
+    page2 = session.fetch('https://protected-site.com/page2')
+```
+
+## Element Selection
+
+All fetchers return a `Selector` object with these methods:
+
+### CSS Selectors
+
+```python
+page.css('h1::text').get()              # First h1 text
+page.css('a::attr(href)').getall()      # All link hrefs
+page.css('.quote .text::text').getall() # Nested selection
+```
+
+### XPath
+
+```python
+page.xpath('//div[@class="content"]/text()').getall()
+page.xpath('//a/@href').getall()
+```
+
+### Find Methods
+
+```python
+page.find_all('div', class_='quote')       # By tag + attribute
+page.find_by_text('Read more', tag='a')    # By text content
+page.find_by_regex(r'\$\d+\.\d{2}')       # By regex pattern
+```
+
+### Similar Elements
+
+Find elements with similar structure (useful for product listings, etc.):
+
+```python
+first_product = page.css('.product')[0]
+all_similar = first_product.find_similar()
+```
+
+### Navigation
+
+```python
+el = page.css('.target')[0]
+el.parent                # Parent element
+el.children              # Child elements
+el.next_sibling          # Next sibling
+el.prev_sibling          # Previous sibling
+```
+
+## Python: Spider Framework
+
+For multi-page crawling with link following:
+
+```python
+from scrapling.spiders import Spider, Request, Response
+
+class QuotesSpider(Spider):
+    name = "quotes"
+    start_urls = ["https://quotes.toscrape.com/"]
+    concurrent_requests = 10
+    download_delay = 1
+
+    async def parse(self, response: Response):
+        for quote in response.css('.quote'):
+            yield {
+                "text": quote.css('.text::text').get(),
+                "author": quote.css('.author::text').get(),
+                "tags": quote.css('.tag::text').getall(),
+            }
+
+        next_page = response.css('.next a::attr(href)').get()
+        if next_page:
+            yield response.follow(next_page)
+
+result = QuotesSpider().start()
+print(f"Scraped {len(result.items)} quotes")
+result.items.to_json("quotes.json")
+```
+
+### Multi-Session Spider
+
+Route requests to different fetcher types:
+
+```python
+from scrapling.fetchers import FetcherSession, AsyncStealthySession
+
+class SmartSpider(Spider):
+    name = "smart"
+    start_urls = ["https://example.com/"]
+
+    def configure_sessions(self, manager):
+        manager.add("fast", FetcherSession(impersonate="chrome"))
+        manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
+
+    async def parse(self, response: Response):
+        for link in response.css('a::attr(href)').getall():
+            if "protected" in link:
+                yield Request(link, sid="stealth")
+            else:
+                yield Request(link, sid="fast", callback=self.parse)
+```
+
+### Pause/Resume Crawling
+
+```python
+spider = QuotesSpider(crawldir="./crawl_checkpoint")
+spider.start()  # Ctrl+C to pause, re-run to resume from checkpoint
+```
+
+## Pitfalls
+
+- **Browser install required**: run `scrapling install` after pip install -- without it, `DynamicFetcher` and `StealthyFetcher` will fail
+- **Timeouts**: DynamicFetcher/StealthyFetcher timeout is in **milliseconds** (default 30000), Fetcher timeout is in **seconds**
+- **Cloudflare bypass**: `solve_cloudflare=True` adds 5-15 seconds to fetch time -- only enable when needed
+- **Resource usage**: StealthyFetcher runs a real browser -- limit concurrent usage
+- **Legal**: always check robots.txt and website ToS before scraping. This library is for educational and research purposes
+- **Python version**: requires Python 3.10+