skills: consolidate mlops redundancies (gguf+llama-cpp, grpo+trl, guidance→optional) (#11965)

Three tightly-scoped built-in skill consolidations to reduce redundancy in
the available_skills listing injected into every system prompt:

1. gguf-quantization → llama-cpp (merged)
   GGUF is llama.cpp's format; two skills covered the same toolchain. The
   merged llama-cpp skill keeps the full K-quant table + imatrix workflow
   from gguf and the ROCm/benchmarks/supported-models sections from the
   original llama-cpp. All 5 reference files preserved.

2. grpo-rl-training → fine-tuning-with-trl (folded in)
   GRPO isn't a framework, it's a trainer inside TRL. Moved the 17KB
   deep-dive SKILL.md to references/grpo-training.md and the working
   template to templates/basic_grpo_training.py. TRL's GRPO workflow
   section now points to both. Atropos skill's related_skills updated.

3. guidance → optional-skills/mlops/
   Dropped from built-in. Outlines (still built-in) covers the same
   structured-generation ground with wider adoption. Listed in the
   optional catalog for users who specifically want Guidance.

Net: 3 fewer built-in skill lines in every system prompt, zero content
loss. Contributor authorship preserved via git rename detection.
This commit is contained in:
Teknium 2026-04-17 21:36:40 -07:00 committed by GitHub
parent 598cba62ad
commit 73bccc94c7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
15 changed files with 470 additions and 889 deletions

View file

@ -98,6 +98,7 @@ The largest optional category — covers the full ML pipeline from data curation
| **chroma** | Open-source embedding database. Store embeddings and metadata, perform vector and full-text search. Simple 4-function API for RAG and semantic search. |
| **faiss** | Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). |
| **flash-attention** | Optimize transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Supports PyTorch SDPA, flash-attn library, H100 FP8, and sliding window. |
| **guidance** | Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance — Microsoft Research's constrained generation framework. |
| **hermes-atropos-environments** | Build, test, and debug Hermes Agent RL environments for Atropos training. Covers the HermesAgentBaseEnv interface, reward functions, agent loop integration, and evaluation. |
| **huggingface-tokenizers** | Fast Rust-based tokenizers for research and production. Tokenizes 1GB in under 20 seconds. Supports BPE, WordPiece, and Unigram algorithms. |
| **instructor** | Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, and stream partial results. |