docs(skills): salvage dropped trigger content into skill bodies

For 14 of 74 compressed skills, the original description contained
trigger keywords, technique counts, attribution, or use-case phrases
not covered by the existing body content. Prepends a 'When to use' /
'What's inside' block near the top so the agent still has the full
context when the skill is loaded.

Skills salvaged:
- codex, ascii-video, creative-ideation, excalidraw, manim-video, p5js
- gif-search, heartmula, youtube-content
- lm-evaluation-harness, obliteratus, vllm, axolotl
- powerpoint

Remaining 60 skills were verified to already cover the dropped content
in their existing body sections (When to Use, overview, intro prose)
or had short descriptions fully captured by the new compressed form.
This commit is contained in:
Teknium 2026-04-26 21:50:26 -07:00 committed by Teknium
parent e3921e7ca4
commit 9f1b1977bc
14 changed files with 66 additions and 1 deletions

View file

@ -13,6 +13,10 @@ metadata:
# OBLITERATUS Skill
## What's inside
9 CLI methods, 28 analysis modules, 116 model presets across 5 compute tiers, tournament evaluation, and telemetry-driven recommendations.
Remove refusal behaviors (guardrails) from open-weight LLMs without retraining or fine-tuning. Uses mechanistic interpretability techniques — including diff-in-means, SVD, whitened SVD, LEACE concept erasure, SAE decomposition, Bayesian kernel projection, and more — to identify and surgically excise refusal directions from model weights while preserving reasoning capabilities.
**License warning:** OBLITERATUS is AGPL-3.0. NEVER import it as a Python library. Always invoke via CLI (`obliteratus` command) or subprocess. This keeps Hermes Agent's MIT license clean.

View file

@ -13,6 +13,10 @@ metadata:
# vLLM - High-Performance LLM Serving
## When to use
Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.
## Quick start
vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).