feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap (#3934)

* feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap Map active skills to Telegram's slash command menu so users can discover and invoke skills directly. Three changes: 1. Telegram menu now includes active skill commands alongside built-in commands, capped at 100 entries (Telegram Bot API limit). Overflow commands remain callable but hidden from the picker. Logged at startup when cap is hit. 2. New /commands [page] gateway command for paginated browsing of all commands + skills. /help now shows first 10 skill commands and points to /commands for the full list. 3. When a user types a slash command that matches a disabled or uninstalled skill, they get actionable guidance: - Disabled: 'Enable it with: hermes skills config' - Optional (not installed): 'Install with: hermes skills install official/<path>' Built on ideas from PR #3921 by @kshitijk4poor. * chore: move 21 niche skills to optional-skills Move specialized/niche skills from built-in (skills/) to optional (optional-skills/) to reduce the default skill count. Users can install them with: hermes skills install official/<category>/<name> Moved skills (21): - mlops: accelerate, chroma, faiss, flash-attention, hermes-atropos-environments, huggingface-tokenizers, instructor, lambda-labs, llava, nemo-curator, pinecone, pytorch-lightning, qdrant, saelens, simpo, slime, tensorrt-llm, torchtitan - research: domain-intel, duckduckgo-search - devops: inference-sh cli Built-in skills: 96 → 75 Optional skills: 22 → 43 * fix: only include repo built-in skills in Telegram menu, not user-installed User-installed skills (from hub or manually added) stay accessible via /skills and by typing the command directly, but don't get registered in the Telegram slash command picker. Only skills whose SKILL.md is under the repo's skills/ directory are included in the menu. This keeps the Telegram menu focused on the curated built-in set while user-installed skills remain discoverable through /skills and /commands.
2026-04-26 01:01:40 +00:00 · 2026-03-30 10:57:30 -07:00 · 2026-03-30 10:57:30 -07:00 · 5ceed021dc
commit 5ceed021dc
parent 97d6813f51
73 changed files with 163 additions and 4 deletions
--- a/optional-skills/mlops/huggingface-tokenizers/references/integration.md
+++ b/optional-skills/mlops/huggingface-tokenizers/references/integration.md
@ -0,0 +1,637 @@
+# Transformers Integration
+
+Complete guide to using HuggingFace Tokenizers with the Transformers library.
+
+## AutoTokenizer
+
+The easiest way to load tokenizers.
+
+### Loading pretrained tokenizers
+
+```python
+from transformers import AutoTokenizer
+
+# Load from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+
+# Check if using fast tokenizer (Rust-based)
+print(tokenizer.is_fast)  # True
+
+# Access underlying tokenizers.Tokenizer
+if tokenizer.is_fast:
+    fast_tokenizer = tokenizer.backend_tokenizer
+    print(type(fast_tokenizer))  # <class 'tokenizers.Tokenizer'>
+```
+
+### Fast vs slow tokenizers
+
+| Feature                  | Fast (Rust)    | Slow (Python) |
+|--------------------------|----------------|---------------|
+| Speed                    | 5-10× faster   | Baseline      |
+| Alignment tracking       | ✅ Full support | ❌ Limited     |
+| Batch processing         | ✅ Optimized    | ⚠️ Slower      |
+| Offset mapping           | ✅ Yes          | ❌ No          |
+| Installation             | `tokenizers`   | Built-in      |
+
+**Always use fast tokenizers when available.**
+
+### Check available tokenizers
+
+```python
+from transformers import TOKENIZER_MAPPING
+
+# List all fast tokenizers
+for config_class, (slow, fast) in TOKENIZER_MAPPING.items():
+    if fast is not None:
+        print(f"{config_class.__name__}: {fast.__name__}")
+```
+
+## PreTrainedTokenizerFast
+
+Wrap custom tokenizers for transformers.
+
+### Convert custom tokenizer
+
+```python
+from tokenizers import Tokenizer
+from tokenizers.models import BPE
+from tokenizers.trainers import BpeTrainer
+from transformers import PreTrainedTokenizerFast
+
+# Train custom tokenizer
+tokenizer = Tokenizer(BPE())
+trainer = BpeTrainer(
+    vocab_size=30000,
+    special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"]
+)
+tokenizer.train(files=["corpus.txt"], trainer=trainer)
+
+# Save tokenizer
+tokenizer.save("my-tokenizer.json")
+
+# Wrap for transformers
+transformers_tokenizer = PreTrainedTokenizerFast(
+    tokenizer_file="my-tokenizer.json",
+    unk_token="[UNK]",
+    sep_token="[SEP]",
+    pad_token="[PAD]",
+    cls_token="[CLS]",
+    mask_token="[MASK]"
+)
+
+# Save in transformers format
+transformers_tokenizer.save_pretrained("my-tokenizer")
+```
+
+**Result**: Directory with `tokenizer.json` + `tokenizer_config.json` + `special_tokens_map.json`
+
+### Use like any transformers tokenizer
+
+```python
+# Load
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("my-tokenizer")
+
+# Encode with all transformers features
+outputs = tokenizer(
+    "Hello world",
+    padding="max_length",
+    truncation=True,
+    max_length=128,
+    return_tensors="pt"
+)
+
+print(outputs.keys())
+# dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])
+```
+
+## Special tokens
+
+### Default special tokens
+
+| Model Family | CLS/BOS | SEP/EOS       | PAD     | UNK     | MASK    |
+|--------------|---------|---------------|---------|---------|---------|
+| BERT         | [CLS]   | [SEP]         | [PAD]   | [UNK]   | [MASK]  |
+| GPT-2        | -       | <\|endoftext\|> | <\|endoftext\|> | <\|endoftext\|> | -       |
+| RoBERTa      | <s>     | </s>          | <pad>   | <unk>   | <mask>  |
+| T5           | -       | </s>          | <pad>   | <unk>   | -       |
+
+### Adding special tokens
+
+```python
+# Add new special tokens
+special_tokens_dict = {
+    "additional_special_tokens": ["<|image|>", "<|video|>", "<|audio|>"]
+}
+
+num_added_tokens = tokenizer.add_special_tokens(special_tokens_dict)
+print(f"Added {num_added_tokens} tokens")
+
+# Resize model embeddings
+model.resize_token_embeddings(len(tokenizer))
+
+# Use new tokens
+text = "This is an image: <|image|>"
+tokens = tokenizer.encode(text)
+```
+
+### Adding regular tokens
+
+```python
+# Add domain-specific tokens
+new_tokens = ["COVID-19", "mRNA", "vaccine"]
+num_added = tokenizer.add_tokens(new_tokens)
+
+# These are NOT special tokens (can be split if needed)
+tokenizer.add_tokens(new_tokens, special_tokens=False)
+
+# These ARE special tokens (never split)
+tokenizer.add_tokens(new_tokens, special_tokens=True)
+```
+
+## Encoding and decoding
+
+### Basic encoding
+
+```python
+# Single sentence
+text = "Hello, how are you?"
+encoded = tokenizer(text)
+
+print(encoded)
+# {'input_ids': [101, 7592, 1010, 2129, 2024, 2017, 1029, 102],
+#  'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0],
+#  'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}
+```
+
+### Batch encoding
+
+```python
+# Multiple sentences
+texts = ["Hello world", "How are you?", "I am fine"]
+encoded = tokenizer(texts, padding=True, truncation=True, max_length=10)
+
+print(encoded['input_ids'])
+# [[101, 7592, 2088, 102, 0, 0, 0, 0, 0, 0],
+#  [101, 2129, 2024, 2017, 1029, 102, 0, 0, 0, 0],
+#  [101, 1045, 2572, 2986, 102, 0, 0, 0, 0, 0]]
+```
+
+### Return tensors
+
+```python
+# Return PyTorch tensors
+outputs = tokenizer("Hello world", return_tensors="pt")
+print(outputs['input_ids'].shape)  # torch.Size([1, 5])
+
+# Return TensorFlow tensors
+outputs = tokenizer("Hello world", return_tensors="tf")
+
+# Return NumPy arrays
+outputs = tokenizer("Hello world", return_tensors="np")
+
+# Return lists (default)
+outputs = tokenizer("Hello world", return_tensors=None)
+```
+
+### Decoding
+
+```python
+# Decode token IDs
+ids = [101, 7592, 2088, 102]
+text = tokenizer.decode(ids)
+print(text)  # "[CLS] hello world [SEP]"
+
+# Skip special tokens
+text = tokenizer.decode(ids, skip_special_tokens=True)
+print(text)  # "hello world"
+
+# Batch decode
+batch_ids = [[101, 7592, 102], [101, 2088, 102]]
+texts = tokenizer.batch_decode(batch_ids, skip_special_tokens=True)
+print(texts)  # ["hello", "world"]
+```
+
+## Padding and truncation
+
+### Padding strategies
+
+```python
+# Pad to max length in batch
+tokenizer(texts, padding="longest")
+
+# Pad to model max length
+tokenizer(texts, padding="max_length", max_length=128)
+
+# No padding
+tokenizer(texts, padding=False)
+
+# Pad to multiple of value (for efficient computation)
+tokenizer(texts, padding="max_length", max_length=128, pad_to_multiple_of=8)
+# Result: length will be 128 (already multiple of 8)
+```
+
+### Truncation strategies
+
+```python
+# Truncate to max length
+tokenizer(text, truncation=True, max_length=10)
+
+# Only truncate first sequence (for pairs)
+tokenizer(text1, text2, truncation="only_first", max_length=20)
+
+# Only truncate second sequence
+tokenizer(text1, text2, truncation="only_second", max_length=20)
+
+# Truncate longest first (default for pairs)
+tokenizer(text1, text2, truncation="longest_first", max_length=20)
+
+# No truncation (error if too long)
+tokenizer(text, truncation=False)
+```
+
+### Stride for long documents
+
+```python
+# For documents longer than max_length
+text = "Very long document " * 1000
+
+# Encode with overlap
+encodings = tokenizer(
+    text,
+    max_length=512,
+    stride=128,          # Overlap between chunks
+    truncation=True,
+    return_overflowing_tokens=True,
+    return_offsets_mapping=True
+)
+
+# Get all chunks
+num_chunks = len(encodings['input_ids'])
+print(f"Split into {num_chunks} chunks")
+
+# Each chunk overlaps by stride tokens
+for i, chunk in enumerate(encodings['input_ids']):
+    print(f"Chunk {i}: {len(chunk)} tokens")
+```
+
+**Use case**: Long document QA, sliding window inference
+
+## Alignment and offsets
+
+### Offset mapping
+
+```python
+# Get character offsets for each token
+encoded = tokenizer("Hello, world!", return_offsets_mapping=True)
+
+for token, (start, end) in zip(
+    encoded.tokens(),
+    encoded['offset_mapping'][0]
+):
+    print(f"{token:10s} → [{start:2d}, {end:2d})")
+
+# Output:
+# [CLS]      → [ 0,  0)
+# Hello      → [ 0,  5)
+# ,          → [ 5,  6)
+# world      → [ 7, 12)
+# !          → [12, 13)
+# [SEP]      → [ 0,  0)
+```
+
+### Word IDs
+
+```python
+# Get word index for each token
+encoded = tokenizer("Hello world", return_offsets_mapping=True)
+word_ids = encoded.word_ids()
+
+print(word_ids)
+# [None, 0, 1, None]
+# None = special token, 0 = first word, 1 = second word
+```
+
+**Use case**: Token classification (NER, POS tagging)
+
+### Character to token mapping
+
+```python
+text = "Machine learning is awesome"
+encoded = tokenizer(text, return_offsets_mapping=True)
+
+# Find token for character position
+char_pos = 8  # "l" in "learning"
+token_idx = encoded.char_to_token(char_pos)
+
+print(f"Character {char_pos} is in token {token_idx}: {encoded.tokens()[token_idx]}")
+# Character 8 is in token 2: learning
+```
+
+**Use case**: Question answering (map answer character span to tokens)
+
+### Sequence pairs
+
+```python
+# Encode sentence pair
+encoded = tokenizer("Question here", "Answer here", return_offsets_mapping=True)
+
+# Get sequence IDs (which sequence each token belongs to)
+sequence_ids = encoded.sequence_ids()
+print(sequence_ids)
+# [None, 0, 0, 0, None, 1, 1, 1, None]
+# None = special token, 0 = question, 1 = answer
+```
+
+## Model integration
+
+### Use with transformers models
+
+```python
+from transformers import AutoModel, AutoTokenizer
+import torch
+
+# Load model and tokenizer
+model = AutoModel.from_pretrained("bert-base-uncased")
+tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+
+# Tokenize
+text = "Hello world"
+inputs = tokenizer(text, return_tensors="pt")
+
+# Forward pass
+with torch.no_grad():
+    outputs = model(**inputs)
+
+# Get embeddings
+last_hidden_state = outputs.last_hidden_state
+print(last_hidden_state.shape)  # [1, seq_len, hidden_size]
+```
+
+### Custom model with custom tokenizer
+
+```python
+from transformers import BertConfig, BertModel
+
+# Train custom tokenizer
+from tokenizers import Tokenizer, models, trainers
+tokenizer = Tokenizer(models.BPE())
+trainer = trainers.BpeTrainer(vocab_size=30000)
+tokenizer.train(files=["data.txt"], trainer=trainer)
+
+# Wrap for transformers
+from transformers import PreTrainedTokenizerFast
+fast_tokenizer = PreTrainedTokenizerFast(
+    tokenizer_object=tokenizer,
+    unk_token="[UNK]",
+    pad_token="[PAD]"
+)
+
+# Create model with custom vocab size
+config = BertConfig(vocab_size=30000)
+model = BertModel(config)
+
+# Use together
+inputs = fast_tokenizer("Hello world", return_tensors="pt")
+outputs = model(**inputs)
+```
+
+### Save and load together
+
+```python
+# Save both
+model.save_pretrained("my-model")
+tokenizer.save_pretrained("my-model")
+
+# Directory structure:
+# my-model/
+#   ├── config.json
+#   ├── pytorch_model.bin
+#   ├── tokenizer.json
+#   ├── tokenizer_config.json
+#   └── special_tokens_map.json
+
+# Load both
+from transformers import AutoModel, AutoTokenizer
+
+model = AutoModel.from_pretrained("my-model")
+tokenizer = AutoTokenizer.from_pretrained("my-model")
+```
+
+## Advanced features
+
+### Multimodal tokenization
+
+```python
+from transformers import AutoTokenizer
+
+# LLaVA-style (image + text)
+tokenizer = AutoTokenizer.from_pretrained("llava-hf/llava-1.5-7b-hf")
+
+# Add image placeholder token
+tokenizer.add_special_tokens({"additional_special_tokens": ["<image>"]})
+
+# Use in prompt
+text = "Describe this image: <image>"
+inputs = tokenizer(text, return_tensors="pt")
+```
+
+### Template formatting
+
+```python
+# Chat template
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Hello!"},
+    {"role": "assistant", "content": "Hi! How can I help?"},
+    {"role": "user", "content": "What's the weather?"}
+]
+
+# Apply chat template (if tokenizer has one)
+if hasattr(tokenizer, "apply_chat_template"):
+    text = tokenizer.apply_chat_template(messages, tokenize=False)
+    inputs = tokenizer(text, return_tensors="pt")
+```
+
+### Custom template
+
+```python
+from transformers import PreTrainedTokenizerFast
+
+tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
+
+# Define chat template
+tokenizer.chat_template = """
+{%- for message in messages %}
+    {%- if message['role'] == 'system' %}
+        System: {{ message['content'] }}\\n
+    {%- elif message['role'] == 'user' %}
+        User: {{ message['content'] }}\\n
+    {%- elif message['role'] == 'assistant' %}
+        Assistant: {{ message['content'] }}\\n
+    {%- endif %}
+{%- endfor %}
+Assistant:
+"""
+
+# Use template
+text = tokenizer.apply_chat_template(messages, tokenize=False)
+```
+
+## Performance optimization
+
+### Batch processing
+
+```python
+# Process large datasets efficiently
+from datasets import load_dataset
+
+dataset = load_dataset("imdb", split="train[:1000]")
+
+# Tokenize in batches
+def tokenize_function(examples):
+    return tokenizer(
+        examples["text"],
+        padding="max_length",
+        truncation=True,
+        max_length=512
+    )
+
+# Map over dataset (batched)
+tokenized_dataset = dataset.map(
+    tokenize_function,
+    batched=True,
+    batch_size=1000,
+    num_proc=4  # Parallel processing
+)
+```
+
+### Caching
+
+```python
+# Enable caching for repeated tokenization
+tokenizer = AutoTokenizer.from_pretrained(
+    "bert-base-uncased",
+    use_fast=True,
+    cache_dir="./cache"  # Cache tokenizer files
+)
+
+# Tokenize with caching
+from functools import lru_cache
+
+@lru_cache(maxsize=10000)
+def cached_tokenize(text):
+    return tuple(tokenizer.encode(text))
+
+# Reuses cached results for repeated inputs
+```
+
+### Memory efficiency
+
+```python
+# For very large datasets, use streaming
+from datasets import load_dataset
+
+dataset = load_dataset("pile", split="train", streaming=True)
+
+def process_batch(batch):
+    # Tokenize
+    tokens = tokenizer(batch["text"], truncation=True, max_length=512)
+
+    # Process tokens...
+
+    return tokens
+
+# Process in chunks (memory efficient)
+for batch in dataset.batch(batch_size=1000):
+    processed = process_batch(batch)
+```
+
+## Troubleshooting
+
+### Issue: Tokenizer not fast
+
+**Symptom**:
+```python
+tokenizer.is_fast  # False
+```
+
+**Solution**: Install tokenizers library
+```bash
+pip install tokenizers
+```
+
+### Issue: Special tokens not working
+
+**Symptom**: Special tokens are split into subwords
+
+**Solution**: Add as special tokens, not regular tokens
+```python
+# Wrong
+tokenizer.add_tokens(["<|image|>"])
+
+# Correct
+tokenizer.add_special_tokens({"additional_special_tokens": ["<|image|>"]})
+```
+
+### Issue: Offset mapping not available
+
+**Symptom**:
+```python
+tokenizer("text", return_offsets_mapping=True)
+# Error: return_offsets_mapping not supported
+```
+
+**Solution**: Use fast tokenizer
+```python
+from transformers import AutoTokenizer
+
+# Load fast version
+tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", use_fast=True)
+```
+
+### Issue: Padding inconsistent
+
+**Symptom**: Some sequences padded, others not
+
+**Solution**: Specify padding strategy
+```python
+# Explicit padding
+tokenizer(
+    texts,
+    padding="max_length",  # or "longest"
+    max_length=128
+)
+```
+
+## Best practices
+
+1. **Always use fast tokenizers**:
+   - 5-10× faster
+   - Full alignment tracking
+   - Better batch processing
+
+2. **Save tokenizer with model**:
+   - Ensures reproducibility
+   - Prevents version mismatches
+
+3. **Use batch processing for datasets**:
+   - Tokenize with `.map(batched=True)`
+   - Set `num_proc` for parallelism
+
+4. **Enable caching for repeated inputs**:
+   - Use `lru_cache` for inference
+   - Cache tokenizer files with `cache_dir`
+
+5. **Handle special tokens properly**:
+   - Use `add_special_tokens()` for never-split tokens
+   - Resize embeddings after adding tokens
+
+6. **Test alignment for downstream tasks**:
+   - Verify `offset_mapping` is correct
+   - Test `char_to_token()` on samples
+
+7. **Version control tokenizer config**:
+   - Save `tokenizer_config.json`
+   - Document custom templates
+   - Track vocabulary changes