feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap (#3934)

* feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap Map active skills to Telegram's slash command menu so users can discover and invoke skills directly. Three changes: 1. Telegram menu now includes active skill commands alongside built-in commands, capped at 100 entries (Telegram Bot API limit). Overflow commands remain callable but hidden from the picker. Logged at startup when cap is hit. 2. New /commands [page] gateway command for paginated browsing of all commands + skills. /help now shows first 10 skill commands and points to /commands for the full list. 3. When a user types a slash command that matches a disabled or uninstalled skill, they get actionable guidance: - Disabled: 'Enable it with: hermes skills config' - Optional (not installed): 'Install with: hermes skills install official/<path>' Built on ideas from PR #3921 by @kshitijk4poor. * chore: move 21 niche skills to optional-skills Move specialized/niche skills from built-in (skills/) to optional (optional-skills/) to reduce the default skill count. Users can install them with: hermes skills install official/<category>/<name> Moved skills (21): - mlops: accelerate, chroma, faiss, flash-attention, hermes-atropos-environments, huggingface-tokenizers, instructor, lambda-labs, llava, nemo-curator, pinecone, pytorch-lightning, qdrant, saelens, simpo, slime, tensorrt-llm, torchtitan - research: domain-intel, duckduckgo-search - devops: inference-sh cli Built-in skills: 96 → 75 Optional skills: 22 → 43 * fix: only include repo built-in skills in Telegram menu, not user-installed User-installed skills (from hub or manually added) stay accessible via /skills and by typing the command directly, but don't get registered in the Telegram slash command picker. Only skills whose SKILL.md is under the repo's skills/ directory are included in the menu. This keeps the Telegram menu focused on the curated built-in set while user-installed skills remain discoverable through /skills and /commands.
2026-04-26 01:01:40 +00:00 · 2026-03-30 10:57:30 -07:00 · 2026-03-30 10:57:30 -07:00 · 5ceed021dc
commit 5ceed021dc
parent 97d6813f51
73 changed files with 163 additions and 4 deletions
--- a/optional-skills/mlops/hermes-atropos-environments/references/atropos-base-env.md
+++ b/optional-skills/mlops/hermes-atropos-environments/references/atropos-base-env.md
@ -0,0 +1,65 @@
+# Atropos BaseEnv Reference
+
+Source: `atroposlib/envs/base.py` (~2124 lines)
+
+## Abstract Methods (MUST implement)
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `get_next_item()` | `async def get_next_item(self) -> Item` | Return next item for trajectory. Return None to pause. |
+| `evaluate()` | `async def evaluate(self, *args, **kwargs)` | Called every steps_per_eval steps. |
+| `setup()` | `async def setup(self)` | Called once at start. Load datasets, init models. |
+| `collect_trajectory()` | `async def collect_trajectory(self, item) -> Tuple[Optional[ScoredDataItem], List[Item]]` | Single rollout. Or override collect_trajectories instead. |
+
+## Overridable Methods
+
+| Method | Default Behavior | Override When |
+|--------|-----------------|---------------|
+| `collect_trajectories()` | Runs collect_trajectory group_size times in parallel | Batch generation, MCTS, coupled rollouts |
+| `wandb_log()` | Logs completion lengths, rollout table, perf stats | Add custom metrics (always call super) |
+| `config_init()` | Returns (env_config_cls(), ServerBaseline()) | Custom defaults + server configs |
+| `postprocess_histories()` | Passthrough | Final processing before sending to trainer |
+| `save_checkpoint()` | Saves JSON to checkpoint_dir | Custom serialization |
+| `cleanup()` | No-op | Release resources after each rollout |
+
+## ScoredDataGroup Structure
+
+```python
+ScoredDataGroup = TypedDict with:
+    tokens:             List[List[int]]       # Token IDs per rollout
+    masks:              List[List[int]]       # -100=prompt, token_id=completion
+    scores:             List[float]           # Score per rollout
+    advantages:         Optional[...]         # Per-token advantages
+    ref_logprobs:       Optional[...]         # Reference model logprobs
+    messages:           Optional[...]         # OpenAI-format messages
+    inference_logprobs: Optional[...]         # Inference logprobs
+```
+
+## BaseEnvConfig Key Fields
+
+| Field | Default | Description |
+|-------|---------|-------------|
+| `group_size` | 4 | Responses grouped for scoring |
+| `steps_per_eval` | 100 | Steps between evaluations |
+| `max_token_length` | 2048 | Max token length for generations |
+| `total_steps` | 1000 | Total training steps |
+| `use_wandb` | True | Enable wandb logging |
+| `tokenizer_name` | DeepHermes-3 | Tokenizer for token encoding |
+| `ensure_scores_are_not_same` | True | Skip groups with identical scores |
+| `worker_timeout` | 600 | Task timeout seconds |
+
+## Data Flow
+
+```
+env_manager() → add_train_workers() → handle_env()
+    → collect_trajectories() → postprocess_histories()
+    → handle_send_to_api() → training server
+```
+
+## Atropos Environment Statistics (82 environments analyzed)
+
+- 95% implement setup, collect_trajectories, evaluate, get_next_item
+- 76% override wandb_log
+- 54% have custom config class
+- Most use collect_trajectories (plural), not collect_trajectory (singular)
+- Common reward patterns: LLM-judge (~40), regex-extract (~35), code-exec (~12)