mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap (#3934 )

* feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap

Map active skills to Telegram's slash command menu so users can
discover and invoke skills directly. Three changes:

1. Telegram menu now includes active skill commands alongside built-in
   commands, capped at 100 entries (Telegram Bot API limit). Overflow
   commands remain callable but hidden from the picker. Logged at
   startup when cap is hit.

2. New /commands [page] gateway command for paginated browsing of all
   commands + skills. /help now shows first 10 skill commands and
   points to /commands for the full list.

3. When a user types a slash command that matches a disabled or
   uninstalled skill, they get actionable guidance:
   - Disabled: 'Enable it with: hermes skills config'
   - Optional (not installed): 'Install with: hermes skills install official/<path>'

Built on ideas from PR #3921 by @kshitijk4poor.

* chore: move 21 niche skills to optional-skills

Move specialized/niche skills from built-in (skills/) to optional
(optional-skills/) to reduce the default skill count. Users can
install them with: hermes skills install official/<category>/<name>

Moved skills (21):
- mlops: accelerate, chroma, faiss, flash-attention,
  hermes-atropos-environments, huggingface-tokenizers, instructor,
  lambda-labs, llava, nemo-curator, pinecone, pytorch-lightning,
  qdrant, saelens, simpo, slime, tensorrt-llm, torchtitan
- research: domain-intel, duckduckgo-search
- devops: inference-sh cli

Built-in skills: 96 → 75
Optional skills: 22 → 43

* fix: only include repo built-in skills in Telegram menu, not user-installed

User-installed skills (from hub or manually added) stay accessible via
/skills and by typing the command directly, but don't get registered
in the Telegram slash command picker. Only skills whose SKILL.md is
under the repo's skills/ directory are included in the menu.

This keeps the Telegram menu focused on the curated built-in set while
user-installed skills remain discoverable through /skills and /commands.

2026-03-30 10:57:30 -07:00

3 KiB

Raw Blame History

Atropos BaseEnv Reference

Source: atroposlib/envs/base.py (~2124 lines)

Abstract Methods (MUST implement)

Method	Signature	Description
`get_next_item()`	`async def get_next_item(self) -> Item`	Return next item for trajectory. Return None to pause.
`evaluate()`	`async def evaluate(self, args, *kwargs)`	Called every steps_per_eval steps.
`setup()`	`async def setup(self)`	Called once at start. Load datasets, init models.
`collect_trajectory()`	`async def collect_trajectory(self, item) -> Tuple[Optional[ScoredDataItem], List[Item]]`	Single rollout. Or override collect_trajectories instead.

Overridable Methods

Method	Default Behavior	Override When
`collect_trajectories()`	Runs collect_trajectory group_size times in parallel	Batch generation, MCTS, coupled rollouts
`wandb_log()`	Logs completion lengths, rollout table, perf stats	Add custom metrics (always call super)
`config_init()`	Returns (env_config_cls(), ServerBaseline())	Custom defaults + server configs
`postprocess_histories()`	Passthrough	Final processing before sending to trainer
`save_checkpoint()`	Saves JSON to checkpoint_dir	Custom serialization
`cleanup()`	No-op	Release resources after each rollout

ScoredDataGroup Structure

ScoredDataGroup = TypedDict with:
    tokens:             List[List[int]]       # Token IDs per rollout
    masks:              List[List[int]]       # -100=prompt, token_id=completion
    scores:             List[float]           # Score per rollout
    advantages:         Optional[...]         # Per-token advantages
    ref_logprobs:       Optional[...]         # Reference model logprobs
    messages:           Optional[...]         # OpenAI-format messages
    inference_logprobs: Optional[...]         # Inference logprobs

BaseEnvConfig Key Fields

Field	Default	Description
`group_size`	4	Responses grouped for scoring
`steps_per_eval`	100	Steps between evaluations
`max_token_length`	2048	Max token length for generations
`total_steps`	1000	Total training steps
`use_wandb`	True	Enable wandb logging
`tokenizer_name`	DeepHermes-3	Tokenizer for token encoding
`ensure_scores_are_not_same`	True	Skip groups with identical scores
`worker_timeout`	600	Task timeout seconds

Data Flow

env_manager() → add_train_workers() → handle_env()
    → collect_trajectories() → postprocess_histories()
    → handle_send_to_api() → training server

Atropos Environment Statistics (82 environments analyzed)

95% implement setup, collect_trajectories, evaluate, get_next_item
76% override wandb_log
54% have custom config class
Most use collect_trajectories (plural), not collect_trajectory (singular)
Common reward patterns: LLM-judge (~40), regex-extract (~35), code-exec (~12)

3 KiB Raw Blame History

Atropos BaseEnv Reference

Abstract Methods (MUST implement)

Overridable Methods

ScoredDataGroup Structure

BaseEnvConfig Key Fields

Data Flow

Atropos Environment Statistics (82 environments analyzed)

3 KiB

Raw Blame History