Hermes-Agent is an agent harness for LLMs with an interactive CLI. ## Development Environment **IMPORTANT**: Always use the virtual environment if it exists: ```bash source venv/bin/activate # Before running any Python commands ``` ## Project Structure - `hermes` - CLI launcher script (run with `./hermes`) - `cli.py` - Interactive CLI with Rich UI, prompt_toolkit, animated spinners - `cli-config.yaml` - CLI configuration (model, terminal, toolsets, personalities) - `tools/` - Individual tool implementations (web, terminal, browser, vision, etc.) - `tools/__init__.py` - Exports all tools for importing - `model_tools.py` - Consolidates tool schemas and handlers for the agent - `toolsets.py` - Groups tools into logical toolsets (web, terminal, browser, etc.) - `toolset_distributions.py` - Probability-based tool selection for data generation - `run_agent.py` - Primary agent runner with AIAgent class and KawaiiSpinner - `batch_runner.py` - Parallel batch processing with checkpointing - `tests/` - Test scripts ## File Dependency Chain ``` tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py ↑ run_agent.py ──────────────────────────┘ cli.py → run_agent.py (uses AIAgent with quiet_mode=True) batch_runner.py → run_agent.py + toolset_distributions.py ``` Always ensure consistency between tools, model_tools.py, and toolsets.py when changing any of them. ## CLI Architecture (cli.py) The interactive CLI uses: - **Rich** - For the welcome banner and styled panels - **prompt_toolkit** - For fixed input area with history and `patch_stdout` - **KawaiiSpinner** (in run_agent.py) - Animated feedback during API calls and tool execution Key components: - `HermesCLI` class - Main CLI controller with commands and conversation loop - `load_cli_config()` - Loads `cli-config.yaml`, sets environment variables for terminal - `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary - `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc. CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging and enable kawaii-style feedback instead. ### Adding CLI Commands 1. Add to `COMMANDS` dict with description 2. Add handler in `process_command()` method 3. For persistent settings, use `save_config_value()` to update `cli-config.yaml` ## Adding a New Tool Follow this strict order to maintain consistency: 1. Create `tools/your_tool.py` with: - Handler function (sync or async) returning a JSON string via `json.dumps()` - `check_*_requirements()` function to verify dependencies (e.g., API keys) - Schema definition following OpenAI function-calling format 2. Export in `tools/__init__.py`: - Import the handler and check function - Add to `__all__` list 3. Register in `model_tools.py`: - Create `get_*_tool_definitions()` function or add to existing - Add routing in `handle_function_call()` dispatcher - Update `get_all_tool_names()` with the tool name - Update `get_toolset_for_tool()` mapping - Update `get_available_toolsets()` and `check_toolset_requirements()` 4. Add to toolset in `toolsets.py`: - Add to existing toolset or create new one in TOOLSETS dict 5. Optionally add to `toolset_distributions.py` for batch processing ## Tool Implementation Pattern ```python # tools/example_tool.py import json import os def check_example_requirements() -> bool: """Check if required API keys/dependencies are available.""" return bool(os.getenv("EXAMPLE_API_KEY")) def example_tool(param: str, task_id: str = None) -> str: """Execute the tool and return JSON string result.""" try: result = {"success": True, "data": "..."} return json.dumps(result, ensure_ascii=False) except Exception as e: return json.dumps({"error": str(e)}, ensure_ascii=False) ``` All tool handlers MUST return a JSON string. Never return raw dicts. ## Stateful Tools Tools that maintain state (terminal, browser) require: - `task_id` parameter for session isolation between concurrent tasks - `cleanup_*()` function to release resources - Cleanup is called automatically in run_agent.py after conversation completes ## Environment Variables API keys are loaded from `.env` file in repo root: - `OPENROUTER_API_KEY` - Main LLM API access (primary provider) - `FIRECRAWL_API_KEY` - Web search/extract tools - `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation - `FAL_KEY` - Image generation (FLUX model) - `NOUS_API_KEY` - Vision and Mixture-of-Agents tools Terminal tool configuration (can also be set in `cli-config.yaml`): - `TERMINAL_ENV` - Backend: local, docker, singularity, modal, or ssh - `TERMINAL_CWD` - Working directory - `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` - For SSH backend ## Agent Loop (run_agent.py) The AIAgent class handles: - Processing enabled toolsets to provide to the model - Piping prompts to the agent - Looping LLM calls when tools are invoked, until natural language response - Returning the final response Uses OpenAI-compatible API (primarily OpenRouter) with the OpenAI Python SDK. ## Reasoning Model Support For models that support chain-of-thought reasoning: - Extract `reasoning_content` from API responses - Store in `assistant_msg["reasoning"]` for trajectory export - Pass back via `reasoning_content` field on subsequent turns ## Trajectory Format Conversations are saved in ShareGPT format for training: ```json {"from": "system", "value": "System prompt with ..."} {"from": "human", "value": "User message"} {"from": "gpt", "value": "reasoning\n{...}"} {"from": "tool", "value": "{...}"} {"from": "gpt", "value": "Final response"} ``` Tool calls use `` XML tags, responses use `` tags, reasoning uses `` tags. ## Batch Processing (batch_runner.py) For processing multiple prompts: - Parallel execution with multiprocessing - Content-based resume for fault tolerance (matches on prompt text, not indices) - Toolset distributions control probabilistic tool availability per prompt - Output: `data//trajectories.jsonl` (combined) + individual batch files ## Logging Trajectories restructure tools as a system prompt for storage in a format suitable for later training use. ## Skills System Skills are on-demand knowledge documents the agent can load. Located in `skills/` directory: ``` skills/ ├── mlops/ # Category folder │ ├── axolotl/ # Skill folder │ │ ├── SKILL.md # Main instructions (required) │ │ ├── references/ # Additional docs, API specs │ │ └── templates/ # Output formats, configs │ └── vllm/ │ └── SKILL.md └── example-skill/ └── SKILL.md ``` **Progressive disclosure** (token-efficient): 1. `skills_categories()` - List category names (~50 tokens) 2. `skills_list(category)` - Name + description per skill (~3k tokens) 3. `skill_view(name)` - Full content + tags + linked files SKILL.md files use YAML frontmatter: ```yaml --- name: skill-name description: Brief description for listing tags: [tag1, tag2] related_skills: [other-skill] version: 1.0.0 --- # Skill Content... ``` Tool files: `tools/skills_tool.py` → `model_tools.py` → `toolsets.py`