hermes-agent/README.md
Jai Suphavadeeprasit e698b7e0e5 changes
2025-10-10 18:04:22 -04:00

295 lines
7.6 KiB
Markdown

# Hermes Agent
AI Agent with advanced tool calling capabilities, real-time logging, and extensible toolsets.
## Features
- 🤖 **Multi-model Support**: Works with Claude, GPT-4, and other OpenAI-compatible models
- 🔧 **Rich Tool Library**: Web search, content extraction, vision analysis, terminal execution, and more
- 📊 **Real-time Logging**: WebSocket-based logging system for monitoring agent execution
- 🖥️ **Desktop UI**: Modern PySide6 frontend with real-time event streaming
- 🎯 **Flexible Toolsets**: Predefined toolset combinations for different use cases
- 💾 **Trajectory Saving**: Save conversation flows for training and analysis
- 🔄 **Auto-retry**: Built-in error handling and retry logic
## Quick Start
### Installation
```bash
pip install -r requirements.txt
```
### Basic Usage
```bash
python run_agent.py \
--enabled_toolsets web \
--query "Search for the latest AI news"
```
### With Real-time Logging
```bash
# Terminal 1: Start API endpoint server
python api_endpoint/logging_server.py
# Terminal 2: Run agent
python run_agent.py \
--enabled_toolsets web \
--enable_websocket_logging \
--query "Your question here"
```
### With Desktop UI (Recommended)
The easiest way to use Hermes Agent is through the desktop UI:
```bash
# One-command launch (starts server + UI)
cd ui && ./start_hermes_ui.sh
# Or manually:
# Terminal 1: Start server
python api_endpoint/logging_server.py
# Terminal 2: Start UI
python ui/hermes_ui.py
```
The UI provides:
- 🖱️ Point-and-click query submission
- 🎛️ Easy model and tool selection
- 📊 Real-time event visualization
- 🔄 Automatic WebSocket connection
- 📝 Session history
## Project Structure
```
Hermes-Agent/
├── run_agent.py # Main agent runner
├── model_tools.py # Tool definitions and handling
├── toolsets.py # Predefined toolset combinations
├── requirements.txt # Python dependencies
├── ui/ # Desktop UI ⭐ NEW
│ ├── hermes_ui.py # PySide6 desktop application
│ ├── start_hermes_ui.sh # UI launcher script
│ └── test_ui_flow.py # UI integration tests
├── tools/ # Tool implementations
│ ├── web_tools.py # Web search, extract, crawl
│ ├── vision_tools.py # Image analysis
│ ├── terminal_tool.py # Command execution
│ ├── image_generation_tool.py
│ └── ...
├── api_endpoint/ # FastAPI + WebSocket logging endpoint
│ ├── logging_server.py # WebSocket server + Agent API ⭐ ENHANCED
│ ├── websocket_logger.py # Client library
│ ├── README.md # API endpoint docs
│ └── ...
├── logs/ # Log files
│ └── realtime/ # WebSocket session logs
└── tests/ # Test files
```
## Available Toolsets
### Basic Toolsets
- **web**: Web search, extract, and crawl
- **terminal**: Command execution
- **vision**: Image analysis
- **creative**: Image generation
- **reasoning**: Mixture of agents
### Composite Toolsets
- **research**: Web + vision tools
- **development**: Web + terminal + vision
- **analysis**: Web + vision + reasoning
- **full_stack**: All tools enabled
### Usage Examples
```bash
# Research with web and vision
python run_agent.py --enabled_toolsets research --query "..."
# Development with terminal access
python run_agent.py --enabled_toolsets development --query "..."
# Combine multiple toolsets
python run_agent.py --enabled_toolsets web,vision --query "..."
```
## Real-time Logging System
Monitor your agent's execution in real-time with the FastAPI WebSocket endpoint using a **persistent connection pool** architecture.
### Architecture
The logging system uses a **singleton WebSocket connection** that persists across multiple agent runs:
-**No timeouts** - connection stays alive indefinitely
-**No reconnection overhead** - connect once, reuse forever
-**Parallel execution** - multiple agents share one connection
-**Production-ready** - graceful shutdown with signal handlers
See [`api_endpoint/PERSISTENT_CONNECTION_GUIDE.md`](api_endpoint/PERSISTENT_CONNECTION_GUIDE.md) for technical details.
### Features
- Track all API calls and responses
- **Persistent connection** - one WebSocket for all sessions
- Monitor tool executions with parameters and timing
- Capture errors and completion status
- REST API for querying sessions
- Real-time WebSocket broadcasting
### Documentation
See [`api_endpoint/README.md`](api_endpoint/README.md) for complete documentation.
### Quick Start
```bash
# Start API endpoint server
python api_endpoint/logging_server.py
# Run agent with logging
python run_agent.py --enable_websocket_logging --query "..."
# View logs
curl http://localhost:8000/sessions
```
## Configuration
### Environment Variables
Create a `.env` file in the project root:
```bash
# API Keys
ANTHROPIC_API_KEY=your_key_here
FIRECRAWL_API_KEY=your_key_here
NOUS_API_KEY=your_key_here
FAL_KEY=your_key_here
# Optional
WEB_TOOLS_DEBUG=true # Enable web tools debug logging
```
### Command-Line Options
```bash
python run_agent.py --help
```
Key options:
- `--query`: Your question/task
- `--model`: Model to use (default: claude-sonnet-4-5-20250929)
- `--enabled_toolsets`: Toolsets to enable
- `--max_turns`: Maximum conversation turns
- `--enable_websocket_logging`: Enable real-time logging
- `--verbose`: Verbose debug output
- `--save_trajectories`: Save conversation trajectories
## Parallel Execution
The persistent connection pool enables true parallel agent execution. Multiple agents can run simultaneously, all sharing the same WebSocket connection for logging.
### Test Parallel Execution
```bash
python test_parallel_execution.py
```
This script runs three tests:
1. **Sequential** - baseline (3 queries one after another)
2. **Parallel** - 3 queries simultaneously
3. **High Concurrency** - 10 queries simultaneously
**Expected Results:**
- ⚡ ~3x speedup with parallel execution
- ✅ All queries logged to same connection
- ✅ No connection timeouts or errors
### Custom Parallel Code
```python
import asyncio
from run_agent import AIAgent
async def main():
agent1 = AIAgent(enable_websocket_logging=True)
agent2 = AIAgent(enable_websocket_logging=True)
# Run in parallel - both use shared connection!
results = await asyncio.gather(
agent1.run_conversation("Query 1"),
agent2.run_conversation("Query 2")
)
asyncio.run(main())
```
## Examples
### Investment Research
```bash
python run_agent.py \
--enabled_toolsets web \
--query "Find publicly traded companies in renewable energy"
```
### Code Analysis
```bash
python run_agent.py \
--enabled_toolsets development \
--query "Analyze the codebase and suggest improvements"
```
### Image Analysis
```bash
python run_agent.py \
--enabled_toolsets vision \
--query "Analyze this chart and explain the trends"
```
## Development
### Adding New Tools
1. Create tool in `tools/` directory
2. Register in `model_tools.py`
3. Add to appropriate toolset in `toolsets.py`
### Running Tests
```bash
# Test web tools
python tests/test_web_tools.py
# Test API endpoint / logging
cd api_endpoint
./test_websocket_logging.sh
```
## License
MIT License - see LICENSE file for details
## Contributing
Contributions welcome! Please open an issue or PR.
## Support
For questions or issues:
1. Check documentation in `api_endpoint/`
2. Review example usage in this README
3. Open a GitHub issue
---
Built with ❤️ for advanced AI agent workflows