hermes-agent/README.md

# Hermes Agent

AI Agent with advanced tool calling capabilities, real-time logging, and extensible toolsets.

## Features

- 🤖 **Multi-model Support**: Works with Claude, GPT-4, and other OpenAI-compatible models
- 🔧 **Rich Tool Library**: Web search, content extraction, vision analysis, terminal execution, and more
- 📊 **Real-time Logging**: WebSocket-based logging system for monitoring agent execution
- 🖥️ **Desktop UI**: Modern PySide6 frontend with real-time event streaming
- 🎯 **Flexible Toolsets**: Predefined toolset combinations for different use cases
- 💾 **Trajectory Saving**: Save conversation flows for training and analysis
- 🔄 **Auto-retry**: Built-in error handling and retry logic

## Quick Start

### Installation

```bash
pip install -r requirements.txt
```

### Basic Usage

```bash
python run_agent.py \
  --enabled_toolsets web \
  --query "Search for the latest AI news"
```

### With Real-time Logging

```bash
# Terminal 1: Start API endpoint server
python api_endpoint/logging_server.py

# Terminal 2: Run agent
python run_agent.py \
  --enabled_toolsets web \
  --enable_websocket_logging \
  --query "Your question here"
```

### With Desktop UI (Recommended)

The easiest way to use Hermes Agent is through the desktop UI:

```bash
# One-command launch (starts server + UI)
cd ui && ./start_hermes_ui.sh

# Or manually:
# Terminal 1: Start server
python api_endpoint/logging_server.py

# Terminal 2: Start UI
python ui/hermes_ui.py
```

The UI provides:
- 🖱️ Point-and-click query submission
- 🎛️ Easy model and tool selection
- 📊 Real-time event visualization
- 🔄 Automatic WebSocket connection
- 📝 Session history

## Project Structure

```
Hermes-Agent/
├── run_agent.py              # Main agent runner
├── model_tools.py            # Tool definitions and handling
├── toolsets.py               # Predefined toolset combinations
├── requirements.txt          # Python dependencies
│
├── ui/                      # Desktop UI ⭐ NEW
│   ├── hermes_ui.py         # PySide6 desktop application
│   ├── start_hermes_ui.sh   # UI launcher script
│   └── test_ui_flow.py      # UI integration tests
│
├── tools/                    # Tool implementations
│   ├── web_tools.py         # Web search, extract, crawl
│   ├── vision_tools.py      # Image analysis
│   ├── terminal_tool.py     # Command execution
│   ├── image_generation_tool.py
│   └── ...
│
├── api_endpoint/            # FastAPI + WebSocket logging endpoint
│   ├── logging_server.py    # WebSocket server + Agent API ⭐ ENHANCED
│   ├── websocket_logger.py  # Client library
│   ├── README.md           # API endpoint docs
│   └── ...
│
├── logs/                    # Log files
│   └── realtime/           # WebSocket session logs
│
└── tests/                   # Test files
```

## Available Toolsets

### Basic Toolsets
- **web**: Web search, extract, and crawl
- **terminal**: Command execution
- **vision**: Image analysis
- **creative**: Image generation
- **reasoning**: Mixture of agents

### Composite Toolsets
- **research**: Web + vision tools
- **development**: Web + terminal + vision
- **analysis**: Web + vision + reasoning
- **full_stack**: All tools enabled

### Usage Examples

```bash
# Research with web and vision
python run_agent.py --enabled_toolsets research --query "..."

# Development with terminal access
python run_agent.py --enabled_toolsets development --query "..."

# Combine multiple toolsets
python run_agent.py --enabled_toolsets web,vision --query "..."
```

## Real-time Logging System

Monitor your agent's execution in real-time with the FastAPI WebSocket endpoint using a **persistent connection pool** architecture.

### Architecture

The logging system uses a **singleton WebSocket connection** that persists across multiple agent runs:
- ✅ **No timeouts** - connection stays alive indefinitely
- ✅ **No reconnection overhead** - connect once, reuse forever
- ✅ **Parallel execution** - multiple agents share one connection
- ✅ **Production-ready** - graceful shutdown with signal handlers

See [`api_endpoint/PERSISTENT_CONNECTION_GUIDE.md`](api_endpoint/PERSISTENT_CONNECTION_GUIDE.md) for technical details.

### Features
- Track all API calls and responses
- **Persistent connection** - one WebSocket for all sessions
- Monitor tool executions with parameters and timing
- Capture errors and completion status
- REST API for querying sessions
- Real-time WebSocket broadcasting

### Documentation
See [`api_endpoint/README.md`](api_endpoint/README.md) for complete documentation.

### Quick Start
```bash
# Start API endpoint server
python api_endpoint/logging_server.py

# Run agent with logging
python run_agent.py --enable_websocket_logging --query "..."

# View logs
curl http://localhost:8000/sessions
```

## Configuration

### Environment Variables

Create a `.env` file in the project root:

```bash
# API Keys
ANTHROPIC_API_KEY=your_key_here
FIRECRAWL_API_KEY=your_key_here
NOUS_API_KEY=your_key_here
FAL_KEY=your_key_here

# Optional
WEB_TOOLS_DEBUG=true  # Enable web tools debug logging
```

### Command-Line Options

```bash
python run_agent.py --help
```

Key options:
- `--query`: Your question/task
- `--model`: Model to use (default: claude-sonnet-4-5-20250929)
- `--enabled_toolsets`: Toolsets to enable
- `--max_turns`: Maximum conversation turns
- `--enable_websocket_logging`: Enable real-time logging
- `--verbose`: Verbose debug output
- `--save_trajectories`: Save conversation trajectories

## Parallel Execution

The persistent connection pool enables true parallel agent execution. Multiple agents can run simultaneously, all sharing the same WebSocket connection for logging.

### Test Parallel Execution

```bash
python test_parallel_execution.py
```

This script runs three tests:
1. **Sequential** - baseline (3 queries one after another)
2. **Parallel** - 3 queries simultaneously
3. **High Concurrency** - 10 queries simultaneously

**Expected Results:**
- ⚡ ~3x speedup with parallel execution
- ✅ All queries logged to same connection
- ✅ No connection timeouts or errors

### Custom Parallel Code

```python
import asyncio
from run_agent import AIAgent

async def main():
    agent1 = AIAgent(enable_websocket_logging=True)
    agent2 = AIAgent(enable_websocket_logging=True)

    # Run in parallel - both use shared connection!
    results = await asyncio.gather(
        agent1.run_conversation("Query 1"),
        agent2.run_conversation("Query 2")
    )

asyncio.run(main())
```

## Examples

### Investment Research
```bash
python run_agent.py \
  --enabled_toolsets web \
  --query "Find publicly traded companies in renewable energy"
```

### Code Analysis
```bash
python run_agent.py \
  --enabled_toolsets development \
  --query "Analyze the codebase and suggest improvements"
```

### Image Analysis
```bash
python run_agent.py \
  --enabled_toolsets vision \
  --query "Analyze this chart and explain the trends"
```

## Development

### Adding New Tools

1. Create tool in `tools/` directory
2. Register in `model_tools.py`
3. Add to appropriate toolset in `toolsets.py`

### Running Tests

```bash
# Test web tools
python tests/test_web_tools.py

# Test API endpoint / logging
cd api_endpoint
./test_websocket_logging.sh
```

## License

MIT License - see LICENSE file for details

## Contributing

Contributions welcome! Please open an issue or PR.

## Support

For questions or issues:
1. Check documentation in `api_endpoint/`
2. Review example usage in this README
3. Open a GitHub issue

---

Built with ❤️ for advanced AI agent workflows