mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-01 01:51:44 +00:00
- Added detailed descriptions for new skills categories: Machine Learning Operations and Note Taking. - Introduced a new Obsidian skill with commands for reading, listing, searching, creating, and appending notes. - Enhanced the skills tool to load and display category descriptions from DESCRIPTION.md files, improving user guidance and discovery of available skills.
360 lines
13 KiB
Markdown
360 lines
13 KiB
Markdown
# Hermes Agent - Future Improvements
|
|
|
|
> Ideas for enhancing the agent's capabilities, generated from self-analysis of the codebase.
|
|
|
|
---
|
|
|
|
## 🚨 HIGH PRIORITY - Immediate Fixes
|
|
|
|
These items need to be addressed ASAP:
|
|
|
|
### 1. SUDO Breaking Terminal Tool 🔐
|
|
- [ ] **Problem:** SUDO commands break the terminal tool execution
|
|
- [ ] **Fix:** Handle password prompts / TTY requirements gracefully
|
|
- [ ] **Options:**
|
|
- Configure passwordless sudo for specific commands
|
|
- Detect sudo and warn user / request alternative approach
|
|
- Use `sudo -S` with stdin handling if password can be provided securely
|
|
|
|
### 2. Fix `browser_get_images` Tool 🖼️
|
|
- [ ] **Problem:** `browser_get_images` tool is broken/not working correctly
|
|
- [ ] **Debug:** Investigate what's failing - selector issues? async timing?
|
|
- [ ] **Fix:** Ensure it properly extracts image URLs and alt text from pages
|
|
|
|
### 3. Better Action Logging for Debugging 📝
|
|
- [ ] **Problem:** Need better logging of agent actions for debugging
|
|
- [ ] **Implementation:**
|
|
- Log all tool calls with inputs/outputs
|
|
- Timestamps for each action
|
|
- Structured log format (JSON?) for easy parsing
|
|
- Log levels (DEBUG, INFO, ERROR)
|
|
- Option to write to file vs stdout
|
|
|
|
### 4. Stream Thinking Summaries in Real-Time 💭
|
|
- [ ] **Problem:** Thinking/reasoning summaries not shown while streaming
|
|
- [ ] **Implementation:**
|
|
- Use streaming API to show thinking summaries as they're generated
|
|
- Display intermediate reasoning before final response
|
|
- Let user see the agent "thinking" in real-time
|
|
|
|
---
|
|
|
|
## 1. Context Management
|
|
|
|
**Problem:** Context grows unbounded during long conversations. Trajectory compression exists for training data post-hoc, but live conversations lack intelligent context management.
|
|
|
|
**Ideas:**
|
|
- [ ] **Incremental summarization** - Compress old tool outputs on-the-fly during conversations
|
|
- Trigger when context exceeds threshold (e.g., 80% of max tokens)
|
|
- Preserve recent turns fully, summarize older tool responses
|
|
- Could reuse logic from `trajectory_compressor.py`
|
|
|
|
- [ ] **Semantic memory retrieval** - Vector store for long conversation recall
|
|
- Embed important facts/findings as conversation progresses
|
|
- Retrieve relevant memories when needed instead of keeping everything in context
|
|
- Consider lightweight solutions: ChromaDB, FAISS, or even a simple embedding cache
|
|
|
|
- [ ] **Working vs. episodic memory** distinction
|
|
- Working memory: Current task state, recent tool results (always in context)
|
|
- Episodic memory: Past findings, tried approaches (retrieved on demand)
|
|
- Clear eviction policies for each
|
|
|
|
**Files to modify:** `run_agent.py` (add memory manager), possibly new `tools/memory_tool.py`
|
|
|
|
---
|
|
|
|
## 2. Self-Reflection & Course Correction 🔄
|
|
|
|
**Problem:** Current retry logic handles malformed outputs but not semantic failures. Agent doesn't reason about *why* something failed.
|
|
|
|
**Ideas:**
|
|
- [ ] **Meta-reasoning after failures** - When a tool returns an error or unexpected result:
|
|
```
|
|
Tool failed → Reflect: "Why did this fail? What assumptions were wrong?"
|
|
→ Adjust approach → Retry with new strategy
|
|
```
|
|
- Could be a lightweight LLM call or structured self-prompt
|
|
|
|
- [ ] **Planning/replanning module** - For complex multi-step tasks:
|
|
- Generate plan before execution
|
|
- After each step, evaluate: "Am I on track? Should I revise the plan?"
|
|
- Store plan in working memory, update as needed
|
|
|
|
- [ ] **Approach memory** - Remember what didn't work:
|
|
- "I tried X for this type of problem and it failed because Y"
|
|
- Prevents repeating failed strategies in the same conversation
|
|
|
|
**Files to modify:** `run_agent.py` (add reflection hooks in tool loop), new `tools/reflection_tool.py`
|
|
|
|
---
|
|
|
|
## 3. Tool Composition & Learning 🔧
|
|
|
|
**Problem:** Tools are atomic. Complex tasks require repeated manual orchestration of the same tool sequences.
|
|
|
|
**Ideas:**
|
|
- [ ] **Macro tools / Tool chains** - Define reusable tool sequences:
|
|
```yaml
|
|
research_topic:
|
|
description: "Deep research on a topic"
|
|
steps:
|
|
- web_search: {query: "$topic"}
|
|
- web_extract: {urls: "$search_results.urls[:3]"}
|
|
- summarize: {content: "$extracted"}
|
|
```
|
|
- Could be defined in skills or a new `macros/` directory
|
|
- Agent can invoke macro as single tool call
|
|
|
|
- [ ] **Tool failure patterns** - Learn from failures:
|
|
- Track: tool, input pattern, error type, what worked instead
|
|
- Before calling a tool, check: "Has this pattern failed before?"
|
|
- Persistent across sessions (stored in skills or separate DB)
|
|
|
|
- [ ] **Parallel tool execution** - When tools are independent, run concurrently:
|
|
- Detect independence (no data dependencies between calls)
|
|
- Use `asyncio.gather()` for parallel execution
|
|
- Already have async support in some tools, just need orchestration
|
|
|
|
**Files to modify:** `model_tools.py`, `toolsets.py`, new `tool_macros.py`
|
|
|
|
---
|
|
|
|
## 4. Dynamic Skills Expansion 📚
|
|
|
|
**Problem:** Skills system is elegant but static. Skills must be manually created and added.
|
|
|
|
**Ideas:**
|
|
- [ ] **Skill acquisition from successful tasks** - After completing a complex task:
|
|
- "This approach worked well. Save as a skill?"
|
|
- Extract: goal, steps taken, tools used, key decisions
|
|
- Generate SKILL.md automatically
|
|
- Store in user's skills directory
|
|
|
|
- [ ] **Skill templates** - Common patterns that can be parameterized:
|
|
```markdown
|
|
# Debug {language} Error
|
|
1. Reproduce the error
|
|
2. Search for error message: `web_search("{error_message} {language}")`
|
|
3. Check common causes: {common_causes}
|
|
4. Apply fix and verify
|
|
```
|
|
|
|
- [ ] **Skill chaining** - Combine skills for complex workflows:
|
|
- Skills can reference other skills as dependencies
|
|
- "To do X, first apply skill Y, then skill Z"
|
|
- Directed graph of skill dependencies
|
|
|
|
**Files to modify:** `tools/skills_tool.py`, `skills/` directory structure, new `skill_generator.py`
|
|
|
|
---
|
|
|
|
## 5. Task Continuation Hints 🎯
|
|
|
|
**Problem:** Could be more helpful by suggesting logical next steps.
|
|
|
|
**Ideas:**
|
|
- [ ] **Suggest next steps** - At end of a task, suggest logical continuations:
|
|
- "Code is written. Want me to also write tests / docs / deploy?"
|
|
- Based on common workflows for task type
|
|
- Non-intrusive, just offer options
|
|
|
|
**Files to modify:** `run_agent.py`, response generation logic
|
|
|
|
---
|
|
|
|
## 6. Uncertainty & Honesty Calibration 🎚️
|
|
|
|
**Problem:** Sometimes confidently wrong. Should be better calibrated about what I know vs. don't know.
|
|
|
|
**Ideas:**
|
|
- [ ] **Source attribution** - Track where information came from:
|
|
- "According to the docs I just fetched..." vs "From my training data (may be outdated)..."
|
|
- Let user assess reliability themselves
|
|
|
|
- [ ] **Cross-reference high-stakes claims** - Self-check for made-up details:
|
|
- When stakes are high, verify with tools before presenting as fact
|
|
- "Let me verify that before you act on it..."
|
|
|
|
**Files to modify:** `run_agent.py`, response generation logic
|
|
|
|
---
|
|
|
|
## 7. Resource Awareness & Efficiency 💰
|
|
|
|
**Problem:** No awareness of costs, time, or resource usage. Could be smarter about efficiency.
|
|
|
|
**Ideas:**
|
|
- [ ] **Tool result caching** - Don't repeat identical operations:
|
|
- Cache web searches, extractions within a session
|
|
- Invalidation based on time-sensitivity of query
|
|
- Hash-based lookup: same input → cached output
|
|
|
|
- [ ] **Lazy evaluation** - Don't fetch everything upfront:
|
|
- Get summaries first, full content only if needed
|
|
- "I found 5 relevant pages. Want me to deep-dive on any?"
|
|
|
|
**Files to modify:** `model_tools.py`, new `resource_tracker.py`
|
|
|
|
---
|
|
|
|
## 8. Collaborative Problem Solving 🤝
|
|
|
|
**Problem:** Interaction is command/response. Complex problems benefit from dialogue.
|
|
|
|
**Ideas:**
|
|
- [ ] **Assumption surfacing** - Make implicit assumptions explicit:
|
|
- "I'm assuming you want Python 3.11+. Correct?"
|
|
- "This solution assumes you have sudo access..."
|
|
- Let user correct before going down wrong path
|
|
|
|
- [ ] **Checkpoint & confirm** - For high-stakes operations:
|
|
- "About to delete 47 files. Here's the list - proceed?"
|
|
- "This will modify your database. Want a backup first?"
|
|
- Configurable threshold for when to ask
|
|
|
|
**Files to modify:** `run_agent.py`, system prompt configuration
|
|
|
|
---
|
|
|
|
## 9. Project-Local Context 💾
|
|
|
|
**Problem:** Valuable context lost between sessions.
|
|
|
|
**Ideas:**
|
|
- [ ] **Project awareness** - Remember project-specific context:
|
|
- Store `.hermes/context.md` in project directory
|
|
- "This is a Django project using PostgreSQL"
|
|
- Coding style preferences, deployment setup, etc.
|
|
- Load automatically when working in that directory
|
|
|
|
- [ ] **Handoff notes** - Leave notes for future sessions:
|
|
- Write to `.hermes/notes.md` in project
|
|
- "TODO for next session: finish implementing X"
|
|
- "Known issues: Y doesn't work on Windows"
|
|
|
|
**Files to modify:** New `project_context.py`, auto-load in `run_agent.py`
|
|
|
|
---
|
|
|
|
## 10. Graceful Degradation & Robustness 🛡️
|
|
|
|
**Problem:** When things go wrong, recovery is limited. Should fail gracefully.
|
|
|
|
**Ideas:**
|
|
- [ ] **Fallback chains** - When primary approach fails, have backups:
|
|
- `web_extract` fails → try `browser_navigate` → try `web_search` for cached version
|
|
- Define fallback order per tool type
|
|
|
|
- [ ] **Partial progress preservation** - Don't lose work on failure:
|
|
- Long task fails midway → save what we've got
|
|
- "I completed 3/5 steps before the error. Here's what I have..."
|
|
|
|
- [ ] **Self-healing** - Detect and recover from bad states:
|
|
- Browser stuck → close and retry
|
|
- Terminal hung → timeout and reset
|
|
|
|
**Files to modify:** `model_tools.py`, tool implementations, new `fallback_manager.py`
|
|
|
|
---
|
|
|
|
## 11. Tools & Skills Wishlist 🧰
|
|
|
|
*Things that would need new tool implementations (can't do well with current tools):*
|
|
|
|
### High-Impact
|
|
|
|
- [ ] **Audio/Video Transcription** 🎬
|
|
- Transcribe audio files, podcasts, YouTube videos
|
|
- Extract key moments from video
|
|
- Currently blind to multimedia content
|
|
- *Could potentially use whisper via terminal, but native tool would be cleaner*
|
|
|
|
- [ ] **Diagram Rendering** 📊
|
|
- Render Mermaid/PlantUML to actual images
|
|
- Can generate the code, but rendering requires external service or tool
|
|
- "Show me how these components connect" → actual visual diagram
|
|
|
|
### Medium-Impact
|
|
|
|
- [ ] **Canvas / Visual Workspace** 🖼️
|
|
- Agent-controlled visual panel for rendering interactive UI
|
|
- Inspired by OpenClaw's Canvas feature
|
|
- **Capabilities:**
|
|
- `present` / `hide` - Show/hide the canvas panel
|
|
- `navigate` - Load HTML files or URLs into the canvas
|
|
- `eval` - Execute JavaScript in the canvas context
|
|
- `snapshot` - Capture the rendered UI as an image
|
|
- **Use cases:**
|
|
- Display generated HTML/CSS/JS previews
|
|
- Show interactive data visualizations (charts, graphs)
|
|
- Render diagrams (Mermaid → rendered output)
|
|
- Present structured information in rich format
|
|
- A2UI-style component system for structured agent UI
|
|
- **Implementation options:**
|
|
- Electron-based panel for CLI
|
|
- WebSocket-connected web app
|
|
- VS Code webview extension
|
|
- *Would let agent "show" things rather than just describe them*
|
|
|
|
- [ ] **Document Generation** 📄
|
|
- Create styled PDFs, Word docs, presentations
|
|
- *Can do basic PDF via terminal tools, but limited*
|
|
|
|
- [ ] **Diff/Patch Tool** 📝
|
|
- Surgical code modifications with preview
|
|
- "Change line 45-50 to X" without rewriting whole file
|
|
- Show diffs before applying
|
|
- *Can use `diff`/`patch` but a native tool would be safer*
|
|
|
|
### Skills to Create
|
|
|
|
- [ ] **Domain-specific skill packs:**
|
|
- DevOps/Infrastructure (Terraform, K8s, AWS)
|
|
- Data Science workflows (EDA, model training)
|
|
- Security/pentesting procedures
|
|
|
|
- [ ] **Framework-specific skills:**
|
|
- React/Vue/Angular patterns
|
|
- Django/Rails/Express conventions
|
|
- Database optimization playbooks
|
|
|
|
- [ ] **Troubleshooting flowcharts:**
|
|
- "Docker container won't start" → decision tree
|
|
- "Production is slow" → systematic diagnosis
|
|
|
|
---
|
|
|
|
## Priority Order (Suggested)
|
|
|
|
1. **Memory & Context Management** - Biggest impact on complex tasks
|
|
2. **Self-Reflection** - Improves reliability and reduces wasted tool calls
|
|
3. **Project-Local Context** - Practical win, keeps useful info across sessions
|
|
4. **Tool Composition** - Quality of life, builds on other improvements
|
|
5. **Dynamic Skills** - Force multiplier for repeated tasks
|
|
|
|
---
|
|
|
|
## Removed Items (Unrealistic)
|
|
|
|
The following were removed because they're architecturally impossible:
|
|
|
|
- ~~Proactive suggestions / Prefetching~~ - Agent only runs on user request, can't interject
|
|
- ~~Session save/restore across conversations~~ - Agent doesn't control session persistence
|
|
- ~~User preference learning across sessions~~ - Same issue
|
|
- ~~Clipboard integration~~ - No access to user's local system clipboard
|
|
- ~~Voice/TTS playback~~ - Can generate audio but can't play it to user
|
|
- ~~Set reminders~~ - No persistent background execution
|
|
|
|
The following were removed because they're **already possible**:
|
|
|
|
- ~~HTTP/API Client~~ → Use `curl` or Python `requests` in terminal
|
|
- ~~Structured Data Manipulation~~ → Use `pandas` in terminal
|
|
- ~~Git-Native Operations~~ → Use `git` CLI in terminal
|
|
- ~~Symbolic Math~~ → Use `SymPy` in terminal
|
|
- ~~Code Quality Tools~~ → Run linters (`eslint`, `black`, `mypy`) in terminal
|
|
- ~~Testing Framework~~ → Run `pytest`, `jest`, etc. in terminal
|
|
- ~~Translation~~ → LLM handles this fine, or use translation APIs
|
|
|
|
---
|
|
|
|
*Last updated: $(date +%Y-%m-%d)* 🤖
|