mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-26 06:01:49 +00:00
feat(skills): add autoresearch skill for autonomous research orchestration
Add a new research skill that enables continuous, self-directed research with a two-loop architecture: - Inner loop: Rapid experiment iteration with measurable outcomes - Outer loop: Periodic synthesis, pattern discovery, and direction setting Features: - Research workspace templates (state, findings, log) - Example project (LoRA rank study) - Configuration options for loop intervals and auto-commit - Integration with existing research skills (arxiv, paper-writing) Updated research/DESCRIPTION.md to include autoresearch in the skill overview.
This commit is contained in:
parent
939d2b37d1
commit
762b582704
5 changed files with 482 additions and 1 deletions
|
|
@ -1,3 +1,31 @@
|
||||||
---
|
---
|
||||||
description: Skills for academic research, paper discovery, literature review, domain reconnaissance, market data, content monitoring, and scientific knowledge retrieval.
|
description: Skills for academic research, paper discovery, literature review, domain reconnaissance, market data, content monitoring, and scientific knowledge retrieval. Includes autoresearch for autonomous, continuous research with iterative experimentation.
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Skill Overview
|
||||||
|
|
||||||
|
| Skill | Purpose | Best For |
|
||||||
|
|-------|---------|----------|
|
||||||
|
| `autoresearch` | Autonomous research orchestration | Continuous experimentation, hypothesis testing, benchmark optimization |
|
||||||
|
| `arxiv` | Search academic papers | Literature surveys, paper discovery |
|
||||||
|
| `research-paper-writing` | Write publication-ready papers | Final paper generation |
|
||||||
|
| `blogwatcher` | Monitor research blogs | Staying current with new developments |
|
||||||
|
| `llm-wiki` | LLM knowledge base | Quick reference on models and techniques |
|
||||||
|
| `polymarket` | Prediction market data | Research on market trends and predictions |
|
||||||
|
|
||||||
|
## Getting Started with Research
|
||||||
|
|
||||||
|
**Quick literature search:**
|
||||||
|
```
|
||||||
|
/arxiv "transformer attention mechanisms"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Start autonomous research:**
|
||||||
|
```
|
||||||
|
/autoresearch "Does LoRA rank affect convergence speed?"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Write a paper:**
|
||||||
|
```
|
||||||
|
/research-paper-writing
|
||||||
|
```
|
||||||
|
|
|
||||||
200
skills/research/autoresearch/SKILL.md
Normal file
200
skills/research/autoresearch/SKILL.md
Normal file
|
|
@ -0,0 +1,200 @@
|
||||||
|
---
|
||||||
|
name: autoresearch
|
||||||
|
description: Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required.
|
||||||
|
version: 1.0.0
|
||||||
|
author: Hermes Agent
|
||||||
|
license: MIT
|
||||||
|
metadata:
|
||||||
|
hermes:
|
||||||
|
tags: [Research, Autonomous, Experiments, ML, AI, Literature, Hypothesis, Benchmark, Optimization]
|
||||||
|
related_skills: [arxiv, research-paper-writing, web-search, notebooklm]
|
||||||
|
config:
|
||||||
|
autoresearch.loop_interval_minutes:
|
||||||
|
description: "Interval between autonomous research loops (in minutes)"
|
||||||
|
default: 20
|
||||||
|
autoresearch.max_iterations:
|
||||||
|
description: "Maximum number of inner-loop experiments before forced reflection"
|
||||||
|
default: 10
|
||||||
|
autoresearch.auto_commit:
|
||||||
|
description: "Automatically git-commit research milestones"
|
||||||
|
default: true
|
||||||
|
---
|
||||||
|
|
||||||
|
# Autoresearch
|
||||||
|
|
||||||
|
**Autonomous research orchestration for AI coding agents.**
|
||||||
|
|
||||||
|
Run continuous, self-directed research with a two-loop architecture:
|
||||||
|
- **Inner Loop**: Rapid experiment iteration with clear measurable outcomes
|
||||||
|
- **Outer Loop**: Periodic synthesis, pattern discovery, and direction setting
|
||||||
|
|
||||||
|
Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
| Scenario | Use Autoresearch? |
|
||||||
|
|----------|-------------------|
|
||||||
|
| "I want to explore X and see what works" | ✅ Yes |
|
||||||
|
| "Does technique Y improve metric Z?" | ✅ Yes |
|
||||||
|
| "What's the state of the art for problem W?" | ✅ Yes (bootstrap + literature) |
|
||||||
|
| "Train a model with specific hyperparameters" | ❌ Use domain skills directly |
|
||||||
|
| "Run a single evaluation" | ❌ Use evaluation skills directly |
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start a research project
|
||||||
|
/autoresearch "Does LoRA rank affect convergence speed on small datasets?"
|
||||||
|
|
||||||
|
# Or with the research tool
|
||||||
|
research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?")
|
||||||
|
```
|
||||||
|
|
||||||
|
## The Two-Loop Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
BOOTSTRAP (once)
|
||||||
|
↓
|
||||||
|
INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn
|
||||||
|
↓ (every N experiments or when stuck)
|
||||||
|
OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction
|
||||||
|
↓
|
||||||
|
CONCLUDE → Write findings → Generate report
|
||||||
|
```
|
||||||
|
|
||||||
|
### Inner Loop: Experiment Fast
|
||||||
|
|
||||||
|
1. Pick highest-priority untested hypothesis
|
||||||
|
2. Write protocol (what change, what prediction, why)
|
||||||
|
3. **Lock it**: Commit to git BEFORE running
|
||||||
|
4. Run experiment (invoke domain skill)
|
||||||
|
5. Sanity check results (converged? baseline correct?)
|
||||||
|
6. Measure proxy metric
|
||||||
|
7. Record in `experiments/{hypothesis-slug}/`
|
||||||
|
8. Update `research-state.yaml`
|
||||||
|
9. If stuck → search literature or brainstorm
|
||||||
|
|
||||||
|
### Outer Loop: Step Back and Synthesize
|
||||||
|
|
||||||
|
1. Review all results since last reflection
|
||||||
|
2. Cluster by type: what worked? what didn't?
|
||||||
|
3. Ask WHY — identify mechanisms
|
||||||
|
4. Update `findings.md` with current understanding
|
||||||
|
5. Search literature if results surprise you
|
||||||
|
6. Generate new hypotheses if warranted
|
||||||
|
7. Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE
|
||||||
|
|
||||||
|
## Workspace Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
{project}/
|
||||||
|
├── research-state.yaml # Central state tracking
|
||||||
|
├── research-log.md # Decision timeline
|
||||||
|
├── findings.md # Evolving narrative synthesis
|
||||||
|
├── literature/ # Papers, survey notes
|
||||||
|
├── src/ # Reusable code (utils, plotting)
|
||||||
|
├── data/ # Raw result data
|
||||||
|
├── experiments/ # Per-hypothesis work
|
||||||
|
│ └── {hypothesis-slug}/
|
||||||
|
│ ├── protocol.md # What, why, and prediction
|
||||||
|
│ ├── code/ # Experiment-specific code
|
||||||
|
│ ├── results/ # Raw outputs, metrics
|
||||||
|
│ └── analysis.md # What we learned
|
||||||
|
├── to_human/ # Progress presentations
|
||||||
|
└── paper/ # Final paper (optional)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Research Discipline
|
||||||
|
|
||||||
|
### Lock Before You Run
|
||||||
|
|
||||||
|
Always commit your protocol to git BEFORE executing:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add experiments/H001-protocol.md
|
||||||
|
git commit -m "research(protocol): H001 — cosine warmup improves convergence"
|
||||||
|
# THEN run the experiment
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates temporal proof your plan existed before results.
|
||||||
|
|
||||||
|
### Confirmatory vs Exploratory
|
||||||
|
|
||||||
|
| Type | Definition | Trust Level |
|
||||||
|
|------|------------|-------------|
|
||||||
|
| **Confirmatory** | Matches your locked protocol | High |
|
||||||
|
| **Exploratory** | Discovered during execution | Medium — needs replication |
|
||||||
|
|
||||||
|
### Negative Results Are Progress
|
||||||
|
|
||||||
|
A refuted hypothesis tells you something. Log what it rules out and what it suggests.
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `/autoresearch <question>` | Initialize and start research project |
|
||||||
|
| `/research-status` | Show current state and progress |
|
||||||
|
| `/research-pause` | Pause autonomous loops |
|
||||||
|
| `/research-resume` | Resume autonomous loops |
|
||||||
|
| `/research-report` | Generate progress presentation |
|
||||||
|
| `/research-conclude` | Finalize and write paper |
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Add to `~/.hermes/config.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
autoresearch:
|
||||||
|
loop_interval_minutes: 20 # How often to check progress
|
||||||
|
max_iterations: 10 # Experiments before forced reflection
|
||||||
|
auto_commit: true # Auto-commit milestones
|
||||||
|
default_workspace: "./research" # Where to create projects
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration with Other Skills
|
||||||
|
|
||||||
|
| Research Phase | Skills to Invoke |
|
||||||
|
|----------------|------------------|
|
||||||
|
| Literature search | `arxiv`, `web-search`, `notebooklm` |
|
||||||
|
| Data preparation | `data-science` tools |
|
||||||
|
| Model training | `mlops`, domain-specific skills |
|
||||||
|
| Evaluation | `evaluating-llms-harness`, custom evals |
|
||||||
|
| Paper writing | `research-paper-writing` |
|
||||||
|
| Progress reports | Built-in report generation |
|
||||||
|
|
||||||
|
## Example: LoRA Rank Study
|
||||||
|
|
||||||
|
```
|
||||||
|
User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?"
|
||||||
|
|
||||||
|
Agent:
|
||||||
|
1. Bootstraps: Searches arxiv for LoRA papers
|
||||||
|
2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16)
|
||||||
|
3. Inner loop: Trains 3 models, records convergence steps
|
||||||
|
4. Outer loop: Notices rank 8 converges fastest
|
||||||
|
5. Deepens: Tests rank 6, 10, 12
|
||||||
|
6. Concludes: Generates report with trajectory plot
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **Start simple**: First experiment should run in <30 minutes
|
||||||
|
2. **Define metrics upfront**: Lock evaluation criteria before running
|
||||||
|
3. **Return to literature**: When stuck or surprised, search papers
|
||||||
|
4. **Commit frequently**: Git history is your research log
|
||||||
|
5. **Show your work**: Generate progress reports for human review
|
||||||
|
6. **Never idle**: If blocked, diagnose, fix, or pivot — but keep moving
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- Inspired by Andrej Karpathy's autoresearch methodology
|
||||||
|
- Compatible with agentskills.io open standard
|
||||||
|
- Built-in templates from `templates/` directory
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- `templates/research-state.yaml` — State tracking template
|
||||||
|
- `templates/findings.md` — Synthesis template
|
||||||
|
- `templates/research-log.md` — Decision log template
|
||||||
|
- `examples/` — Example research projects
|
||||||
93
skills/research/autoresearch/templates/findings.md
Normal file
93
skills/research/autoresearch/templates/findings.md
Normal file
|
|
@ -0,0 +1,93 @@
|
||||||
|
# Findings: {{PROJECT_NAME}}
|
||||||
|
|
||||||
|
**Research Question:** {{RESEARCH_QUESTION}}
|
||||||
|
|
||||||
|
**Last Updated:** {{LAST_UPDATED}}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Understanding
|
||||||
|
|
||||||
|
### What We Know So Far
|
||||||
|
|
||||||
|
[Summarize the current state of knowledge. 2-4 paragraphs.]
|
||||||
|
|
||||||
|
### Key Patterns and Insights
|
||||||
|
|
||||||
|
| Pattern | Evidence | Confidence |
|
||||||
|
|---------|----------|------------|
|
||||||
|
| [Pattern 1] | [Which experiments support this] | High/Medium/Low |
|
||||||
|
| [Pattern 2] | [Which experiments support this] | High/Medium/Low |
|
||||||
|
|
||||||
|
### Mechanistic Understanding
|
||||||
|
|
||||||
|
[If applicable: What mechanisms explain the results?]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lessons and Constraints
|
||||||
|
|
||||||
|
### What Works
|
||||||
|
|
||||||
|
- [Specific finding with context]
|
||||||
|
- [Another finding]
|
||||||
|
|
||||||
|
### What Doesn't Work
|
||||||
|
|
||||||
|
- [Failed approach and why]
|
||||||
|
- [Constraint discovered]
|
||||||
|
|
||||||
|
### Critical Parameters
|
||||||
|
|
||||||
|
| Parameter | Sweet Spot | Why |
|
||||||
|
|-----------|------------|-----|
|
||||||
|
| [Param 1] | [Value/range] | [Explanation] |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
### High Priority
|
||||||
|
|
||||||
|
1. [Question that would change the story if answered]
|
||||||
|
2. [Another critical question]
|
||||||
|
|
||||||
|
### Medium Priority
|
||||||
|
|
||||||
|
1. [Nice to know but not blocking]
|
||||||
|
|
||||||
|
### Answered
|
||||||
|
|
||||||
|
1. ~~[Question]~~ → Answer: [Brief answer with evidence]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Narrative Arc
|
||||||
|
|
||||||
|
[For paper writing: What's the story? What would the abstract say?]
|
||||||
|
|
||||||
|
### Contribution Sketch
|
||||||
|
|
||||||
|
[1-2 sentences on what this research contributes]
|
||||||
|
|
||||||
|
### Implications
|
||||||
|
|
||||||
|
[Who cares? Why does this matter?]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
### Immediate (Next 1-3 Experiments)
|
||||||
|
|
||||||
|
- [ ] [Specific experiment]
|
||||||
|
- [ ] [Another experiment]
|
||||||
|
|
||||||
|
### Medium Term
|
||||||
|
|
||||||
|
- [ ] [Broader direction]
|
||||||
|
|
||||||
|
### If Current Direction Fails
|
||||||
|
|
||||||
|
- [Pivot option 1]
|
||||||
|
- [Pivot option 2]
|
||||||
90
skills/research/autoresearch/templates/research-log.md
Normal file
90
skills/research/autoresearch/templates/research-log.md
Normal file
|
|
@ -0,0 +1,90 @@
|
||||||
|
# Research Log: {{PROJECT_NAME}}
|
||||||
|
|
||||||
|
**Research Question:** {{RESEARCH_QUESTION}}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Bootstrap Phase
|
||||||
|
|
||||||
|
### {{DATE}} — Project Initialization
|
||||||
|
|
||||||
|
- **Action:** Created workspace, initialized state files
|
||||||
|
- **Research Question:** {{RESEARCH_QUESTION}}
|
||||||
|
- **Initial Thoughts:** [What makes this interesting?]
|
||||||
|
|
||||||
|
### {{DATE}} — Literature Search
|
||||||
|
|
||||||
|
- **Sources:** arxiv, semantic scholar, web search
|
||||||
|
- **Key Papers:**
|
||||||
|
- [Paper 1] — [Key finding relevant to question]
|
||||||
|
- [Paper 2] — [Key finding]
|
||||||
|
- **Gap Identified:** [What's missing in existing work?]
|
||||||
|
|
||||||
|
### {{DATE}} — Hypothesis Formation
|
||||||
|
|
||||||
|
- **H001:** [Description] → Prediction: [Specific prediction]
|
||||||
|
- **H002:** [Description] → Prediction: [Specific prediction]
|
||||||
|
- **H003:** [Description] → Prediction: [Specific prediction]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Inner Loop Log
|
||||||
|
|
||||||
|
### {{DATE}} — Experiment H001
|
||||||
|
|
||||||
|
- **Hypothesis:** H001
|
||||||
|
- **Protocol:** [What was changed, what was predicted]
|
||||||
|
- **Git Commit:** `research(protocol): H001 — [description]`
|
||||||
|
- **Status:** [Running/Completed/Failed]
|
||||||
|
- **Results:**
|
||||||
|
- Metric: [Value]
|
||||||
|
- Baseline: [Value]
|
||||||
|
- Delta: [+/-X]
|
||||||
|
- **Interpretation:** [What this means]
|
||||||
|
- **Next Action:** [Continue/Adjust/Pivot]
|
||||||
|
|
||||||
|
### {{DATE}} — Experiment H002
|
||||||
|
|
||||||
|
[Same format...]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Outer Loop Log
|
||||||
|
|
||||||
|
### {{DATE}} — Reflection #1 (After {{N}} experiments)
|
||||||
|
|
||||||
|
- **Experiments Reviewed:** H001-H00N
|
||||||
|
- **Patterns Observed:**
|
||||||
|
- [Pattern 1]
|
||||||
|
- [Pattern 2]
|
||||||
|
- **Updated Understanding:** [New insights]
|
||||||
|
- **Direction Decision:** [DEEPEN/BROADEN/PIVOT/CONCLUDE]
|
||||||
|
- **Rationale:** [Why this direction?]
|
||||||
|
- **New Hypotheses:**
|
||||||
|
- H00N+1: [Description]
|
||||||
|
- H00N+2: [Description]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Direction Changes
|
||||||
|
|
||||||
|
### {{DATE}} — PIVOT: [New Direction]
|
||||||
|
|
||||||
|
- **From:** [Old direction/assumption]
|
||||||
|
- **To:** [New direction]
|
||||||
|
- **Trigger:** [What result/surprise caused this?]
|
||||||
|
- **New Research Question:** [If changed]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
### {{DATE}} — Research Concluded
|
||||||
|
|
||||||
|
- **Final Status:** [Completed/Partial/Abandoned]
|
||||||
|
- **Key Findings:**
|
||||||
|
1. [Finding 1]
|
||||||
|
2. [Finding 2]
|
||||||
|
- **Contribution:** [What this adds to the field]
|
||||||
|
- **Limitations:** [What we didn't test/couldn't conclude]
|
||||||
|
- **Future Work:** [What someone should do next]
|
||||||
70
skills/research/autoresearch/templates/research-state.yaml
Normal file
70
skills/research/autoresearch/templates/research-state.yaml
Normal file
|
|
@ -0,0 +1,70 @@
|
||||||
|
# Research State
|
||||||
|
# Central tracking file for autoresearch project
|
||||||
|
# Updated automatically by the agent — do not edit manually
|
||||||
|
|
||||||
|
project:
|
||||||
|
name: "{{PROJECT_NAME}}"
|
||||||
|
created_at: "{{CREATED_AT}}"
|
||||||
|
research_question: "{{RESEARCH_QUESTION}}"
|
||||||
|
status: "bootstrapping" # bootstrapping | active | paused | concluding | completed
|
||||||
|
|
||||||
|
bootstrap:
|
||||||
|
literature_searched: false
|
||||||
|
initial_hypotheses_formed: false
|
||||||
|
evaluation_metric_defined: false
|
||||||
|
baseline_established: false
|
||||||
|
|
||||||
|
loops:
|
||||||
|
inner_loop_count: 0
|
||||||
|
outer_loop_count: 0
|
||||||
|
last_inner_loop_at: null
|
||||||
|
last_outer_loop_at: null
|
||||||
|
|
||||||
|
direction:
|
||||||
|
current: "explore" # explore | deepen | broaden | pivot | conclude
|
||||||
|
rationale: "Initial exploration phase"
|
||||||
|
next_milestone: "Complete first 3 experiments"
|
||||||
|
|
||||||
|
hypotheses:
|
||||||
|
# Example structure — replace with actual hypotheses
|
||||||
|
H001:
|
||||||
|
description: "{{HYPOTHESIS_DESCRIPTION}}"
|
||||||
|
status: "untested" # untested | running | completed | refuted | supported
|
||||||
|
prediction: "{{PREDICTION}}"
|
||||||
|
priority: 1
|
||||||
|
created_at: "{{CREATED_AT}}"
|
||||||
|
completed_at: null
|
||||||
|
result_summary: null
|
||||||
|
experiment_slug: null
|
||||||
|
|
||||||
|
metrics:
|
||||||
|
primary: "{{PRIMARY_METRIC}}" # e.g., "val_loss", "accuracy", "convergence_steps"
|
||||||
|
baseline_value: null
|
||||||
|
target_value: null
|
||||||
|
current_best: null
|
||||||
|
optimization_direction: "minimize" # minimize | maximize
|
||||||
|
|
||||||
|
trajectory:
|
||||||
|
# Auto-populated from experiments
|
||||||
|
# - experiment_id: run_001
|
||||||
|
# hypothesis: H001
|
||||||
|
# metric_value: 0.847
|
||||||
|
# baseline: 0.812
|
||||||
|
# delta: "+0.035"
|
||||||
|
# wall_time_min: 23
|
||||||
|
# change_summary: "Added cosine annealing"
|
||||||
|
|
||||||
|
resources:
|
||||||
|
literature_papers: []
|
||||||
|
related_work_notes: null
|
||||||
|
code_references: []
|
||||||
|
|
||||||
|
continuity:
|
||||||
|
last_session_at: "{{CREATED_AT}}"
|
||||||
|
next_scheduled_loop: null
|
||||||
|
current_experiment: null
|
||||||
|
pending_tasks: []
|
||||||
|
|
||||||
|
notes: |
|
||||||
|
# Agent notes — context for next session
|
||||||
|
# What was I doing? What's next?
|
||||||
Loading…
Add table
Add a link
Reference in a new issue