hermes-agent/skills/research/autoresearch/SKILL.md
Howard Li 762b582704 feat(skills): add autoresearch skill for autonomous research orchestration
Add a new research skill that enables continuous, self-directed research
with a two-loop architecture:

- Inner loop: Rapid experiment iteration with measurable outcomes
- Outer loop: Periodic synthesis, pattern discovery, and direction setting

Features:
- Research workspace templates (state, findings, log)
- Example project (LoRA rank study)
- Configuration options for loop intervals and auto-commit
- Integration with existing research skills (arxiv, paper-writing)

Updated research/DESCRIPTION.md to include autoresearch in the skill overview.
2026-04-11 12:28:41 -07:00

200 lines
7.1 KiB
Markdown

---
name: autoresearch
description: Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [Research, Autonomous, Experiments, ML, AI, Literature, Hypothesis, Benchmark, Optimization]
related_skills: [arxiv, research-paper-writing, web-search, notebooklm]
config:
autoresearch.loop_interval_minutes:
description: "Interval between autonomous research loops (in minutes)"
default: 20
autoresearch.max_iterations:
description: "Maximum number of inner-loop experiments before forced reflection"
default: 10
autoresearch.auto_commit:
description: "Automatically git-commit research milestones"
default: true
---
# Autoresearch
**Autonomous research orchestration for AI coding agents.**
Run continuous, self-directed research with a two-loop architecture:
- **Inner Loop**: Rapid experiment iteration with clear measurable outcomes
- **Outer Loop**: Periodic synthesis, pattern discovery, and direction setting
Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation.
## When to Use
| Scenario | Use Autoresearch? |
|----------|-------------------|
| "I want to explore X and see what works" | ✅ Yes |
| "Does technique Y improve metric Z?" | ✅ Yes |
| "What's the state of the art for problem W?" | ✅ Yes (bootstrap + literature) |
| "Train a model with specific hyperparameters" | ❌ Use domain skills directly |
| "Run a single evaluation" | ❌ Use evaluation skills directly |
## Quick Start
```bash
# Start a research project
/autoresearch "Does LoRA rank affect convergence speed on small datasets?"
# Or with the research tool
research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?")
```
## The Two-Loop Architecture
```
BOOTSTRAP (once)
INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn
↓ (every N experiments or when stuck)
OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction
CONCLUDE → Write findings → Generate report
```
### Inner Loop: Experiment Fast
1. Pick highest-priority untested hypothesis
2. Write protocol (what change, what prediction, why)
3. **Lock it**: Commit to git BEFORE running
4. Run experiment (invoke domain skill)
5. Sanity check results (converged? baseline correct?)
6. Measure proxy metric
7. Record in `experiments/{hypothesis-slug}/`
8. Update `research-state.yaml`
9. If stuck → search literature or brainstorm
### Outer Loop: Step Back and Synthesize
1. Review all results since last reflection
2. Cluster by type: what worked? what didn't?
3. Ask WHY — identify mechanisms
4. Update `findings.md` with current understanding
5. Search literature if results surprise you
6. Generate new hypotheses if warranted
7. Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE
## Workspace Structure
```
{project}/
├── research-state.yaml # Central state tracking
├── research-log.md # Decision timeline
├── findings.md # Evolving narrative synthesis
├── literature/ # Papers, survey notes
├── src/ # Reusable code (utils, plotting)
├── data/ # Raw result data
├── experiments/ # Per-hypothesis work
│ └── {hypothesis-slug}/
│ ├── protocol.md # What, why, and prediction
│ ├── code/ # Experiment-specific code
│ ├── results/ # Raw outputs, metrics
│ └── analysis.md # What we learned
├── to_human/ # Progress presentations
└── paper/ # Final paper (optional)
```
## Research Discipline
### Lock Before You Run
Always commit your protocol to git BEFORE executing:
```bash
git add experiments/H001-protocol.md
git commit -m "research(protocol): H001 — cosine warmup improves convergence"
# THEN run the experiment
```
This creates temporal proof your plan existed before results.
### Confirmatory vs Exploratory
| Type | Definition | Trust Level |
|------|------------|-------------|
| **Confirmatory** | Matches your locked protocol | High |
| **Exploratory** | Discovered during execution | Medium — needs replication |
### Negative Results Are Progress
A refuted hypothesis tells you something. Log what it rules out and what it suggests.
## Commands
| Command | Description |
|---------|-------------|
| `/autoresearch <question>` | Initialize and start research project |
| `/research-status` | Show current state and progress |
| `/research-pause` | Pause autonomous loops |
| `/research-resume` | Resume autonomous loops |
| `/research-report` | Generate progress presentation |
| `/research-conclude` | Finalize and write paper |
## Configuration
Add to `~/.hermes/config.yaml`:
```yaml
autoresearch:
loop_interval_minutes: 20 # How often to check progress
max_iterations: 10 # Experiments before forced reflection
auto_commit: true # Auto-commit milestones
default_workspace: "./research" # Where to create projects
```
## Integration with Other Skills
| Research Phase | Skills to Invoke |
|----------------|------------------|
| Literature search | `arxiv`, `web-search`, `notebooklm` |
| Data preparation | `data-science` tools |
| Model training | `mlops`, domain-specific skills |
| Evaluation | `evaluating-llms-harness`, custom evals |
| Paper writing | `research-paper-writing` |
| Progress reports | Built-in report generation |
## Example: LoRA Rank Study
```
User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?"
Agent:
1. Bootstraps: Searches arxiv for LoRA papers
2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16)
3. Inner loop: Trains 3 models, records convergence steps
4. Outer loop: Notices rank 8 converges fastest
5. Deepens: Tests rank 6, 10, 12
6. Concludes: Generates report with trajectory plot
```
## Best Practices
1. **Start simple**: First experiment should run in <30 minutes
2. **Define metrics upfront**: Lock evaluation criteria before running
3. **Return to literature**: When stuck or surprised, search papers
4. **Commit frequently**: Git history is your research log
5. **Show your work**: Generate progress reports for human review
6. **Never idle**: If blocked, diagnose, fix, or pivot but keep moving
## References
- Inspired by Andrej Karpathy's autoresearch methodology
- Compatible with agentskills.io open standard
- Built-in templates from `templates/` directory
## See Also
- `templates/research-state.yaml` State tracking template
- `templates/findings.md` Synthesis template
- `templates/research-log.md` Decision log template
- `examples/` Example research projects