mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-28 06:21:33 +00:00
Add a new research skill that enables continuous, self-directed research with a two-loop architecture: - Inner loop: Rapid experiment iteration with measurable outcomes - Outer loop: Periodic synthesis, pattern discovery, and direction setting Features: - Research workspace templates (state, findings, log) - Example project (LoRA rank study) - Configuration options for loop intervals and auto-commit - Integration with existing research skills (arxiv, paper-writing) Updated research/DESCRIPTION.md to include autoresearch in the skill overview.
200 lines
7.1 KiB
Markdown
200 lines
7.1 KiB
Markdown
---
|
|
name: autoresearch
|
|
description: Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required.
|
|
version: 1.0.0
|
|
author: Hermes Agent
|
|
license: MIT
|
|
metadata:
|
|
hermes:
|
|
tags: [Research, Autonomous, Experiments, ML, AI, Literature, Hypothesis, Benchmark, Optimization]
|
|
related_skills: [arxiv, research-paper-writing, web-search, notebooklm]
|
|
config:
|
|
autoresearch.loop_interval_minutes:
|
|
description: "Interval between autonomous research loops (in minutes)"
|
|
default: 20
|
|
autoresearch.max_iterations:
|
|
description: "Maximum number of inner-loop experiments before forced reflection"
|
|
default: 10
|
|
autoresearch.auto_commit:
|
|
description: "Automatically git-commit research milestones"
|
|
default: true
|
|
---
|
|
|
|
# Autoresearch
|
|
|
|
**Autonomous research orchestration for AI coding agents.**
|
|
|
|
Run continuous, self-directed research with a two-loop architecture:
|
|
- **Inner Loop**: Rapid experiment iteration with clear measurable outcomes
|
|
- **Outer Loop**: Periodic synthesis, pattern discovery, and direction setting
|
|
|
|
Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation.
|
|
|
|
## When to Use
|
|
|
|
| Scenario | Use Autoresearch? |
|
|
|----------|-------------------|
|
|
| "I want to explore X and see what works" | ✅ Yes |
|
|
| "Does technique Y improve metric Z?" | ✅ Yes |
|
|
| "What's the state of the art for problem W?" | ✅ Yes (bootstrap + literature) |
|
|
| "Train a model with specific hyperparameters" | ❌ Use domain skills directly |
|
|
| "Run a single evaluation" | ❌ Use evaluation skills directly |
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Start a research project
|
|
/autoresearch "Does LoRA rank affect convergence speed on small datasets?"
|
|
|
|
# Or with the research tool
|
|
research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?")
|
|
```
|
|
|
|
## The Two-Loop Architecture
|
|
|
|
```
|
|
BOOTSTRAP (once)
|
|
↓
|
|
INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn
|
|
↓ (every N experiments or when stuck)
|
|
OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction
|
|
↓
|
|
CONCLUDE → Write findings → Generate report
|
|
```
|
|
|
|
### Inner Loop: Experiment Fast
|
|
|
|
1. Pick highest-priority untested hypothesis
|
|
2. Write protocol (what change, what prediction, why)
|
|
3. **Lock it**: Commit to git BEFORE running
|
|
4. Run experiment (invoke domain skill)
|
|
5. Sanity check results (converged? baseline correct?)
|
|
6. Measure proxy metric
|
|
7. Record in `experiments/{hypothesis-slug}/`
|
|
8. Update `research-state.yaml`
|
|
9. If stuck → search literature or brainstorm
|
|
|
|
### Outer Loop: Step Back and Synthesize
|
|
|
|
1. Review all results since last reflection
|
|
2. Cluster by type: what worked? what didn't?
|
|
3. Ask WHY — identify mechanisms
|
|
4. Update `findings.md` with current understanding
|
|
5. Search literature if results surprise you
|
|
6. Generate new hypotheses if warranted
|
|
7. Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE
|
|
|
|
## Workspace Structure
|
|
|
|
```
|
|
{project}/
|
|
├── research-state.yaml # Central state tracking
|
|
├── research-log.md # Decision timeline
|
|
├── findings.md # Evolving narrative synthesis
|
|
├── literature/ # Papers, survey notes
|
|
├── src/ # Reusable code (utils, plotting)
|
|
├── data/ # Raw result data
|
|
├── experiments/ # Per-hypothesis work
|
|
│ └── {hypothesis-slug}/
|
|
│ ├── protocol.md # What, why, and prediction
|
|
│ ├── code/ # Experiment-specific code
|
|
│ ├── results/ # Raw outputs, metrics
|
|
│ └── analysis.md # What we learned
|
|
├── to_human/ # Progress presentations
|
|
└── paper/ # Final paper (optional)
|
|
```
|
|
|
|
## Research Discipline
|
|
|
|
### Lock Before You Run
|
|
|
|
Always commit your protocol to git BEFORE executing:
|
|
|
|
```bash
|
|
git add experiments/H001-protocol.md
|
|
git commit -m "research(protocol): H001 — cosine warmup improves convergence"
|
|
# THEN run the experiment
|
|
```
|
|
|
|
This creates temporal proof your plan existed before results.
|
|
|
|
### Confirmatory vs Exploratory
|
|
|
|
| Type | Definition | Trust Level |
|
|
|------|------------|-------------|
|
|
| **Confirmatory** | Matches your locked protocol | High |
|
|
| **Exploratory** | Discovered during execution | Medium — needs replication |
|
|
|
|
### Negative Results Are Progress
|
|
|
|
A refuted hypothesis tells you something. Log what it rules out and what it suggests.
|
|
|
|
## Commands
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `/autoresearch <question>` | Initialize and start research project |
|
|
| `/research-status` | Show current state and progress |
|
|
| `/research-pause` | Pause autonomous loops |
|
|
| `/research-resume` | Resume autonomous loops |
|
|
| `/research-report` | Generate progress presentation |
|
|
| `/research-conclude` | Finalize and write paper |
|
|
|
|
## Configuration
|
|
|
|
Add to `~/.hermes/config.yaml`:
|
|
|
|
```yaml
|
|
autoresearch:
|
|
loop_interval_minutes: 20 # How often to check progress
|
|
max_iterations: 10 # Experiments before forced reflection
|
|
auto_commit: true # Auto-commit milestones
|
|
default_workspace: "./research" # Where to create projects
|
|
```
|
|
|
|
## Integration with Other Skills
|
|
|
|
| Research Phase | Skills to Invoke |
|
|
|----------------|------------------|
|
|
| Literature search | `arxiv`, `web-search`, `notebooklm` |
|
|
| Data preparation | `data-science` tools |
|
|
| Model training | `mlops`, domain-specific skills |
|
|
| Evaluation | `evaluating-llms-harness`, custom evals |
|
|
| Paper writing | `research-paper-writing` |
|
|
| Progress reports | Built-in report generation |
|
|
|
|
## Example: LoRA Rank Study
|
|
|
|
```
|
|
User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?"
|
|
|
|
Agent:
|
|
1. Bootstraps: Searches arxiv for LoRA papers
|
|
2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16)
|
|
3. Inner loop: Trains 3 models, records convergence steps
|
|
4. Outer loop: Notices rank 8 converges fastest
|
|
5. Deepens: Tests rank 6, 10, 12
|
|
6. Concludes: Generates report with trajectory plot
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Start simple**: First experiment should run in <30 minutes
|
|
2. **Define metrics upfront**: Lock evaluation criteria before running
|
|
3. **Return to literature**: When stuck or surprised, search papers
|
|
4. **Commit frequently**: Git history is your research log
|
|
5. **Show your work**: Generate progress reports for human review
|
|
6. **Never idle**: If blocked, diagnose, fix, or pivot — but keep moving
|
|
|
|
## References
|
|
|
|
- Inspired by Andrej Karpathy's autoresearch methodology
|
|
- Compatible with agentskills.io open standard
|
|
- Built-in templates from `templates/` directory
|
|
|
|
## See Also
|
|
|
|
- `templates/research-state.yaml` — State tracking template
|
|
- `templates/findings.md` — Synthesis template
|
|
- `templates/research-log.md` — Decision log template
|
|
- `examples/` — Example research projects
|