hermes-agent/skills/research/autoresearch/SKILL.md
Howard Li 762b582704 feat(skills): add autoresearch skill for autonomous research orchestration
Add a new research skill that enables continuous, self-directed research
with a two-loop architecture:

- Inner loop: Rapid experiment iteration with measurable outcomes
- Outer loop: Periodic synthesis, pattern discovery, and direction setting

Features:
- Research workspace templates (state, findings, log)
- Example project (LoRA rank study)
- Configuration options for loop intervals and auto-commit
- Integration with existing research skills (arxiv, paper-writing)

Updated research/DESCRIPTION.md to include autoresearch in the skill overview.
2026-04-11 12:28:41 -07:00

7.1 KiB

name description version author license metadata
autoresearch Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required. 1.0.0 Hermes Agent MIT
hermes
tags related_skills config
Research
Autonomous
Experiments
ML
AI
Literature
Hypothesis
Benchmark
Optimization
arxiv
research-paper-writing
web-search
notebooklm
autoresearch.loop_interval_minutes autoresearch.max_iterations autoresearch.auto_commit
description default
Interval between autonomous research loops (in minutes) 20
description default
Maximum number of inner-loop experiments before forced reflection 10
description default
Automatically git-commit research milestones true

Autoresearch

Autonomous research orchestration for AI coding agents.

Run continuous, self-directed research with a two-loop architecture:

  • Inner Loop: Rapid experiment iteration with clear measurable outcomes
  • Outer Loop: Periodic synthesis, pattern discovery, and direction setting

Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation.

When to Use

Scenario Use Autoresearch?
"I want to explore X and see what works" Yes
"Does technique Y improve metric Z?" Yes
"What's the state of the art for problem W?" Yes (bootstrap + literature)
"Train a model with specific hyperparameters" Use domain skills directly
"Run a single evaluation" Use evaluation skills directly

Quick Start

# Start a research project
/autoresearch "Does LoRA rank affect convergence speed on small datasets?"

# Or with the research tool
research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?")

The Two-Loop Architecture

BOOTSTRAP (once)
  ↓
INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn
  ↓ (every N experiments or when stuck)
OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction
  ↓
CONCLUDE → Write findings → Generate report

Inner Loop: Experiment Fast

  1. Pick highest-priority untested hypothesis
  2. Write protocol (what change, what prediction, why)
  3. Lock it: Commit to git BEFORE running
  4. Run experiment (invoke domain skill)
  5. Sanity check results (converged? baseline correct?)
  6. Measure proxy metric
  7. Record in experiments/{hypothesis-slug}/
  8. Update research-state.yaml
  9. If stuck → search literature or brainstorm

Outer Loop: Step Back and Synthesize

  1. Review all results since last reflection
  2. Cluster by type: what worked? what didn't?
  3. Ask WHY — identify mechanisms
  4. Update findings.md with current understanding
  5. Search literature if results surprise you
  6. Generate new hypotheses if warranted
  7. Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE

Workspace Structure

{project}/
├── research-state.yaml       # Central state tracking
├── research-log.md           # Decision timeline
├── findings.md               # Evolving narrative synthesis
├── literature/               # Papers, survey notes
├── src/                      # Reusable code (utils, plotting)
├── data/                     # Raw result data
├── experiments/              # Per-hypothesis work
│   └── {hypothesis-slug}/
│       ├── protocol.md       # What, why, and prediction
│       ├── code/             # Experiment-specific code
│       ├── results/          # Raw outputs, metrics
│       └── analysis.md       # What we learned
├── to_human/                 # Progress presentations
└── paper/                    # Final paper (optional)

Research Discipline

Lock Before You Run

Always commit your protocol to git BEFORE executing:

git add experiments/H001-protocol.md
git commit -m "research(protocol): H001 — cosine warmup improves convergence"
# THEN run the experiment

This creates temporal proof your plan existed before results.

Confirmatory vs Exploratory

Type Definition Trust Level
Confirmatory Matches your locked protocol High
Exploratory Discovered during execution Medium — needs replication

Negative Results Are Progress

A refuted hypothesis tells you something. Log what it rules out and what it suggests.

Commands

Command Description
/autoresearch <question> Initialize and start research project
/research-status Show current state and progress
/research-pause Pause autonomous loops
/research-resume Resume autonomous loops
/research-report Generate progress presentation
/research-conclude Finalize and write paper

Configuration

Add to ~/.hermes/config.yaml:

autoresearch:
  loop_interval_minutes: 20      # How often to check progress
  max_iterations: 10             # Experiments before forced reflection
  auto_commit: true              # Auto-commit milestones
  default_workspace: "./research" # Where to create projects

Integration with Other Skills

Research Phase Skills to Invoke
Literature search arxiv, web-search, notebooklm
Data preparation data-science tools
Model training mlops, domain-specific skills
Evaluation evaluating-llms-harness, custom evals
Paper writing research-paper-writing
Progress reports Built-in report generation

Example: LoRA Rank Study

User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?"

Agent: 
1. Bootstraps: Searches arxiv for LoRA papers
2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16)
3. Inner loop: Trains 3 models, records convergence steps
4. Outer loop: Notices rank 8 converges fastest
5. Deepens: Tests rank 6, 10, 12
6. Concludes: Generates report with trajectory plot

Best Practices

  1. Start simple: First experiment should run in <30 minutes
  2. Define metrics upfront: Lock evaluation criteria before running
  3. Return to literature: When stuck or surprised, search papers
  4. Commit frequently: Git history is your research log
  5. Show your work: Generate progress reports for human review
  6. Never idle: If blocked, diagnose, fix, or pivot — but keep moving

References

  • Inspired by Andrej Karpathy's autoresearch methodology
  • Compatible with agentskills.io open standard
  • Built-in templates from templates/ directory

See Also

  • templates/research-state.yaml — State tracking template
  • templates/findings.md — Synthesis template
  • templates/research-log.md — Decision log template
  • examples/ — Example research projects