mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-27 06:11:40 +00:00

Howard Li 762b582704 feat(skills): add autoresearch skill for autonomous research orchestration

Add a new research skill that enables continuous, self-directed research
with a two-loop architecture:

- Inner loop: Rapid experiment iteration with measurable outcomes
- Outer loop: Periodic synthesis, pattern discovery, and direction setting

Features:
- Research workspace templates (state, findings, log)
- Example project (LoRA rank study)
- Configuration options for loop intervals and auto-commit
- Integration with existing research skills (arxiv, paper-writing)

Updated research/DESCRIPTION.md to include autoresearch in the skill overview.

2026-04-11 12:28:41 -07:00

7.1 KiB

Raw Blame History

name

description

version

author

license

metadata

autoresearch

Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required.

1.0.0

Hermes Agent

MIT

hermes

Autoresearch

Autonomous research orchestration for AI coding agents.

Run continuous, self-directed research with a two-loop architecture:

Inner Loop: Rapid experiment iteration with clear measurable outcomes
Outer Loop: Periodic synthesis, pattern discovery, and direction setting

Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation.

When to Use

Scenario	Use Autoresearch?
"I want to explore X and see what works"	✅ Yes
"Does technique Y improve metric Z?"	✅ Yes
"What's the state of the art for problem W?"	✅ Yes (bootstrap + literature)
"Train a model with specific hyperparameters"	❌ Use domain skills directly
"Run a single evaluation"	❌ Use evaluation skills directly

Quick Start

# Start a research project
/autoresearch "Does LoRA rank affect convergence speed on small datasets?"

# Or with the research tool
research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?")

The Two-Loop Architecture

BOOTSTRAP (once)
  ↓
INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn
  ↓ (every N experiments or when stuck)
OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction
  ↓
CONCLUDE → Write findings → Generate report

Inner Loop: Experiment Fast

Pick highest-priority untested hypothesis
Write protocol (what change, what prediction, why)
Lock it: Commit to git BEFORE running
Run experiment (invoke domain skill)
Sanity check results (converged? baseline correct?)
Measure proxy metric
Record in experiments/{hypothesis-slug}/
Update research-state.yaml
If stuck → search literature or brainstorm

Outer Loop: Step Back and Synthesize

Review all results since last reflection
Cluster by type: what worked? what didn't?
Ask WHY — identify mechanisms
Update findings.md with current understanding
Search literature if results surprise you
Generate new hypotheses if warranted
Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE

Workspace Structure

{project}/
├── research-state.yaml       # Central state tracking
├── research-log.md           # Decision timeline
├── findings.md               # Evolving narrative synthesis
├── literature/               # Papers, survey notes
├── src/                      # Reusable code (utils, plotting)
├── data/                     # Raw result data
├── experiments/              # Per-hypothesis work
│   └── {hypothesis-slug}/
│       ├── protocol.md       # What, why, and prediction
│       ├── code/             # Experiment-specific code
│       ├── results/          # Raw outputs, metrics
│       └── analysis.md       # What we learned
├── to_human/                 # Progress presentations
└── paper/                    # Final paper (optional)

Research Discipline

Lock Before You Run

Always commit your protocol to git BEFORE executing:

git add experiments/H001-protocol.md
git commit -m "research(protocol): H001 — cosine warmup improves convergence"
# THEN run the experiment

This creates temporal proof your plan existed before results.

Confirmatory vs Exploratory

Type	Definition	Trust Level
Confirmatory	Matches your locked protocol	High
Exploratory	Discovered during execution	Medium — needs replication

Negative Results Are Progress

A refuted hypothesis tells you something. Log what it rules out and what it suggests.

Commands

Command	Description
`/autoresearch <question>`	Initialize and start research project
`/research-status`	Show current state and progress
`/research-pause`	Pause autonomous loops
`/research-resume`	Resume autonomous loops
`/research-report`	Generate progress presentation
`/research-conclude`	Finalize and write paper

Configuration

Add to ~/.hermes/config.yaml:

autoresearch:
  loop_interval_minutes: 20      # How often to check progress
  max_iterations: 10             # Experiments before forced reflection
  auto_commit: true              # Auto-commit milestones
  default_workspace: "./research" # Where to create projects

Integration with Other Skills

Research Phase	Skills to Invoke
Literature search	`arxiv`, `web-search`, `notebooklm`
Data preparation	`data-science` tools
Model training	`mlops`, domain-specific skills
Evaluation	`evaluating-llms-harness`, custom evals
Paper writing	`research-paper-writing`
Progress reports	Built-in report generation

Example: LoRA Rank Study

User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?"

Agent: 
1. Bootstraps: Searches arxiv for LoRA papers
2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16)
3. Inner loop: Trains 3 models, records convergence steps
4. Outer loop: Notices rank 8 converges fastest
5. Deepens: Tests rank 6, 10, 12
6. Concludes: Generates report with trajectory plot

Best Practices

Start simple: First experiment should run in <30 minutes
Define metrics upfront: Lock evaluation criteria before running
Return to literature: When stuck or surprised, search papers
Commit frequently: Git history is your research log
Show your work: Generate progress reports for human review
Never idle: If blocked, diagnose, fix, or pivot — but keep moving

References

Inspired by Andrej Karpathy's autoresearch methodology
Compatible with agentskills.io open standard
Built-in templates from templates/ directory

7.1 KiB Raw Blame History