mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-27 06:11:40 +00:00
Add a new research skill that enables continuous, self-directed research with a two-loop architecture: - Inner loop: Rapid experiment iteration with measurable outcomes - Outer loop: Periodic synthesis, pattern discovery, and direction setting Features: - Research workspace templates (state, findings, log) - Example project (LoRA rank study) - Configuration options for loop intervals and auto-commit - Integration with existing research skills (arxiv, paper-writing) Updated research/DESCRIPTION.md to include autoresearch in the skill overview.
7.1 KiB
7.1 KiB
| name | description | version | author | license | metadata | |||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| autoresearch | Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required. | 1.0.0 | Hermes Agent | MIT |
|
Autoresearch
Autonomous research orchestration for AI coding agents.
Run continuous, self-directed research with a two-loop architecture:
- Inner Loop: Rapid experiment iteration with clear measurable outcomes
- Outer Loop: Periodic synthesis, pattern discovery, and direction setting
Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation.
When to Use
| Scenario | Use Autoresearch? |
|---|---|
| "I want to explore X and see what works" | ✅ Yes |
| "Does technique Y improve metric Z?" | ✅ Yes |
| "What's the state of the art for problem W?" | ✅ Yes (bootstrap + literature) |
| "Train a model with specific hyperparameters" | ❌ Use domain skills directly |
| "Run a single evaluation" | ❌ Use evaluation skills directly |
Quick Start
# Start a research project
/autoresearch "Does LoRA rank affect convergence speed on small datasets?"
# Or with the research tool
research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?")
The Two-Loop Architecture
BOOTSTRAP (once)
↓
INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn
↓ (every N experiments or when stuck)
OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction
↓
CONCLUDE → Write findings → Generate report
Inner Loop: Experiment Fast
- Pick highest-priority untested hypothesis
- Write protocol (what change, what prediction, why)
- Lock it: Commit to git BEFORE running
- Run experiment (invoke domain skill)
- Sanity check results (converged? baseline correct?)
- Measure proxy metric
- Record in
experiments/{hypothesis-slug}/ - Update
research-state.yaml - If stuck → search literature or brainstorm
Outer Loop: Step Back and Synthesize
- Review all results since last reflection
- Cluster by type: what worked? what didn't?
- Ask WHY — identify mechanisms
- Update
findings.mdwith current understanding - Search literature if results surprise you
- Generate new hypotheses if warranted
- Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE
Workspace Structure
{project}/
├── research-state.yaml # Central state tracking
├── research-log.md # Decision timeline
├── findings.md # Evolving narrative synthesis
├── literature/ # Papers, survey notes
├── src/ # Reusable code (utils, plotting)
├── data/ # Raw result data
├── experiments/ # Per-hypothesis work
│ └── {hypothesis-slug}/
│ ├── protocol.md # What, why, and prediction
│ ├── code/ # Experiment-specific code
│ ├── results/ # Raw outputs, metrics
│ └── analysis.md # What we learned
├── to_human/ # Progress presentations
└── paper/ # Final paper (optional)
Research Discipline
Lock Before You Run
Always commit your protocol to git BEFORE executing:
git add experiments/H001-protocol.md
git commit -m "research(protocol): H001 — cosine warmup improves convergence"
# THEN run the experiment
This creates temporal proof your plan existed before results.
Confirmatory vs Exploratory
| Type | Definition | Trust Level |
|---|---|---|
| Confirmatory | Matches your locked protocol | High |
| Exploratory | Discovered during execution | Medium — needs replication |
Negative Results Are Progress
A refuted hypothesis tells you something. Log what it rules out and what it suggests.
Commands
| Command | Description |
|---|---|
/autoresearch <question> |
Initialize and start research project |
/research-status |
Show current state and progress |
/research-pause |
Pause autonomous loops |
/research-resume |
Resume autonomous loops |
/research-report |
Generate progress presentation |
/research-conclude |
Finalize and write paper |
Configuration
Add to ~/.hermes/config.yaml:
autoresearch:
loop_interval_minutes: 20 # How often to check progress
max_iterations: 10 # Experiments before forced reflection
auto_commit: true # Auto-commit milestones
default_workspace: "./research" # Where to create projects
Integration with Other Skills
| Research Phase | Skills to Invoke |
|---|---|
| Literature search | arxiv, web-search, notebooklm |
| Data preparation | data-science tools |
| Model training | mlops, domain-specific skills |
| Evaluation | evaluating-llms-harness, custom evals |
| Paper writing | research-paper-writing |
| Progress reports | Built-in report generation |
Example: LoRA Rank Study
User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?"
Agent:
1. Bootstraps: Searches arxiv for LoRA papers
2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16)
3. Inner loop: Trains 3 models, records convergence steps
4. Outer loop: Notices rank 8 converges fastest
5. Deepens: Tests rank 6, 10, 12
6. Concludes: Generates report with trajectory plot
Best Practices
- Start simple: First experiment should run in <30 minutes
- Define metrics upfront: Lock evaluation criteria before running
- Return to literature: When stuck or surprised, search papers
- Commit frequently: Git history is your research log
- Show your work: Generate progress reports for human review
- Never idle: If blocked, diagnose, fix, or pivot — but keep moving
References
- Inspired by Andrej Karpathy's autoresearch methodology
- Compatible with agentskills.io open standard
- Built-in templates from
templates/directory
See Also
templates/research-state.yaml— State tracking templatetemplates/findings.md— Synthesis templatetemplates/research-log.md— Decision log templateexamples/— Example research projects