From 762b582704ec396fe7b3307cc224ab68a99b81ea Mon Sep 17 00:00:00 2001 From: Howard Li Date: Sat, 11 Apr 2026 12:28:41 -0700 Subject: [PATCH 1/2] feat(skills): add autoresearch skill for autonomous research orchestration Add a new research skill that enables continuous, self-directed research with a two-loop architecture: - Inner loop: Rapid experiment iteration with measurable outcomes - Outer loop: Periodic synthesis, pattern discovery, and direction setting Features: - Research workspace templates (state, findings, log) - Example project (LoRA rank study) - Configuration options for loop intervals and auto-commit - Integration with existing research skills (arxiv, paper-writing) Updated research/DESCRIPTION.md to include autoresearch in the skill overview. --- skills/research/DESCRIPTION.md | 30 ++- skills/research/autoresearch/SKILL.md | 200 ++++++++++++++++++ .../autoresearch/templates/findings.md | 93 ++++++++ .../autoresearch/templates/research-log.md | 90 ++++++++ .../templates/research-state.yaml | 70 ++++++ 5 files changed, 482 insertions(+), 1 deletion(-) create mode 100644 skills/research/autoresearch/SKILL.md create mode 100644 skills/research/autoresearch/templates/findings.md create mode 100644 skills/research/autoresearch/templates/research-log.md create mode 100644 skills/research/autoresearch/templates/research-state.yaml diff --git a/skills/research/DESCRIPTION.md b/skills/research/DESCRIPTION.md index a54c16906..775b49e2e 100644 --- a/skills/research/DESCRIPTION.md +++ b/skills/research/DESCRIPTION.md @@ -1,3 +1,31 @@ --- -description: Skills for academic research, paper discovery, literature review, domain reconnaissance, market data, content monitoring, and scientific knowledge retrieval. +description: Skills for academic research, paper discovery, literature review, domain reconnaissance, market data, content monitoring, and scientific knowledge retrieval. Includes autoresearch for autonomous, continuous research with iterative experimentation. --- + +## Skill Overview + +| Skill | Purpose | Best For | +|-------|---------|----------| +| `autoresearch` | Autonomous research orchestration | Continuous experimentation, hypothesis testing, benchmark optimization | +| `arxiv` | Search academic papers | Literature surveys, paper discovery | +| `research-paper-writing` | Write publication-ready papers | Final paper generation | +| `blogwatcher` | Monitor research blogs | Staying current with new developments | +| `llm-wiki` | LLM knowledge base | Quick reference on models and techniques | +| `polymarket` | Prediction market data | Research on market trends and predictions | + +## Getting Started with Research + +**Quick literature search:** +``` +/arxiv "transformer attention mechanisms" +``` + +**Start autonomous research:** +``` +/autoresearch "Does LoRA rank affect convergence speed?" +``` + +**Write a paper:** +``` +/research-paper-writing +``` diff --git a/skills/research/autoresearch/SKILL.md b/skills/research/autoresearch/SKILL.md new file mode 100644 index 000000000..7ef674c45 --- /dev/null +++ b/skills/research/autoresearch/SKILL.md @@ -0,0 +1,200 @@ +--- +name: autoresearch +description: Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required. +version: 1.0.0 +author: Hermes Agent +license: MIT +metadata: + hermes: + tags: [Research, Autonomous, Experiments, ML, AI, Literature, Hypothesis, Benchmark, Optimization] + related_skills: [arxiv, research-paper-writing, web-search, notebooklm] + config: + autoresearch.loop_interval_minutes: + description: "Interval between autonomous research loops (in minutes)" + default: 20 + autoresearch.max_iterations: + description: "Maximum number of inner-loop experiments before forced reflection" + default: 10 + autoresearch.auto_commit: + description: "Automatically git-commit research milestones" + default: true +--- + +# Autoresearch + +**Autonomous research orchestration for AI coding agents.** + +Run continuous, self-directed research with a two-loop architecture: +- **Inner Loop**: Rapid experiment iteration with clear measurable outcomes +- **Outer Loop**: Periodic synthesis, pattern discovery, and direction setting + +Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation. + +## When to Use + +| Scenario | Use Autoresearch? | +|----------|-------------------| +| "I want to explore X and see what works" | ✅ Yes | +| "Does technique Y improve metric Z?" | ✅ Yes | +| "What's the state of the art for problem W?" | ✅ Yes (bootstrap + literature) | +| "Train a model with specific hyperparameters" | ❌ Use domain skills directly | +| "Run a single evaluation" | ❌ Use evaluation skills directly | + +## Quick Start + +```bash +# Start a research project +/autoresearch "Does LoRA rank affect convergence speed on small datasets?" + +# Or with the research tool +research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?") +``` + +## The Two-Loop Architecture + +``` +BOOTSTRAP (once) + ↓ +INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn + ↓ (every N experiments or when stuck) +OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction + ↓ +CONCLUDE → Write findings → Generate report +``` + +### Inner Loop: Experiment Fast + +1. Pick highest-priority untested hypothesis +2. Write protocol (what change, what prediction, why) +3. **Lock it**: Commit to git BEFORE running +4. Run experiment (invoke domain skill) +5. Sanity check results (converged? baseline correct?) +6. Measure proxy metric +7. Record in `experiments/{hypothesis-slug}/` +8. Update `research-state.yaml` +9. If stuck → search literature or brainstorm + +### Outer Loop: Step Back and Synthesize + +1. Review all results since last reflection +2. Cluster by type: what worked? what didn't? +3. Ask WHY — identify mechanisms +4. Update `findings.md` with current understanding +5. Search literature if results surprise you +6. Generate new hypotheses if warranted +7. Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE + +## Workspace Structure + +``` +{project}/ +├── research-state.yaml # Central state tracking +├── research-log.md # Decision timeline +├── findings.md # Evolving narrative synthesis +├── literature/ # Papers, survey notes +├── src/ # Reusable code (utils, plotting) +├── data/ # Raw result data +├── experiments/ # Per-hypothesis work +│ └── {hypothesis-slug}/ +│ ├── protocol.md # What, why, and prediction +│ ├── code/ # Experiment-specific code +│ ├── results/ # Raw outputs, metrics +│ └── analysis.md # What we learned +├── to_human/ # Progress presentations +└── paper/ # Final paper (optional) +``` + +## Research Discipline + +### Lock Before You Run + +Always commit your protocol to git BEFORE executing: + +```bash +git add experiments/H001-protocol.md +git commit -m "research(protocol): H001 — cosine warmup improves convergence" +# THEN run the experiment +``` + +This creates temporal proof your plan existed before results. + +### Confirmatory vs Exploratory + +| Type | Definition | Trust Level | +|------|------------|-------------| +| **Confirmatory** | Matches your locked protocol | High | +| **Exploratory** | Discovered during execution | Medium — needs replication | + +### Negative Results Are Progress + +A refuted hypothesis tells you something. Log what it rules out and what it suggests. + +## Commands + +| Command | Description | +|---------|-------------| +| `/autoresearch ` | Initialize and start research project | +| `/research-status` | Show current state and progress | +| `/research-pause` | Pause autonomous loops | +| `/research-resume` | Resume autonomous loops | +| `/research-report` | Generate progress presentation | +| `/research-conclude` | Finalize and write paper | + +## Configuration + +Add to `~/.hermes/config.yaml`: + +```yaml +autoresearch: + loop_interval_minutes: 20 # How often to check progress + max_iterations: 10 # Experiments before forced reflection + auto_commit: true # Auto-commit milestones + default_workspace: "./research" # Where to create projects +``` + +## Integration with Other Skills + +| Research Phase | Skills to Invoke | +|----------------|------------------| +| Literature search | `arxiv`, `web-search`, `notebooklm` | +| Data preparation | `data-science` tools | +| Model training | `mlops`, domain-specific skills | +| Evaluation | `evaluating-llms-harness`, custom evals | +| Paper writing | `research-paper-writing` | +| Progress reports | Built-in report generation | + +## Example: LoRA Rank Study + +``` +User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?" + +Agent: +1. Bootstraps: Searches arxiv for LoRA papers +2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16) +3. Inner loop: Trains 3 models, records convergence steps +4. Outer loop: Notices rank 8 converges fastest +5. Deepens: Tests rank 6, 10, 12 +6. Concludes: Generates report with trajectory plot +``` + +## Best Practices + +1. **Start simple**: First experiment should run in <30 minutes +2. **Define metrics upfront**: Lock evaluation criteria before running +3. **Return to literature**: When stuck or surprised, search papers +4. **Commit frequently**: Git history is your research log +5. **Show your work**: Generate progress reports for human review +6. **Never idle**: If blocked, diagnose, fix, or pivot — but keep moving + +## References + +- Inspired by Andrej Karpathy's autoresearch methodology +- Compatible with agentskills.io open standard +- Built-in templates from `templates/` directory + +## See Also + +- `templates/research-state.yaml` — State tracking template +- `templates/findings.md` — Synthesis template +- `templates/research-log.md` — Decision log template +- `examples/` — Example research projects diff --git a/skills/research/autoresearch/templates/findings.md b/skills/research/autoresearch/templates/findings.md new file mode 100644 index 000000000..9f2cae034 --- /dev/null +++ b/skills/research/autoresearch/templates/findings.md @@ -0,0 +1,93 @@ +# Findings: {{PROJECT_NAME}} + +**Research Question:** {{RESEARCH_QUESTION}} + +**Last Updated:** {{LAST_UPDATED}} + +--- + +## Current Understanding + +### What We Know So Far + +[Summarize the current state of knowledge. 2-4 paragraphs.] + +### Key Patterns and Insights + +| Pattern | Evidence | Confidence | +|---------|----------|------------| +| [Pattern 1] | [Which experiments support this] | High/Medium/Low | +| [Pattern 2] | [Which experiments support this] | High/Medium/Low | + +### Mechanistic Understanding + +[If applicable: What mechanisms explain the results?] + +--- + +## Lessons and Constraints + +### What Works + +- [Specific finding with context] +- [Another finding] + +### What Doesn't Work + +- [Failed approach and why] +- [Constraint discovered] + +### Critical Parameters + +| Parameter | Sweet Spot | Why | +|-----------|------------|-----| +| [Param 1] | [Value/range] | [Explanation] | + +--- + +## Open Questions + +### High Priority + +1. [Question that would change the story if answered] +2. [Another critical question] + +### Medium Priority + +1. [Nice to know but not blocking] + +### Answered + +1. ~~[Question]~~ → Answer: [Brief answer with evidence] + +--- + +## Narrative Arc + +[For paper writing: What's the story? What would the abstract say?] + +### Contribution Sketch + +[1-2 sentences on what this research contributes] + +### Implications + +[Who cares? Why does this matter?] + +--- + +## Next Steps + +### Immediate (Next 1-3 Experiments) + +- [ ] [Specific experiment] +- [ ] [Another experiment] + +### Medium Term + +- [ ] [Broader direction] + +### If Current Direction Fails + +- [Pivot option 1] +- [Pivot option 2] diff --git a/skills/research/autoresearch/templates/research-log.md b/skills/research/autoresearch/templates/research-log.md new file mode 100644 index 000000000..ad6faae65 --- /dev/null +++ b/skills/research/autoresearch/templates/research-log.md @@ -0,0 +1,90 @@ +# Research Log: {{PROJECT_NAME}} + +**Research Question:** {{RESEARCH_QUESTION}} + +--- + +## Bootstrap Phase + +### {{DATE}} — Project Initialization + +- **Action:** Created workspace, initialized state files +- **Research Question:** {{RESEARCH_QUESTION}} +- **Initial Thoughts:** [What makes this interesting?] + +### {{DATE}} — Literature Search + +- **Sources:** arxiv, semantic scholar, web search +- **Key Papers:** + - [Paper 1] — [Key finding relevant to question] + - [Paper 2] — [Key finding] +- **Gap Identified:** [What's missing in existing work?] + +### {{DATE}} — Hypothesis Formation + +- **H001:** [Description] → Prediction: [Specific prediction] +- **H002:** [Description] → Prediction: [Specific prediction] +- **H003:** [Description] → Prediction: [Specific prediction] + +--- + +## Inner Loop Log + +### {{DATE}} — Experiment H001 + +- **Hypothesis:** H001 +- **Protocol:** [What was changed, what was predicted] +- **Git Commit:** `research(protocol): H001 — [description]` +- **Status:** [Running/Completed/Failed] +- **Results:** + - Metric: [Value] + - Baseline: [Value] + - Delta: [+/-X] +- **Interpretation:** [What this means] +- **Next Action:** [Continue/Adjust/Pivot] + +### {{DATE}} — Experiment H002 + +[Same format...] + +--- + +## Outer Loop Log + +### {{DATE}} — Reflection #1 (After {{N}} experiments) + +- **Experiments Reviewed:** H001-H00N +- **Patterns Observed:** + - [Pattern 1] + - [Pattern 2] +- **Updated Understanding:** [New insights] +- **Direction Decision:** [DEEPEN/BROADEN/PIVOT/CONCLUDE] +- **Rationale:** [Why this direction?] +- **New Hypotheses:** + - H00N+1: [Description] + - H00N+2: [Description] + +--- + +## Direction Changes + +### {{DATE}} — PIVOT: [New Direction] + +- **From:** [Old direction/assumption] +- **To:** [New direction] +- **Trigger:** [What result/surprise caused this?] +- **New Research Question:** [If changed] + +--- + +## Conclusion + +### {{DATE}} — Research Concluded + +- **Final Status:** [Completed/Partial/Abandoned] +- **Key Findings:** + 1. [Finding 1] + 2. [Finding 2] +- **Contribution:** [What this adds to the field] +- **Limitations:** [What we didn't test/couldn't conclude] +- **Future Work:** [What someone should do next] diff --git a/skills/research/autoresearch/templates/research-state.yaml b/skills/research/autoresearch/templates/research-state.yaml new file mode 100644 index 000000000..9a7d30a44 --- /dev/null +++ b/skills/research/autoresearch/templates/research-state.yaml @@ -0,0 +1,70 @@ +# Research State +# Central tracking file for autoresearch project +# Updated automatically by the agent — do not edit manually + +project: + name: "{{PROJECT_NAME}}" + created_at: "{{CREATED_AT}}" + research_question: "{{RESEARCH_QUESTION}}" + status: "bootstrapping" # bootstrapping | active | paused | concluding | completed + +bootstrap: + literature_searched: false + initial_hypotheses_formed: false + evaluation_metric_defined: false + baseline_established: false + +loops: + inner_loop_count: 0 + outer_loop_count: 0 + last_inner_loop_at: null + last_outer_loop_at: null + +direction: + current: "explore" # explore | deepen | broaden | pivot | conclude + rationale: "Initial exploration phase" + next_milestone: "Complete first 3 experiments" + +hypotheses: + # Example structure — replace with actual hypotheses + H001: + description: "{{HYPOTHESIS_DESCRIPTION}}" + status: "untested" # untested | running | completed | refuted | supported + prediction: "{{PREDICTION}}" + priority: 1 + created_at: "{{CREATED_AT}}" + completed_at: null + result_summary: null + experiment_slug: null + +metrics: + primary: "{{PRIMARY_METRIC}}" # e.g., "val_loss", "accuracy", "convergence_steps" + baseline_value: null + target_value: null + current_best: null + optimization_direction: "minimize" # minimize | maximize + +trajectory: + # Auto-populated from experiments + # - experiment_id: run_001 + # hypothesis: H001 + # metric_value: 0.847 + # baseline: 0.812 + # delta: "+0.035" + # wall_time_min: 23 + # change_summary: "Added cosine annealing" + +resources: + literature_papers: [] + related_work_notes: null + code_references: [] + +continuity: + last_session_at: "{{CREATED_AT}}" + next_scheduled_loop: null + current_experiment: null + pending_tasks: [] + +notes: | + # Agent notes — context for next session + # What was I doing? What's next? From c77175b7f76d726e2380cded8ac425049bd28fa1 Mon Sep 17 00:00:00 2001 From: Howard Li Date: Sat, 11 Apr 2026 12:28:50 -0700 Subject: [PATCH 2/2] docs(autoresearch): add example research project and README Add LoRA rank convergence study example demonstrating: - Bootstrap phase with literature search - Hypothesis formation and testing - Inner/outer loop workflow - Progress tracking and findings synthesis --- .../research/autoresearch/examples/README.md | 34 +++++++++ .../autoresearch/examples/lora-rank-study.md | 74 +++++++++++++++++++ 2 files changed, 108 insertions(+) create mode 100644 skills/research/autoresearch/examples/README.md create mode 100644 skills/research/autoresearch/examples/lora-rank-study.md diff --git a/skills/research/autoresearch/examples/README.md b/skills/research/autoresearch/examples/README.md new file mode 100644 index 000000000..c61a5ba9e --- /dev/null +++ b/skills/research/autoresearch/examples/README.md @@ -0,0 +1,34 @@ +# Autoresearch Examples + +This directory contains example research projects using the autoresearch methodology. + +## Available Examples + +### `lora-rank-study.md` + +**Question:** Does LoRA rank affect convergence speed on small datasets? + +**Type:** Benchmark optimization, hyperparameter study + +**Skills Used:** +- `arxiv` — Literature search +- `mlops` — Model training +- `tensorboard` — Experiment tracking + +**Key Takeaway:** Higher rank improves convergence speed up to a point (r=16), then diminishing returns. + +--- + +## Creating Your Own Research + +1. Start with `/autoresearch "your question"` +2. Follow the two-loop architecture +3. Commit protocols before running +4. Generate progress reports with `/research-report` + +## Tips from Examples + +- **Start small:** First experiment should complete in <30 minutes +- **Define metrics upfront:** Know what you're measuring before you start +- **Document surprises:** Negative results are progress too +- **Show your work:** Progress reports help humans follow along diff --git a/skills/research/autoresearch/examples/lora-rank-study.md b/skills/research/autoresearch/examples/lora-rank-study.md new file mode 100644 index 000000000..8ad3ddfd8 --- /dev/null +++ b/skills/research/autoresearch/examples/lora-rank-study.md @@ -0,0 +1,74 @@ +# LoRA Rank Convergence Study + +**Research Question:** Does LoRA rank affect convergence speed on small datasets? + +## Bootstrap + +### Literature + +Key papers: +- Hu et al. (2021) — LoRA: Low-Rank Adaptation of Large Language Models +- Valipour et al. (2023) — DyLoRA: Parameter-Efficient Tuning with Dynamic Search + +Gap: Most papers focus on final performance, not convergence dynamics. + +### Hypotheses + +- **H1:** Higher rank (r=16) converges faster but may overfit on small data +- **H2:** Lower rank (r=4) converges slower but generalizes better +- **H3:** There's an optimal rank (r=8) that balances speed and generalization + +## Experiments + +### H001 — Baseline (r=8) + +```bash +# Protocol: Train with rank 8, measure convergence steps to 90% of max accuracy +# Prediction: Baseline behavior, ~50 steps to converge +``` + +**Results:** +- Convergence steps: 47 +- Final accuracy: 0.892 +- Wall time: 12 min + +### H002 — Low Rank (r=4) + +**Results:** +- Convergence steps: 68 (+44% vs baseline) +- Final accuracy: 0.887 (-0.6%) + +### H003 — High Rank (r=16) + +**Results:** +- Convergence steps: 41 (-13% vs baseline) +- Final accuracy: 0.894 (+0.2%) + +## Outer Loop #1 + +**Pattern:** Higher rank → faster convergence, minimal overfit on this dataset + +**Decision:** DEEPEN — Test r=32 and r=64 to find saturation point + +### H004 — Very High Rank (r=32) + +**Results:** +- Convergence steps: 38 (-6% vs r=16) +- Final accuracy: 0.891 (-0.3%) +- **Diminishing returns observed** + +### H005 — Optimal Search (r=6, r=10, r=12) + +[Running...] + +## Current Findings + +1. Convergence speed improves with rank up to r=16, then plateaus +2. Final accuracy relatively stable across ranks (±0.5%) +3. For small datasets, r=8-12 appears optimal (speed vs compute tradeoff) + +## Next Steps + +- Complete H005-H007 +- Test on different dataset sizes (generalization) +- Write up findings