This commit is contained in:
Oysterguard 2026-04-24 19:26:33 -05:00 committed by GitHub
commit bb11bc76a0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 590 additions and 1 deletions

View file

@ -1,3 +1,31 @@
--- ---
description: Skills for academic research, paper discovery, literature review, domain reconnaissance, market data, content monitoring, and scientific knowledge retrieval. description: Skills for academic research, paper discovery, literature review, domain reconnaissance, market data, content monitoring, and scientific knowledge retrieval. Includes autoresearch for autonomous, continuous research with iterative experimentation.
--- ---
## Skill Overview
| Skill | Purpose | Best For |
|-------|---------|----------|
| `autoresearch` | Autonomous research orchestration | Continuous experimentation, hypothesis testing, benchmark optimization |
| `arxiv` | Search academic papers | Literature surveys, paper discovery |
| `research-paper-writing` | Write publication-ready papers | Final paper generation |
| `blogwatcher` | Monitor research blogs | Staying current with new developments |
| `llm-wiki` | LLM knowledge base | Quick reference on models and techniques |
| `polymarket` | Prediction market data | Research on market trends and predictions |
## Getting Started with Research
**Quick literature search:**
```
/arxiv "transformer attention mechanisms"
```
**Start autonomous research:**
```
/autoresearch "Does LoRA rank affect convergence speed?"
```
**Write a paper:**
```
/research-paper-writing
```

View file

@ -0,0 +1,200 @@
---
name: autoresearch
description: Autonomous research orchestration for AI coding agents. Run continuous, self-directed research with a two-loop architecture — rapid inner-loop experiments and periodic outer-loop synthesis. Ideal for literature surveys, hypothesis testing, benchmark optimization, and iterative discovery. No human hand-holding required.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [Research, Autonomous, Experiments, ML, AI, Literature, Hypothesis, Benchmark, Optimization]
related_skills: [arxiv, research-paper-writing, web-search, notebooklm]
config:
autoresearch.loop_interval_minutes:
description: "Interval between autonomous research loops (in minutes)"
default: 20
autoresearch.max_iterations:
description: "Maximum number of inner-loop experiments before forced reflection"
default: 10
autoresearch.auto_commit:
description: "Automatically git-commit research milestones"
default: true
---
# Autoresearch
**Autonomous research orchestration for AI coding agents.**
Run continuous, self-directed research with a two-loop architecture:
- **Inner Loop**: Rapid experiment iteration with clear measurable outcomes
- **Outer Loop**: Periodic synthesis, pattern discovery, and direction setting
Ideal for literature surveys, hypothesis testing, benchmark optimization, mechanistic interpretability studies, and any research requiring iterative experimentation.
## When to Use
| Scenario | Use Autoresearch? |
|----------|-------------------|
| "I want to explore X and see what works" | ✅ Yes |
| "Does technique Y improve metric Z?" | ✅ Yes |
| "What's the state of the art for problem W?" | ✅ Yes (bootstrap + literature) |
| "Train a model with specific hyperparameters" | ❌ Use domain skills directly |
| "Run a single evaluation" | ❌ Use evaluation skills directly |
## Quick Start
```bash
# Start a research project
/autoresearch "Does LoRA rank affect convergence speed on small datasets?"
# Or with the research tool
research_init(project="lora-rank-study", question="Does LoRA rank affect convergence speed?")
```
## The Two-Loop Architecture
```
BOOTSTRAP (once)
INNER LOOP (fast, repeating) → Run experiments → Measure → Record → Learn
↓ (every N experiments or when stuck)
OUTER LOOP (reflective) → Synthesize → New hypotheses → Decide direction
CONCLUDE → Write findings → Generate report
```
### Inner Loop: Experiment Fast
1. Pick highest-priority untested hypothesis
2. Write protocol (what change, what prediction, why)
3. **Lock it**: Commit to git BEFORE running
4. Run experiment (invoke domain skill)
5. Sanity check results (converged? baseline correct?)
6. Measure proxy metric
7. Record in `experiments/{hypothesis-slug}/`
8. Update `research-state.yaml`
9. If stuck → search literature or brainstorm
### Outer Loop: Step Back and Synthesize
1. Review all results since last reflection
2. Cluster by type: what worked? what didn't?
3. Ask WHY — identify mechanisms
4. Update `findings.md` with current understanding
5. Search literature if results surprise you
6. Generate new hypotheses if warranted
7. Decide direction: DEEPEN / BROADEN / PIVOT / CONCLUDE
## Workspace Structure
```
{project}/
├── research-state.yaml # Central state tracking
├── research-log.md # Decision timeline
├── findings.md # Evolving narrative synthesis
├── literature/ # Papers, survey notes
├── src/ # Reusable code (utils, plotting)
├── data/ # Raw result data
├── experiments/ # Per-hypothesis work
│ └── {hypothesis-slug}/
│ ├── protocol.md # What, why, and prediction
│ ├── code/ # Experiment-specific code
│ ├── results/ # Raw outputs, metrics
│ └── analysis.md # What we learned
├── to_human/ # Progress presentations
└── paper/ # Final paper (optional)
```
## Research Discipline
### Lock Before You Run
Always commit your protocol to git BEFORE executing:
```bash
git add experiments/H001-protocol.md
git commit -m "research(protocol): H001 — cosine warmup improves convergence"
# THEN run the experiment
```
This creates temporal proof your plan existed before results.
### Confirmatory vs Exploratory
| Type | Definition | Trust Level |
|------|------------|-------------|
| **Confirmatory** | Matches your locked protocol | High |
| **Exploratory** | Discovered during execution | Medium — needs replication |
### Negative Results Are Progress
A refuted hypothesis tells you something. Log what it rules out and what it suggests.
## Commands
| Command | Description |
|---------|-------------|
| `/autoresearch <question>` | Initialize and start research project |
| `/research-status` | Show current state and progress |
| `/research-pause` | Pause autonomous loops |
| `/research-resume` | Resume autonomous loops |
| `/research-report` | Generate progress presentation |
| `/research-conclude` | Finalize and write paper |
## Configuration
Add to `~/.hermes/config.yaml`:
```yaml
autoresearch:
loop_interval_minutes: 20 # How often to check progress
max_iterations: 10 # Experiments before forced reflection
auto_commit: true # Auto-commit milestones
default_workspace: "./research" # Where to create projects
```
## Integration with Other Skills
| Research Phase | Skills to Invoke |
|----------------|------------------|
| Literature search | `arxiv`, `web-search`, `notebooklm` |
| Data preparation | `data-science` tools |
| Model training | `mlops`, domain-specific skills |
| Evaluation | `evaluating-llms-harness`, custom evals |
| Paper writing | `research-paper-writing` |
| Progress reports | Built-in report generation |
## Example: LoRA Rank Study
```
User: /autoresearch "Does LoRA rank affect convergence speed on small datasets?"
Agent:
1. Bootstraps: Searches arxiv for LoRA papers
2. Forms hypotheses: H1 (rank 4), H2 (rank 8), H3 (rank 16)
3. Inner loop: Trains 3 models, records convergence steps
4. Outer loop: Notices rank 8 converges fastest
5. Deepens: Tests rank 6, 10, 12
6. Concludes: Generates report with trajectory plot
```
## Best Practices
1. **Start simple**: First experiment should run in <30 minutes
2. **Define metrics upfront**: Lock evaluation criteria before running
3. **Return to literature**: When stuck or surprised, search papers
4. **Commit frequently**: Git history is your research log
5. **Show your work**: Generate progress reports for human review
6. **Never idle**: If blocked, diagnose, fix, or pivot — but keep moving
## References
- Inspired by Andrej Karpathy's autoresearch methodology
- Compatible with agentskills.io open standard
- Built-in templates from `templates/` directory
## See Also
- `templates/research-state.yaml` — State tracking template
- `templates/findings.md` — Synthesis template
- `templates/research-log.md` — Decision log template
- `examples/` — Example research projects

View file

@ -0,0 +1,34 @@
# Autoresearch Examples
This directory contains example research projects using the autoresearch methodology.
## Available Examples
### `lora-rank-study.md`
**Question:** Does LoRA rank affect convergence speed on small datasets?
**Type:** Benchmark optimization, hyperparameter study
**Skills Used:**
- `arxiv` — Literature search
- `mlops` — Model training
- `tensorboard` — Experiment tracking
**Key Takeaway:** Higher rank improves convergence speed up to a point (r=16), then diminishing returns.
---
## Creating Your Own Research
1. Start with `/autoresearch "your question"`
2. Follow the two-loop architecture
3. Commit protocols before running
4. Generate progress reports with `/research-report`
## Tips from Examples
- **Start small:** First experiment should complete in <30 minutes
- **Define metrics upfront:** Know what you're measuring before you start
- **Document surprises:** Negative results are progress too
- **Show your work:** Progress reports help humans follow along

View file

@ -0,0 +1,74 @@
# LoRA Rank Convergence Study
**Research Question:** Does LoRA rank affect convergence speed on small datasets?
## Bootstrap
### Literature
Key papers:
- Hu et al. (2021) — LoRA: Low-Rank Adaptation of Large Language Models
- Valipour et al. (2023) — DyLoRA: Parameter-Efficient Tuning with Dynamic Search
Gap: Most papers focus on final performance, not convergence dynamics.
### Hypotheses
- **H1:** Higher rank (r=16) converges faster but may overfit on small data
- **H2:** Lower rank (r=4) converges slower but generalizes better
- **H3:** There's an optimal rank (r=8) that balances speed and generalization
## Experiments
### H001 — Baseline (r=8)
```bash
# Protocol: Train with rank 8, measure convergence steps to 90% of max accuracy
# Prediction: Baseline behavior, ~50 steps to converge
```
**Results:**
- Convergence steps: 47
- Final accuracy: 0.892
- Wall time: 12 min
### H002 — Low Rank (r=4)
**Results:**
- Convergence steps: 68 (+44% vs baseline)
- Final accuracy: 0.887 (-0.6%)
### H003 — High Rank (r=16)
**Results:**
- Convergence steps: 41 (-13% vs baseline)
- Final accuracy: 0.894 (+0.2%)
## Outer Loop #1
**Pattern:** Higher rank → faster convergence, minimal overfit on this dataset
**Decision:** DEEPEN — Test r=32 and r=64 to find saturation point
### H004 — Very High Rank (r=32)
**Results:**
- Convergence steps: 38 (-6% vs r=16)
- Final accuracy: 0.891 (-0.3%)
- **Diminishing returns observed**
### H005 — Optimal Search (r=6, r=10, r=12)
[Running...]
## Current Findings
1. Convergence speed improves with rank up to r=16, then plateaus
2. Final accuracy relatively stable across ranks (±0.5%)
3. For small datasets, r=8-12 appears optimal (speed vs compute tradeoff)
## Next Steps
- Complete H005-H007
- Test on different dataset sizes (generalization)
- Write up findings

View file

@ -0,0 +1,93 @@
# Findings: {{PROJECT_NAME}}
**Research Question:** {{RESEARCH_QUESTION}}
**Last Updated:** {{LAST_UPDATED}}
---
## Current Understanding
### What We Know So Far
[Summarize the current state of knowledge. 2-4 paragraphs.]
### Key Patterns and Insights
| Pattern | Evidence | Confidence |
|---------|----------|------------|
| [Pattern 1] | [Which experiments support this] | High/Medium/Low |
| [Pattern 2] | [Which experiments support this] | High/Medium/Low |
### Mechanistic Understanding
[If applicable: What mechanisms explain the results?]
---
## Lessons and Constraints
### What Works
- [Specific finding with context]
- [Another finding]
### What Doesn't Work
- [Failed approach and why]
- [Constraint discovered]
### Critical Parameters
| Parameter | Sweet Spot | Why |
|-----------|------------|-----|
| [Param 1] | [Value/range] | [Explanation] |
---
## Open Questions
### High Priority
1. [Question that would change the story if answered]
2. [Another critical question]
### Medium Priority
1. [Nice to know but not blocking]
### Answered
1. ~~[Question]~~ → Answer: [Brief answer with evidence]
---
## Narrative Arc
[For paper writing: What's the story? What would the abstract say?]
### Contribution Sketch
[1-2 sentences on what this research contributes]
### Implications
[Who cares? Why does this matter?]
---
## Next Steps
### Immediate (Next 1-3 Experiments)
- [ ] [Specific experiment]
- [ ] [Another experiment]
### Medium Term
- [ ] [Broader direction]
### If Current Direction Fails
- [Pivot option 1]
- [Pivot option 2]

View file

@ -0,0 +1,90 @@
# Research Log: {{PROJECT_NAME}}
**Research Question:** {{RESEARCH_QUESTION}}
---
## Bootstrap Phase
### {{DATE}} — Project Initialization
- **Action:** Created workspace, initialized state files
- **Research Question:** {{RESEARCH_QUESTION}}
- **Initial Thoughts:** [What makes this interesting?]
### {{DATE}} — Literature Search
- **Sources:** arxiv, semantic scholar, web search
- **Key Papers:**
- [Paper 1] — [Key finding relevant to question]
- [Paper 2] — [Key finding]
- **Gap Identified:** [What's missing in existing work?]
### {{DATE}} — Hypothesis Formation
- **H001:** [Description] → Prediction: [Specific prediction]
- **H002:** [Description] → Prediction: [Specific prediction]
- **H003:** [Description] → Prediction: [Specific prediction]
---
## Inner Loop Log
### {{DATE}} — Experiment H001
- **Hypothesis:** H001
- **Protocol:** [What was changed, what was predicted]
- **Git Commit:** `research(protocol): H001 — [description]`
- **Status:** [Running/Completed/Failed]
- **Results:**
- Metric: [Value]
- Baseline: [Value]
- Delta: [+/-X]
- **Interpretation:** [What this means]
- **Next Action:** [Continue/Adjust/Pivot]
### {{DATE}} — Experiment H002
[Same format...]
---
## Outer Loop Log
### {{DATE}} — Reflection #1 (After {{N}} experiments)
- **Experiments Reviewed:** H001-H00N
- **Patterns Observed:**
- [Pattern 1]
- [Pattern 2]
- **Updated Understanding:** [New insights]
- **Direction Decision:** [DEEPEN/BROADEN/PIVOT/CONCLUDE]
- **Rationale:** [Why this direction?]
- **New Hypotheses:**
- H00N+1: [Description]
- H00N+2: [Description]
---
## Direction Changes
### {{DATE}} — PIVOT: [New Direction]
- **From:** [Old direction/assumption]
- **To:** [New direction]
- **Trigger:** [What result/surprise caused this?]
- **New Research Question:** [If changed]
---
## Conclusion
### {{DATE}} — Research Concluded
- **Final Status:** [Completed/Partial/Abandoned]
- **Key Findings:**
1. [Finding 1]
2. [Finding 2]
- **Contribution:** [What this adds to the field]
- **Limitations:** [What we didn't test/couldn't conclude]
- **Future Work:** [What someone should do next]

View file

@ -0,0 +1,70 @@
# Research State
# Central tracking file for autoresearch project
# Updated automatically by the agent — do not edit manually
project:
name: "{{PROJECT_NAME}}"
created_at: "{{CREATED_AT}}"
research_question: "{{RESEARCH_QUESTION}}"
status: "bootstrapping" # bootstrapping | active | paused | concluding | completed
bootstrap:
literature_searched: false
initial_hypotheses_formed: false
evaluation_metric_defined: false
baseline_established: false
loops:
inner_loop_count: 0
outer_loop_count: 0
last_inner_loop_at: null
last_outer_loop_at: null
direction:
current: "explore" # explore | deepen | broaden | pivot | conclude
rationale: "Initial exploration phase"
next_milestone: "Complete first 3 experiments"
hypotheses:
# Example structure — replace with actual hypotheses
H001:
description: "{{HYPOTHESIS_DESCRIPTION}}"
status: "untested" # untested | running | completed | refuted | supported
prediction: "{{PREDICTION}}"
priority: 1
created_at: "{{CREATED_AT}}"
completed_at: null
result_summary: null
experiment_slug: null
metrics:
primary: "{{PRIMARY_METRIC}}" # e.g., "val_loss", "accuracy", "convergence_steps"
baseline_value: null
target_value: null
current_best: null
optimization_direction: "minimize" # minimize | maximize
trajectory:
# Auto-populated from experiments
# - experiment_id: run_001
# hypothesis: H001
# metric_value: 0.847
# baseline: 0.812
# delta: "+0.035"
# wall_time_min: 23
# change_summary: "Added cosine annealing"
resources:
literature_papers: []
related_work_notes: null
code_references: []
continuity:
last_session_at: "{{CREATED_AT}}"
next_scheduled_loop: null
current_experiment: null
pending_tasks: []
notes: |
# Agent notes — context for next session
# What was I doing? What's next?