Thin wrapper around Imbue's darwinian_evolver (AGPL-3.0, subprocess-only). Ships a working OpenRouter driver (parrot_openrouter.py), a snapshot inspector (show_snapshot.py), and a custom-problem template. SKILL.md has 58-char description, Pitfalls sourced from actually running the loop: non-viable seed trap, Azure content filter killing runs, loop.run() being a generator, nested-pickle snapshots, and aggressive default concurrency. Salvaged from #12719 by @Bihruze — original PR shipped 12,289 LOC across 61 files (29 Python modules, FastAPI dashboard, VS Code extension, benchmark hub, marketplace, etc.) which was far beyond the scope of the underlying issue (#336). This version stays at the ~700-LOC scope that issue actually asked for. Authorship of the original effort credited via AUTHOR_MAP entry and the SKILL.md author field. Verified end-to-end: seed 'Say {{ phrase }}' (score 0.000) evolved into 'Please repeat the following phrase exactly as it is, without any modifications or additional formatting: {{ phrase }}' (score 0.750) across 3 iterations on gpt-4o-mini via OpenRouter. Co-authored-by: Bihruze <98262967+Bihruze@users.noreply.github.com>
8.3 KiB
| name | description | version | author | license | platforms | metadata | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| darwinian-evolver | Evolve prompts/regex/SQL/code with Imbue's evolution loop. | 0.1.0 | Bihruze (Asahi0x), Hermes Agent | MIT |
|
|
Darwinian Evolver
Run Imbue's darwinian_evolver — an LLM-driven evolutionary search loop — to optimize a prompt, regex, SQL query, or small code snippet against a fitness function.
Status: thin wrapper around the upstream tool. The skill installs it, walks the
agent through writing a Problem definition (organism + evaluator + mutator),
and drives the loop via the upstream CLI or a small custom Python driver.
License: the upstream tool is AGPL-3.0. The skill ONLY ever invokes it
via the upstream CLI or a subprocess/uv run call (mere aggregation). Do NOT
import upstream classes into Hermes itself.
When to Use
- User says "optimize this prompt", "evolve a regex for X", "auto-improve this code/SQL", "search for a better instruction".
- You have a scorer (exact match, regex pass-rate, unit test, LLM-judge, runtime metric) AND a starting candidate (organism). If you don't have a scorer, stop and define one first — that's the hard part.
- Cost is OK: a typical run is 50–500 LLM calls. On gpt-4o-mini that's pennies; on Claude Sonnet it can be a few dollars.
Do not use this when:
- The optimization target is differentiable (use gradient descent / DSPy).
- You only need to try 2–3 variants — just write them by hand.
- The fitness signal is purely subjective with no measurable criterion.
Prerequisites
- Python ≥3.11
git,uv(orpip)- One of:
OPENROUTER_API_KEY,ANTHROPIC_API_KEY, orOPENAI_API_KEY
The skill ships a small parrot_openrouter.py driver that uses OPENROUTER_API_KEY
via the OpenAI SDK, so any model on OpenRouter works. The upstream CLI itself
hardcodes Anthropic and needs ANTHROPIC_API_KEY.
Install (One-Time)
Run via the terminal tool:
mkdir -p ~/.hermes/cache/darwinian-evolver && cd ~/.hermes/cache/darwinian-evolver
[ -d darwinian_evolver ] || git clone --depth 1 https://github.com/imbue-ai/darwinian_evolver.git
cd darwinian_evolver && uv sync
Verify:
cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver \
&& uv run darwinian_evolver --help | head -5
Quick Start — The Built-In Parrot Example
Tiny smoke test (requires ANTHROPIC_API_KEY):
cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver
uv run darwinian_evolver parrot \
--num_iterations 2 \
--num_parents_per_iteration 2 \
--mutator_concurrency 2 --evaluator_concurrency 2 \
--output_dir /tmp/parrot_demo
Outputs:
/tmp/parrot_demo/snapshots/iteration_N.pkl— pickled population per iteration/tmp/parrot_demo/<jsonl>— per-iteration JSON log (path printed at end)
Open ~/.hermes/cache/darwinian-evolver/darwinian_evolver/darwinian_evolver/lineage_visualizer.html
in a browser and load the JSON log to see the evolutionary tree.
Quick Start — OpenRouter Driver (No Anthropic Key)
The skill ships scripts/parrot_openrouter.py — same parrot problem, but the
LLM call goes through OpenRouter so any provider works.
# From wherever the skill is installed:
SKILL_DIR=~/.hermes/skills/research/darwinian-evolver
DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver
cd "$DE_DIR" && \
EVOLVER_MODEL='openai/gpt-4o-mini' \
uv run --with openai python "$SKILL_DIR/scripts/parrot_openrouter.py" \
--num_iterations 3 --num_parents_per_iteration 2 \
--output_dir /tmp/parrot_or
Inspect the result with scripts/show_snapshot.py:
uv run --with openai python "$SKILL_DIR/scripts/show_snapshot.py" \
/tmp/parrot_or/snapshots/iteration_3.pkl
Expected output: 7 evolved prompt templates ranked by score, with the best
landing around 0.6–0.8 (the seed Say {{ phrase }} scored 0.000).
Defining a Custom Problem
The skill ships templates/custom_problem_template.py — copy, edit, run.
Three things you must define:
-
Organism— a PydanticBaseModelsubclass holding the artifact being evolved (prompt_template: str,regex_pattern: str,sql_query: str,code_block: str, etc.). Add arun(*args)method that exercises it. -
Evaluator—.evaluate(organism) -> EvaluationResult(score=..., trainable_failure_cases=[...], holdout_failure_cases=[...], is_viable=True).scoreis in[0, 1]. Higher is better.trainable_failure_cases— what the mutator sees. Include enough context (input, expected, actual) for the LLM to diagnose.holdout_failure_cases— kept out of the mutator's view. Use these to detect overfitting.is_viable=Trueunless the organism is completely broken (raises, returns None, etc.). A 0-score viable organism is fine — it just gets down-weighted in parent selection.
-
Mutator—.mutate(organism, failure_cases, learning_log_entries) -> list[Organism]. Typically: build an LLM prompt that includes the current organism + a failure case + an ask to propose a fix; parse the LLM's response; return a newOrganism. Return[]on parse failure — the loop handles it.
Then write a driver script that wires Problem(initial_organism, evaluator, [mutators])
into EvolveProblemLoop and iterates over loop.run(num_iterations=N) — the
shipped scripts/parrot_openrouter.py is the reference.
Hyperparameters That Actually Matter
| flag | default | when to change |
|---|---|---|
--num_iterations |
5 | bump to 10–20 once you trust the evaluator |
--num_parents_per_iteration |
4 | drop to 2 for cheap exploration |
--mutator_concurrency |
10 | drop to 2–4 to avoid rate limits |
--evaluator_concurrency |
10 | same; evaluator hits the LLM too |
--batch_size |
1 | raise to 3–5 once your mutator handles multiple failures |
--verify_mutations |
off | turn on once mutator is wasteful (>10× cost saving on later runs per Imbue) |
--midpoint_score |
p75 |
leave alone unless scores cluster |
--sharpness |
10 | leave alone |
Pitfalls
Initial organism must be viable— setis_viable=Truein yourEvaluationResulteven on a 0-score seed. The loop refuses non-viable organisms because they imply the loop has nothing to evolve from.- Provider content filters kill runs. Azure-backed OpenRouter models
reject phrases like "ignore previous instructions" with HTTP 400. Wrap
the LLM call in
try/exceptand returnf"<LLM_ERROR: {e}>"— the evolver will just score that organism 0 and move on. loop.run()is a generator — calling it doesn't run anything until you iterate. Usefor snap in loop.run(num_iterations=N):.- Snapshots are nested pickles.
iteration_N.pklcontains a dict withpopulation_snapshot(more pickled bytes). To unpickle you must have theOrganismclass importable under the same dotted path it was pickled at. - Concurrency defaults are aggressive. 10/10 will hit rate limits on most providers. Start with 2/2.
- CLI is hardcoded to Anthropic.
uv run darwinian_evolver <problem>reaches forANTHROPIC_API_KEYand uses Claude Sonnet. To use any other provider, write a driver likeparrot_openrouter.py. - AGPL. Never
from darwinian_evolver import ...inside Hermes core. Custom driver scripts under~/.hermes/skills/...are user-side and fine. - No PyPI package.
pip install darwinian-evolverwill pull the wrong thing. Always install from the GitHub repo.
Verification
After install + a parrot run, exit code 0 from this is sufficient:
DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver
ls "$DE_DIR/darwinian_evolver/lineage_visualizer.html" >/dev/null && \
cd "$DE_DIR" && uv run darwinian_evolver --help >/dev/null && \
echo "darwinian-evolver: OK"