๐ Evolution Convergence
๐ฌ Mutation Effectiveness
๐ Best Prompt โ Detailed Metrics
๐ฅ Prompt Leaderboard
| Rank | Generation | Composite Score | Prompt Preview | Tags |
|---|---|---|---|---|
| 1 | Gen 9 | "Think step by step and analyze: Explain ML... Explain your reasoning." |
CoTmutated | |
| 2 | Gen 8 | "You are an expert assistant. Step 1: Understand. Step 2: Apply..." |
structuredelite | |
| 3 | Gen 10 | "### Task
Explain ML...
### Instructions
- Be accurate..." |
structuredcrossover | |
| 4 | Gen 7 | "As a knowledgeable AI, provide an expert response to: Explain ML..." |
template | |
| 5 | Gen 5 | "Let's think through this carefully... Question: Explain ML..." |
few-shotmutated |
โ๏ธ Evolution Pipeline
of diverse prompts
through the model
reasoning, coherence
selection
crossover operators
๐ Evolutionary Prompt Optimization โ Documentation
What is Evolutionary Prompt Optimization?
Evolutionary Prompt Optimization (EPO) applies Genetic Algorithm principles to the problem of prompt engineering for LLMs. A population of prompt candidates is maintained across generations. Each candidate is scored using a fitness function that measures output quality. High-scoring prompts reproduce (are selected as parents), and new offspring are created through mutation and crossover. Over generations, the population converges toward prompts that consistently elicit high-quality responses.
Fitness Function
The composite fitness score combines three metrics: accuracy ร 0.40 + reasoning_quality ร 0.35 + coherence ร 0.25. Accuracy measures keyword overlap with expected answers. Reasoning quality scores the use of logical connectives, structured steps, and response depth. Coherence evaluates transition word usage, sentence variety, and paragraph structure. All three scores are in [0, 1].
Mutation Operators
Five mutation operators are applied stochastically: prefix_injection prepends instruction prefixes; suffix_modification appends reasoning directives; instruction_paraphrase replaces instruction verbs with synonyms; structure_mutation toggles markdown formatting; and temperature_word_swap randomly substitutes words from a paraphrase pool. crossover recombines sentence halves from two parents to create two offspring.
Selection Strategy
Elitist selection preserves the top elite_fraction (default 25%) of prompts unchanged each generation, preventing fitness regression. The remaining slots are filled via tournament selection: k candidates are sampled randomly and the best is chosen. This balances exploitation of high-fitness prompts with exploration of the broader population.
Adaptive Mutation Rate
The mutation rate decays from 0.7 in generation 1 to 0.1 by the final generation using the formula rate = 0.7 - 0.6 ร (gen / max_gen). This follows the exploration-exploitation trade-off: high mutation early explores diverse prompt structures, while low mutation late refines the best candidates found.
Convergence
The system tracks per-generation best, average, and worst scores. Convergence is reached when best scores plateau across consecutive generations. The RankingSystem stores the all-time best prompt and can export the full evolution history as JSON for analysis or visualization.