DeepEval Guide — Part 17: Prompts & Prompt Optimization
The Prompt Class
A Prompt in DeepEval is a first-class object that holds your prompt
template, model settings, and output configuration. It enables versioning,
evaluation tracking, and automated optimization.
Creating Prompts
from deepeval.prompt import Prompt, PromptMessage
# Text-based prompt
prompt_text = Prompt(
alias="My Prompt",
text_template="You are a helpful assistant. Answer: {input}",
)
# Message-based prompt (chat format)
prompt_msgs = Prompt(
alias="Chat Prompt",
messages_template=[
PromptMessage(role="system", content="You are a helpful assistant."),
],
)
Loading Prompts
from deepeval.prompt import Prompt
# From Confident AI cloud
prompt = Prompt(alias="My Prompt")
prompt.pull(version="00.00.01")
# From local JSON or TXT file
prompt = Prompt()
prompt.load(file_path="prompt.json")
Model & Output Settings
from deepeval.prompt import Prompt, ModelSettings, ModelProvider, OutputType
prompt = Prompt(
alias="Structured Prompt",
text_template="Answer: {input}",
model_settings=ModelSettings(
provider=ModelProvider.OPEN_AI, name="gpt-4o",
temperature=0.7, max_tokens=500,
),
output_type=OutputType.SCHEMA,
output_schema=MyPydanticModel, # any BaseModel subclass
)
Evaluating Prompts
Pass prompts in hyperparameters to track which prompt produced which scores:
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
evaluate(
test_cases=[LLMTestCase(input="...", actual_output="...")],
metrics=[AnswerRelevancyMetric()],
hyperparameters={"prompt": prompt},
)
For component-level evaluation, use update_llm_span(prompt=prompt)
inside an @observe(type="llm") span (see Part 12).
Prompt Optimization
DeepEval's PromptOptimizer automatically rewrites prompts using
metric-driven feedback. Instead of manually tweaking prompts, the
optimizer evaluates candidates against your goldens and metrics.
Quick Start
from deepeval.dataset import Golden
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.prompt import Prompt
from deepeval.optimizer import PromptOptimizer
prompt = Prompt(text_template="Respond to the query: {input}")
async def model_callback(prompt: Prompt, golden: Golden) -> str:
text = prompt.interpolate(input=golden.input)
return await your_llm(text)
optimizer = PromptOptimizer(
metrics=[AnswerRelevancyMetric()],
model_callback=model_callback,
)
optimized = optimizer.optimize(
prompt=prompt,
goldens=[Golden(input="What is RAG?", expected_output="...")],
)
print(optimized.text_template)
Three Algorithms
1. GEPA (Genetic-Pareto) — Default
Evolutionary optimization with Pareto selection. Maintains a diverse pool of candidate prompts and uses metric feedback for targeted mutations.
from deepeval.optimizer.algorithms import GEPA
optimizer = PromptOptimizer(
algorithm=GEPA(iterations=10, pareto_size=5, minibatch_size=8),
metrics=[...], model_callback=model_callback,
)
Steps: split goldens → select parent from Pareto frontier → collect metric feedback → LLM rewrites prompt → accept if improved → final selection by aggregate score.
Best for: Diverse problem types where no single prompt excels.
2. MIPROv2 (Bayesian Optimization)
Jointly optimizes instructions AND few-shot demonstrations using Bayesian Optimization (Optuna TPE sampler).
from deepeval.optimizer.algorithms import MIPROV2
optimizer = PromptOptimizer(
algorithm=MIPROV2(num_candidates=10, num_trials=20, num_demo_sets=5),
metrics=[...], model_callback=model_callback,
)
How it works: 1. Proposal phase: generate N instruction candidates + M demo sets 2. Optimization phase: Bayesian search over (instruction, demo_set) combinations using minibatch scoring 3. Return best combination with demos rendered inline
Best for: Tasks where few-shot examples significantly improve output quality (complex reasoning, formatting, code generation).
3. COPRO (Cooperative Prompt Optimization)
Bounded-population, zero-shot algorithm. Proposes multiple child prompts cooperatively from shared feedback on each iteration.
from deepeval.optimizer.algorithms import COPRO
optimizer = PromptOptimizer(
algorithm=COPRO(population_size=4, proposals_per_step=4),
metrics=[...], model_callback=model_callback,
)
How it works: 1. Epsilon-greedy parent selection from bounded population 2. Shared feedback → multiple child proposals per iteration 3. Accept children that improve; prune if population exceeds limit 4. Periodic full evaluation of best candidate
Best for: Fast exploration with multiple candidates per iteration.
Algorithm Comparison
| Aspect | GEPA | MIPROv2 | COPRO |
|---|---|---|---|
| Strategy | Pareto evolutionary | Bayesian (TPE) | Cooperative bounded |
| Few-shot demos | No | Yes (jointly optimized) | No |
| Diversity | Pareto frontier | Diverse tips + demo sets | Multi-child proposals |
| Dependencies | None | optuna |
None |
| Default | ✅ | — | — |