GEMINI LABJP
API — The Interactions API reaches general availability as the default API for Gemini models and agentsAGENT — Managed Agents enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesSECURITY — From June 19, requests from unrestricted API keys are rejected, so keys now need restrictionsCLI — Gemini CLI reaches end-of-life on June 18, replaced by the Agentic 2.0 Antigravity CLIMODEL — Gemini 3.5 Flash is generally available for sustained frontier performance on agentic and coding tasksUPDATE — Older image preview models such as gemini-3.1-flash-image-preview were shut down on June 25API — The Interactions API reaches general availability as the default API for Gemini models and agentsAGENT — Managed Agents enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesSECURITY — From June 19, requests from unrestricted API keys are rejected, so keys now need restrictionsCLI — Gemini CLI reaches end-of-life on June 18, replaced by the Agentic 2.0 Antigravity CLIMODEL — Gemini 3.5 Flash is generally available for sustained frontier performance on agentic and coding tasksUPDATE — Older image preview models such as gemini-3.1-flash-image-preview were shut down on June 25
Articles/Advanced
Advanced/2026-04-22Advanced

Gemini × DSPy: Retire from Prompt Craftsmanship — Automated Prompt Optimization

A hands-on implementation guide for combining Stanford's DSPy framework with Gemini to end the era of hand-written prompts. Covers Signatures, Modules, Optimizers, LLM-as-a-Judge metrics, and production pipelines — all with working code.

gemini93dspyprompt-engineering15optimization4llmpython95

Premium Article

The first time I tried DSPy, I was honestly skeptical — "another framework?" But when I compared a prompt I'd hand-tuned for three days against what DSPy's MIPROv2 produced automatically in twenty minutes, the automated version was measurably better. A combination of few-shot examples I never would have considered emerged from the gradient-free search on its own.

That was the moment I had to admit: my instincts as a "prompt craftsman" no longer applied. This article builds on that experience to walk through a Gemini × DSPy prompt-optimization pipeline, from zero to production. If you know basic Python and have touched the Gemini API once or twice, you should be able to copy and run the code as you go.

This piece is written for developers who recognize the following situations. You're tweaking prompts every week but quality has plateaued. You switched to Gemini from OpenAI or Anthropic and the same prompt doesn't reproduce the quality. You pick few-shot examples by hand and secretly know that your choices are mostly intuition. For all of these, DSPy offers a path from "prompt engineering" to "prompt computation."

One caveat up front: DSPy is not magic. Preparing labeled data and designing a metric function remain your work as a human. If anything, the quality of those two steps determines how good the results will be. I'll cover both, with the pitfalls I hit along the way.

A note on what's inside: this is not a whirlwind tour of every DSPy feature. I've picked the subset that matters when you're shipping a Gemini-powered product in the real world. Reasonably, you'll find yourself copy-pasting much of this code, adjusting it for your own tasks, and rerunning compile on your data. If you do that and come back in a week, I'd expect you to be measurably ahead of where you are today.

Where Hand-Written Prompting Breaks Down

I maintain around twenty prompts for Gemini Lab — title generation, summarization, tagging, and so on. At first, I tweaked each Flash prompt by hand. Classic A/B: ship the one that wins.

This approach collapses whenever two or three of the following happen at once:

  1. The input distribution changes (say, a new article genre appears).
  2. You swap models (Flash → Pro, or Gemini 2.5 → 3.1).
  3. You add evaluation axes (not just accuracy, but also length and tone).

With hand-tuning, the moment any of these hits, your accumulated tweaks go stale overnight. And every change triggers another round of A/B. DSPy replaces that cost with a simple premise: "give me a metric and some data, and I'll do the rest."

DSPy (Declarative Self-improving Language Programs) is a framework from Stanford NLP that lets you treat LLMs as callable functions. Its core insight is that prompts shouldn't be hand-written — they should be optimized from data, using abstractions that feel a lot like PyTorch's nn.Module.

What clicked for me was thinking of DSPy as a bridge between "prompt engineering" and "compilers." You write prompts in a high-level language; the Optimizer compiles them into a form the machine can execute efficiently. With that lens, writing prompts end-to-end by hand is a bit like hand-writing the assembly a C compiler would generate: technically possible, but slow, and rarely optimal.

A secondary benefit I didn't appreciate until I'd been using DSPy for a month: prompts expressed as Signatures are testable as functions. Writing a pytest suite against a DSPy Module looks almost exactly like testing a plain Python function — given this input, expect this structure of output. That alone transforms prompt work from "look at the model response and decide if it feels right" to something you can cover in CI. If your team already values testing, this maps surprisingly well to existing habits.

Building the First Pipeline

Let's set up the minimum working configuration. Start with installation and the Gemini connection.

# Dependencies. dspy-ai >= 2.6 supports Gemini via LiteLLM out of the box
pip install -U "dspy-ai>=2.6.0" google-generativeai litellm python-dotenv

Put your Gemini API key in .env.

# .env
GEMINI_API_KEY=YOUR_GEMINI_API_KEY

The cleanest way to reach Gemini from DSPy is through LiteLLM — prefix the model name with gemini/ and LiteLLM routes the request to Google AI Studio's API.

# setup_dspy.py — the smallest possible DSPy × Gemini example
import os
import dspy
from dotenv import load_dotenv
 
load_dotenv()
 
# LiteLLM model name. Keep temperature at 0 for reproducibility.
lm = dspy.LM(
    model="gemini/gemini-2.5-flash",
    api_key=os.environ["GEMINI_API_KEY"],
    temperature=0.0,
    max_tokens=1024,
)
dspy.configure(lm=lm)
 
# The simplest possible Signature (the input/output declaration)
class Translate(dspy.Signature):
    """Translate Japanese text into natural English."""
    japanese = dspy.InputField(desc="Japanese source text")
    english = dspy.OutputField(desc="English translation")
 
predictor = dspy.Predict(Translate)
result = predictor(japanese="プロンプトを手書きする時代は終わった。")
 
print(result.english)
# Expected: "The era of writing prompts by hand is over."

If this runs, your DSPy × Gemini setup is working. Notice something important: there is no "prompt string" anywhere in this code. DSPy generates the actual prompt from the Signature's docstring and field definitions, and the sent payload is managed internally by the framework.

When you need to see what DSPy actually sent to the model, use dspy.inspect_history(n=1). You'll lean on this during debugging, so it's worth getting familiar with it early.

One gotcha: even with temperature=0.0, Gemini 2.5 Flash doesn't always return byte-identical outputs. Google doesn't guarantee a stable tie-break when multiple tokens have the same top probability. If you need strict reproducibility, call the same input 3–5 times and vote, or set the seed parameter (available in Gemini 3.1+). DSPy also adds internal retries, which introduce additional variability — keep this in mind when writing snapshot tests in CI.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Engineers who keep tweaking prompts yet see quality plateau will learn to switch to DSPy's Optimizers and let measured data drive the improvements
You'll understand DSPy's three-layer design — Signature, Module, Optimizer — and be able to build production pipelines that compose Gemini 2.5 Pro and Flash as interchangeable components
You can bring battle-tested patterns into today's work: when to use BootstrapFewShot vs. MIPROv2, how to author LLM-as-a-Judge metrics, and how to keep costs under control
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Advanced2026-06-21
When Gemini's Maps Grounding Quietly Fails in Production — Field Notes on Attribution, Billing Boundaries, and Fallbacks
An operations-focused look at the pitfalls that surface after you ship Grounding with Google Maps on Gemini: detecting silent grounding misses, meeting the attribution requirement, knowing which responses are billed, and building fallbacks for latency and staleness.
Advanced2026-04-27
Self-Healing Architecture for Gemini Computer Use — Production Patterns That Keep Browser Automation Alive Beyond Day Three
Gemini Computer Use looks magical in demos but breaks daily in production: vanishing elements, surprise modals, network jitter, off-by-four-pixel clicks. This guide builds a five-layer self-healing architecture in Python that classifies failures and recovers them automatically, with working code you can drop into your agent loop today.
API / SDK2026-04-01
Mastering Gemini 2.5 Pro System Instructions — Production-Grade AI Assistant Design Patterns
A deep-dive practical guide to mastering Gemini 2.5 Pro system instructions. Learn persona design, output control, safety guardrails, A/B testing, and version management with full code examples for production environments.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →