GEMINI LABJP
MODEL — Gemini 3.5 Flash is generally available, beating 3.1 Pro on nearly all benchmarks while running fasterAPI — The Interactions API reaches GA as the primary way to work with Gemini models and agentsAGENTS — Managed Agents enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesCOST — Project Spend Caps let you set a monthly dollar limit on Gemini API usage per projectSHEETS — Gemini in Sheets diagnoses and fixes formula errors in one click by analyzing surrounding dataSTUDIO — Google AI Studio gets a developer-first refresh with an expanded gallery of starter appsMODEL — Gemini 3.5 Flash is generally available, beating 3.1 Pro on nearly all benchmarks while running fasterAPI — The Interactions API reaches GA as the primary way to work with Gemini models and agentsAGENTS — Managed Agents enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesCOST — Project Spend Caps let you set a monthly dollar limit on Gemini API usage per projectSHEETS — Gemini in Sheets diagnoses and fixes formula errors in one click by analyzing surrounding dataSTUDIO — Google AI Studio gets a developer-first refresh with an expanded gallery of starter apps
Articles/API / SDK
API / SDK/2026-06-26Advanced

Reliable Text-in-Image with Gemini 3.1 Flash Image — an OCR-Verified Pipeline

After the preview shutdown, the GA gemini-3.1-flash-image still occasionally garbles text baked into images. Here is a generate -> read-back-verify -> regenerate/composite pipeline, with working code and an unattended retry budget.

gemini89gemini-api249image-generation8nano-banana3ocrindie-dev38

Premium Article

When you automate image generation as an indie developer — banners for a wallpaper app, OGP thumbnails for a blog — you eventually hit a quietly maddening wall. The composition looks great, but the Japanese line you wanted baked into the image, something like "Free this week," comes out subtly wrong. One character swapped for a similar one, a diacritic dropped, the second line illegible. You don't notice at a glance; you notice when you zoom in.

On June 25, 2026 the preview models gemini-3.1-flash-image-preview and gemini-3-pro-image-preview were shut down, and the GA versions gemini-3.1-flash-image (Flash Image) and gemini-3-pro-image (Pro Image) took their place. Text rendering keeps getting better with each generation. Even so, in an unattended setup — spin it up, publish whatever comes out — the occasional garble is guaranteed to become an incident. I spent a while eyeballing that one bad frame by hand, which defeated half the point of automating.

What I landed on is a two-stage approach: don't trust the output, read the image back with a model to confirm the characters, and if it fails, either regenerate or switch to compositing. This article records that verification-gated pipeline, thresholds and retry design included.

After the shutdown: which model should draw the text

First, pick the model that draws. The two GA options trade text accuracy against speed and cost differently. Here is my rough feel for my use case (a few short Japanese characters on wallpaper-app announcement banners and blog OGP images).

Aspectgemini-3.1-flash-image (Flash Image)gemini-3-pro-image (Pro Image)
Accuracy of short Japanese textProduction-usable; stable up to a few linesHigher; holds up on multi-line, smaller text
Speed per imageFast (seconds)Somewhat slower
Approx. cost per imageLowA few times Flash
Video-to-imageSupported (gemini-3.1-flash-image only)Not supported
Workhorse for unattended bulk runsMake this the defaultReserve for regeneration on failure

Personally, defaulting to Flash Image and only promoting an image to Pro Image after it fails verification twice in a row gave me the best cost-effectiveness. Running every image through Pro Image from the start reduces garbling but inflates cost to several times Flash, which undercuts the point of unattended runs. I think the realistic way to win on text accuracy is "promote only the suspicious ones," not "use the premium model for everything."

How to prompt the model to actually draw the text

The first lever for text-in-image is how you prompt. Three rules I keep:

Pass the target string as text to copy verbatim

Don't bury it inside a description; hand over the exact characters as an unchangeable quote. A vague "a title that feels like ~" leaves room for paraphrasing and mis-conversion.

Pin character count, line count, and placement with numbers

Instead of "large in the center," constrain layout numerically: "one line at 25% from the top, max 8 characters." Long or multi-line text raises the failure rate on its own, so keep the text payload short.

from google import genai
from google.genai import types
 
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
 
MODEL_FAST = "gemini-3.1-flash-image"
MODEL_STRONG = "gemini-3-pro-image"
 
def build_prompt(headline: str) -> str:
    # Pass the desired characters as a verbatim, do-not-alter constraint
    return (
        "Generate a simple vertical (9:16) announcement banner image.\n"
        "Calm indigo background with a soft washi-paper texture.\n"
        f"At 25% from the top, draw EXACTLY these characters, one horizontal line, large: '{headline}'\n"
        "- Max 10 characters. Avoid decorative fonts; high-legibility gothic.\n"
        "- Draw NO other characters, no alphanumerics, no logo, no signature.\n"
        "- Do not omit diacritics or small kana."
    )
 
def generate_image(model: str, headline: str) -> bytes:
    resp = client.models.generate_content(
        model=model,
        contents=build_prompt(headline),
        config=types.GenerateContentConfig(response_modalities=["Image"]),
    )
    for part in resp.candidates[0].content.parts:
        if part.inline_data and part.inline_data.mime_type.startswith("image/"):
            return part.inline_data.data  # PNG bytes
    raise RuntimeError("No image part returned")

The negative instruction "draw no other characters" helps suppress stray English signatures and random logos that creep in. With Japanese in particular, unrequested alphabet tends to sneak in as decoration, so I include this line almost every time.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
If garbled text in generated images keeps biting you, you can drop in an OCR match-ratio gate today and copy-paste the verification code
You'll get the full two-stage pattern for gemini-3.1-flash-image: generate, read back with a vision model, and fall back to Pillow compositing, thresholds included
You can size a monthly cost ceiling for unattended runs by reasoning from per-image cost and a capped retry budget instead of guessing
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-23
Generating a Thumbnail From a Video With Nano Banana 2 (gemini-3.1-flash-image)
A hands-on guide to passing a whole video as context to the GA model gemini-3.1-flash-image (Nano Banana 2) and generating a single thumbnail. Covers how it differs from frame extraction, the preview-to-GA migration, and measured cost and time per image.
API / SDK2026-05-25
When gemini-2.5-flash-image Ignores Your Reference Image — Diagnosing Why Nano Banana Returns a Totally Different Picture
A field-tested triage order for the situations where gemini-2.5-flash-image (Nano Banana) silently ignores your reference image, swaps the subject, or refuses to honor your edit instructions. Covers parts ordering, response_modalities, image size, and chat-session pitfalls with runnable code.
API / SDK2026-05-18
Building Automatic Wallpaper Category Classification with Gemini Vision
An indie developer shares how they implemented automatic wallpaper image classification with the Gemini Vision API — including accuracy results, real pitfalls, structured-output tips, and a cost comparison with GPT-4o Vision.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →