GEMINI LABJP
TTS — gemini-3.1-flash-tts-preview now streams speech generation via streamGenerateContent for lower latencyTRANSLATE — Gemini 3.5 Live Translate arrives, auto-detecting 70+ languages for speech-to-speech while preserving intonationIMAGE — Nano Banana 2 Lite launches as the fastest and most cost-efficient Gemini image modelOMNI — Gemini Omni Flash enters public preview as a natively multimodal model for custom video workflowsMODEL — Gemini 3.5 Flash reaches GA and now powers gemini-flash-latestAGENT — Managed Agents enter public preview in the Gemini API, running in isolated Google-hosted Linux sandboxesTTS — gemini-3.1-flash-tts-preview now streams speech generation via streamGenerateContent for lower latencyTRANSLATE — Gemini 3.5 Live Translate arrives, auto-detecting 70+ languages for speech-to-speech while preserving intonationIMAGE — Nano Banana 2 Lite launches as the fastest and most cost-efficient Gemini image modelOMNI — Gemini Omni Flash enters public preview as a natively multimodal model for custom video workflowsMODEL — Gemini 3.5 Flash reaches GA and now powers gemini-flash-latestAGENT — Managed Agents enter public preview in the Gemini API, running in isolated Google-hosted Linux sandboxes
Articles/API / SDK
API / SDK/2026-07-05Intermediate

Designing Batch Image Costs with Nano Banana 2 Lite: Decide by Measuring

How to fold the fastest, cheapest image model, Nano Banana 2 Lite, into high-volume generation: measuring per-image cost, a two-tier setup with a quality model, and retry handling grounded in real numbers.

Gemini API170Nano Banana2Image GenerationCost DesignBatch Processing4

When I set out to make a few hundred background images for a wallpaper app, the first wall I hit was not quality. It was that generation never seemed to finish, and the size of the bill waiting at the end of the month. Each image takes only seconds, but running hundreds through a top-tier model balloons both time and cost beyond what you pictured. Nano Banana 2 Lite, newly added to the Gemini family, fits exactly this "many, fast, cheap" demand. It launched as the fastest and lowest-cost Gemini image model.

That does not mean you should push everything onto the cheapest model. A fast, cheap model trades that speed for weaknesses in certain areas. Here is the pattern I use as an indie developer when mass-producing images: measure the cost, then decide where to split the work.

Hold per-image cost as a formula, not a guess

The most dangerous move in batch cost design is running production on a vague "probably about this much." Start by breaking per-image cost into three parts: the model fee for one generation, the regeneration rate for images you had to redo, and the yield of images you accepted versus threw away.

Discarded images cost money too. Miss that, and effective cost can run nearly double your estimate. As a formula:

effective cost / accepted image = per-call cost × (1 + regen rate) ÷ accept rate
 
e.g. per call = X, regen rate = 8%, accept rate = 70%:
    effective = X × 1.08 ÷ 0.70 ≈ X × 1.54
 
So what you thought was "X per image" actually costs
about 1.5× that on an accepted basis.

What matters here is not that I assert the absolute value of X. Model fees get revised, and your prompt and resolution shift it too. What you should do is generate the first 50 images for real and capture two measured values yourself: the regeneration rate and the accept rate. Once you have those two, monthly cost is just a multiplication by the number of images.

Catch it in two tiers: cheap model and quality model

The speed and low cost of Nano Banana 2 Lite pay off most in the stage where you produce a large batch of rough drafts. Meanwhile, the single hero image you put up front, or anything where broken detail is unacceptable, is often cheaper in the end to hand to a higher-quality model — because you redo it less.

So I use a two-tier setup: generate many candidates with Lite, then finish only the chosen ones on a quality model, or accept them as-is. The code below is a skeleton that generates a candidate pool with Lite, applies a cheap score to cull, and passes only survivors to the next stage.

import os
from google import genai
 
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 
LITE_MODEL = "nano-banana-2-lite"   # fastest, cheapest model, for bulk candidate generation
 
def generate_candidates(prompt: str, n: int) -> list[bytes]:
    """Generate n candidate images with the Lite model."""
    images = []
    for i in range(n):
        resp = client.models.generate_images(
            model=LITE_MODEL,
            prompt=prompt,
            config={"number_of_images": 1},
        )
        images.append(resp.generated_images[0].image.image_bytes)
    return images
 
def keep_or_drop(image_bytes: bytes) -> bool:
    """Minimal accept/reject: a first-pass filter on size only."""
    # Cull extremely small outputs (a sign of a broken generation)
    return len(image_bytes) > 40_000  # tune the threshold against real data
 
def run_batch(prompt: str, want: int) -> list[bytes]:
    kept, attempts = [], 0
    while len(kept) < want and attempts < want * 3:  # always cap to avoid an infinite loop
        for img in generate_candidates(prompt, want - len(kept)):
            attempts += 1
            if keep_or_drop(img):
                kept.append(img)
    return kept

Always place the attempts < want * 3 cap. When a prompt is hard and the accept rate is very low, an uncapped batch spins forever and time and cost run open-ended. Hitting the cap is the signal to stop and revisit the prompt.

Do not turn failures into silent retries

In high-volume generation, transient errors and empty responses always occur at some rate. Drop in careless unlimited retries and, during an outage, retries avalanche and only the bill grows. Give retries a cap and a wait, and record failed inputs rather than discarding them.

Failure typeCommon reactionRecommended handling
Transient rate limitImmediate infinite retryExponential backoff, up to 3 tries
Empty / broken outputAccept it unnoticedCull with the filter; regenerate once only
Prompt-driven low accept rateBrute-force more imagesStop the batch and fix the prompt
Interruption mid-runRestart from scratchSave accepted images; resume from where you left off

Resuming from an interruption is a high-impact design in mass generation. If you accepted 250 of 300 and then errors on the last 50 force you to throw it all out and restart, that is wasted time and cost. Save accepted images incrementally and you only fill in the remainder.

Watch the migration and deprecation schedule too

Image models turn over fast. In fact, some image-generation models already have retirement dates announced. Hard-code a single model name because it is cheapest today, and you touch the whole codebase every time one is retired. Keep the model name as a config value in one place so switching is a one-line change, and the next revision or new model arrives without a scramble.

In my case, I keep the model name and a placeholder per-call cost in a small config file, and at the start of each month I update only that value and the measured accept rate. That alone lets me read the monthly cost of changing the image count instantly.

Where to put the dividing line

Nano Banana 2 Lite is the lead in the stage that produces many candidates fast and cheap. Decide the split up front — the hero image and anything intolerant of breakage go to the quality model, the rough drafts before them go to Lite — and you drop neither cost nor quality.

For a next step, pick one generation flow you currently run entirely on a high-quality model where you have never watched it through to final acceptance, swap just that "candidate" stage to Lite, and measure the regeneration and accept rates over 50 images. With those two numbers in hand, you can decide how far to lean on Lite by measurement, not by feel.

I hope it helps your build. Thanks for reading to the end.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API / SDK2026-06-18
Stop a Batch Before It Overspends — A Budget Gate Built on countTokens That Survives a Default-Model Swap
Nightly batches overspend because you only learn the cost after billing. Starting from countTokens, this guide builds a budget gate that folds in thinking tokens and keeps your estimate intact even when the default model changes underneath you.
API / SDK2026-05-03
Auto-Categorizing 3,000 Wallpaper Images With Gemini Vision API — A Real Production Account
Manually categorizing thousands of wallpaper images doesn't scale. This is a hands-on account of building an auto-classification pipeline with Gemini Vision API — covering design, implementation, actual cost, and the failure patterns I hit running 3,000 images through it.
API / SDK2026-03-21
Gemini Batch Processing API Guide— Process Thousands of Requests at 50% Off
A comprehensive guide to Gemini's Batch Processing API. Learn how to process thousands of requests asynchronously, cut costs by 50%, and build production-grade batch pipelines with Python and TypeScript.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →