GEMINI LABJP
TTS — gemini-3.1-flash-tts-preview now streams speech generation via streamGenerateContent for lower latencyTRANSLATE — Gemini 3.5 Live Translate arrives, auto-detecting 70+ languages for speech-to-speech while preserving intonationIMAGE — Nano Banana 2 Lite launches as the fastest and most cost-efficient Gemini image modelOMNI — Gemini Omni Flash enters public preview as a natively multimodal model for custom video workflowsMODEL — Gemini 3.5 Flash reaches GA and now powers gemini-flash-latestAGENT — Managed Agents enter public preview in the Gemini API, running in isolated Google-hosted Linux sandboxesTTS — gemini-3.1-flash-tts-preview now streams speech generation via streamGenerateContent for lower latencyTRANSLATE — Gemini 3.5 Live Translate arrives, auto-detecting 70+ languages for speech-to-speech while preserving intonationIMAGE — Nano Banana 2 Lite launches as the fastest and most cost-efficient Gemini image modelOMNI — Gemini Omni Flash enters public preview as a natively multimodal model for custom video workflowsMODEL — Gemini 3.5 Flash reaches GA and now powers gemini-flash-latestAGENT — Managed Agents enter public preview in the Gemini API, running in isolated Google-hosted Linux sandboxes
Articles/API / SDK
API / SDK/2026-07-05Intermediate

Splitting Bulk Image Generation Cost in Two with Nano Banana 2 Lite: A Draft-and-Render Design

A two-tier cost design that routes first-pass generation to Nano Banana 2 Lite and final renders to the standard Nano Banana 2, with a minimal Python router you can adapt.

Nano Banana 2 Liteimage generation3Gemini API169cost designindie dev5

Whenever I set out to generate a batch of image assets for a wallpaper app, the first question was never "how many do I need," but "how many will I throw away." If I want 100 keepers, I usually generate 300 to 400 and discard the ones with broken composition or the wrong mood. That discard-heavy count lands directly on the bill, which is the painful part of doing this as a solo developer.

Nano Banana 2 Lite, which became available in July 2026, is positioned as the fastest and lowest-cost image model in the Gemini lineup. It is tempting to conclude "just route everything to the cheaper model." But after actually testing it, what I settled on was a two-tier setup: Lite for the first pass, and the standard Nano Banana 2 for the final render of anything that gets accepted. This post records how I split the two, plus a minimal router that actually runs.

Discarded Images Don't Need Top-Tier Quality

Most of the images produced in a bulk first pass are never accepted. A person or a machine filters out the ones with broken composition or the wrong atmosphere. Spending the standard model's resolution and detail on those means paying the highest unit price for exactly the images you throw away.

What a first pass needs is enough fidelity to decide accept or reject, not delivery quality. Lite's speed and price fit that job precisely. In my wallpaper generation, all I want to see in the first pass is the color direction and rough composition, and Lite's output rarely left me unable to make that call.

The accepted image, on the other hand, ships and gets used for a long time. That single frame is worth rebuilding with the standard model. The heart of the two-tier idea is this: the "images you plan to discard" and the "one you keep" deserve different unit prices.

Cut It Into Three Stages: Draft, Screen, Final Render

A two-tier setup is easiest to build when you split the work into three stages.

In the first pass, generate candidates with Lite at three to four times the number you need. In the screening stage, run a mechanical filter first (resolution, aspect ratio, brightness skew, duplicates), then send only the survivors to a human eye or a Vision model. In the final render, pass the same instructions (prompt and seed) from an accepted candidate to the standard model and rebuild it at delivery quality.

What matters across these three stages is whether you can share instructions between the first pass and the final render. If you keep the accepted candidate's prompt and seed, the final render becomes "reproduce the same intent at higher quality." When that link is broken, the atmosphere you accepted can't be reproduced at render time, and you end up doing the work twice.

A Minimal Two-Tier Router

Carving the stages into functions makes the whole flow easier to reason about. Below is a minimal setup using the Gemini API Python SDK. The model IDs are pulled out as constants since they vary by environment.

import os
from google import genai
 
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 
# Set model IDs to match availability
MODEL_DRAFT = os.environ.get("MODEL_DRAFT", "nano-banana-2-lite")
MODEL_FINAL = os.environ.get("MODEL_FINAL", "nano-banana-2")
 
 
def generate_draft(prompt: str, seed: int) -> bytes:
    """First pass: one candidate for screening, made with Lite."""
    res = client.models.generate_images(
        model=MODEL_DRAFT,
        prompt=prompt,
        config={"number_of_images": 1, "seed": seed},
    )
    return res.generated_images[0].image.image_bytes
 
 
def machine_screen(image: bytes) -> bool:
    """Mechanical filter. Reject the bulk here."""
    from PIL import Image
    import io
 
    img = Image.open(io.BytesIO(image)).convert("RGB")
    w, h = img.size
    if w < 512 or h < 512:
        return False
    # Reject outputs with extreme brightness skew
    grayscale = img.convert("L")
    mean = sum(grayscale.getdata()) / (w * h)
    if mean < 20 or mean > 235:
        return False
    return True
 
 
def render_final(prompt: str, seed: int) -> bytes:
    """Final render: reproduce the accepted candidate with the standard model."""
    res = client.models.generate_images(
        model=MODEL_FINAL,
        prompt=prompt,
        config={"number_of_images": 1, "seed": seed},
    )
    return res.generated_images[0].image.image_bytes
 
 
def run_batch(prompt: str, want: int, oversample: float = 3.0) -> list[bytes]:
    """First pass -> mechanical screen -> final render only what's accepted."""
    finals: list[bytes] = []
    tried = 0
    target_drafts = int(want * oversample)
    for seed in range(target_drafts):
        tried += 1
        draft = generate_draft(prompt, seed)
        if not machine_screen(draft):
            continue
        # Insert human or Vision review here. If it passes, render final.
        finals.append(render_final(prompt, seed))
        if len(finals) >= want:
            break
    print(f"drafts={tried} finals={len(finals)}")
    return finals

The key is that seed is shared between the first pass and the final render. Passing the same seed and prompt lets the standard model reproduce, in a close form, the composition you saw in Lite. machine_screen is deliberately kept to cheap checks, acting as a front filter that forwards only the candidates worth a human or Vision review.

How the Cost Changes

The payoff of the two-tier setup depends on the unit-price gap and the acceptance rate. Suppose the first-pass unit price is a quarter of the standard model's, you need 100 keepers, the first pass generates three times that at 300, and about 35% of those are accepted. The spend breaks down like this:

ApproachStandard model callsLite callsRelative cost (standard unit = 1)
Generate everything with the standard model3000300
Two-tier (Lite first pass + 100 standard finals)100300100 + 300×0.25 = 175

By this estimate, going two-tier alone cuts relative cost from 300 to 175, roughly a 42% reduction. The more the acceptance rate drops and the discard count rises, the greater the advantage of running the first pass cheaply. Conversely, in a workflow where nearly everything is accepted, the gap narrows and two tiers buy you little.

These figures are relative values based on an assumed unit-price ratio. Actual prices shift with availability, so I'd recommend recalculating for your own workload by reconciling a counttokens-style estimate against the real bill. For the broader topic of capping spend, I've written separately about building guardrails so a Gemini API bill never catches you off guard.

Where Not to Over-Route to Lite

If you let Lite handle the final render too just because it's cheap, you'll regret it at delivery quality. My rule is simple: I split by "does this output reach the user's device as-is." What reaches them goes to the standard model; what gets discarded before it reaches them goes to Lite.

The other thing I watch is reproducibility between the first pass and the final render. Without keeping the seed and prompt, you can't rebuild the accepted atmosphere at render time, and the good quality you saw in Lite disappears. Before adopting a two-tier setup, put a mechanism in place to log your generation instructions first; it makes later rebuilds far easier.

Wrapping Up

The cost of bulk image generation is determined not by how many you generate, but by "what unit price you pay for the ones you discard." Placing Nano Banana 2 Lite on the first pass and the standard Nano Banana 2 on the post-acceptance final render is a straightforward way to implement that split in unit prices.

As a next step, start by measuring the acceptance rate of your own workflow. The lower it is, the greater the effect of routing the first pass to a cheaper model. Then put a mechanism to record seeds and prompts in place first, and the two-tier setup drops right onto your existing pipeline.

I'm still tuning my own read on the unit-price ratio, but the idea of running a discard-heavy first pass cheaply feels like something I'll keep using in indie cost design for a long time. Thank you for reading.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API / SDK2026-05-17
Auto-Generating App Metadata in 30 Languages with Gemini API — Lessons from Beautiful HD Wallpapers
How I used Gemini API to auto-generate App Store and Google Play metadata in 30 languages for a 50M+ download wallpaper app. Covers prompt design, character limit handling, and real results from 3 months of production use.
API / SDK2026-07-04
When Two Managed Agents Fight Over the Same Repo: External Leases and Fencing for Isolated Sandboxes
Every Managed Agents run gets its own isolated sandbox, so a local lock cannot stop two runs from touching the same repo or record. Here is how I serialize them safely with an external lease and a fencing token.
API / SDK2026-07-03
A Webhook Is a Claim, Not a Fact — Three Layers of Defense for Your Gemini Webhooks Endpoint
Your Gemini Webhooks receiver is a public URL, which means forged events, replays, and duplicate deliveries are all on the table. This walkthrough builds a three-layer defense — reachability checks, dedupe, and a lightweight handler that re-fetches truth from the API — with working FastAPI and SQLite code.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →