Whenever I set out to generate a batch of image assets for a wallpaper app, the first question was never "how many do I need," but "how many will I throw away." If I want 100 keepers, I usually generate 300 to 400 and discard the ones with broken composition or the wrong mood. That discard-heavy count lands directly on the bill, which is the painful part of doing this as a solo developer.
Nano Banana 2 Lite, which became available in July 2026, is positioned as the fastest and lowest-cost image model in the Gemini lineup. It is tempting to conclude "just route everything to the cheaper model." But after actually testing it, what I settled on was a two-tier setup: Lite for the first pass, and the standard Nano Banana 2 for the final render of anything that gets accepted. This post records how I split the two, plus a minimal router that actually runs.
Discarded Images Don't Need Top-Tier Quality
Most of the images produced in a bulk first pass are never accepted. A person or a machine filters out the ones with broken composition or the wrong atmosphere. Spending the standard model's resolution and detail on those means paying the highest unit price for exactly the images you throw away.
What a first pass needs is enough fidelity to decide accept or reject, not delivery quality. Lite's speed and price fit that job precisely. In my wallpaper generation, all I want to see in the first pass is the color direction and rough composition, and Lite's output rarely left me unable to make that call.
The accepted image, on the other hand, ships and gets used for a long time. That single frame is worth rebuilding with the standard model. The heart of the two-tier idea is this: the "images you plan to discard" and the "one you keep" deserve different unit prices.
Cut It Into Three Stages: Draft, Screen, Final Render
A two-tier setup is easiest to build when you split the work into three stages.
In the first pass, generate candidates with Lite at three to four times the number you need. In the screening stage, run a mechanical filter first (resolution, aspect ratio, brightness skew, duplicates), then send only the survivors to a human eye or a Vision model. In the final render, pass the same instructions (prompt and seed) from an accepted candidate to the standard model and rebuild it at delivery quality.
What matters across these three stages is whether you can share instructions between the first pass and the final render. If you keep the accepted candidate's prompt and seed, the final render becomes "reproduce the same intent at higher quality." When that link is broken, the atmosphere you accepted can't be reproduced at render time, and you end up doing the work twice.
A Minimal Two-Tier Router
Carving the stages into functions makes the whole flow easier to reason about. Below is a minimal setup using the Gemini API Python SDK. The model IDs are pulled out as constants since they vary by environment.
import os
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
# Set model IDs to match availability
MODEL_DRAFT = os.environ.get("MODEL_DRAFT", "nano-banana-2-lite")
MODEL_FINAL = os.environ.get("MODEL_FINAL", "nano-banana-2")
def generate_draft(prompt: str, seed: int) -> bytes:
"""First pass: one candidate for screening, made with Lite."""
res = client.models.generate_images(
model=MODEL_DRAFT,
prompt=prompt,
config={"number_of_images": 1, "seed": seed},
)
return res.generated_images[0].image.image_bytes
def machine_screen(image: bytes) -> bool:
"""Mechanical filter. Reject the bulk here."""
from PIL import Image
import io
img = Image.open(io.BytesIO(image)).convert("RGB")
w, h = img.size
if w < 512 or h < 512:
return False
# Reject outputs with extreme brightness skew
grayscale = img.convert("L")
mean = sum(grayscale.getdata()) / (w * h)
if mean < 20 or mean > 235:
return False
return True
def render_final(prompt: str, seed: int) -> bytes:
"""Final render: reproduce the accepted candidate with the standard model."""
res = client.models.generate_images(
model=MODEL_FINAL,
prompt=prompt,
config={"number_of_images": 1, "seed": seed},
)
return res.generated_images[0].image.image_bytes
def run_batch(prompt: str, want: int, oversample: float = 3.0) -> list[bytes]:
"""First pass -> mechanical screen -> final render only what's accepted."""
finals: list[bytes] = []
tried = 0
target_drafts = int(want * oversample)
for seed in range(target_drafts):
tried += 1
draft = generate_draft(prompt, seed)
if not machine_screen(draft):
continue
# Insert human or Vision review here. If it passes, render final.
finals.append(render_final(prompt, seed))
if len(finals) >= want:
break
print(f"drafts={tried} finals={len(finals)}")
return finalsThe key is that seed is shared between the first pass and the final render. Passing the same seed and prompt lets the standard model reproduce, in a close form, the composition you saw in Lite. machine_screen is deliberately kept to cheap checks, acting as a front filter that forwards only the candidates worth a human or Vision review.
How the Cost Changes
The payoff of the two-tier setup depends on the unit-price gap and the acceptance rate. Suppose the first-pass unit price is a quarter of the standard model's, you need 100 keepers, the first pass generates three times that at 300, and about 35% of those are accepted. The spend breaks down like this:
| Approach | Standard model calls | Lite calls | Relative cost (standard unit = 1) |
|---|---|---|---|
| Generate everything with the standard model | 300 | 0 | 300 |
| Two-tier (Lite first pass + 100 standard finals) | 100 | 300 | 100 + 300×0.25 = 175 |
By this estimate, going two-tier alone cuts relative cost from 300 to 175, roughly a 42% reduction. The more the acceptance rate drops and the discard count rises, the greater the advantage of running the first pass cheaply. Conversely, in a workflow where nearly everything is accepted, the gap narrows and two tiers buy you little.
These figures are relative values based on an assumed unit-price ratio. Actual prices shift with availability, so I'd recommend recalculating for your own workload by reconciling a counttokens-style estimate against the real bill. For the broader topic of capping spend, I've written separately about building guardrails so a Gemini API bill never catches you off guard.
Where Not to Over-Route to Lite
If you let Lite handle the final render too just because it's cheap, you'll regret it at delivery quality. My rule is simple: I split by "does this output reach the user's device as-is." What reaches them goes to the standard model; what gets discarded before it reaches them goes to Lite.
The other thing I watch is reproducibility between the first pass and the final render. Without keeping the seed and prompt, you can't rebuild the accepted atmosphere at render time, and the good quality you saw in Lite disappears. Before adopting a two-tier setup, put a mechanism in place to log your generation instructions first; it makes later rebuilds far easier.
Wrapping Up
The cost of bulk image generation is determined not by how many you generate, but by "what unit price you pay for the ones you discard." Placing Nano Banana 2 Lite on the first pass and the standard Nano Banana 2 on the post-acceptance final render is a straightforward way to implement that split in unit prices.
As a next step, start by measuring the acceptance rate of your own workflow. The lower it is, the greater the effect of routing the first pass to a cheaper model. Then put a mechanism to record seeds and prompts in place first, and the two-tier setup drops right onto your existing pipeline.
I'm still tuning my own read on the unit-price ratio, but the idea of running a discard-heavy first pass cheaply feels like something I'll keep using in indie cost design for a long time. Thank you for reading.