Gemini 3.5 Flash Went GA, So I Reshuffled Which Model Does What

On the morning of June 23rd I opened the Gemini API changelog the way I usually do, and there it was: 3.5 Flash had gone GA. What caught my eye was how the performance was framed — it was positioned as beating the previously higher-tier 3.1 Pro on nearly every benchmark, while running four times faster than other frontier models.

The first thing I opened next was not the rest of the release notes. It was my own private note titled "which model handles which task." If a fast, cheap model can now handle work I used to reserve for a higher tier, the assignment table is worth redrawing. These are my notes from that redraw.

I assign a specific model to each task

Once you run a few apps as an indie developer, there is never just one place where you call Gemini. A job that sorts the overnight user reviews for a wallpaper app before I wake up. Drafting release notes. A light review of a code diff. A multilingual check of store copy. Each one wants something different.

Review sorting is high-volume and each judgment is light, so speed and price per call dominate. Code review, on the other hand, is low-volume but expensive to get wrong, so I would rather have something smart even if it is a little slower. So I assign models by the nature of the task and keep that mapping as a single table. Price, speed, and intelligence usually form a standoff where you can only pick two.

What made this update interesting is that one corner of that standoff gave way. If the Flash model — supposedly the fast, cheap one — is said to beat the higher Pro on benchmarks, then the assumption "fast means lower-tier" itself deserves a second look.

Test "fast and smart," not "fast so lower"

That said, swapping assignments on the strength of benchmark numbers alone is not how I like to work. I want to try it on my own tasks and confirm the output quality does not drop before I move anything. So rather than mechanically generating a list of swap candidates, I started by sorting tasks into "intelligence-first" and "speed-first."

I keep the decision of whether to switch a task inside a small routing function. If model names are scattered hard-coded throughout the code, every migration like this turns into a hunt.

import os
from google import genai
 
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 
# Decide the model per task type in one place.
# A migration becomes a single edit to this dictionary.
MODEL_FOR_TASK = {
    "review_triage": "gemini-3.5-flash",  # high volume, light calls -> favor speed
    "store_copy_check": "gemini-3.5-flash",
    "code_review": "gemini-3.1-pro",       # misses hurt -> keep the higher tier for now
    "release_notes": "gemini-3.5-flash",   # moved from Pro to Flash this time
}
 
def run_task(task: str, prompt: str) -> str:
    model = MODEL_FOR_TASK.get(task, "gemini-3.5-flash")
    response = client.models.generate_content(model=model, contents=prompt)
    return response.text
 
# Example: drafting release notes (the task I reassigned to Flash)
draft = run_task("release_notes", "Turn the following changes into a calm, plain bulleted list: ...")
print(draft)
# Expected output: a bulleted release-note draft, returned in a few hundred ms to a couple of seconds

With this shape, a reassignment is just a one-line edit to the dictionary. Moving release_notes from Pro to Flash was, in practice, exactly that single line. I compared a week's worth of output, confirmed the tone and the level of detail barely changed, and moved it with confidence.

The one thing I was careful about

I did not push everything onto Flash. I left code review on the higher-tier model for a while longer.

The reason is that what I want from that task is not speed but the willingness to stop and think. A while back I was chasing a bug in a wallpaper app where the list view crashed only after a particular action; the diff itself was small, but the condition that caused the crash lived outside the diff. To question that kind of "assumption the code does not show," the intelligence to step back and trace the logic matters more than answering quickly. Even after hearing it won on benchmarks, I keep it in place until I have run a few reviews of comparable difficulty by hand and confirmed the misses do not increase.

Every time a new model ships I am tempted to switch over wholesale, but I try to test it task by task — and in order of "most painful to get wrong" first. News that a fast model got smarter is genuinely welcome, yet if you swap everything in that glow, a single happens-to-be-hard case will bite you. Reassigning one card at a time, checking as you go, turns out to be the fastest route in the end.

One more thing: these generational turnovers move quickly, so it is safer to track them alongside the deprecation schedule. I open the Gemini changelog and the list of deprecated models every other week.

Wrapping up

If you also call Gemini across several tasks, start by gathering your model names into a single table. Long before the question of whether a given reassignment is wise, simply having migrations cost you a one-line edit is what pays off most during a fast turnover like this one.

Thank you for reading.