GEMINI LABJP
MODEL — Gemini 3.5 Flash is GA, Google's most intelligent model for sustained frontier performance on agentic and coding tasksAGENTS — Managed Agents in the Gemini API enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesIMAGE — Nano Banana 2 (Gemini 3.1 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image) are now GAIMAGE — Video-to-image generation arrives: pass a video as context to create thumbnails, posters, and infographics (3.1 Flash Image only)DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down on June 25; migrate to GASTUDIO — Gemini 3 is available across the Gemini app, AI Studio, and Vertex AIMODEL — Gemini 3.5 Flash is GA, Google's most intelligent model for sustained frontier performance on agentic and coding tasksAGENTS — Managed Agents in the Gemini API enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesIMAGE — Nano Banana 2 (Gemini 3.1 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image) are now GAIMAGE — Video-to-image generation arrives: pass a video as context to create thumbnails, posters, and infographics (3.1 Flash Image only)DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down on June 25; migrate to GASTUDIO — Gemini 3 is available across the Gemini app, AI Studio, and Vertex AI
Articles/Updates
Updates/2026-05-06Intermediate

Gemini API Developer Update for May 2026 — What Changed and What You Should Do

A developer-focused roundup of Gemini API changes in May 2026. Covers Gemini 3.2 impressions, the June Gemini 2.0 Flash deprecation deadline, and what to prepare before Google I/O 2026.

Gemini updateGemini 3.26Gemini API144Google I/O 20263May 2026indie developer11

A month ago I wrote a recap of Gemini 3's rollout in April, and honestly the pace has not slowed down since. If anything, May is shaping up to be an even denser month for anyone building with the Gemini API.

Three things are demanding my attention right now: Gemini 3.2 becoming generally available on the API, the hard deadline for Gemini 2.0 Flash deprecation at the end of June, and Google I/O 2026 looming just a couple of weeks away. This month isn't just about trying new features — it's also about making sure nothing breaks when the deprecation deadline hits, and positioning yourself to move fast when I/O drops new capabilities.

Here is what I think every developer working with Gemini should know and act on this month.

Gemini 3.2 Is Now Generally Available on the API

After a limited staged rollout that began in late April, Gemini 3.2 is now accessible to all developers through Google AI Studio and the Gemini API. The model IDs are gemini-3.2-pro and gemini-3.2-flash. If you have been watching the preview closely, the general availability release is largely feature-equivalent — what changed is the removal of the waitlist and the move to standard quota limits.

Compared to Gemini 3.1 Pro, the improvement I noticed most in day-to-day use is code generation quality. Specifically for Swift and Kotlin, the model now suggests patterns that align better with current SDK versions, rather than occasionally defaulting to older or deprecated APIs. For me as someone building iOS and Android apps daily, that translates directly to fewer edits after copy-pasting generated code.

Long-context handling has also improved. The so-called "middle compression" effect — where information in the middle of a very long context window gets underweighted relative to the start and end — is less pronounced in 3.2. If you are feeding entire codebases or long technical documents into context, you should notice modestly better recall for content in those middle sections. It is not fully solved, but it is a meaningful step.

Here is what the simplest migration looks like:

import google.generativeai as genai
 
# Switching to Gemini 3.2 is a one-line change
genai.configure(api_key="YOUR_API_KEY")
 
model = genai.GenerativeModel(
    model_name="gemini-3.2-pro",   # was: "gemini-3.1-pro"
    generation_config={
        "temperature": 0.7,
        "top_p": 0.95,
        "max_output_tokens": 8192,
    }
)
 
response = model.generate_content(
    "Implement an async image download cache in Swift using URLSession, "
    "actor, and async/await. Support both in-memory and persistent disk cache."
)
print(response.text)

The migration cost is essentially zero for most users — swap the model ID and your existing code keeps working. The output format and safety behavior are compatible. That said, gemini-3.2-flash is still labeled as preview, so I would do a proper test pass before promoting it to production.

For a detailed comparison of when to choose 3.2 Pro versus 3.1 Pro, or whether the Flash variant is ready for your workload, the Gemini 3.2 complete guide goes into this in depth.

The June Gemini 2.0 Flash Deadline — Act Now

This is the most time-sensitive item on the May agenda. The Gemini 2.0 Flash family — gemini-2.0-flash, gemini-2.0-flash-exp, and related variants — will be deprecated at the end of June 2026. If you have any of those model IDs in a live production service, you have roughly seven weeks before requests start failing with a model-not-found error.

Google's recommended migration paths are gemini-2.5-flash for cost-sensitive workloads and gemini-3-flash or gemini-3.1-flash for workloads where output quality and instruction following matter more. The performance gap between 2.5 Flash and the Gemini 3 Flash variants has grown over the past few months, so if you are currently on 2.0 Flash because it was fast and cheap, I would at minimum test gemini-3-flash before assuming 2.5 Flash is the right landing spot.

Start by checking whether you are affected at all:

# Find all references to deprecated model IDs across your codebase
grep -r "gemini-2.0-flash\|gemini-2.0-flash-exp" .   --include="*.py" --include="*.ts" --include="*.js" --include="*.env"   --include="*.yaml" --include="*.toml" -l

The -l flag just lists filenames, which is a good first pass to understand the scope. If nothing comes back, you are in the clear. If files do appear, the actual fix is usually a single-line change per file.

I had one place in my own app still referencing 2.0 Flash. I switched it to gemini-3-flash last week and saw no meaningful difference in latency or output quality for that particular use case, which was light document summarization. Your mileage may vary depending on task type.

Three Things to Do Before Google I/O 2026

Google I/O 2026 is expected in mid-to-late May. In past years, significant API capabilities and new models have been made available at or immediately after the keynote — which means developers who have their tooling in order can start experimenting on day one, while others spend a week on setup. Three lightweight preparation tasks are worth doing this week.

1. Review and potentially expand your Google AI Studio quota

Post-I/O demand spikes are predictable. Quota expansion requests can take several business days to process. Submitting a request now, even as a precaution, means you will have headroom available when the rush hits. You can check your current free and paid quotas in Google AI Studio under the "Usage" section in the left sidebar. Pay particular attention to requests-per-minute limits for any model you are planning to evaluate aggressively.

2. Set up Vertex AI authentication if you have not already

Some capabilities — particularly enterprise-grade features and certain experimental models — land on Vertex AI before or alongside the Gemini API. Setting up Application Default Credentials (ADC) now takes about five minutes and eliminates a common stumbling block:

# Authenticate and configure your project
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
 
# Enable the APIs you will need
gcloud services enable aiplatform.googleapis.com
gcloud services enable generativelanguage.googleapis.com
 
# Verify your credentials are working
gcloud auth application-default print-access-token

If you already have this set up, it is worth checking that your service account still has the necessary IAM roles, particularly if you rotated credentials recently.

3. Confirm your Google AI subscription tier

Early access to newly announced models is frequently tied to Google AI Pro or Ultra subscriptions. If there is a model or capability you are planning to evaluate right after I/O, verifying your subscription now means you will not find out at an inconvenient moment that you need to upgrade first. Check at Google AI subscription settings.

Smaller Changes Worth Knowing About

A few updates from early May that are easy to miss but affect day-to-day API work.

Implicit caching hit rate improvement: The automatic cache-hit detection logic has been refined. If your requests share a long common prefix — a fixed system prompt or a large reference document prepended to every call — you should see a modest improvement in cache hit rates. This reduces effective cost without any configuration change on your end. The improvement is most noticeable for prompts where the shared prefix exceeds roughly 32,000 tokens.

thinking_budget behavior change in Gemini 3.x: Thinking Mode now applies more dynamic allocation when thinking_budget is not explicitly set. In practice, the model may consume more tokens on harder tasks compared to the previous behavior. If you are running Thinking Mode at scale and have not checked your billing dashboard for May yet, it is worth a look. Setting an explicit thinking_budget value restores the previous deterministic behavior.

Lyria 3 Pro and Veo 3.1 Lite now under SLA: Both APIs moved from preview to general availability with formal SLA coverage this month. If you were holding off on integrating AI-generated music or low-cost video generation into a production service because of stability concerns, that barrier is removed. The Lyria 3 Pro music generation guide covers the practical details.

The One Thing to Do This Month

If you only have time for one task, make it the Gemini 2.0 Flash migration check. Run that grep, confirm whether you are affected, and if so, schedule the fix. The consequence of not doing it is a live service breaking in late June, and the fix itself is typically trivial.

Once that is handled, spend fifteen minutes swapping one of your existing integrations to Gemini 3.2 and running it through some real inputs. Getting a feel for where 3.2 differs from 3.1 before I/O puts you in a much better position to evaluate the new announcements when they drop.

For a preview of what I am watching for at I/O this year, the Google I/O 2026 Gemini preview article covers my current thinking. The Gemini developer ecosystem is moving fast — the best way to keep up is to make the basic maintenance work habitual so that trying new things stays easy.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

Updates2026-05-05
Two Weeks Until Google I/O 2026: What Gemini API Developers Should Prepare Right Now
With Google I/O 2026 just around the corner, here's what developers running Gemini API in production should do this week — from pinning model versions and recording baselines to tracking deprecation timelines before the announcements hit.
Updates2026-06-12
Gemini Lab This Week: An Outage, Two Migration Deadlines, and Four Posts to Read Before June 25
Editor's notes on four posts for a turbulent week: surviving the Gemini outage, migrating off the preview image models before June 25, fixing the outputs schema removal, and structuring App Store rejection replies.
Updates2026-06-11
Google I/O 2026 Preview — What I'm Watching for in Gemini This Year
Google I/O 2026 is approaching. Based on current Gemini development trends and past announcement patterns, here's what I'm personally expecting — no guarantees.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →