GEMINI LABJP
CLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context windowCLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context window
Articles/API / SDK
API / SDK/2026-06-16Advanced

Don't Break When the Default Model Moves: A Startup Capability-Probing Layer for Gemini

Pinning a model name breaks on deprecation; trusting the default breaks when the weights swap silently. This is the design I settled on: probe what the served model can actually do at startup, then build every request from that answer. Includes runnable Python.

gemini83gemini-api239production113architecture9model-migration6

Premium Article

One morning I opened the nightly job log and found that a structured-output call that had been working the night before was returning a finish_reason and an empty body. Not a line of my code had changed. What had changed was on the other side of the API: the model being handed to me as the default.

In June 2026, Gemini 3.5 Flash reached general availability, and in some surfaces the feature-management toggle was removed entirely. Even when you think you've said "use this model," operationally you're now living with the assumption that the default drifts upward. Pin the model name and you fall over on the deprecation date. Don't pin it, and one morning the behavior shifts without you knowing.

Both break. So the idea here is to ask, exactly once at startup, what the model being served right now actually accepts and returns. I lived through that morning a few times in my own indie developer automation, and this is the design I eventually settled on.

Pinning and default-reliance break for different reasons

Before adding any defense, it's worth being precise about why both common stances fail. Skip this and you just stack symptomatic patches.

Pinning a model name is the correct call for reproducibility. Fix gemini-2.5-flash and the same weights answer today and tomorrow. But that stability only lasts as long as the model is served. Once a deprecation notice lands, the pinned code stops at that date with a 404 or NOT_FOUND. The price of stability is that you now own the lifecycle yourself.

Relying on the default is the opposite. Defer to an alias like gemini-flash-latest, or to an unspecified default, and you don't fall over on deprecation. Instead the weights swap under you without notice. The prompt is identical, but the output-length habit, the default thinking behavior, and how strictly structured output obeys the schema all drift quietly. Because nothing throws, you find out from tomorrow's log.

So the question isn't "which is right." Pinning breaks along the time axis; deferring breaks along the behavior axis. As long as you pick one, the other axis is left undefended.

Ask the served model, once, at startup

The smallest way to close the undefended axis is to send tiny test calls at app startup and decide only on the facts that come back. Not whether the docs say "supported," but whether — with your key, against the model assigned to you right now — it's actually accepted.

from google import genai
from google.genai import types
import time, logging
 
log = logging.getLogger("capability")
 
class Capabilities:
    def __init__(self, model: str):
        self.model = model
        self.thinking = False          # accepts a thinking directive?
        self.thinking_param = None     # "level" or "budget"
        self.structured = False        # actually obeys response_schema?
        self.multimodal = False        # accepts image input?
        self.probed_at = 0.0
 
def _try(call):
    """Swallow exceptions: return the value on success, None on failure."""
    try:
        return call()
    except Exception as e:                       # absorb SDK / API version differences
        log.info("probe miss: %s", type(e).__name__)
        return None

Two design choices live here. First, detection is exception-based. Field names may shift across SDK versions, but the fact that an unaccepted request raises does not. Sending it and seeing whether it's accepted depends far less on my assumptions than inspecting field presence with hasattr.

Second, detection is split per feature. Rather than a coarse "this model is newer," I check independently whether thinking goes through, whether structured output obeys the schema, and whether an image is accepted — each with its own small call. Models roll out feature by feature, so a bundled verdict is guaranteed to be wrong somewhere.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Avoid both failure modes at once — pinned model names (which 404 on deprecation) and default reliance (which swaps weights under you) — by probing at startup what the served model actually accepts before you build a production request
A capability-profile implementation you can paste and run: per-feature probes for thinking control, structured output, and multimodal input, cached with a TTL so you pay for detection once, not per call
A fallback chain that degrades quietly the moment a probe is wrong instead of throwing, plus measured cost figures showing detection overhead lands in the single-digit-cents-per-day range
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-13
Where to Adopt Gemini 3.5 Flash GA First — Per-Workload Evaluation and a Staged Rollout with a Model Router
How I migrated production workloads to Gemini 3.5 Flash GA in stages: a per-workload evaluation harness, measured results, an env-based model router, and rollback design.
API / SDK2026-06-17
Keep Your Flash-to-Pro Routing Threshold Honest with Shadow Re-evaluation
A Flash-generates, Pro-on-low-confidence router starts drifting the moment you hand-pick its threshold. This is a working build of a loop that samples your kept-Flash outputs, scores them against Pro, and recalibrates the threshold from a quality budget.
API / SDK2026-06-15
When the Default Model Silently Upgrades: Catching Prompt Regressions in Numbers
Gemini 3.5 Flash is now the default and you can no longer turn it off. Assuming your responses can shift without you touching the prompt, here is how to bundle prompt, model, and sampling into one variant and catch regressions with canaries and an LLM judge — in working code.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →