GEMINI LABJP
FLASH GA — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for sustained frontier performance on agentic and coding tasksTOGGLE — From Jun 16 the Gemini 3.5 Flash feature toggle is removed in the Global, US, and EU multi-regions, so check any configs that depend on itAGENTS — Managed Agents launched in public preview, letting developers build and deploy autonomous, stateful agents inside Google-hosted isolated Linux sandboxesIMAGE — The image preview models gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25; migrate to their successorsSEARCH — File Search now supports multimodal search, natively embedding and searching images via the gemini-embedding-2 modelCLI — Gemini CLI and Code Assist end individual access on Jun 18; free users and AI Pro/Ultra subscribers are directed to the Antigravity CLIFLASH GA — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for sustained frontier performance on agentic and coding tasksTOGGLE — From Jun 16 the Gemini 3.5 Flash feature toggle is removed in the Global, US, and EU multi-regions, so check any configs that depend on itAGENTS — Managed Agents launched in public preview, letting developers build and deploy autonomous, stateful agents inside Google-hosted isolated Linux sandboxesIMAGE — The image preview models gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25; migrate to their successorsSEARCH — File Search now supports multimodal search, natively embedding and searching images via the gemini-embedding-2 modelCLI — Gemini CLI and Code Assist end individual access on Jun 18; free users and AI Pro/Ultra subscribers are directed to the Antigravity CLI
Articles/API / SDK
API / SDK/2026-06-14Advanced

Controlling Image Tokens with the Gemini API media_resolution Setting — Tuning Batch Image Classification by Measurement

media_resolution, introduced in the Gemini 3 line, switches how many tokens an image input consumes across three levels. Through real batch-classification measurements, this guide shows how to balance cost and accuracy by assigning the right tier per task.

gemini-api232media_resolutionmultimodal37cost-optimization23image-classification2tokens3production106

Premium Article

Running a wallpaper app as a solo developer means a job runs every day to auto-categorize newly added images. One day, while looking at the billing breakdown, I noticed it wasn't output or reasoning that was piling up — it was input tokens, far more than I expected, even though I was barely sending any text. The reason was simple: I had been throwing every image at the highest resolution without thinking about how many tokens a single image costs.

media_resolution, introduced in the Gemini 3 line, is exactly the parameter for controlling that per-image token cost. Most cost-optimization writeups focus on caching or model routing, but for multimodal-heavy workloads the input image resolution tier is the single biggest lever you have. In this article, using real measurements from my wallpaper classification pipeline, I walk through how to assign tiers per task without sacrificing cost or accuracy.

What media_resolution Is — The Setting That Decides Image Token Cost

media_resolution controls how many tokens Gemini internally converts an input image or video frame into. Conceptually the value has three levels — low, medium, high. The lower the level, the fewer tokens per image; the higher, the more fine detail the model can read.

The key idea is that this is not a "reduce image quality" setting but a "choose the granularity of the representation handed to the model" setting. Even a coarse tier conveys big-picture features just fine: overall composition, dominant colors, the rough type of subject. But reading small embedded text, or telling apart subtly different patterns, requires a higher tier. In other words, the right tier is determined by what information in the image your task actually needs.

One caveat: the exact token count per tier varies with the model version and the image aspect ratio. Rather than trusting published ballpark figures, it's more reliable to measure on your own workload. The next section builds that harness.

Measure First — Read Per-Tier Tokens via usage_metadata

Before optimizing, capture the current state as numbers. Gemini API responses include usage_metadata, whose prompt_token_count is the input-side token consumption. Send the same image and the same prompt while changing only the tier, and you can compare the tier's effect in isolation.

from google import genai
from google.genai import types
 
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
 
MODEL = "gemini-3.5-flash"  # pin the version in production
 
TIERS = {
    "low": types.MediaResolution.MEDIA_RESOLUTION_LOW,
    "medium": types.MediaResolution.MEDIA_RESOLUTION_MEDIUM,
    "high": types.MediaResolution.MEDIA_RESOLUTION_HIGH,
}
 
def measure_tokens(image_bytes: bytes, prompt: str) -> dict[str, int]:
    """Measure input tokens per tier with the same image and prompt."""
    result = {}
    img_part = types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg")
    for name, tier in TIERS.items():
        resp = client.models.generate_content(
            model=MODEL,
            contents=[img_part, prompt],
            config=types.GenerateContentConfig(
                media_resolution=tier,
                temperature=0,
            ),
        )
        result[name] = resp.usage_metadata.prompt_token_count
    return result
 
with open("sample_wallpaper.jpg", "rb") as f:
    print(measure_tokens(f.read(), "Answer this image's category in one word."))

Run this across a handful of representative images and the gap between tiers becomes obvious at a glance. In my pipeline, input tokens per image differed by roughly several times between the low and high tiers (the absolute values shift with model and image size, so always measure in your own environment). Once you're processing thousands of images a day in batch, that multiplier shows up directly on the bill.

Why change only the tier when measuring? Because if you also change the prompt or the output schema at the same time, you can't isolate which factor moved the number. Move one variable at a time. It sounds tedious, but this is the single principle that saved me the most time in cost investigations.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A reproducible harness that measures, via usage_metadata, how the three media_resolution levels change image token consumption
A measurement protocol and decision criteria for finding the lowest tier that still preserves accuracy, task by task
A record of moving from a flat HIGH setting to per-task assignment and meaningfully shrinking the classification pipeline's input tokens
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-04-13
Gemini API Multimodal Input Optimization — Production Techniques to Cut Token Costs for Images, PDFs, Video, and Audio
Cut your Gemini API multimodal token costs by up to 70% in production. Practical optimization techniques for images, PDFs, video, and audio with working Python code examples.
API / SDK2026-05-25
Designing a Semantic Cache for the Gemini API — Embedding-based Answer Caching That Actually Pays for Itself
A practical design for a semantic cache that sits in front of the Gemini API. Combines text-embedding-004, cosine similarity thresholds, versioned cache keys, and TTL design to balance hit rate and answer quality, with Python and Cloudflare Vectorize code that runs in production.
API / SDK2026-05-16
Testing Gemini Vision for Wallpaper Auto-Classification — Real Accuracy Numbers and Pitfalls
An indie developer behind a 50M+ download wallpaper app shares a hands-on Gemini Vision classification experiment — including a first attempt at 67% accuracy and the improvements that brought it to 87%.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →