GEMINI LABJP
FLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasksAGENTS — Managed Agents arrive in public preview, running autonomous agents in Google-hosted isolated Linux sandboxesWEBHOOK — Event-driven webhooks now replace polling for the Batch API and long-running operationsSEARCH — File Search goes multimodal, embedding and searching images via gemini-embedding-2SUNSET — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down on June 25ANTIGRAVITY — The Antigravity Agent managed agent (antigravity-preview-05-2026) is available in public previewFLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasksAGENTS — Managed Agents arrive in public preview, running autonomous agents in Google-hosted isolated Linux sandboxesWEBHOOK — Event-driven webhooks now replace polling for the Batch API and long-running operationsSEARCH — File Search goes multimodal, embedding and searching images via gemini-embedding-2SUNSET — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down on June 25ANTIGRAVITY — The Antigravity Agent managed agent (antigravity-preview-05-2026) is available in public preview
Articles/API / SDK
API / SDK/2026-06-14Advanced

How a Deep Think Verification Step Tripled My API Bill, and How thinking_level Got It Back

After wiring API-accessible Gemini 3 Deep Think into my output-verification step, my projected monthly cost jumped roughly 3x. Here is the implementation record of capping it with thinking_level and a cost guardrail, then settling on a two-stage design with Flash.

gemini86deep-think4gemini-api243cost4reasoning6

Premium Article

The week after I added Gemini 3 Deep Think to my verification step, my projected API bill was about three times the usual.

As an indie developer, I automate article generation for four technical blogs (the Dolice Labs sites), and I run a pre-publish step that mechanically checks each draft for overstated claims and broken code examples. I used to have Flash do the grading. When the June 2026 update opened up partial API access to Deep Think, I simply swapped it in, hoping for sharper judgments.

The accuracy did improve. The cost increase, however, went well beyond what I expected. Deep Think runs a long internal reasoning pass before it answers, and those reasoning tokens hit your bill separately from input and output. Verification runs dozens of times a day, so the per-call difference compounds quickly.

This is the record of how I reined that in with thinking_level and a cost guardrail, and eventually landed on a two-stage design with Flash. The goal was to treat Deep Think as a smart-but-expensive tool and draw a clear line around where it actually earns its cost.

Why Deep Think verification gets expensive

For an ordinary model call, you can estimate cost roughly as "input tokens + output tokens." But a deep reasoning model like Deep Think unfolds a long chain of thought before writing its final answer. That thinking is computation, and it is billable.

Verification turned out to be a poor fit for this. A question like "does this article overstate anything?" looks, to Deep Think, like a problem worth pondering, so it digs in on its own. The output I need is just a short "OK" or "needs revision," yet the reasoning leading up to it balloons.

In other words, Deep Think's verification cost is not something you can infer from the short output. The hidden reasoning tokens are the main act, and unless you control them, your per-call price never stabilizes.

Capping reasoning depth with thinking_level

The first thing that helped was putting a ceiling on the depth of reasoning itself. The Gemini 3 family lets you set how hard the model thinks via thinking_level. For a task like verification, where the correct answer is short and the criteria are clear, you do not need maximum deliberation.

from google import genai
from google.genai import types
 
# Reads GEMINI_API_KEY from the environment
client = genai.Client()
 
def verify_article(article_text: str) -> str:
    """Verifies an article and returns a short result containing JUDGE: OK / JUDGE: REVISE."""
    prompt = (
        "You are a copy editor doing fact-checking for technical articles. "
        "Decide whether the following article contains overstatement, broken code "
        "examples, or clear factual errors. Write 'JUDGE: OK' on the first line if "
        "it is fine, or 'JUDGE: REVISE' if it needs changes, then give the reason "
        "in at most three lines.\n\n---\n" + article_text
    )
 
    response = client.models.generate_content(
        model="gemini-3-deep-think",
        contents=prompt,
        config=types.GenerateContentConfig(
            # Verification does not need deep deliberation, so pin it to low
            thinking_config=types.ThinkingConfig(thinking_level="low"),
            # Keep the output short to avoid wasted long-form text
            max_output_tokens=200,
        ),
    )
    return response.text
 
print(verify_article("(article body here)"))
# Example expected output:
#   JUDGE: REVISE
#   The third code example calls client.generate(); it should be client.models.generate_content().

Leaving thinking_level at high was the single biggest cause of the cost blow-up. Dropping it to low barely changed verification accuracy, yet sharply reduced the thinking tokens per call. A design or math problem that genuinely warrants deep reasoning is nothing like a task that just returns a short verdict; the amount of thinking they need is completely different.

In this situation I prefer to make low the default and only raise specific "gray" cases to high, which I'll cover below.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
If Deep Think blew up your verification costs, you'll learn to cap them with thinking_level and a guardrail
You'll get copy-paste code for a two-stage design: Flash judges first, Deep Think only handles the gray cases
You'll understand how reasoning tokens land on your bill, so you can price each verification call
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-20
Building a Type-Safe AI Backend with Gemini API, tRPC v11, and Prisma — Real-Time Streaming, Auth Middleware, and Production Deployment
Learn how to integrate Gemini API streaming into tRPC v11 subscriptions, persist conversations type-safely with Prisma, and handle auth middleware, rate limiting, and common production pitfalls — all with working code examples.
API / SDK2026-06-19
Building location-aware AI with Gemini's Google Maps grounding: pricing and the source-display rules tutorials skip
How to ship a 'recommend something nearby' feature with Gemini API's Google Maps grounding, with the $25/1K cost design and the source-display obligations laid out for indie developers.
API / SDK2026-06-17
Moving My Automation Off the Gemini CLI Before the June 18 Shutdown
On June 18, the Gemini CLI stops responding for hosted plans. Here is how I moved unattended scripts that called gemini from the shell over to the google-genai SDK, with structured output, retries, and cost measurement built in.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →