GEMINI LABJP
SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soonSIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
Articles/Advanced
Advanced/2026-04-16Advanced

Controlling Gemini 2.5 Pro's Thinking — Thinking Budget and Reasoning-Aware Prompt Design

A deep dive into Gemini 2.5 Pro's Thinking feature and internal reasoning process. Covers Thinking Budget configuration, optimal values by task type, extracting thinking_parts for quality verification, and prompt design patterns that maximize reasoning quality.

Gemini 2.5 Pro24Thinking Budget2reasoning7prompt design4Gemini API181AI quality

Premium Article

A Model Where You Can Configure How Much It Thinks

What makes Gemini 2.5 Pro fundamentally different from other models is this: you can control how deeply the model reasons before returning an answer.

That's the thinking_budget parameter. Set it to 0 and you get an immediate response (Thinking OFF). Push it to 24576 tokens and the model works through the problem internally before answering. The same prompt can produce very different output quality depending on this setting — I ran systematic tests, and the differences are striking.

What Is Thinking Budget?

thinking_budget is a Gemini 2.5 Pro-specific parameter that sets the maximum number of tokens the model can use for internal reasoning.

The key distinction: it's a maximum, not a guarantee. Simple questions get resolved with minimal reasoning; hard problems consume the full budget. Setting a high value is effectively saying "take as long as you need."

On cost — being straightforward here: Thinking tokens are billed at the same rate as regular tokens. Running a complex problem with thinking_budget=24576 will cost noticeably more. Use lower values for simple tasks where the extra reasoning isn't worth it.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Measured latency, cost, and quality scores at each Budget setting (5-run averages)
Three production issues I actually hit with the conservative workarounds I settled on
Budget allocation rules from an indie developer running 50M+ download apps
Secure payment via Stripe · Cancel anytime
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Advanced2026-04-11
Gemini Advanced Reasoning: Practical Strategies for Solving Complex Problems
A systematic guide to unlocking Gemini Advanced's full reasoning and analysis capabilities — covering Deep Research, multimodal reasoning, code analysis, and mathematical reasoning with real-world prompt strategies and examples.
Advanced2026-04-07
Gemini 2.5 Flash Thinking — Integrating Thought Traces and Advanced Reasoning into Production Systems
A complete guide to using Gemini 2.5 Flash Thinking's thought trace API in production. Covers thinking budget control, streaming thought display, multi-turn reasoning chains, cost optimization, and robust fallback strategies.
Advanced2026-04-06
Gemini 2.5 Pro Business Masterclass: Thinking, Long Context, and Multimodal for Advanced Users
An advanced guide to unlocking Gemini 2.5 Pro's full business potential — Thinking mode for complex decisions, 1M-token context for large document analysis, multimodal for data interpretation, and API automation design. Includes production-ready prompt frameworks.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →