GEMINI LABJP
OUTAGE — Gemini recovers from one of its biggest outages (errors 1076/1099) as engineering mitigations take effectDAILY-BRIEF — The new Daily Brief agent works overnight, analyzing your inbox, calendar, and tasks into a personalized morning digestGEMINI-OMNI — Gemini Omni combines Gemini with Google's generative media models to produce consistent, high-quality video from a single promptENTERPRISE — Gemini 3.5 Flash is enabled by default in Gemini Enterprise as of Jun 8 and can no longer be turned offDEPRECATION — Image preview models (3.1-flash-image / 3-pro-image) shut down Jun 25; migrate to the GA versions nowFILE-SEARCH — File Search now supports multimodal search, natively embedding and searching images via gemini-embedding-2OUTAGE — Gemini recovers from one of its biggest outages (errors 1076/1099) as engineering mitigations take effectDAILY-BRIEF — The new Daily Brief agent works overnight, analyzing your inbox, calendar, and tasks into a personalized morning digestGEMINI-OMNI — Gemini Omni combines Gemini with Google's generative media models to produce consistent, high-quality video from a single promptENTERPRISE — Gemini 3.5 Flash is enabled by default in Gemini Enterprise as of Jun 8 and can no longer be turned offDEPRECATION — Image preview models (3.1-flash-image / 3-pro-image) shut down Jun 25; migrate to the GA versions nowFILE-SEARCH — File Search now supports multimodal search, natively embedding and searching images via gemini-embedding-2
Articles/API / SDK
API / SDK/2026-06-12Intermediate

Reverse-Engineering Empty Gemini API Responses with finish_reason — Triage, Retry Classification, and Monitoring

An empty response.text has three distinct failure layers — candidates, prompt_feedback, and finish_reason. Production code for detecting thinking-token starvation, classifying what is worth retrying, and monitoring your empty-response rate.

gemini-api222finish-reason2troubleshooting80error-handling7python88typescript14

Premium Article

Earlier this month, Gemini went through one of its largest outages to date. My own pipelines failed in waves that morning, and while reading back through the logs after recovery, one thing stood out.

The calls that raised exceptions were easy to find. Alerts fired, stack traces landed where they should. What took far longer to notice were the calls sitting right next to them — HTTP 200, technically successful, with an empty response.text. An empty string slips through a pipeline quietly. Downstream steps run as if nothing happened, and the user ends up staring at a blank screen. In practice, this failure mode is worse than the loud one.

An empty response is not the Gemini API breaking. The model always leaves a signal explaining why it stopped generating; the missing piece is code that actually reads it. This article walks through that reading in three stages — a triage flow, a retry classification, and monitoring — with Python (google-genai) as the primary SDK and Node/TypeScript (@google/genai) alongside.

An "empty response" has three distinct layers — where to look before response.text

response.text is a convenience helper. Internally it just collects the text parts from candidates[0].content.parts and joins them. So when it comes back empty, the actual breakage lives in one of three different places:

  1. candidates itself is empty — your input was blocked before generation even started. Look at prompt_feedback.block_reason
  2. candidates exists but parts is empty — generation was cut off mid-flight. Look at finish_reason
  3. parts exists but carries no text — nothing is broken. The response contains non-text parts such as function_call, inline_data, or thought parts

I run a small function that separates these three layers immediately after every production call.

from google import genai
 
client = genai.Client(api_key="YOUR_API_KEY")
 
def triage(resp) -> str:
    """Identify which layer an empty response belongs to. The return value doubles as a log key."""
    # Layer 1: no candidates -> the input was blocked upstream
    if not resp.candidates:
        block = getattr(resp.prompt_feedback, "block_reason", None)
        return f"input_blocked:{block}"
 
    cand = resp.candidates[0]
    finish = str(getattr(cand, "finish_reason", "UNKNOWN"))
 
    # Layer 2: no parts -> finish_reason holds the cutoff reason
    parts = getattr(getattr(cand, "content", None), "parts", None) or []
    if not parts:
        return f"no_parts:{finish}"
 
    # Layer 3: parts exist but no text -> a different kind of part came back
    text = "".join((getattr(p, "text", "") or "") for p in parts)
    if not text:
        kinds = []
        for p in parts:
            if getattr(p, "function_call", None):
                kinds.append("function_call")
            elif getattr(p, "inline_data", None):
                kinds.append("inline_data")
            elif getattr(p, "thought", False):
                kinds.append("thought")
            else:
                kinds.append("unknown")
        return f"non_text_parts:{finish}:{'+'.join(kinds)}"
 
    return "ok"

Streaming this return value into your logs means that the moment an "empty-looking" response arrives, you already know which layer failed. During the outage, having this one function in place was the difference between an hour of guesswork and reading a single log line.

A finish_reason lookup table — which values are worth retrying as-is

When layer 2 is the culprit, finish_reason (finishReason in the Node SDK) tells you why generation stopped. The practical question is not what each value means in the abstract. It is a single yes-or-no: does retrying with no changes stand any chance of a different outcome? Hammering a value that always answers the same way burns quota and returns nothing.

ValueTypical causeRetry as-is?
STOPNormal completion. If empty, suspect non-text partsNo need (check layer 3)
MAX_TOKENSOutput budget exhausted, including by thinking tokensPointless (fix config first)
SAFETYOutput tripped the safety filterPointless (fix settings or input)
RECITATIONExcessive overlap with training dataPointless (fix the prompt)
LANGUAGEUnsupported languagePointless
BLOCKLISTHit a forbidden-terms listPointless
PROHIBITED_CONTENTProhibited content detectedPointless
SPIISensitive personal information detectedPointless
MALFORMED_FUNCTION_CALLBroken tool-call generationConditional (fix the schema)
OTHER / UNSPECIFIEDInternal or unclassified errorYes (with backoff)

Scan the table and the pattern jumps out: the only family where a plain retry genuinely helps is OTHER. Everything else falls into either "repair the config or input, then come back" or "fail fast, because nothing will change." Those three groups become the skeleton of the retry classifier we will build below.

One value that confuses people the first time: STOP with an empty text. Almost always this is layer 3 in disguise. When function calling is enabled and the model decides to invoke a tool, parts contains only a function_call and text is empty. That response is perfectly healthy.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A three-layer triage flow across candidates, prompt_feedback, and finish_reason, plus a lookup table covering nine values with their causes and retry verdicts
Code that detects the 2.5-era failure where thinking tokens starve the output budget and silently erase your text, then repairs the config and recovers automatically
A three-branch retry classifier that stops wasting quota on SAFETY and RECITATION, and a minimal logging setup that tracks your empty-response rate continuously
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-05-12
Gemini File API Stuck in PROCESSING State: Timeout Handling and Retry Design
Fix Gemini File API files stuck in PROCESSING state. Learn proper polling with exponential backoff, timeout design, and cleanup strategies with working Python code examples.
API / SDK2026-05-03
Why Gemini API Returns RECITATION as finish_reason — and How to Fix It
When Gemini API silently truncates responses with finish_reason RECITATION, the request technically succeeds with HTTP 200 — but the output is gone. Here's what actually triggers it and how to recover.
API / SDK2026-06-01
Empty Output but finish_reason Is MAX_TOKENS on Gemini 2.5/3: Cause and Fix
Your prompt is just a few lines, yet a low maxOutputTokens on gemini-2.5-flash returns empty text with finish_reason MAX_TOKENS. The culprit is thinking tokens. Here are three fixes with working code.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →