GEMINI LABJP
MODEL — Gemini 3.5 Flash is generally available, beating 3.1 Pro on nearly all benchmarks while running fasterAPI — The Interactions API reaches GA as the primary way to work with Gemini models and agentsAGENTS — Managed Agents enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesCOST — Project Spend Caps let you set a monthly dollar limit on Gemini API usage per projectSHEETS — Gemini in Sheets diagnoses and fixes formula errors in one click by analyzing surrounding dataSTUDIO — Google AI Studio gets a developer-first refresh with an expanded gallery of starter appsMODEL — Gemini 3.5 Flash is generally available, beating 3.1 Pro on nearly all benchmarks while running fasterAPI — The Interactions API reaches GA as the primary way to work with Gemini models and agentsAGENTS — Managed Agents enter public preview, running autonomous agents in Google-hosted isolated Linux sandboxesCOST — Project Spend Caps let you set a monthly dollar limit on Gemini API usage per projectSHEETS — Gemini in Sheets diagnoses and fixes formula errors in one click by analyzing surrounding dataSTUDIO — Google AI Studio gets a developer-first refresh with an expanded gallery of starter apps
Articles/API / SDK
API / SDK/2026-06-26Advanced

When Gemini's Safety Filter Silently Drops Legitimate Output — Field Notes on Catching False Positives Without Turning Everything Off

Field notes on handling Gemini API false positives in production without disabling every category. Separating input blocks from output blocks, instrumenting per-category false-positive rates, and recovering by relaxing only the offending category.

gemini-api250safety-filterproduction120observability8error-handling8

Premium Article

Most safety-filter questions are about how to switch everything off. In production the painful case is the opposite: you can't turn it all off, you don't want to, and yet legitimate requests still get dropped now and then — quietly. Often you notice late. The call comes back with finishReason set to SAFETY, and the moment you read response.text an exception fires, leaving the user staring at an empty card.

As an indie developer, I run four technical blogs on an automated publishing pipeline, and an unattended batch ran straight into this. A small fragment of a code snippet or an error message I'd handed over as source material leaned toward DANGEROUS_CONTENT, and generation stopped midway. With no human watching, nobody noticed until I read the logs the next morning. These notes are a record of the mechanism I built so that, instead of "set every category to OFF and move on," I could pick out the false positives and fall back to the safe side.

First, pin down which side blocked it — in one line

The safety filter inspects both the input (prompt) and the output (generated result). These two have entirely different causes and fixes, yet in the logs they both read as "blocked," which is the awkward part. The first thing to do is record, mechanically and every time, which side failed.

When the input is blocked, no candidate is produced and the detail lives in prompt_feedback.block_reason. When the output is blocked, a candidate exists but finish_reason becomes SAFETY and the body is empty. With the newer google-genai SDK, the split looks like this:

from google import genai
from google.genai import types
 
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
 
def classify_block(resp):
    """Return where the block happened: input / output / none."""
    pf = getattr(resp, "prompt_feedback", None)
    if pf and getattr(pf, "block_reason", None):
        return "INPUT_BLOCKED", str(pf.block_reason)
 
    if not resp.candidates:
        # Rare: zero candidates and no prompt_feedback either
        return "NO_CANDIDATE", "unknown"
 
    cand = resp.candidates[0]
    if cand.finish_reason == types.FinishReason.SAFETY:
        return "OUTPUT_BLOCKED", "SAFETY"
 
    return "OK", str(cand.finish_reason)

Run this immediately after every production generation call and write the INPUT_BLOCKED / OUTPUT_BLOCKED result straight into a log field. That alone lets you count "fix the prompt" cases and "revisit the output threshold" cases separately later. In my experience, logs that skip this distinction can only ever say "blocks are up" — which gives you nothing to act on.

Make false positives visible as a per-category rate

Decide threshold changes by gut feeling and you'll usually either loosen too far or freeze and do nothing. What you actually need is measured data: which category, at what probability, contributes how much to the blocks.

Each candidate and the prompt feedback carry safety_ratings, where every element holds category, probability (NEGLIGIBLE / LOW / MEDIUM / HIGH), and blocked (boolean). Flatten that into structured logs and aggregate by category.

from collections import Counter
 
def extract_ratings(resp):
    """Flatten input-side and output-side safety_ratings."""
    rows = []
    pf = getattr(resp, "prompt_feedback", None)
    if pf and getattr(pf, "safety_ratings", None):
        for r in pf.safety_ratings:
            rows.append(("input", str(r.category), str(r.probability), bool(r.blocked)))
    for cand in (resp.candidates or []):
        for r in (cand.safety_ratings or []):
            rows.append(("output", str(r.category), str(r.probability), bool(r.blocked)))
    return rows
 
def summarize(logged_rows):
    """Surface per-category false-positive tendencies from accumulated rows."""
    blocked = Counter()
    medium_plus = Counter()
    for _side, cat, prob, was_blocked in logged_rows:
        if was_blocked:
            blocked[cat] += 1
        if prob in ("MEDIUM", "HIGH"):
            medium_plus[cat] += 1
    return blocked, medium_plus

The point here is to watch not only the blocked count but also the distribution of MEDIUM ratings — the ones that stop short of a block but sit near the line. If MEDIUM is piling up in one category, that category is a reserve army: holding for now, but one slight shift in input away from dropping. Sudden spikes in production usually happen the moment that reserve crosses the threshold. Keep it as a rate and you'll see it coming before it becomes an incident.

Note that probability is a safety-policy likelihood, not a measure of whether the output is correct. Read "it's LOW, so the content must be fine" and you'll misjudge. The filter is looking at policy fit, not factual accuracy.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A logging design that decides, in one line, whether the prompt or the output was blocked
Instrumentation that measures per-category false-positive rates so you loosen thresholds from data, not from a hunch
A recovery function that relaxes only the offending category instead of flipping everything to OFF
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-05-23
Gemini API × Sentry: A Production Pipeline for LLM Error Tracking and Prompt Failure Observability
Pair Sentry's error tracking with Gemini-specific failure modes so you can catch safety filter blocks, recitation rejections, empty completions, and quiet latency drift in production.
API / SDK2026-05-02
7 Design Decisions When Wiring Gemini API Into a Solo App — From Error Design to Quality Monitoring
After embedding Gemini API into several of my own apps, I've collected seven design decisions that come up in production but rarely in tutorials — fallback layering, dynamic model switching, latency UX, and lightweight quality monitoring. This is the playbook I use today.
API / SDK2026-04-25
Tracing Gemini API in Production with OpenTelemetry: See Every Step of a Single Request
After three months of running Gemini API in production, plain logs stop telling you why latency, cost, or failures spike. This guide walks through wrapping Gemini in OpenTelemetry — Python and Node.js code, GenAI semantic conventions, sampling, and Grafana/Datadog wiring — so you can see the full anatomy of every request.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →