GEMINI LABJP
SEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact sourceAPI — Event-driven Webhooks replace polling for the Batch API and long-running operationsDEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation nowMODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x fasterAGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxesSTUDIO — Google AI Studio can now generate Android apps from natural-language promptsSEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact sourceAPI — Event-driven Webhooks replace polling for the Batch API and long-running operationsDEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation nowMODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x fasterAGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxesSTUDIO — Google AI Studio can now generate Android apps from natural-language prompts
Articles/API / SDK
API / SDK/2026-06-24Advanced

When a Deploy Drops the Webhook: Reconciling Gemini Long-Running Operations with a Belt-and-Suspenders Design

Even after you move from polling to Webhooks, events still get dropped during deploys and transient 5xx windows. Here is how I double up Gemini long-running operations with an operation ledger and a low-frequency reconciliation poller so a missing terminal event never goes unnoticed.

Gemini API146WebhooksLong-Running OperationsBatch API3Reliability3

Premium Article

One morning, a nightly batch I run for a personal project simply hadn't written its results back. The logs were clear: the Batch API job itself had succeeded. But the webhook that announced completion arrived at the exact moment a redeploy was rolling out. The receiving Worker swapped out for a fraction of a second, and that single delivery slipped through.

This was shortly after I had retired polling in favor of Webhooks. The move to event-driven was the obvious one — no more wasteful status checks — but it quietly introduced a new failure mode: if you miss the event, nobody notices.

This is a record of the belt-and-suspenders design I built so that never happens again. The example uses Gemini long-running operations (Batch API and slow generation jobs), but the skeleton transfers to any system that receives external event notifications.

Reframe dropped events as the normal case, not an anomaly

Webhooks promise at-least-once delivery. That means "we will try to deliver at least once," not "exactly one delivery will always land." The sender retries a few times, but if your endpoint keeps failing, it exhausts the retry budget.

As an indie developer running this alone, this happens for real. In my environment, Cloudflare Workers redeploys run more than ten times a day. Each one opens a few-hundred-millisecond window where the receiver is shaky. Cold starts and transient 5xx pile onto that. The odds of every sender retry landing inside that window are low, but not zero. Run a few dozen operations a day and one or two will go quietly missing each month.

The point is to stop banishing this to exception handling as a "rare failure." If you adopt event-driven, dropped events are normal behavior your design must absorb. So you keep Webhooks as the primary path, and guarantee recovery through a separate reconciliation path. I treat that doubling-up as a premise from day one.

The shape: split a fast path from a slow path

The design has three parts.

PartRoleTrigger
Operation ledgerThe single source of truth for the state of every submitted operationOn job submission
Webhook receiver (fast path)Receives terminal events immediately and closes the ledger entryNotification from Gemini
Reconciliation poller (slow path)Scans for unfinished entries and recovers any dropped terminal eventPeriodic cron

The key is to manage the ledger by "did the operation reach a terminal state," not "did a webhook arrive." There are two ways to confirm a terminal state: the webhook (fast), and an operations.get-style query at reconciliation time (slow but reliable). The state transition is made idempotent so the result is identical no matter which one confirms first.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A complete TypeScript implementation that pairs a KV operation ledger with a Webhooks fast path and a reconciliation slow path so terminal events are never lost
The exact logic for recovering a webhook that never arrived during a deploy, plus where to put the idempotency key so side effects never fire twice
The detection threshold for operations that get stuck, and the measured trade-offs behind how often I reconcile
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-14
Keeping Gemini API's Default-Model Shift From Becoming an Incident — Pinning Model IDs and Detecting Silent Upgrades in Production
When the default model quietly moves up, your output length, reasoning behavior, and cost change with zero code edits. This guide shows how to pin model IDs in a single source of truth and verify the effective model from the response to detect default changes.
API / SDK2026-06-12
Retiring the Midnight Polling Loop — Rebuilding My Gemini Batch Monitoring Around Webhooks
A working log of migrating Gemini Batch API completion monitoring from 60-second polling to event-driven webhooks: static vs dynamic, signature verification, and real numbers.
API / SDK2026-04-27
Designing a Multi-LLM Failover Architecture Around Gemini API: Production Redundancy Patterns That Actually Hold
A production-grade pattern for putting Gemini API at the core of your stack while keeping Claude and GPT-4o as fallbacks — router, adapters, circuit breakers, and observability, all written in Python you can paste straight into your service.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →