GEMINI LABJP
API — Event-driven webhooks deliver Batch API and long-running completions, removing the need to pollSEARCH — File Search now supports gemini-embedding-2, embedding and searching images nativelySECURITY — Since June 19, requests from unrestricted API keys are blocked — review your key limitsMODEL — Gemini 3.5 Flash is generally available and now powers gemini-flash-latestAGENT — Managed Agents hit public preview in the Gemini API, running in isolated sandboxesDEPRECATED — Two image-preview models shut down June 25 — check any preview-dependent flowsAPI — Event-driven webhooks deliver Batch API and long-running completions, removing the need to pollSEARCH — File Search now supports gemini-embedding-2, embedding and searching images nativelySECURITY — Since June 19, requests from unrestricted API keys are blocked — review your key limitsMODEL — Gemini 3.5 Flash is generally available and now powers gemini-flash-latestAGENT — Managed Agents hit public preview in the Gemini API, running in isolated sandboxesDEPRECATED — Two image-preview models shut down June 25 — check any preview-dependent flows
Articles/API / SDK
API / SDK/2026-06-28Advanced

A Finished Gemini Job Flipped Back to 'Running' — Stopping Out-of-Order Webhooks with Monotonic State Apply

When you receive Gemini long-running operations over webhooks, a stale 'running' event can arrive after completion and roll your state backward. Here is a monotonic-apply reducer that safely drops regressing updates.

gemini90webhook3long-running-operationsproduction126

Premium Article

One morning I opened my nightly-batch dashboard and a job that had finished publishing the night before was showing "running" again. There were even traces of a re-publish, so my first guess was that the job itself had been re-run. But the operation ledger told a different story: the job had received SUCCEEDED the previous night, and then, forty minutes later, an old RUNNING event arrived a second time. The sink naively did state = event.state, so completion rolled backward into running.

When you run nightly batches as an indie developer, the failures that survive a webhook migration almost always look like this. Not duplication (the same completion arriving twice), not loss (an event dropping). It is the third hazard: events arriving out of order.

A Rollback Is Neither Duplication Nor Loss

In webhook-migration writing, duplication and loss get all the attention. I covered duplication in an idempotent sink that makes completion events effectively once, and loss in double-covering long-running operations with reconciliation. Both are about "don't process the same state twice" and "recover a state that never arrived."

Out-of-order delivery is neither. Each event arrives once, correctly signed, genuine. It is simply that the arrival order does not match the operation's progression. For a job that moved PENDING → RUNNING → SUCCEEDED, your sink can receive them as SUCCEEDED → RUNNING. An idempotency key won't drop the late one, because the two are different events. A reconciliation poller won't recover anything, because nothing was lost. Ordering needs its own defense.

Why Webhooks Don't Guarantee Order

Gemini's webhooks deliver completion of Batch API jobs and long-running operations as events. The delivery model is at-least-once, and ordering is not part of the contract. There are several overlapping paths to reordering.

If your sink returns a transient 5xx, that update is redelivered later. If the next update gets through in the meantime, the redelivered older event ends up behind the newer one. With parallel delivery, ordinary network jitter is enough to swap two events. And when a reconciliation poller picks up what a deploy missed, the webhook path and the reconciliation path merge two events of different freshness with a time gap between them.

So reordering is not "the occasional bad luck." It is structural the moment you choose at-least-once delivery plus a two-path design. The sink has to take responsibility for recognizing and dropping updates that move backward.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
You can now separate the 'a finished job went back to running' failure from duplication and loss, and treat it as the distinct hazard it is
You'll get a copy-paste monotonic-apply reducer that uses a state-rank lattice and updateTime fencing to drop only the updates that move backward
You can route both your webhook handler and your reconciliation poller through one reducer so a terminal state can never be clobbered again
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-23
Your File Search Store Goes Stale in Production — Catalog Sync and Drift Detection That Actually Hold
Load a catalog into File Search once and forget it, and within weeks it starts confidently pointing users at assets you already pulled. Here is the sync pipeline I run: hash-based incremental import, a blue/green rebuild that swallows deletions, and a nightly drift audit.
API / SDK2026-06-17
Keep Your Flash-to-Pro Routing Threshold Honest with Shadow Re-evaluation
A Flash-generates, Pro-on-low-confidence router starts drifting the moment you hand-pick its threshold. This is a working build of a loop that samples your kept-Flash outputs, scores them against Pro, and recalibrates the threshold from a quality budget.
API / SDK2026-06-16
Don't Break When the Default Model Moves: A Startup Capability-Probing Layer for Gemini
Pinning a model name breaks on deprecation; trusting the default breaks when the weights swap silently. This is the design I settled on: probe what the served model can actually do at startup, then build every request from that answer. Includes runnable Python.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →