GEMINI LABJP
FLASH GA — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for sustained frontier performance on agentic and coding tasksTOGGLE — From Jun 16 the Gemini 3.5 Flash feature toggle is removed in the Global, US, and EU multi-regions, so check any configs that depend on itAGENTS — Managed Agents launched in public preview, letting developers build and deploy autonomous, stateful agents inside Google-hosted isolated Linux sandboxesIMAGE — The image preview models gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25; migrate to their successorsSEARCH — File Search now supports multimodal search, natively embedding and searching images via the gemini-embedding-2 modelCLI — Gemini CLI and Code Assist end individual access on Jun 18; free users and AI Pro/Ultra subscribers are directed to the Antigravity CLIFLASH GA — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for sustained frontier performance on agentic and coding tasksTOGGLE — From Jun 16 the Gemini 3.5 Flash feature toggle is removed in the Global, US, and EU multi-regions, so check any configs that depend on itAGENTS — Managed Agents launched in public preview, letting developers build and deploy autonomous, stateful agents inside Google-hosted isolated Linux sandboxesIMAGE — The image preview models gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25; migrate to their successorsSEARCH — File Search now supports multimodal search, natively embedding and searching images via the gemini-embedding-2 modelCLI — Gemini CLI and Code Assist end individual access on Jun 18; free users and AI Pro/Ultra subscribers are directed to the Antigravity CLI
Articles/API / SDK
API / SDK/2026-06-13Advanced

The Morning Gemini Generated Fine but the Publish Crashed — A 'Generation Outbox' So Expensive Output Is Never Lost

Generation succeeds, then the process dies right before publishing. The expensive output is gone, and you pay for the same generation again. Here is a 'generation outbox' that persists the output first and turns publishing into an idempotent follow-up, plus what it did for me during the June outage.

gemini-api232outboxreliability4automation35production106

Premium Article

It was the morning of that big June 2026 outage (error 1076 / 1099). As an indie developer I run automated publishing pipelines across four sites, and when I read the logs I froze. Gemini's generation itself was going through fine — but the publish step right after it (a git push) was dying along with the network.

The job exited with an error, and the scheduler dutifully retried. A retry starts over from generation. So it threw away the output I had already paid for and was holding in memory, and called Gemini again with the same prompt. During the outage, that happened three times.

I wrote about preventing duplicate generations earlier, in Idempotency Key Design for the Gemini API. Today is about the failure hiding behind it: generation succeeds, but you lose the result before you can publish it. Even in a small indie pipeline, this quietly and reliably burns money.

Why "generated but not published" is the costliest failure

Think of the pipeline as two stages — generate, then publish. There are three ways it can fail.

It crashes before generation. Nothing is lost; a retry simply starts over cleanly.

Both generation and publish succeed. That is the happy path.

The problem is the third case: generation succeeds and the crash lands before publishing. You have already been charged by Gemini. The most valuable intermediate artifact — the output — is sitting in memory, and if you never persisted it, it vanishes when the process exits. The retry regenerates, so you buy the same tokens again.

The longer the input prompt, the more this hurts. My article-generation job carries reference data and prior-article context, so input averages around 11,000 tokens and output around 3,500. If only the publish keeps failing and you retry three times, generation cost roughly triples. Small for one article, but across six sites every day, a few hours of outage adds up fast.

The shift — put the output in an "outbox" before sending

The fix borrows the outbox pattern that backend systems have used for a long time.

Instead of writing an email and sending it immediately, you save it to a drafts folder first, then hand it to the sending process. If sending fails, the draft is still there — you never rewrite the body.

Apply that to the Gemini pipeline. The moment generation succeeds, write the output itself to a durable store before publishing. Publishing becomes a separate, independent step that only pulls items out of the box and ships them.

Now the roles are cleanly split. The generation phase's job ends at "safely place the expensive output into the box." The publish phase's job is "ship each item exactly once." Wherever a crash lands, output that made it into the box is never bought twice.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Prevent the most wasteful failure mode — 'generation succeeded, publish failed' — with a single table that durably stores the output first, shown in complete SQLite and Python code
Pass the fingerprint to the publisher as an idempotency key so retries never double-post, with a walk-through of exactly what happens at each crash point
Using real measurements — about 11,000 input and 3,500 output tokens per article — see how three retries triple your generation cost, and how much the outbox claws back
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-05-23
Idempotency Key Design for the Gemini API: Patterns I Use to Prevent Duplicate Generation Across Six Sites
After five months of running six AI-driven sites in parallel, I built an idempotency layer in front of the Gemini API to neutralize retry storms. This deep dive shares the SHA-256 + Cloudflare Workers KV design, the operational numbers behind it, and the four gotchas that only surface in production.
API / SDK2026-03-25
Building a Prompt Evaluation & Optimization Pipeline with Gemini API — Automated Quality Scoring with LLM-as-Judge
Learn how to build a prompt evaluation pipeline using Gemini API. Covers the LLM-as-Judge pattern, A/B testing prompts, automated quality scoring, and cost-quality optimization for production systems.
API / SDK2026-06-15
Defending Against Prompt Injection When You Pass External Text to the Gemini API
User reviews, scraped articles, and other untrusted text are the entry point for indirect prompt injection when you feed them to the Gemini API. Here is a prioritized, code-backed defense you can drop into a production pipeline: trust-boundary isolation, schema constraints, a two-stage screening pass, and output sanitization.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →