GEMINI LABJP
CLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context windowCLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context window
Articles/Advanced
Advanced/2026-06-18Advanced

Restarting a Long Agent Run From Where It Broke — A Step-Ledger Design for Gemini 3.5 Flash Long-Horizon Tasks

Gemini 3.5 Flash is good at long-horizon tasks, but when a 40-step run dies on step 29, you usually start over. An append-only step ledger gives you resume, idempotency, and audit in one place. Here is the design with working Python and measured results.

gemini-3-5-flash3agent8long-horizonproduction114architecture10

Premium Article

It died on step 29.

It was a late night, with Gemini 3.5 Flash handling a long pipeline for me: collecting article candidates, summarizing, dedup checks, drafting, and tidying image metadata — about 40 small steps packed into a single run. It sailed through 28 steps, and on step 29 an external API returned a 504 and took the whole process down with it.

The painful part is that a restart throws away 28 steps of reasoning and billing for nothing. As an indie developer, I went through this "start over from scratch" a few times on my own projects before I finally sat down and built a foundation that could resume.

Today, June 18, Gemini 3.5 Flash became available, described as able to stay useful across long-horizon, multi-step tasks. And yes — the amount of work I can pack into a single run has clearly grown. But the longer a run gets, the larger the loss when it dies midway. To make the most of a model that runs long, you need running gear built for long runs.

The piece of running gear I settled on is an unglamorous thing: a step ledger.

The Three Moments a Long Run Breaks

First, where does it break? Once a run stretches past 30 steps, failures kept arriving in exactly three shapes.

The first is interruption from the outside: API timeouts, rate limits, a deploy restarting the process. You cannot avoid these.

The second is duplicate execution on resume. If you restart sloppily from the failure point, you re-post a draft you already posted, or re-run a generation you already paid for. This is nastier than the interruption itself, because it needs cleanup.

The third is being unable to trace the cause. When you later want to know why the model made a certain decision on step 19, there is nothing to go on if it scrolled past on stdout and vanished.

A step ledger absorbs all three with one mechanism. It records each step's input, output, and decision into an append-only log. Because the record exists, you can resume; because the record carries an idempotency key, you avoid double execution; because the record stays, you can audit.

The Minimal Ledger Schema

One row of the ledger maps to one step. These are the only columns I settled on in production.

ColumnRole
run_idIdentifies the whole run. On resume, pass the same run_id
step_idA stable name within the pipeline, derived deterministically like "summarize:article-42"
idem_keyIdempotency key for the side effect, derived from the input hash
statusOne of started / done / failed
output_refWhere the artifact lives (a path or KV key). The body never goes in the ledger
created_atAppend time, used for ordering and audit

What matters is keeping the artifact body out of the ledger. The ledger is only an index of "what happened"; heavy bodies live elsewhere and the ledger just points at them with output_ref. Kept this way, the ledger stays light as it grows, and reads stay instant even at thousands of rows.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Take home a copy-paste step ledger that lets a 30-60 step agent run resume from the last successful step instead of restarting from zero
Learn how to build idempotency keys that stop the same side effect (API billing, file generation, external posting) from firing twice on resume
See how the same ledger doubles as an audit log of what the model decided at each step, which cut my incident triage time by roughly 70% in practice
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Advanced2026-04-25
Building Self-Critiquing Agents with Gemini API: A Production-Ready Guide to Reflection and Critic-Refiner Patterns
A production-grade walkthrough of Reflection and Critic-Refiner patterns with Gemini 3 Pro and 2.5 Flash. Covers implementation, cost guards, over-correction defenses, and monitoring signals from real deployments.
Advanced2026-04-20
to Production Architecture for Gemini API 2026— Design Patterns for Building Scalable, Reliable AI Systems
A comprehensive guide to production-grade design patterns for Gemini API. Covers resilient API clients, multi-layer caching, multi-tenant design, observability, and cost control with complete code examples.
Advanced2026-03-28
Long-Term Memory and Session Persistence with Gemini API — Design Patterns for Production Chatbots
Master the design patterns for long-term memory management, session persistence, and token budget control essential for building production-grade chatbots with Gemini API.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →