GEMINI LABJP
API — Event-driven webhooks deliver Batch API and long-running completions, removing the need to pollSEARCH — File Search now supports gemini-embedding-2, embedding and searching images nativelySECURITY — Since June 19, requests from unrestricted API keys are blocked — review your key limitsMODEL — Gemini 3.5 Flash is generally available and now powers gemini-flash-latestAGENT — Managed Agents hit public preview in the Gemini API, running in isolated sandboxesDEPRECATED — Two image-preview models shut down June 25 — check any preview-dependent flowsAPI — Event-driven webhooks deliver Batch API and long-running completions, removing the need to pollSEARCH — File Search now supports gemini-embedding-2, embedding and searching images nativelySECURITY — Since June 19, requests from unrestricted API keys are blocked — review your key limitsMODEL — Gemini 3.5 Flash is generally available and now powers gemini-flash-latestAGENT — Managed Agents hit public preview in the Gemini API, running in isolated sandboxesDEPRECATED — Two image-preview models shut down June 25 — check any preview-dependent flows
Articles/API / SDK
API / SDK/2026-06-28Advanced

The Morning a Managed Agent Stalled and Left No Trace — Building a Run-Observability Layer Outside the Sandbox

With Gemini Managed Agents, the sandbox lives on Google's side, so when a run stalls there is nothing left in your own logging stack. This is a working TypeScript design for an outside observability layer that taps stream events into a ledger, detects silent stalls, and folds runs into readable postmortems.

Gemini API153Managed Agents2ObservabilityReliability4Cloudflare Workers6TypeScript7

Premium Article

One morning, a process I run autonomously overnight had produced no output. The schedule log showed a "started" entry but no "completed." I only noticed more than twenty minutes later, when the next scheduled check ran — and by then, nothing remained anywhere to explain what had happened.

When I tried to track down the cause, I hit a wall. The work itself runs inside Gemini's Managed Agents — that is, inside Google's isolated sandbox. On my side I only had the one line that launched the agent and the receiver that was supposed to take the result. My usual logging stack records only what happens inside my own process, so about the moment it went quiet inside the sandbox, it had nothing to say.

This article is the design I landed on for that "stalls quietly" failure: building an observability layer outside the sandbox so a run can be traced after the fact. I write only the shape that actually held up, from the position of an indie developer running nightly batches unattended.

What "stalling quietly" actually looks like

A failure that throws is the easier kind. An exception flies, a stack trace stays behind, an alert fires. The nasty one is the failure where no exception appears and progress simply stops.

Managed Agents run planning, tool calls, and code execution inside the sandbox. Somewhere in there, an external API times out silently, the agent falls into a loop redoing the same move, or it waits forever on a tool result that never returns. When that happens, the operation transitions to neither "failed" nor "succeeded" — progress just stops arriving. From the receiver's side, all you can see is "not done yet."

In my setup, this silent stall going unnoticed until morning was the real pain. If it crashes, I can retry; but something that stops and never returns sits there until a human thinks "that's odd." The first requirement I put on the observability layer was not flashy visualization — it was simply this: don't let a silent stall stay silent.

Why your usual logging stack can't follow it

LLM observability tools — distributed tracing, error collection, and the like — are all built on the premise that you measure inside your own process. You hook in right before a request goes out and right after it comes back, and record that span. That works perfectly for a self-hosted agent loop.

But with Managed Agents, the body of the loop lives inside the sandbox. My code only launches it, receives the stream, and takes the final metadata. In other words, I have exactly two points where measurement is possible: the flow of events visible from outside, and the metadata available after it ends. I cannot insert my own tracer into each step inside the sandbox.

Accept that constraint and the design direction settles itself. Give up on peering inside; instead, record every externally visible event without dropping any, and reconstruct the run from that accumulation. You move the subject of observation from "the process" to "the ledger of events."

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A run-ledger KV schema and append-per-step implementation that reconstructs a run purely from stream events and final metadata, assuming you can never enter the sandbox
How to set the silent-stall threshold from idle-since-last-progress, plus the measured drop in my nightly runs from ~21 minutes to ~80 seconds to detection
Code that normalizes failures into seven classes and a table that separates expected failures from the ones you actually need to chase
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-24
When a Deploy Drops the Webhook: Reconciling Gemini Long-Running Operations with a Belt-and-Suspenders Design
Even after you move from polling to Webhooks, events still get dropped during deploys and transient 5xx windows. Here is how I double up Gemini long-running operations with an operation ledger and a low-frequency reconciliation poller so a missing terminal event never goes unnoticed.
API / SDK2026-06-16
Wiring Gemini Managed Agents Into Your Automation: Keeping Conversation State and Environment State Apart
Managed Agents spin up a Linux sandbox, run an agent loop, and return a result in a single API call. The first thing that trips you up when moving off a hand-rolled loop is that conversation state and file state are two separate things. Here's that design, worked through live.
API / SDK2026-06-14
Keeping Gemini API's Default-Model Shift From Becoming an Incident — Pinning Model IDs and Detecting Silent Upgrades in Production
When the default model quietly moves up, your output length, reasoning behavior, and cost change with zero code edits. This guide shows how to pin model IDs in a single source of truth and verify the effective model from the response to detect default changes.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →