GEMINI LABJP
CHROME — Gemini in Chrome lands on Android in late June with Nano Banana and auto browse, rolling out first to 4GB+ RAM devices set to en-USOMNI-FLASH — Gemini Omni Flash rolls out to all AI Plus, Pro, and Ultra subscribers, and is free for adults in YouTube Shorts Remix and YouTube CreateDEADLINE — 12 days until the image preview models shut down on Jun 25 — migrate gemini-3.1-flash and 3-pro image-preview workloads to GA versions nowSCHEMA — The legacy Interactions API schema was removed on Jun 8; double-check your migration to the steps array and the new response_formatFLASH-GA — Gemini 3.5 Flash is generally available via Antigravity, the Gemini API, AI Studio, and Android StudioSUITE — Deep Think, Deep Research, Gemini Live, and Gemini Omni now form one flow: reason, research, talk, and createCHROME — Gemini in Chrome lands on Android in late June with Nano Banana and auto browse, rolling out first to 4GB+ RAM devices set to en-USOMNI-FLASH — Gemini Omni Flash rolls out to all AI Plus, Pro, and Ultra subscribers, and is free for adults in YouTube Shorts Remix and YouTube CreateDEADLINE — 12 days until the image preview models shut down on Jun 25 — migrate gemini-3.1-flash and 3-pro image-preview workloads to GA versions nowSCHEMA — The legacy Interactions API schema was removed on Jun 8; double-check your migration to the steps array and the new response_formatFLASH-GA — Gemini 3.5 Flash is generally available via Antigravity, the Gemini API, AI Studio, and Android StudioSUITE — Deep Think, Deep Research, Gemini Live, and Gemini Omni now form one flow: reason, research, talk, and create
Articles/API / SDK
API / SDK/2026-06-13Intermediate

Should You Move Your Agent Loop to Gemini's Managed Agents? Three Questions That Decide What Migrates

With Gemini API's Managed Agents in public preview, deciding between a self-hosted agent loop and a Google-hosted sandbox is now a real question. Three questions — execution environment, state ownership, and failure recovery — decide what migrates and what stays.

gemini-api225managed-agentsai-agents2automation34architecture8google-io-2026

Managed Agents, announced at Google I/O 2026, are now available in public preview on the Gemini API. The pitch is appealing: a single API call spins up an agent inside a Google-hosted, isolated Linux sandbox, where it reasons, calls tools, executes code, and hands you back the result.

I run a fair amount of automation as an indie developer — scheduled blog maintenance, image-asset housekeeping, and similar background jobs — all driven by agent loops I wrote and operate myself. My honest first reaction to the announcement was an even split of hope and suspicion. Hope, because if something else can carry the tedious parts of running a loop, I will gladly let it. Suspicion, because moving automation that already works is one of the more reliable ways to hurt yourself.

So I went through the jobs running on my machines, one by one, asking a single question: could this move to Managed Agents, and should it? The short answer is that not everything made the cut — but the reasoning collapsed neatly into three questions. What follows is that working-through, written down.

Your agent loop is mostly not the loop

The agent loop itself is surprisingly little code. Call the model; if it returns a function call, run the matching function; feed the result back; repeat. The skeleton fits in about thirty lines.

Here is a minimal loop with exactly one tool, a release-notes checker. Gemini calls the tool when it needs to, and the loop ends once a final text report comes back.

from google import genai
from google.genai import types
 
client = genai.Client()  # reads GEMINI_API_KEY from the environment
 
def check_release_notes(product: str) -> dict:
    """Returns the latest release-notes entry (a real version would hit an RSS feed or DB)."""
    return {"product": product, "latest": "1.4.2", "breaking_changes": False}
 
tool = types.Tool(function_declarations=[
    types.FunctionDeclaration(
        name="check_release_notes",
        description="Fetches the latest release-notes entry for a product name",
        parameters=types.Schema(
            type=types.Type.OBJECT,
            properties={"product": types.Schema(type=types.Type.STRING)},
            required=["product"],
        ),
    )
])
 
contents = [types.Content(
    role="user",
    parts=[types.Part(text="Check whether the latest release of dependency foo contains breaking changes, and report in one paragraph")],
)]
 
for _ in range(5):  # hard cap to prevent runaway loops
    response = client.models.generate_content(
        model="gemini-3.5-flash",
        contents=contents,
        config=types.GenerateContentConfig(tools=[tool]),
    )
    if not response.function_calls:
        print(response.text)
        break
    contents.append(response.candidates[0].content)
    for call in response.function_calls:
        result = check_release_notes(**call.args)
        contents.append(types.Content(
            role="user",
            parts=[types.Part.from_function_response(name=call.name, response=result)],
        ))

I kept the skeleton deliberately bare to make a point: in production, everything that matters accretes around it. Retries with exponential backoff. Logging and persistence of intermediate progress. Keeping the execution environment alive — cron, containers, whatever you use. Credential management. Timeouts and protection against overlapping runs. In my own codebase, the operational layer around the loop is considerably larger than anything related to the agent's actual thinking.

I covered that production scaffolding in detail in Custom Gemini API Agent Loop Without ADK — A Complete Production Guide to Tool Calling, Memory, and Parallel Execution, but the one-line summary is: the loop is easy, the operations are the product. That is the baseline for everything below.

Which part of that operational layer do Managed Agents actually absorb?

Reading through what Google has published, the scope of Managed Agents looks like this. They provision and tear down the execution environment — the isolated Linux sandbox. They drive the loop itself: reason, execute tools and code, continue, repeat. And they hold the agent's state while it runs. In other words, out of the operational layer I just called "the product," the environment upkeep and the loop orchestration move wholesale to the other side of the API.

What stays on your side is just as clear. Defining the task — what you actually want done. Receiving the result and judging whether it is correct. Handling failure. Watching cost. You can delegate how the agent runs; you cannot delegate why it runs or what happens with what it produces.

Looking at that boundary, what struck me was not the freedom from cron management. It was the fact that verification and failure handling stay with me — because in my experience, the time sink in operating automation has never been keeping environments alive. It is investigating things when they break. That realization is why I stopped asking "is this convenient?" and started asking the three questions below.

Question 1 — Does the execution environment itself carry meaning?

The first question is where the job actually needs to run.

A good number of my jobs read and write files in a local workspace: walking a wallpaper app's image-asset folder to classify new pieces, cloning a repository to push articles, appending run logs to a known location. For these, the fact that they run on infrastructure I control is the point. Moving them into a sandbox means redesigning credential handling and file transfer, and by my estimate the migration cost exceeds the payoff.

The opposite kind of job — pass input in, investigate or transform, return output — runs the same anywhere. Monitoring public release notes. Researching and summarizing information from the web. Transforming data into reports. These fit the Managed Agents sandbox naturally, and for transformations that involve code execution, the isolation is an upgrade in its own right. Running model-generated code directly on your own machine is, frankly, something I have never been fully relaxed about.

Question 2 — Who owns the canonical copy of state?

The second question is about ownership of the agent's state.

Managed Agents support stateful agents, and having the sandbox carry conversational and working context across a long multi-step task is a real advantage. But I am not ready to hand the canonical copy of any state to a service in public preview. Preview APIs change. We just lived through an example: the Interactions API removed its legacy schema, forcing a migration from outputs to steps on short notice.

So my rule ended up simple. If state is ephemeral — meaningful only while the task runs — let the sandbox hold it. If state is an asset — referenced by future runs or by other jobs — the canonical copy lives in my own database or files, and the agent receives only what it needs, per run. Hold that line, and even if the Managed Agents surface shifts underneath you, the blast radius is one in-flight task.

Question 3 — When it fails, who can pick up the pieces, and at what granularity?

The third question was the decisive one for me: recoverability on failure.

A self-hosted loop fails at a granularity you can see. Which tool call, with which arguments, returned what before things stopped — you can trace as deeply as you bothered to log, and you are free to build resume-from-midpoint logic. During Gemini's major outage earlier this month, my pipelines survived on a staged retreat: retry, fall back, and if all else failed, write a log entry and exit quietly. That was only possible because I owned a hook at every stage of the loop.

Move a job into Managed Agents and your granularity is whatever the API exposes. How much of the agent's internal steps will become observable over time is something I genuinely hope improves, but as a design assumption, "you cannot intervene in intermediate state" is the safe stance today. Which narrows the candidates to jobs that can simply be rerun from scratch — idempotent work. Anything that emits side effects as it goes, where a half-finished run leaves value or risk behind, stays in the self-hosted loop where I hold the recovery tools.

One related note: the more "submit and wait" workloads you adopt — and Managed Agents are exactly that shape — the more your completion-notification design matters. I wrote about replacing polling with event-driven completion handling in Retiring the Midnight Polling Loop — Rebuilding My Gemini Batch Monitoring Around Webhooks, which pairs well with this discussion.

A small habit for keeping preview services next to production

Even for jobs that pass all three questions, I keep one thin layer of protection in place while Managed Agents remain in preview. Nothing elaborate: the Managed Agents call lives inside a small wrapper function with exactly two responsibilities. First, a fallback — if the call fails, the same task runs through the existing self-hosted loop. Second, schema validation on whatever comes back.

This buys two things. When preview-stage breaking changes arrive, the affected surface is one wrapper file. And because the wrapper can route the same task to either implementation, comparing output quality between Managed Agents and my own loop happens in one place. On cost, I also keep preview-period spending tagged separately from everything else, so any surprising billing pattern shows up early.

Start by picking the one sandbox-shaped job

When the dust settled, my first migration candidate was "monitor public release notes and summarize." It does not depend on my environment (question 1), its state is disposable every run (question 2), and a failed run can simply be repeated (question 3). It passes all three questions cleanly — almost a textbook case.

If you operate your own agent loops or scheduled jobs, the next step I would suggest is unglamorous: write the three answers next to each job on your list. This is not an all-or-nothing migration. Find the one job that clearly belongs in a sandbox, run it in parallel with your existing loop, and let the comparison teach you the rest. Evaluating new infrastructure without breaking working automation is, in the end, mostly about restraint.

I hope this helps if you are weighing the same decision about where your automation should live.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API / SDK2026-06-13
Reading a Night of Logs in Three Minutes — Building My Own Daily Brief for Ops With the Gemini API
Inspired by Gemini's Daily Brief, I built a pipeline that turns overnight operations logs into one morning email: collect, summarize with response_schema, render, deliver — with measured token counts and a fallback that kept working through the June outage.
API / SDK2026-06-02
A Month of Refreshing App Store Promotional Text Weekly with Gemini
Notes from one month of rewriting App Store promotional text (the 170-character line above the description) weekly with the Gemini API. How I reused a slot that ships without review, what I handed to AI, what I always touched by hand, and whether it moved anything.
API / SDK2026-05-29
Layering Gemini API Response Caches in Three Tiers — How I Split Memory, Redis, and Context Cache
Notes from running a three-tier cache (in-memory, Redis, Gemini Context Cache) in front of the Gemini API for six weeks across a wallpaper app — actual hit rates, billing impact, and the invalidation traps that ate me alive.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →