Managed Agents, announced at Google I/O 2026, are now available in public preview on the Gemini API. The pitch is appealing: a single API call spins up an agent inside a Google-hosted, isolated Linux sandbox, where it reasons, calls tools, executes code, and hands you back the result.
I run a fair amount of automation as an indie developer — scheduled blog maintenance, image-asset housekeeping, and similar background jobs — all driven by agent loops I wrote and operate myself. My honest first reaction to the announcement was an even split of hope and suspicion. Hope, because if something else can carry the tedious parts of running a loop, I will gladly let it. Suspicion, because moving automation that already works is one of the more reliable ways to hurt yourself.
So I went through the jobs running on my machines, one by one, asking a single question: could this move to Managed Agents, and should it? The short answer is that not everything made the cut — but the reasoning collapsed neatly into three questions. What follows is that working-through, written down.
Your agent loop is mostly not the loop
The agent loop itself is surprisingly little code. Call the model; if it returns a function call, run the matching function; feed the result back; repeat. The skeleton fits in about thirty lines.
Here is a minimal loop with exactly one tool, a release-notes checker. Gemini calls the tool when it needs to, and the loop ends once a final text report comes back.
from google import genai
from google.genai import types
client = genai.Client() # reads GEMINI_API_KEY from the environment
def check_release_notes(product: str) -> dict:
"""Returns the latest release-notes entry (a real version would hit an RSS feed or DB)."""
return {"product": product, "latest": "1.4.2", "breaking_changes": False}
tool = types.Tool(function_declarations=[
types.FunctionDeclaration(
name="check_release_notes",
description="Fetches the latest release-notes entry for a product name",
parameters=types.Schema(
type=types.Type.OBJECT,
properties={"product": types.Schema(type=types.Type.STRING)},
required=["product"],
),
)
])
contents = [types.Content(
role="user",
parts=[types.Part(text="Check whether the latest release of dependency foo contains breaking changes, and report in one paragraph")],
)]
for _ in range(5): # hard cap to prevent runaway loops
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=contents,
config=types.GenerateContentConfig(tools=[tool]),
)
if not response.function_calls:
print(response.text)
break
contents.append(response.candidates[0].content)
for call in response.function_calls:
result = check_release_notes(**call.args)
contents.append(types.Content(
role="user",
parts=[types.Part.from_function_response(name=call.name, response=result)],
))I kept the skeleton deliberately bare to make a point: in production, everything that matters accretes around it. Retries with exponential backoff. Logging and persistence of intermediate progress. Keeping the execution environment alive — cron, containers, whatever you use. Credential management. Timeouts and protection against overlapping runs. In my own codebase, the operational layer around the loop is considerably larger than anything related to the agent's actual thinking.
I covered that production scaffolding in detail in Custom Gemini API Agent Loop Without ADK — A Complete Production Guide to Tool Calling, Memory, and Parallel Execution, but the one-line summary is: the loop is easy, the operations are the product. That is the baseline for everything below.
Which part of that operational layer do Managed Agents actually absorb?
Reading through what Google has published, the scope of Managed Agents looks like this. They provision and tear down the execution environment — the isolated Linux sandbox. They drive the loop itself: reason, execute tools and code, continue, repeat. And they hold the agent's state while it runs. In other words, out of the operational layer I just called "the product," the environment upkeep and the loop orchestration move wholesale to the other side of the API.
What stays on your side is just as clear. Defining the task — what you actually want done. Receiving the result and judging whether it is correct. Handling failure. Watching cost. You can delegate how the agent runs; you cannot delegate why it runs or what happens with what it produces.
Looking at that boundary, what struck me was not the freedom from cron management. It was the fact that verification and failure handling stay with me — because in my experience, the time sink in operating automation has never been keeping environments alive. It is investigating things when they break. That realization is why I stopped asking "is this convenient?" and started asking the three questions below.
Question 1 — Does the execution environment itself carry meaning?
The first question is where the job actually needs to run.
A good number of my jobs read and write files in a local workspace: walking a wallpaper app's image-asset folder to classify new pieces, cloning a repository to push articles, appending run logs to a known location. For these, the fact that they run on infrastructure I control is the point. Moving them into a sandbox means redesigning credential handling and file transfer, and by my estimate the migration cost exceeds the payoff.
The opposite kind of job — pass input in, investigate or transform, return output — runs the same anywhere. Monitoring public release notes. Researching and summarizing information from the web. Transforming data into reports. These fit the Managed Agents sandbox naturally, and for transformations that involve code execution, the isolation is an upgrade in its own right. Running model-generated code directly on your own machine is, frankly, something I have never been fully relaxed about.
Question 2 — Who owns the canonical copy of state?
The second question is about ownership of the agent's state.
Managed Agents support stateful agents, and having the sandbox carry conversational and working context across a long multi-step task is a real advantage. But I am not ready to hand the canonical copy of any state to a service in public preview. Preview APIs change. We just lived through an example: the Interactions API removed its legacy schema, forcing a migration from outputs to steps on short notice.
So my rule ended up simple. If state is ephemeral — meaningful only while the task runs — let the sandbox hold it. If state is an asset — referenced by future runs or by other jobs — the canonical copy lives in my own database or files, and the agent receives only what it needs, per run. Hold that line, and even if the Managed Agents surface shifts underneath you, the blast radius is one in-flight task.
Question 3 — When it fails, who can pick up the pieces, and at what granularity?
The third question was the decisive one for me: recoverability on failure.
A self-hosted loop fails at a granularity you can see. Which tool call, with which arguments, returned what before things stopped — you can trace as deeply as you bothered to log, and you are free to build resume-from-midpoint logic. During Gemini's major outage earlier this month, my pipelines survived on a staged retreat: retry, fall back, and if all else failed, write a log entry and exit quietly. That was only possible because I owned a hook at every stage of the loop.
Move a job into Managed Agents and your granularity is whatever the API exposes. How much of the agent's internal steps will become observable over time is something I genuinely hope improves, but as a design assumption, "you cannot intervene in intermediate state" is the safe stance today. Which narrows the candidates to jobs that can simply be rerun from scratch — idempotent work. Anything that emits side effects as it goes, where a half-finished run leaves value or risk behind, stays in the self-hosted loop where I hold the recovery tools.
One related note: the more "submit and wait" workloads you adopt — and Managed Agents are exactly that shape — the more your completion-notification design matters. I wrote about replacing polling with event-driven completion handling in Retiring the Midnight Polling Loop — Rebuilding My Gemini Batch Monitoring Around Webhooks, which pairs well with this discussion.
A small habit for keeping preview services next to production
Even for jobs that pass all three questions, I keep one thin layer of protection in place while Managed Agents remain in preview. Nothing elaborate: the Managed Agents call lives inside a small wrapper function with exactly two responsibilities. First, a fallback — if the call fails, the same task runs through the existing self-hosted loop. Second, schema validation on whatever comes back.
This buys two things. When preview-stage breaking changes arrive, the affected surface is one wrapper file. And because the wrapper can route the same task to either implementation, comparing output quality between Managed Agents and my own loop happens in one place. On cost, I also keep preview-period spending tagged separately from everything else, so any surprising billing pattern shows up early.
Start by picking the one sandbox-shaped job
When the dust settled, my first migration candidate was "monitor public release notes and summarize." It does not depend on my environment (question 1), its state is disposable every run (question 2), and a failed run can simply be repeated (question 3). It passes all three questions cleanly — almost a textbook case.
If you operate your own agent loops or scheduled jobs, the next step I would suggest is unglamorous: write the three answers next to each job on your list. This is not an all-or-nothing migration. Find the one job that clearly belongs in a sandbox, run it in parallel with your existing loop, and let the comparison teach you the rest. Evaluating new infrastructure without breaking working automation is, in the end, mostly about restraint.
I hope this helps if you are weighing the same decision about where your automation should live.