●CLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successor●FLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasks●DEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logic●APP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini Spark●DESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalities●ULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context window●CLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successor●FLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasks●DEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logic●APP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini Spark●DESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalities●ULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context window
Wiring Gemini Managed Agents Into Your Automation: Keeping Conversation State and Environment State Apart
Managed Agents spin up a Linux sandbox, run an agent loop, and return a result in a single API call. The first thing that trips you up when moving off a hand-rolled loop is that conversation state and file state are two separate things. Here's that design, worked through live.
The first time I wired the public-preview Managed Agents into my own automation, the thing that confused me most in the opening half hour wasn't an error or a cost estimate. It was figuring out where state actually lives. A single call to client.interactions.create(...) stands up a Linux sandbox on Google's side, has Gemini 3.5 Flash write and run code inside it, and hands back a block of text. That part is almost anticlimactically easy. The trouble started on the second call. Trying to ask the agent to "continue where it left off," I assumed the conversation history and the sandbox files were the same thing, carried only one of them forward, and lost the better part of an hour.
This memo takes that misunderstanding head-on, because anyone who has hand-written an agent loop is likely to stub their toe on exactly this. The running example is a small formatting helper I use in my own work as an indie developer: it takes text, cleans it up, generates one chart, and writes the result out.
What actually happens in a single call
Start with the smallest possible round trip. For grounding alone, it's worth firing one off yourself and looking at the shape of the response.
# pip install google-genaifrom google import genaiclient = genai.Client() # reads GEMINI_API_KEY from the environmentinteraction = client.interactions.create( agent="antigravity-preview-05-2026", # the default general-purpose Managed Agent input="Generate the first 20 Fibonacci numbers and save them to fibonacci.txt. " "Then read the file back and print its contents.", environment="remote", # provision a fresh sandbox every time)print("interaction.id =", interaction.id)print("environment_id =", interaction.environment_id)print("output_text =", interaction.output_text)print("steps =", len(interaction.steps)) # reasoning, tool calls, code execution
That one create call provisions the sandbox, runs the agent loop, and returns the result, all at once. On the returned Interaction object, three fields are the ones I keep my eye on in practice. output_text is the final answer; steps is the array of each step the agent took (reasoning, tool call, code execution); and environment_id identifies the sandbox you just used. That last one is the key to doing anything "next."
On my formatting task, steps typically ran between six and eleven entries. Even a short instruction goes through plan → generate code → run → inspect file, so rather than stopping at output_text, it pays to glance at steps to see how the agent actually solved it.
The real snag: there are two axes of state
Managed Agents track state across two independent dimensions. Treating them as one is the classic trap for anyone coming from a hand-rolled loop.
The first is conversation context: chat history, the reasoning trace, the flow of tool use. You carry it forward by passing the previous interaction.id as previous_interaction_id.
The second is environment state: the files in the sandbox, the installed packages, the contents of the working directory. You carry it forward by passing the previous environment_id as environment.
You pass them separately, not mixed together.
interaction_2 = client.interactions.create( agent="antigravity-preview-05-2026", previous_interaction_id=interaction.id, # continue the conversation environment=interaction.environment_id, # keep the files too input="Now plot that sequence as a line chart and save it as chart.png.",)print(interaction_2.output_text)
Here fibonacci.txt still exists on the second turn, and the agent remembers what "that sequence" refers to. My first failure was passing only previous_interaction_id while setting environment="remote" — a brand-new box. The conversation continues, but the files are gone, so the agent honestly reports that it can't find the earlier file. Once "state has two axes" clicks, that's exactly the behavior you'd expect.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How previous_interaction_id (conversation) and environment (files) are two independent axes, shown across all four useful combinations
✦Before/after code for moving a hand-rolled agent loop onto Managed Agents, with a tally of the boilerplate that disappears
✦Field notes on the 7-day sandbox TTL, automatic ~135k-token compaction, and pulling generated files back out
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Two axes means there are 2×2 = four ways to resume. I stopped memorizing this as a table and started pairing each option with the situation it's for.
Keep conversation, keep files (pass both): the main multi-turn path, when you build on the previous artifact.
Drop conversation, keep files (omit previous_interaction_id, pass only environment): start unrelated work in the same workspace without dragging context along. Use this to avoid "context rot" from issuing an off-topic instruction while carrying a long history.
Keep conversation, fresh files (pass previous_interaction_id, set environment="remote"): when you want the prior decisions remembered but don't want to dirty the workspace. Good for processing several targets in turn under the same policy.
Drop conversation, fresh files (pass neither): a fully independent one-shot, for batch runs that should start clean every time.
The knack to running Managed Agents cleanly, I've found, is not defaulting every call to "keep both." The third option in particular (continue conversation, fresh environment) was fiddly to implement in a hand-rolled loop, and being able to select it declaratively is a genuine relief.
What disappears when you move off a hand-rolled loop
This is where the difference hits hardest for anyone who has written their own agent. A naive hand-rolled loop usually has roughly this skeleton:
# Before: a hand-rolled agent loop (the "operations layer" is the real body)history = []while True: resp = model.generate(history) # (1) call the model if resp.tool_call: # (2) detect a tool call if resp.tool_call.name == "run_python": out = sandbox.exec(resp.tool_call.code) # (3) run in your own sandbox history.append(tool_result(out)) # (4) feed the result back in continue if looks_done(resp): # (5) decide it's finished (deceptively hard) break history = compact_if_too_long(history) # (6) handle context growthbreak
That while loop is not really the body. The body is everything around it: tool detection in (2), the isolated execution environment in (3), history shaping in (4), the done-detection heuristic in (5), and context compaction in (6). Move onto Managed Agents and all of that surrounding code goes away.
# After: Managed Agents (you hand the operations layer to Google)interaction = client.interactions.create( agent="antigravity-preview-05-2026", input=task_text, environment="remote",)result = interaction.output_text
In my case, five things disappeared specifically: sandbox provisioning and teardown, the Python/Node/Bash execution handlers, tool-result history shaping, the done-detection heuristic, and context compaction. On that last one, Managed Agents insert an automatic compaction step at around 135k tokens, so long multi-turn runs are far less likely to hit a token-limit error or rot the context. The "summarize and fold old tool results once the history grows" code I used to maintain became unnecessary in full.
The flip side is that what you give up is control over the execution environment. If you need a custom image with specific system libraries baked in, or execution confined to an internal network, confirm that before handing your own sandbox away. My formatting task was the kind where the sandbox's identity is irrelevant, so delegating cost me nothing.
Pulling generated files back out of the sandbox
A file the agent creates inside the sandbox, like chart.png, is useless until you bring it down locally. The SDK doesn't have a dedicated method for this yet, so you hit the Files API over plain HTTP and pull a snapshot (a tar).
import os, requests, tarfileenv_id = interaction.environment_idapi_key = os.environ["GEMINI_API_KEY"]resp = requests.get( f"https://generativelanguage.googleapis.com/v1beta/files/environment-{env_id}:download", params={"alt": "media"}, headers={"x-goog-api-key": api_key}, allow_redirects=True,)resp.raise_for_status()with open("snapshot.tar", "wb") as f: f.write(resp.content)with tarfile.open("snapshot.tar") as tar: tar.extractall(path="extracted_snapshot") # chart.png lives in here
Note that what comes back is an environment snapshot tar, not an individual file. I wasted time at first hunting for an API to download chart.png directly. In practice the current idiom is "download the whole environment, extract it, and pick out the file you want." Drop allow_redirects=True and you'll get a 302 with no body, so don't forget that either.
For long runs, stream and watch
A heavier task — read Hacker News, summarize the top five, and save a PDF — is operationally safer to stream than to block on. Pass stream=True and you get step deltas (text, reasoning tokens, tool-call updates) back as an iterable.
stream = client.interactions.create( agent="antigravity-preview-05-2026", input="Summarize the top 5 Hacker News stories and save them as a PDF.", environment="remote", stream=True,)for event in stream: print(event) # emit logs, advance a progress bar, watch for stalls here
Inside that loop I keep a watchdog that aborts if no step advances for a set interval. The environment itself stays resumable for seven days since last activity, with the TTL resetting on each use, but that does not mean "you may wait forever." It's safer to treat your client-side wait budget and the sandbox's keep-alive window as two separate things. Layering an SDK-side timeout (in JavaScript you can pass { timeout: 300_000 }) on top of your own watchdog keeps a stalled run from going unnoticed.
Pin down repeated work as a saved agent
Up to here we passed configuration inline, but if the same role repeats daily, saving the configuration and invoking it by ID makes it much easier to handle. agents.create lets you define a system instruction and an initial environment (baked from a GitHub repository or inline files).
agent = client.agents.create( id="report-formatter", base_agent="antigravity-preview-05-2026", system_instruction="A formatting agent: clean up the input text, add a summary table and one chart, and write it to PDF.", base_environment={ "type": "remote", "sources": [ { "type": "inline", "target": ".agents/AGENTS.md", "content": "Always include a summary table and one chart in the output.", }, ], },)# From now on, invoke by ID alone. Each call forks the base environment, so every run starts clean.result = client.interactions.create( agent="report-formatter", input=today_text, environment="remote",)print(result.output_text)
A saved agent forks its base environment on every invocation, so it never carries last run's mess forward. That design suits daily-batch work that should always start from the same initial state. Conversely, if you want to build on the previous run, you don't lean on the saved agent's fork; you explicitly use the environment_id hand-off described above. Once again, the one thing that decides the right move is the very first point: conversation state and environment state are different things.
A realistic first step if you're considering the move
Managed Agents take over "everything but the loop" from a hand-rolled agent, in exchange for assuming you'll let go of the execution environment's internals. For that reason, rather than re-platforming a core pipeline outright, the realistic move is to start with one task where the sandbox's identity doesn't matter. For me, that was the formatting job.
Worth confirming in that first trial: that a follow-up call using environment_id truly preserves files; that the drop-conversation-keep-environment combination behaves as intended; that the artifact tar extracts the way you expect; and that on a long task your client-side timeout and watchdog both fire. Get those four through on a single task, and the second and third tasks get much easier. It's a preview feature, so my current stance is to place it small, right next to production but where a failure won't ripple into the main path, and let it settle in from there.
If this spared you one of the snags from my own first half hour, I'll consider it well spent.
Share
Thank You for Reading
Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.