GEMINI LABJP
CLI — Gemini CLI and the Gemini Code Assist IDE extensions stopped serving requests on Jun 18; migrate to Antigravity or the new Go-based Antigravity CLIFLASH — Gemini 3.5 Flash is now generally available, billed as the smartest model for sustained frontier performance on agentic and coding tasksIMAGE — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview are deprecated and shut down on Jun 25; move to the successor modelsAGENTS — Managed Agents is in public preview, running stateful autonomous agents in secure, isolated Google-hosted Linux sandboxesSEARCH — File Search now supports multimodal image search natively via the gemini-embedding-2 modelMIGRATE — With deadline-bound deprecations piling up, any automation built on the CLI or old models needs a tracked migrationCLI — Gemini CLI and the Gemini Code Assist IDE extensions stopped serving requests on Jun 18; migrate to Antigravity or the new Go-based Antigravity CLIFLASH — Gemini 3.5 Flash is now generally available, billed as the smartest model for sustained frontier performance on agentic and coding tasksIMAGE — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview are deprecated and shut down on Jun 25; move to the successor modelsAGENTS — Managed Agents is in public preview, running stateful autonomous agents in secure, isolated Google-hosted Linux sandboxesSEARCH — File Search now supports multimodal image search natively via the gemini-embedding-2 modelMIGRATE — With deadline-bound deprecations piling up, any automation built on the CLI or old models needs a tracked migration
Articles/API / SDK
API / SDK/2026-06-19Advanced

Your Managed Agents Bill Has a Second Axis: Drawing a Budget Boundary Around Sandbox Runtime

Managed Agents in public preview bills for tokens and for how long its Google-hosted sandbox stays alive. A single hung run quietly drains your budget on that second axis. Here is a working Python design for wall-clock caps, idle teardown, and a concurrency ceiling.

gemini-api242managed-agents3cost-management5production115automation39agent-design2

Premium Article

One morning I was reading through a Managed Agents bill, and I stopped.

Token consumption sat almost exactly where I had estimated it. And yet the total refused to line up with the figure in my head. Tracing the gap, the culprit was not tokens at all — it was the time the Google-hosted sandbox had spent alive.

As an indie developer, I run content updates for several sites on a schedule alongside my own product work. I dropped the Managed Agents public preview into that pipeline to shave a layer off my own agent loop. Then one run quietly hung while waiting on a tool response, and the sandbox sat there, alive, for about 18 minutes. Not a single extra token was spent — only seconds of runtime, piling up.

What I want to share here is a budgeting approach that starts from one premise: a Managed Agents bill moves along two axes — tokens and runtime. Watch only one, and the other erodes your budget in silence.

Managed Agents bills along two axes

Start by separating what you are actually paying for. A single Managed Agents call spins up an isolated, Google-hosted Linux sandbox and runs reasoning, tool calls, code execution, and file operations inside it. The convenience comes with a second thing to meter.

AxisDriven byWhere it balloonsCaught by usual monitoring?
Token costInput / output / thinking tokensLong context, verbose retriesYes — your existing cost tooling sees it
Sandbox runtimeWall-clock seconds the sandbox was aliveHangs on tool waits, idling, over-parallelismEasy to miss

Your instincts from past API work carry over fine to the token side. The runtime side is the problem. An agent that is "thinking and stuck," "waiting on an external API," or "done but with a sandbox that never got torn down" accrues seconds while spending zero tokens.

In my case, one hung run kept a sandbox alive for 18 minutes, while the healthy runs that same morning averaged around 40 seconds. A single missed teardown turned into the runtime of 27 normal runs. The scarier part was not the money — it was realizing I had never been watching this axis at all.

The budget boundary starts at a per-run wall-clock cap

The first boundary to draw is the maximum wall-clock seconds a single run is allowed to stay alive. Not a token cap — a time cap.

The idea is simple. Launch the run asynchronously, put a deadline on the whole thing with asyncio.wait_for, and explicitly tear the sandbox down if it overruns. Even when it hangs, it is guaranteed to fold at the cap.

import asyncio
import time
from dataclasses import dataclass
 
@dataclass
class RuntimeBudget:
    max_wall_seconds: float = 180.0   # ceiling on how long one run may live
    idle_seconds: float = 45.0        # grace period once progress stalls
    max_concurrent: int = 2           # ceiling on simultaneous live sandboxes
 
class RuntimeBudgetError(Exception):
    """Raised when a wall-clock or idle ceiling is exceeded."""
 
async def run_with_wall_clock(client, *, model, instruction, budget):
    """Put a wall-clock deadline on a single Managed Agents run.
 
    client.agents reflects the public-preview surface; map
    create_run / poll_run / cancel_run to your SDK's names.
    """
    started = time.monotonic()
    run = await client.agents.create_run(model=model, instruction=instruction)
    try:
        result = await asyncio.wait_for(
            _poll_until_done(client, run.id, budget),
            timeout=budget.max_wall_seconds,
        )
        return result, time.monotonic() - started
    except asyncio.TimeoutError:
        # Skip this and the sandbox keeps billing seconds in silence
        await client.agents.cancel_run(run.id)
        raise RuntimeBudgetError(
            f"run {run.id} exceeded wall-clock cap {budget.max_wall_seconds}s"
        )

The single most important line here is cancel_run in the except block. When the deadline trips wait_for, the sandbox itself keeps living quietly. Only an explicit teardown stops the clock. That one line is exactly what I missed the first time.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
If your Managed Agents bill never matches your token estimate, you will learn to separate out the second billing axis — sandbox wall-clock runtime — and pinpoint exactly where the cost is leaking
You get a per-run wall-clock cap, automatic idle teardown of the sandbox, and a concurrency ceiling, all as runnable asyncio Python you can drop in today
A per-stage JSONL runtime ledger lets you say, in numbers and before the invoice arrives, which automation is eating your time budget
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-16
Before You Let a Managed Agent Ship: Designing Your Own Acceptance Gate
Let the public-preview Managed Agents generate files and broken artifacts will flow straight into production. Here is how to build a verification gate that artifacts must pass before you accept them, with runnable Python and a rejection-feedback loop.
API / SDK2026-06-13
The Morning Gemini Generated Fine but the Publish Crashed — A 'Generation Outbox' So Expensive Output Is Never Lost
Generation succeeds, then the process dies right before publishing. The expensive output is gone, and you pay for the same generation again. Here is a 'generation outbox' that persists the output first and turns publishing into an idempotent follow-up, plus what it did for me during the June outage.
API / SDK2026-06-13
Should You Move Your Agent Loop to Gemini's Managed Agents? Three Questions That Decide What Migrates
With Gemini API's Managed Agents in public preview, deciding between a self-hosted agent loop and a Google-hosted sandbox is now a real question. Three questions — execution environment, state ownership, and failure recovery — decide what migrates and what stays.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →