⬡ Advanced/2026-07-01Advanced

Getting Artifacts Out of a Managed Agents Sandbox Safely — Scoped Credentials and Egress Design

Gemini API Managed Agents run in a Google-hosted isolated sandbox. Here is the short-lived, least-privilege credential and egress-boundary design I use to return generated artifacts to my own repository safely.

Gemini API¹⁵⁹ Managed Agents³ Security⁴ Agents⁷ Indie Development⁷

✦ Premium Article

The first thing I tried when Managed Agents hit public preview was whether I could move part of my usual article-generation pipeline into that sandbox. Having an agent plan, run code, and touch files entirely inside a Google-hosted isolated Linux environment is genuinely appealing when you are otherwise running your own containers.

But the moment I ran it, I hit the least glamorous and most dangerous problem right away: how do you get the artifacts you produced inside the sandbox back into something you control — a GitHub repository or object storage? Build this carelessly and you end up placing a key that can touch your entire infrastructure inside a sandbox that Google hosts. The whole point of isolation flips on its head.

This article focuses on that single concern — egress — and shares the design I actually adopted for the Dolice Labs automation. It also ties in neatly with the 6/19 change that started rejecting requests from unrestricted API keys.

Why design "getting artifacts out" as its own problem

A Managed Agents sandbox is best treated as a stateless environment that spins up per run and disappears when done. Planning, reasoning, and code execution all complete inside it. That is convenient, but it also means your artifacts — generated MDX, built JSON, images — vanish along with the sandbox. So you have to push them somewhere.

The naive path looks like this: you ask the agent to "push to GitHub" and hand it a full GITHUB_TOKEN as an environment variable. It works. But at that instant, the code running inside the sandbox — including code the model wrote, which you do not fully control — can reach every repository that token can touch. In my case, running several sites under one account as an indie developer, one over-broad token leaking puts all four sites in range.

There is one principle here. Give the sandbox nothing more than the ability to write this run's artifact to one predetermined place. No read, no reach into other repositories, no delete. Narrow egress to this granularity and even if the sandbox code goes rogue, the damage stops at "this run's output location gets dirty."

The bad way and the good way to hand over credentials

First, the shape to avoid.

# ❌ Bad: pass a long-lived, broad token straight into the sandbox
agent = client.agents.create(
    model="gemini-flash-latest",
    environment={
        "env_vars": {
            # This PAT can read/write every repo, and effectively never expires
            "GITHUB_TOKEN": "ghp_LONG_LIVED_BROAD_SCOPE_TOKEN",
            # And it even carries the deploy key along
            "CF_API_TOKEN": "cloudflare_account_wide_token",
        }
    },
)

The problem is that scope, lifetime, and reach are all "wide." The token is long-lived, can write to every repo, and touches deploys on top. Since you cannot verify the code running inside the sandbox line by line, you should trim the key down to exactly what that code needs to do.

The good shape is to not hand over a general-purpose token at all. On the caller side (your own controlled server), you issue a short-lived signed upload URL valid for exactly one object this run, and hand only that to the sandbox.

# ✅ Good: caller issues a write-only URL scoped to this run's single artifact
import datetime
from google.cloud import storage
 
def issue_upload_url(site: str, run_id: str) -> str:
    bucket = storage.Client().bucket("dolice-agent-artifacts")
    # Pin the target to a per-run prefix (cannot touch other runs' space)
    blob = bucket.blob(f"incoming/{site}/{run_id}/output.tar.gz")
    return blob.generate_signed_url(
        version="v4",
        expiration=datetime.timedelta(minutes=15),  # expires in 15 minutes
        method="PUT",                               # PUT only. no GET, no DELETE
        content_type="application/gzip",
    )
 
signed_url = issue_upload_url("gemilab", run_id="20260701-01")
 
agent = client.agents.create(
    model="gemini-flash-latest",
    environment={
        # All the sandbox receives is "write this one object, for 15 minutes"
        "env_vars": {"ARTIFACT_UPLOAD_URL": signed_url},
    },
)

The difference is stark. In the bad example the key's reach was "the whole account." In the good example it is "one object at incoming/gemilab/20260701-01/output.tar.gz, via PUT, for 15 minutes." Even if this signed URL ends up in a log, all a thief can do is overwrite this run's output — and it becomes invalid 15 minutes later.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦An egress implementation that hands the sandbox only a 15-minute signed URL — never a long-lived GitHub PAT

✦Locking the egress target to one bucket prefix with a write-only role so blast radius stays contained

✦Turning the 6/19 unrestricted-key rejection into a discipline for scoping the agent's Gemini API key

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Keep the sandbox side to "PUT one object"

The code inside the sandbox should bundle the artifact and throw it at the URL it was given — nothing more. The key is to not smuggle any extra authority decisions in here.

# egress code running inside the sandbox (holds no more authority than this)
import os, tarfile, io, urllib.request
 
def egress_artifacts(src_dir: str = "/workspace/out"):
    url = os.environ["ARTIFACT_UPLOAD_URL"]  # PUT-only, 15-minute TTL
    buf = io.BytesIO()
    with tarfile.open(fileobj=buf, mode="w:gz") as tar:
        tar.add(src_dir, arcname="out")
    buf.seek(0)
    req = urllib.request.Request(
        url, data=buf.read(), method="PUT",
        headers={"Content-Type": "application/gzip"},
    )
    with urllib.request.urlopen(req, timeout=30) as resp:
        # Anything other than 200/201 fails hard (leave no partial success)
        if resp.status not in (200, 201):
            raise RuntimeError(f"egress failed: {resp.status}")

This code knows about no other bucket, no other run's prefix, and no GitHub. What it does not know, it cannot leak. Reflecting into GitHub happens afterward, on your own controlled server. The sandbox writes to storage; the repository update happens in your trusted environment — this two-stage split is what keeps egress at least privilege.

Quarantine on ingest: don't over-trust the signed URL

Once the artifact lands in the bucket, your server fetches it, validates it, and only then reflects it into the repository. Skip this and a broken output from the sandbox — an empty file, a truncated tar, an unexpected path — flows straight to production. I skipped this quarantine on my first attempt, and an empty MDX sat in the bucket and got within one step of being reflected. That was a cold-sweat moment.

# your controlled server (only this side holds GitHub write authority)
def ingest(site: str, run_id: str):
    blob = storage.Client().bucket("dolice-agent-artifacts") \
        .blob(f"incoming/{site}/{run_id}/output.tar.gz")
    if not blob.exists():
        raise FileNotFoundError("artifact not delivered")
    data = blob.download_as_bytes()
 
    # Quarantine 1: absurdly small size == treat as a failed output
    if len(data) < 512:
        raise ValueError(f"artifact too small: {len(data)} bytes")
 
    members = _safe_extract(data)  # Quarantine 2: reject path traversal (../)
    # Quarantine 3: is the expected shape present (both JA and EN MDX)?
    ja = [m for m in members if m.startswith("out/ja/") and m.endswith(".mdx")]
    en = [m for m in members if m.startswith("out/en/") and m.endswith(".mdx")]
    if len(ja) != len(en) or not ja:
        raise ValueError(f"JA/EN mismatch: ja={len(ja)} en={len(en)}")
    return ja, en

The point here is that only your controlled server holds the authority to write to GitHub. Since the sandbox can only reach as far as placing an object in storage, an artifact that fails quarantine gets discarded without ever touching the repository. Running the JA/EN count-match check at this stage too lets me defend the invariant I never break at Dolice Labs — every Japanese article ships with an English one — once more along the egress path.

Turn the 6/19 "unrestricted-key rejection" into key discipline

Since 6/19, requests from API keys without restrictions get rejected. It is a change meant to curb abuse and runaway billing, but it dovetails with Managed Agents egress design — because the Gemini API key you hand the sandbox should be tightened with the same mindset as egress.

I scope the Gemini API key for Managed Agents along three lines.

Restrict by purpose. Allow only the Gemini API the agent execution needs, and disable other Google APIs. The narrower a single key's reach, the smaller the damage when it leaks.
Separate the project. I keep the automation GCP project separate from production and set a monthly ceiling with Project Spend Caps. Even if code inside the sandbox loops on calls, the month's billing halts at the cap.
Lean toward short-lived keys. Rather than keeping a long-lived key resident, I lean toward issuing one right before a run and revoking it after. The idea is identical to the signed URL for egress: just enough, just for the duration of this run.

I take the unrestricted-key rejection as a kind of health check that surfaces over-broad keys left lying around. If you are pulling Managed Agents into personal operations, revisiting how you scope keys at the same time pays off later.

Make egress idempotent, since the sandbox will disappear

Managed Agents tears down the whole sandbox once a run ends. If egress fails partway through — a network blip, a timeout — the sandbox will not retry for you. This is where it matters to pin the output target to run_id and make it idempotent.

In the earlier implementation, the target was pinned to incoming/{site}/{run_id}/output.tar.gz. So re-invoking the same run writes to the same single object, and no double reflection occurs. The ingest side is shaped as "if the object for that run_id passes quarantine and has not been reflected yet, reflect it." In my setup, I reuse this idempotency key as the reflection-log key too, so I can trace "which run, when, produced which slug" in a single line.

Without an idempotent design, a manual retry after an egress failure can drop the same article in twice. Since the sandbox will not self-heal, retry safety has to be guaranteed on the caller and ingest sides.

Why I settled on this shape

Having touched both a self-hosted agent-loop setup and a Managed Agents one, the egress design philosophy converged on the same place. Hand an untrusted execution environment only least privilege, short lifetime, and a single destination. The moment you place a broad key inside an isolated environment, the benefit of isolation nearly evaporates.

Managed Agents is genuinely lightweight in that you can run an agent without building an environment, and it lowers verification cost. But that lightness is no reason to build egress carelessly. If anything, because you are placing a key that can reach your assets inside a Google-hosted environment, I want to tighten egress more carefully than I would when self-hosting — that is my honest read.

I hope this gives a starting point to anyone moving multi-project automation onto agents. Begin with one thing: prepare a single URL that can write "this run's one object, to a predetermined place, for a short time."

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.