GEMINI LABJP
MODEL — Gemini 3.5 Flash reaches general availability and becomes gemini-flash-latestAPI — The Interactions API hits GA as the primary way to work with Gemini models and agentsAGENT — Managed Agents enter public preview, running stateful agents in isolated Linux sandboxesAPI — Background execution lands, letting you fire long-running jobs and collect results laterSEARCH — File Search now embeds and searches images natively via gemini-embedding-2NOTICE — Since June 19, requests from unrestricted API keys are blockedMODEL — Gemini 3.5 Flash reaches general availability and becomes gemini-flash-latestAPI — The Interactions API hits GA as the primary way to work with Gemini models and agentsAGENT — Managed Agents enter public preview, running stateful agents in isolated Linux sandboxesAPI — Background execution lands, letting you fire long-running jobs and collect results laterSEARCH — File Search now embeds and searches images natively via gemini-embedding-2NOTICE — Since June 19, requests from unrestricted API keys are blocked
Articles/API / SDK
API / SDK/2026-06-30Advanced

Folding Scattered Call Sites Into One Front Door: Migrating to the Interactions API for Automation

With the Interactions API now generally available, Gemini's calls can settle behind a single entry point. Here is a migration design for folding scattered call sites — generateContent, Batch, and homegrown agent loops — into one front door without breaking anything, complete with a working adapter layer.

gemini-api257interactions-api3architecture11migration6automation45observability10

Premium Article

It was somewhere around my twenty-odd scheduled jobs that I noticed the "entry point" of my own code had quietly split into four. Article prep called generateContent directly; the nightly bulk work went through Batch; the App Store review summaries ran on a homegrown agent loop; the image work lived in yet another helper. Each was the shortest path at the time. But the first thing I had to do, returning to one job after six months, was remember which entry point it had even used.

On June 30 the Interactions API became generally available, and the primary entry point for Gemini's models and agents settled here. Managed Agents, background execution, and Gemini Omni all line up under the same door. This reads less like a flashy feature and more like the quiet kind of update that pays off slowly over a long time. A single call surface means the version of me six months from now no longer has to remember where a call was coming from.

This article is not about writing one fresh script. It is a migration design for folding already-running, scattered call sites into one front door — without stopping them, without breaking them. It covers a working adapter layer, the order of migration, and the accidents that happen precisely because you are mid-migration.

What scattered entry points actually cost

More call sites is not painful at first. It hurts six months later, and it surfaces in three shapes.

The first is duplicated instrumentation. Token accounting, retry-on-failure, timeout handling — you end up writing each one slightly differently per entry point. In my case, the retry ceiling was three on the generateContent path and, on the Batch path, unset and therefore unbounded — a mismatch I found only later. It is the classic way a cost anomaly goes unnoticed for too long.

The second is that a model swap never ends in one place. When gemini-flash-latest became the body of 3.5 Flash, I had to change four entry points separately. Every time I fixed one, I doubted whether I had really fixed them all. That is not a problem of count; it is a problem of an unseeable blast radius.

The third is that you cannot easily adopt a new operating mode. Even when you want background execution — submit, then receive when done — scattered entry points make it hard to tell which path to rewrite first.

The essential benefit of consolidation is that all three become "fix it in one place." The Interactions API can be that receiver, but you do not need to move everything at once. Slipping a thin layer in between was the safest approach I found.

Put the front door in your adapter layer, not the API

Here is the judgment I most want to convey. Do not place the consolidated front door on the Interactions API itself. Place it on a thin adapter layer that you own.

The reason is simple: the API's details will keep changing. Just as the legacy outputs schema was removed on June 6, schemas and arguments move with their deprecation deadlines. If your application code grips those details directly, every such change means touching every job. With one adapter in between, only the inside of the adapter changes.

What the adapter offers is exactly one entry point. You hand it "what you want done," and a "result" comes back. Inside, it calls the Interactions API.

# llm_gateway.py — the single front door you own
import os
import time
import uuid
import logging
from dataclasses import dataclass, field
from typing import Any
 
from google import genai  # the real call is confined to the inside of this module only
 
log = logging.getLogger("llm_gateway")
_client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 
# Centralize model selection here. A swap is fixed in this one place.
MODEL_BY_TIER = {
    "fast": "gemini-flash-latest",      # prep, classification, light passes
    "deep": "gemini-3-pro",             # the real work that needs reasoning
}
 
@dataclass
class Request:
    task: str                            # what you want done (the prompt body)
    tier: str = "fast"                   # fast / deep
    idempotency_key: str = field(default_factory=lambda: uuid.uuid4().hex)
    background: bool = False             # submit long work and receive later
    metadata: dict[str, Any] = field(default_factory=dict)
 
@dataclass
class Result:
    text: str
    model: str
    usage: dict[str, int]
    idempotency_key: str
 
def run(req: Request, *, max_retries: int = 3) -> Result:
    """The single entry point every job passes through.
    Instrumentation, retries, and model selection all converge here."""
    model = MODEL_BY_TIER[req.tier]
    started = time.monotonic()
    last_err: Exception | None = None
 
    for attempt in range(1, max_retries + 1):
        try:
            # The one and only API-dependent point. Even if details shift,
            # nothing leaks outside this function.
            resp = _client.interactions.create(
                model=model,
                input=req.task,
                # A re-send with the same key prevents duplicate billing and execution
                idempotency_key=req.idempotency_key,
                background=req.background,
            )
            usage = {
                "input": resp.usage.input_tokens,
                "output": resp.usage.output_tokens,
            }
            _record(req, model, usage, time.monotonic() - started, attempt)
            return Result(
                text=resp.output_text,
                model=model,
                usage=usage,
                idempotency_key=req.idempotency_key,
            )
        except Exception as err:  # narrow the type in real use
            last_err = err
            wait = min(2 ** attempt, 30)
            log.warning("run failed (attempt %d/%d): %s — retry in %ss",
                        attempt, max_retries, err, wait)
            time.sleep(wait)
 
    _record(req, model, {}, time.monotonic() - started, max_retries, failed=True)
    raise RuntimeError("llm_gateway.run exhausted retries") from last_err
 
def _record(req, model, usage, elapsed, attempts, *, failed=False):
    # Instrumentation in one place too. Cost rollups and latency monitoring
    # are just a matter of reading this log line.
    log.info("llm_call key=%s model=%s tier=%s in=%s out=%s elapsed=%.2f attempts=%d failed=%s job=%s",
             req.idempotency_key, model, req.tier,
             usage.get("input"), usage.get("output"),
             elapsed, attempts, failed, req.metadata.get("job", "-"))

Please note that the argument names of _client.interactions.create(...) are the spot to confirm against the docs at the time you adopt this. GA pushes arguments toward stability, but the very design of not scattering this across your app is your insurance against that uncertainty. Code outside the adapter knows only run(Request(...)).

The moment you place this adapter, the three pains above disappear. Instrumentation is the one _record. Model swaps are the one MODEL_BY_TIER. The retry ceiling is the one argument on run. None of them require hunting around anymore.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
An order and decision rule for gathering scattered call sites behind one adapter layer through a non-breaking, staged migration
An implementation pattern that centralizes idempotency keys, instrumentation, and model swaps at the front door — with a working Python adapter
A design for folding away result-waiting loops on top of background execution, plus how to avoid the double-counting that migrations invite
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-21
Should You Move Your Agent Loop to Gemini's Managed Agents? Three Questions That Decide What Migrates
With Gemini API's Managed Agents in public preview, deciding between a self-hosted agent loop and a Google-hosted sandbox is now a real question. Three questions — execution environment, state ownership, and failure recovery — decide what migrates and what stays.
API / SDK2026-06-17
Moving My Automation Off the Gemini CLI Before the June 18 Shutdown
On June 18, the Gemini CLI stops responding for hosted plans. Here is how I moved unattended scripts that called gemini from the shell over to the google-genai SDK, with structured output, retries, and cost measurement built in.
API / SDK2026-06-12
Gemini Interactions API: Fixing What Broke When the Legacy outputs Schema Was Removed on June 6
Google removed the Gemini Interactions API legacy outputs schema on June 6, 2026. A symptom-based walkthrough of migrating to steps and the new response_format.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →