●MODEL — Gemma 4 is now available in Google AI Studio and the Gemini API●AGENT — Managed Agents enter public preview, running autonomous agents in isolated sandboxes●MODEL — Gemini 3.5 Flash reaches GA for agentic and coding tasks●STUDIO — Google AI Studio adds Workspace integrations and one-click deploy to Cloud Run●STUDIO — You can now build native Android apps in the AI Studio build tab●MIGRATE — Gemini Code Assist IDE extensions and CLI ended for individuals on June 18; move to Antigravity●MODEL — Gemma 4 is now available in Google AI Studio and the Gemini API●AGENT — Managed Agents enter public preview, running autonomous agents in isolated sandboxes●MODEL — Gemini 3.5 Flash reaches GA for agentic and coding tasks●STUDIO — Google AI Studio adds Workspace integrations and one-click deploy to Cloud Run●STUDIO — You can now build native Android apps in the AI Studio build tab●MIGRATE — Gemini Code Assist IDE extensions and CLI ended for individuals on June 18; move to Antigravity
After the One-Click Deploy — Hardening an AI Studio Gemini App on Cloud Run for Real Production Use
AI Studio's one-click deploy to Cloud Run gives you a working URL in minutes — but not a production service. A practical checklist for API key storage, authentication, cost ceilings, and observability, with copy-paste gcloud commands.
When the one-click deploy to Cloud Run appeared in AI Studio's build tab recently, I tried it on a small Gemini app I had built as little more than an internal tool. A few minutes later I had a public URL and a working app in the browser. As an onboarding experience, it is honestly impressive. But when I went to share that URL with someone, I stopped. Having run a number of Cloud Run services as an indie developer, I know there is a real gap between "there is a URL that works" and "this is fit to be public."
Running gcloud run services describe against the freshly deployed service confirmed the suspicion: several things needed attention. Where does the API key actually live? Can anyone on the internet call this URL? How far will it scale, and is there any ceiling on what it can cost me? And how am I supposed to ship the next change to this thing?
None of those questions matter for a prototype. All of them matter the moment the URL is public. So here is the exact sequence of checks I ran, written up so it can be repeated — the goal being to keep the generated code alive and lift it to an operational standard, rather than throwing it away and starting over.
What the One-Click Deploy Does — and What It Leaves to You
Let's draw the boundary first. The one-click deploy takes care of roughly this much:
Building the container and creating the Cloud Run service
Issuing a public URL with HTTPS termination
Wiring up credentials so the app can call the Gemini API at all
What it cannot decide for you:
Who is allowed to call that URL (authentication and authorization)
Where your API key and secrets should be managed long-term
How much you are willing to pay if traffic exceeds expectations
How the generated code will keep being updated
The difference between a prototype and a production service is not features. It is having answers to those four questions. The rest of this piece fills them in, one by one.
Check 1: Find Out Where the API Key Lives
Start here. Right after deployment, your Gemini API key may be sitting directly in an environment variable on the service. Environment variables are visible to anyone who can view the service in the console or run gcloud run services describe, and they get copied forward into every revision.
Since June 19, 2026, the Gemini API rejects requests from unrestricted API keys, so restricting the key is table stakes. I go one step further and treat "the key body lives in Secret Manager" as the minimum bar before anything is shared publicly.
# 1. Create a secret holding the keyprintf '%s' "YOUR_GEMINI_API_KEY" | \ gcloud secrets create gemini-api-key --data-file=-# 2. Grant the Cloud Run service account read accessgcloud secrets add-iam-policy-binding gemini-api-key \ --member="serviceAccount:my-service-sa@my-project.iam.gserviceaccount.com" \ --role="roles/secretmanager.secretAccessor"# 3. Replace the inline env var with a secret referencegcloud run services update my-genai-app \ --region=asia-northeast1 \ --remove-env-vars=GEMINI_API_KEY \ --set-secrets=GEMINI_API_KEY=gemini-api-key:latest
The nice property of step 3: application code still sees a plain GEMINI_API_KEY environment variable, so you change where the key is stored without touching a single line of generated code. You raise the safety floor before you ever have to read, let alone edit, the prototype.
While you are in there, check which service account the service runs as. If it is the default Compute Engine service account, create a dedicated one and switch — otherwise a public-facing service is running with broad access to everything else in your project.
gcloud iam service-accounts create my-service-sa \ --display-name="genai app runtime"gcloud run services update my-genai-app \ --region=asia-northeast1 \ --service-account=my-service-sa@my-project.iam.gserviceaccount.com
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦You can now walk a freshly one-click-deployed Cloud Run service through an ordered checklist that removes its risky defaults and brings it to production grade
✦You'll take home copy-paste gcloud commands for moving the API key into Secret Manager, closing unauthenticated access, and capping both scaling and spend
✦You'll be able to decide — with concrete criteria — whether generated prototype code is worth adopting into CI and observability, or better rewritten
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The whole point of a one-click deploy is a URL you can try immediately, so the service is almost certainly deployed with unauthenticated access allowed. That's fine for showing a prototype around. But every request to this app triggers a Gemini API call, which costs money. An unauthenticated public URL is a counter where strangers can spend your tokens on your card.
If the audience is you and a handful of people, switching to Cloud Run's IAM authentication is the shortest path:
# Close unauthenticated accessgcloud run services update my-genai-app \ --region=asia-northeast1 \ --no-allow-unauthenticated# Grant invoke rights only to people who should call itgcloud run services add-iam-policy-binding my-genai-app \ --region=asia-northeast1 \ --member="user:teammate@example.com" \ --role="roles/run.invoker"
Callers then pass an identity token; a quick smoke test looks like this:
If the app is meant for the general public, IAM won't cut it — you'll need application-level auth (Firebase Authentication or similar) plus rate limiting before opening the door again. The exact answer depends on the product, but the failure mode to avoid is the quiet one: "it stayed publicly open and I only noticed weeks later." Close first, then decide how to open.
Check 3: Decide the Cost Ceiling Before Traffic Decides It for You
A Gemini app's bill has two floors: Cloud Run execution costs and Gemini API token costs. The one that grows while you're not looking is the second, so cap both.
On the Cloud Run side, make the scaling limits explicit. For prototype-level traffic I start deliberately conservative and raise limits only when something real demands it:
On the Gemini API side, Project Spend Caps let you set a monthly dollar ceiling per project. If a key leaks or a loop runs away, the blast radius becomes a fixed number — the fact that things stop at the cap is exactly the insurance you want. I wrote about pairing the hard cap with an app-side soft limit in Stopping Runaway Costs Twice: Project Spend Caps Plus an App-Side Soft Limit.
One more trade-off worth knowing as a number: cold starts. My Node-based containers typically answer their first request from idle in about 2–4 seconds. If that bothers you, --min-instances=1 keeps an instance warm — and bills you for the idle time. For a freshly promoted prototype I usually skip it and accept that the first request is slow.
Check 4: Take Ownership of the Generated Code
Right after a one-click deploy, the source of truth is the artifact inside AI Studio — not your repository, not your CI. Left alone, the service keeps living while nobody can quite say where the next change happens or who deployed what, when.
The fix is mundane: export the code, commit it to your own repository, and make source-based deploys the only path forward.
# Redeploy the same service from your own source treegcloud run deploy my-genai-app \ --region=asia-northeast1 \ --source=. \ --no-allow-unauthenticated
Deploying to the same service name creates a new revision, so the URL and settings carry over. From then on, a GitHub Actions job running that command from main turns the one-click deploy into what it should be: the scaffolding for the first day, nothing more.
This is also the moment to spend thirty minutes actually reading the generated code. You don't need to understand all of it, but two things are worth verifying: which model name it calls Gemini with, and whether there is any retry or timeout handling at all. A hard-coded preview model name means a guaranteed future outage when that model is retired. I covered how to defend automation against this class of quiet platform breakage in Preventing Automation That Dies Silently — Preflight Gates for Gemini Platform Changes.
Check 5: Observability — Being Able to Answer "Is It Working?" with Numbers
Last, observation. Cloud Run picks up JSON written to stdout as structured logs, so even a single log line per Gemini call changes how much you can see:
import json, timedef log_generation(model: str, resp, started: float) -> None: """Emit one structured log line per Gemini call. Searchable and aggregatable as jsonPayload in Cloud Logging.""" usage = resp.usage_metadata print(json.dumps({ "severity": "INFO", "message": "gemini_generation", "model": model, "input_tokens": usage.prompt_token_count, "output_tokens": usage.candidates_token_count, "latency_ms": int((time.time() - started) * 1000), }))
With that in place, "how did token consumption move month over month" and "when exactly did latency degrade" become questions with numeric answers. Working solo, with no dedicated on-call rotation, I've found that log-based metrics plus two alert policies — error rate and latency, nothing fancier — are the cheapest reliability I can buy.
The Checklist, Side by Side — and When to Rewrite Instead
Here is the whole pass, initial state versus hardened state:
Check
State right after deploy
State after hardening
API key
Possibly inline in an env var
Secret Manager reference + restricted key
Service account
Possibly the default SA
Dedicated SA, least privilege
Access control
Unauthenticated, fully public
IAM auth, or app-level auth + rate limiting
Cost
No scaling or spend ceiling
max-instances / concurrency / Spend Caps
Deploy path
Manual updates from AI Studio
Repo-driven gcloud run deploy / CI
Observability
Default request logs only
Structured token/latency logs + alerts
Sometimes, partway through this hardening pass, the honest conclusion is that rewriting beats adopting. My two criteria: I rewrite when the request boundary — input validation and output shaping — cannot be read out of the code, and when the app depends on a preview-only feature with no visible replacement. If the code is merely verbose but its boundaries are legible, lifting it into operation first and refactoring gradually has been faster and more stable for me every time. If you'd rather design a Gemini API server on Cloud Run from scratch, Gemini 3.1 Pro × Cloud Run: Building a Serverless AI API for Production walks through that architecture.
If you do one thing today, run gcloud run services describe against the service you have right now and read the environment variables and access settings with your own eyes. Every check in this list starts from there. I hope it gives you a solid starting point the next time a prototype suddenly asks to become a product.
Share
Thank You for Reading
Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.