⟐ Dev Tools/2026-07-02Intermediate

After the One-Click Deploy — Hardening an AI Studio Gemini App on Cloud Run for Real Production Use

AI Studio's one-click deploy to Cloud Run gives you a working URL in minutes — but not a production service. A practical checklist for API key storage, authentication, cost ceilings, and observability, with copy-paste gcloud commands.

gemini⁹⁴ ai-studio² cloud-run⁶ deployment³ production¹²⁸ gcp

✦ Premium Article

When the one-click deploy to Cloud Run appeared in AI Studio's build tab recently, I tried it on a small Gemini app I had built as little more than an internal tool. A few minutes later I had a public URL and a working app in the browser. As an onboarding experience, it is honestly impressive. But when I went to share that URL with someone, I stopped. Having run a number of Cloud Run services as an indie developer, I know there is a real gap between "there is a URL that works" and "this is fit to be public."

Running gcloud run services describe against the freshly deployed service confirmed the suspicion: several things needed attention. Where does the API key actually live? Can anyone on the internet call this URL? How far will it scale, and is there any ceiling on what it can cost me? And how am I supposed to ship the next change to this thing?

None of those questions matter for a prototype. All of them matter the moment the URL is public. So here is the exact sequence of checks I ran, written up so it can be repeated — the goal being to keep the generated code alive and lift it to an operational standard, rather than throwing it away and starting over.

What the One-Click Deploy Does — and What It Leaves to You

Let's draw the boundary first. The one-click deploy takes care of roughly this much:

Building the container and creating the Cloud Run service
Issuing a public URL with HTTPS termination
Wiring up credentials so the app can call the Gemini API at all

What it cannot decide for you:

Who is allowed to call that URL (authentication and authorization)
Where your API key and secrets should be managed long-term
How much you are willing to pay if traffic exceeds expectations
How the generated code will keep being updated

The difference between a prototype and a production service is not features. It is having answers to those four questions. The rest of this piece fills them in, one by one.

Check 1: Find Out Where the API Key Lives

Start here. Right after deployment, your Gemini API key may be sitting directly in an environment variable on the service. Environment variables are visible to anyone who can view the service in the console or run gcloud run services describe, and they get copied forward into every revision.

Since June 19, 2026, the Gemini API rejects requests from unrestricted API keys, so restricting the key is table stakes. I go one step further and treat "the key body lives in Secret Manager" as the minimum bar before anything is shared publicly.

# 1. Create a secret holding the key
printf '%s' "YOUR_GEMINI_API_KEY" | \
  gcloud secrets create gemini-api-key --data-file=-
 
# 2. Grant the Cloud Run service account read access
gcloud secrets add-iam-policy-binding gemini-api-key \
  --member="serviceAccount:my-service-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"
 
# 3. Replace the inline env var with a secret reference
gcloud run services update my-genai-app \
  --region=asia-northeast1 \
  --remove-env-vars=GEMINI_API_KEY \
  --set-secrets=GEMINI_API_KEY=gemini-api-key:latest

The nice property of step 3: application code still sees a plain GEMINI_API_KEY environment variable, so you change where the key is stored without touching a single line of generated code. You raise the safety floor before you ever have to read, let alone edit, the prototype.

While you are in there, check which service account the service runs as. If it is the default Compute Engine service account, create a dedicated one and switch — otherwise a public-facing service is running with broad access to everything else in your project.

gcloud iam service-accounts create my-service-sa \
  --display-name="genai app runtime"
 
gcloud run services update my-genai-app \
  --region=asia-northeast1 \
  --service-account=my-service-sa@my-project.iam.gserviceaccount.com

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦You can now walk a freshly one-click-deployed Cloud Run service through an ordered checklist that removes its risky defaults and brings it to production grade

✦You'll take home copy-paste gcloud commands for moving the API key into Secret Manager, closing unauthenticated access, and capping both scaling and spend

✦You'll be able to decide — with concrete criteria — whether generated prototype code is worth adopting into CI and observability, or better rewritten

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Check 2: Is That URL Open to Literally Everyone?

The whole point of a one-click deploy is a URL you can try immediately, so the service is almost certainly deployed with unauthenticated access allowed. That's fine for showing a prototype around. But every request to this app triggers a Gemini API call, which costs money. An unauthenticated public URL is a counter where strangers can spend your tokens on your card.

If the audience is you and a handful of people, switching to Cloud Run's IAM authentication is the shortest path:

# Close unauthenticated access
gcloud run services update my-genai-app \
  --region=asia-northeast1 \
  --no-allow-unauthenticated
 
# Grant invoke rights only to people who should call it
gcloud run services add-iam-policy-binding my-genai-app \
  --region=asia-northeast1 \
  --member="user:teammate@example.com" \
  --role="roles/run.invoker"

Callers then pass an identity token; a quick smoke test looks like this:

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  https://my-genai-app-xxxxxxxx-an.a.run.app/api/generate

If the app is meant for the general public, IAM won't cut it — you'll need application-level auth (Firebase Authentication or similar) plus rate limiting before opening the door again. The exact answer depends on the product, but the failure mode to avoid is the quiet one: "it stayed publicly open and I only noticed weeks later." Close first, then decide how to open.

Check 3: Decide the Cost Ceiling Before Traffic Decides It for You

A Gemini app's bill has two floors: Cloud Run execution costs and Gemini API token costs. The one that grows while you're not looking is the second, so cap both.

On the Cloud Run side, make the scaling limits explicit. For prototype-level traffic I start deliberately conservative and raise limits only when something real demands it:

gcloud run services update my-genai-app \
  --region=asia-northeast1 \
  --max-instances=2 \
  --concurrency=20

On the Gemini API side, Project Spend Caps let you set a monthly dollar ceiling per project. If a key leaks or a loop runs away, the blast radius becomes a fixed number — the fact that things stop at the cap is exactly the insurance you want. I wrote about pairing the hard cap with an app-side soft limit in Stopping Runaway Costs Twice: Project Spend Caps Plus an App-Side Soft Limit.

One more trade-off worth knowing as a number: cold starts. My Node-based containers typically answer their first request from idle in about 2–4 seconds. If that bothers you, --min-instances=1 keeps an instance warm — and bills you for the idle time. For a freshly promoted prototype I usually skip it and accept that the first request is slow.

Check 4: Take Ownership of the Generated Code

Right after a one-click deploy, the source of truth is the artifact inside AI Studio — not your repository, not your CI. Left alone, the service keeps living while nobody can quite say where the next change happens or who deployed what, when.

The fix is mundane: export the code, commit it to your own repository, and make source-based deploys the only path forward.

# Redeploy the same service from your own source tree
gcloud run deploy my-genai-app \
  --region=asia-northeast1 \
  --source=. \
  --no-allow-unauthenticated

Deploying to the same service name creates a new revision, so the URL and settings carry over. From then on, a GitHub Actions job running that command from main turns the one-click deploy into what it should be: the scaffolding for the first day, nothing more.

This is also the moment to spend thirty minutes actually reading the generated code. You don't need to understand all of it, but two things are worth verifying: which model name it calls Gemini with, and whether there is any retry or timeout handling at all. A hard-coded preview model name means a guaranteed future outage when that model is retired. I covered how to defend automation against this class of quiet platform breakage in Preventing Automation That Dies Silently — Preflight Gates for Gemini Platform Changes.

Check 5: Observability — Being Able to Answer "Is It Working?" with Numbers

Last, observation. Cloud Run picks up JSON written to stdout as structured logs, so even a single log line per Gemini call changes how much you can see:

import json, time
 
def log_generation(model: str, resp, started: float) -> None:
    """Emit one structured log line per Gemini call.
    Searchable and aggregatable as jsonPayload in Cloud Logging."""
    usage = resp.usage_metadata
    print(json.dumps({
        "severity": "INFO",
        "message": "gemini_generation",
        "model": model,
        "input_tokens": usage.prompt_token_count,
        "output_tokens": usage.candidates_token_count,
        "latency_ms": int((time.time() - started) * 1000),
    }))

The expected output is one line like this:

{"severity": "INFO", "message": "gemini_generation", "model": "gemini-flash-latest", "input_tokens": 412, "output_tokens": 380, "latency_ms": 2150}

With that in place, "how did token consumption move month over month" and "when exactly did latency degrade" become questions with numeric answers. Working solo, with no dedicated on-call rotation, I've found that log-based metrics plus two alert policies — error rate and latency, nothing fancier — are the cheapest reliability I can buy.

The Checklist, Side by Side — and When to Rewrite Instead

Here is the whole pass, initial state versus hardened state:

Check	State right after deploy	State after hardening
API key	Possibly inline in an env var	Secret Manager reference + restricted key
Service account	Possibly the default SA	Dedicated SA, least privilege
Access control	Unauthenticated, fully public	IAM auth, or app-level auth + rate limiting
Cost	No scaling or spend ceiling	max-instances / concurrency / Spend Caps
Deploy path	Manual updates from AI Studio	Repo-driven gcloud run deploy / CI
Observability	Default request logs only	Structured token/latency logs + alerts

Sometimes, partway through this hardening pass, the honest conclusion is that rewriting beats adopting. My two criteria: I rewrite when the request boundary — input validation and output shaping — cannot be read out of the code, and when the app depends on a preview-only feature with no visible replacement. If the code is merely verbose but its boundaries are legible, lifting it into operation first and refactoring gradually has been faster and more stable for me every time. If you'd rather design a Gemini API server on Cloud Run from scratch, Gemini 3.1 Pro × Cloud Run: Building a Serverless AI API for Production walks through that architecture.

If you do one thing today, run gcloud run services describe against the service you have right now and read the environment variables and access settings with your own eyes. Every check in this list starts from there. I hope it gives you a solid starting point the next time a prototype suddenly asks to become a product.

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.