●SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMA●FLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasks●IMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25●MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxes●FILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2●DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon●SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMA●FLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasks●IMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25●MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxes●FILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2●DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
Terraform × Gemini API: Complete Production Infrastructure Automation Guide — IaC Design Patterns for AI Applications on Google Cloud
Automate your entire Gemini API production infrastructure with Terraform. Covers IAM, Cloud Run, Vertex AI, Secret Manager, and CI/CD in one comprehensive IaC design guide.
As AI applications grow, a familiar set of pain points emerges: making infrastructure changes feels risky when done manually, subtle config drift between environments causes mysterious bugs, and onboarding a new team member means days of tribal-knowledge transfer. Infrastructure as Code (IaC) addresses all of these at the root level.
In this guide, we'll walk through automating the complete Google Cloud infrastructure for a Gemini API–powered application using Terraform (or OpenTofu). From API key management in Secret Manager to deploying Cloud Run services with AI-appropriate resource settings, to a fully automated CI/CD pipeline — every piece is covered with production-ready code.
Why Gemini API Applications Need IaC
AI applications have infrastructure challenges that don't exist in traditional web apps.
API key and credential management is inherently complex. Gemini API keys and service account credentials must be isolated per environment (dev/staging/prod), yet manual management almost always leads to accidental key sharing. Automating integration with Secret Manager eliminates this risk structurally.
Cost control and quota management is another area where AI apps are different. Gemini API usage has quotas, and without per-environment limits, a runaway dev script can consume the budget meant for production. Managing budget alerts and quota settings in Terraform gives you guardrails that can't be forgotten.
Finally, AI-specific resource tuning matters. Cloud Run's default timeout (5 minutes) and memory settings are often inadequate for LLM inference workloads. Encoding these settings in Terraform ensures consistency across environments and eliminates the "it worked in dev" problem.
Project Structure and Prerequisites
Here's the Terraform project structure we'll build:
gemini-ai-infra/
├── main.tf # Root resource definitions
├── variables.tf # Variable declarations
├── outputs.tf # Output values
├── provider.tf # Provider configuration
├── backend.tf # Remote state configuration
├── modules/
│ ├── iam/ # IAM and service accounts
│ ├── secrets/ # Secret Manager resources
│ ├── cloud_run/ # Cloud Run service
│ └── monitoring/ # Cloud Monitoring and budgets
└── environments/
├── dev/ # Dev environment tfvars
├── staging/ # Staging environment tfvars
└── prod/ # Production environment tfvars
Prerequisites:
Terraform 1.7+ (or OpenTofu 1.6+)
Google Cloud CLI installed and authenticated (gcloud auth application-default login)
Terraform Cloud account (optional, for CI/CD integration)
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦You can immediately apply Terraform patterns that fully automate IAM, API keys, and Cloud Run for Gemini API apps
✦You'll learn how to design safe release management with dev/staging/prod separation using Terraform Workspaces
✦You'll be able to build a zero-touch AI infrastructure CI/CD pipeline with GitHub Actions and Terraform Cloud
Secure payment via Stripe · Cancel anytime
Step 1: Provider and Backend Configuration
The foundation of any Terraform project is the provider configuration and remote state backend. For Gemini API projects, using Google Cloud Storage as the backend keeps the state file accessible to your whole team.
# provider.tfterraform { required_version = ">= 1.7" required_providers { google = { source = "hashicorp/google" version = "~> 5.20" } google-beta = { source = "hashicorp/google-beta" version = "~> 5.20" } }}provider "google" { project = var.project_id region = var.region}provider "google-beta" { project = var.project_id region = var.region}
# variables.tfvariable "project_id" { description = "Google Cloud project ID" type = string}variable "region" { description = "Deployment region" type = string default = "us-central1"}variable "environment" { description = "Environment name (dev / staging / prod)" type = string validation { condition = contains(["dev", "staging", "prod"], var.environment) error_message = "environment must be one of: dev, staging, prod" }}variable "gemini_api_key" { description = "Gemini API key (managed via Secret Manager)" type = string sensitive = true # Never printed in plan output}
Step 2: IAM and Service Account Automation
Sound IAM design for Gemini API applications means following least-privilege principles: separate the runtime service account (used by Cloud Run) from the deployment service account (used by your CI/CD pipeline).
# modules/iam/main.tf# Runtime service account for the applicationresource "google_service_account" "gemini_app_runner" { account_id = "gemini-app-runner-${var.environment}" display_name = "Gemini App Runtime SA (${var.environment})" description = "Service account used by Cloud Run to call Gemini API" project = var.project_id}# Allow reading secrets from Secret Managerresource "google_project_iam_member" "secret_accessor" { project = var.project_id role = "roles/secretmanager.secretAccessor" member = "serviceAccount:${google_service_account.gemini_app_runner.email}"}# Allow the SA to invoke other Cloud Run services (for service mesh)resource "google_project_iam_member" "run_invoker" { project = var.project_id role = "roles/run.invoker" member = "serviceAccount:${google_service_account.gemini_app_runner.email}"}# Vertex AI access (if using Gemini via Vertex AI endpoint)resource "google_project_iam_member" "vertex_user" { project = var.project_id role = "roles/aiplatform.user" member = "serviceAccount:${google_service_account.gemini_app_runner.email}"}# Cloud Logging write accessresource "google_project_iam_member" "log_writer" { project = var.project_id role = "roles/logging.logWriter" member = "serviceAccount:${google_service_account.gemini_app_runner.email}"}output "runner_service_account_email" { value = google_service_account.gemini_app_runner.email}
Step 3: Secure API Key Management with Secret Manager
Storing your Gemini API key in Secret Manager — and pulling it into Cloud Run at runtime — means the key never appears in your container image, environment file, or git history.
# modules/secrets/main.tfresource "google_secret_manager_secret" "gemini_api_key" { secret_id = "gemini-api-key-${var.environment}" project = var.project_id replication { auto {} # Automatic multi-region replication (recommended) } labels = { environment = var.environment managed-by = "terraform" service = "gemini-api" }}# Store the actual key value as a secret versionresource "google_secret_manager_secret_version" "gemini_api_key_v1" { secret = google_secret_manager_secret.gemini_api_key.id secret_data = var.gemini_api_key # Prevent Terraform from deleting or rotating secret versions lifecycle { ignore_changes = [secret_data] }}# Grant the runtime SA read access to this specific secretresource "google_secret_manager_secret_iam_member" "gemini_api_key_access" { secret_id = google_secret_manager_secret.gemini_api_key.secret_id role = "roles/secretmanager.secretAccessor" member = "serviceAccount:${var.runner_service_account_email}"}output "gemini_api_key_secret_name" { value = google_secret_manager_secret.gemini_api_key.name}
Step 4: Automating Cloud Run Deployment for Gemini Workloads
Deploying a Gemini API backend on Cloud Run requires careful tuning. LLM inference takes longer than typical HTTP requests, and response generation is memory-intensive. Here's a production-ready Cloud Run configuration:
# modules/cloud_run/main.tfresource "google_cloud_run_v2_service" "gemini_app" { name = "gemini-app-${var.environment}" location = var.region project = var.project_id template { service_account = var.runner_service_account_email scaling { # Dev saves cost by scaling to zero; prod keeps at least 1 warm instance min_instance_count = var.environment == "prod" ? 1 : 0 max_instance_count = var.environment == "prod" ? 20 : 5 } containers { image = var.container_image resources { limits = { cpu = var.environment == "prod" ? "2" : "1" memory = var.environment == "prod" ? "4Gi" : "2Gi" } # Keep CPU allocated during request (critical for LLM inference) cpu_idle = false startup_cpu_boost = true } # Standard environment variables env { name = "APP_ENV" value = var.environment } env { name = "GOOGLE_CLOUD_PROJECT" value = var.project_id } # Pull Gemini API key from Secret Manager at runtime env { name = "GEMINI_API_KEY" value_source { secret_key_ref { secret = var.gemini_api_key_secret_name version = "latest" } } } # Health checks startup_probe { http_get { path = "/health" port = 8080 } initial_delay_seconds = 10 period_seconds = 5 failure_threshold = 10 } liveness_probe { http_get { path = "/health" port = 8080 } period_seconds = 30 failure_threshold = 3 } } # Generous timeout for LLM inference (default 5min is often too short) timeout = "300s" } traffic { type = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST" percent = 100 }}# Public access (disable in prod and use Identity-Aware Proxy instead)resource "google_cloud_run_service_iam_member" "public_access" { count = var.allow_unauthenticated ? 1 : 0 location = google_cloud_run_v2_service.gemini_app.location project = var.project_id service = google_cloud_run_v2_service.gemini_app.name role = "roles/run.invoker" member = "allUsers"}output "service_url" { value = google_cloud_run_v2_service.gemini_app.uri}
Step 5: Environment Isolation with Terraform Workspaces
Managing three separate environments safely requires isolating their Terraform state files. Terraform Workspaces handle this cleanly:
# Create and switch to the dev workspaceterraform workspace new devterraform workspace select dev# Apply dev configurationterraform apply -var-file="environments/dev/terraform.tfvars"# Switch to stagingterraform workspace select stagingterraform apply -var-file="environments/staging/terraform.tfvars"
Critical recommendation: Use separate GCP projects for dev and prod. This eliminates an entire class of IAM misconfiguration risks — a broken terraform apply in dev literally cannot affect production resources in a different project.
Step 6: Budget Alerts and Monitoring Automation
AI API costs can surprise you. Terraform lets you bake cost guardrails directly into your infrastructure:
Workload Identity Federation (WIF) eliminates the need to store service account key files in GitHub Secrets. Instead, GitHub Actions' OIDC token is exchanged for a short-lived Google Cloud access token, dramatically reducing the blast radius of any secret compromise.
Step 8: Team Collaboration Best Practices
A few guidelines that prevent the most common Terraform team problems:
First, never run terraform apply locally against production. The CI/CD pipeline should be the only path to apply production changes. Use environment: production in GitHub Actions to require manual approval from a designated reviewer.
Second, lock down terraform state commands. Direct state manipulation is a footgun. Restrict it to break-glass procedures with a documented runbook, and always take a state backup before any manual state operations.
Third, never commit terraform.tfvars files. Add them to .gitignore. Pass sensitive values (like gemini_api_key) through CI/CD environment variables or Terraform Cloud Variable Sets, never through committed files.
Either the secret doesn't exist yet, or the runtime service account lacks roles/secretmanager.secretAccessor on that specific secret. Verify the IAM binding in google_secret_manager_secret_iam_member and confirm the secret version is in ENABLED state via the GCP Console.
Looking back
Bringing Terraform into your Gemini API development workflow transforms infrastructure management from a manual, error-prone process into a reviewable, auditable, and reproducible one. The patterns in this guide — isolated service accounts, Secret Manager integration, per-environment Cloud Run tuning, budget alerts, and GitHub Actions CI/CD — give you a solid foundation you can adapt as your AI application grows.
Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.