⟐ Dev Tools/2026-04-08Advanced

Terraform × Gemini API: Complete Production Infrastructure Automation Guide — IaC Design Patterns for AI Applications on Google Cloud

Automate your entire Gemini API production infrastructure with Terraform. Covers IAM, Cloud Run, Vertex AI, Secret Manager, and CI/CD in one comprehensive IaC design guide.

terraform gemini-api²⁷⁸ iac google-cloud⁶ devops³ cloud-run⁶ ci-cd⁵

✦ Premium Article

As AI applications grow, a familiar set of pain points emerges: making infrastructure changes feels risky when done manually, subtle config drift between environments causes mysterious bugs, and onboarding a new team member means days of tribal-knowledge transfer. Infrastructure as Code (IaC) addresses all of these at the root level.

Below, we automate the complete Google Cloud infrastructure for a Gemini API–powered application using Terraform (or OpenTofu). From API key management in Secret Manager to deploying Cloud Run services with AI-appropriate resource settings, to a fully automated CI/CD pipeline — every piece is covered with production-ready code.

Why Gemini API Applications Need IaC

AI applications have infrastructure challenges that don't exist in traditional web apps.

API key and credential management is inherently complex. Gemini API keys and service account credentials must be isolated per environment (dev/staging/prod), yet manual management almost always leads to accidental key sharing. Automating integration with Secret Manager eliminates this risk structurally.

Cost control and quota management is another area where AI apps are different. Gemini API usage has quotas, and without per-environment limits, a runaway dev script can consume the budget meant for production. Managing budget alerts and quota settings in Terraform gives you guardrails that can't be forgotten.

Finally, AI-specific resource tuning matters. Cloud Run's default timeout (5 minutes) and memory settings are often inadequate for LLM inference workloads. Encoding these settings in Terraform ensures consistency across environments and eliminates the "it worked in dev" problem.

Project Structure and Prerequisites

Here's the Terraform project structure we'll build:

gemini-ai-infra/
├── main.tf              # Root resource definitions
├── variables.tf         # Variable declarations
├── outputs.tf           # Output values
├── provider.tf          # Provider configuration
├── backend.tf           # Remote state configuration
├── modules/
│   ├── iam/             # IAM and service accounts
│   ├── secrets/         # Secret Manager resources
│   ├── cloud_run/       # Cloud Run service
│   └── monitoring/      # Cloud Monitoring and budgets
└── environments/
    ├── dev/             # Dev environment tfvars
    ├── staging/         # Staging environment tfvars
    └── prod/            # Production environment tfvars

Prerequisites:

Terraform 1.7+ (or OpenTofu 1.6+)
Google Cloud CLI installed and authenticated (gcloud auth application-default login)
Terraform Cloud account (optional, for CI/CD integration)

# Verify versions
terraform version
# Terraform v1.7.5
 
gcloud --version
# Google Cloud SDK 478.0.0

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦You can immediately apply Terraform patterns that fully automate IAM, API keys, and Cloud Run for Gemini API apps

✦You'll learn how to design safe release management with dev/staging/prod separation using Terraform Workspaces

✦You'll be able to build a zero-touch AI infrastructure CI/CD pipeline with GitHub Actions and Terraform Cloud

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Step 1: Provider and Backend Configuration

The foundation of any Terraform project is the provider configuration and remote state backend. For Gemini API projects, using Google Cloud Storage as the backend keeps the state file accessible to your whole team.

# provider.tf
terraform {
  required_version = ">= 1.7"
 
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.20"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = "~> 5.20"
    }
  }
}
 
provider "google" {
  project = var.project_id
  region  = var.region
}
 
provider "google-beta" {
  project = var.project_id
  region  = var.region
}

# backend.tf — create the GCS bucket manually before running terraform init
terraform {
  backend "gcs" {
    bucket  = "your-project-terraform-state"
    prefix  = "gemini-ai-infra"
  }
}

# variables.tf
variable "project_id" {
  description = "Google Cloud project ID"
  type        = string
}
 
variable "region" {
  description = "Deployment region"
  type        = string
  default     = "us-central1"
}
 
variable "environment" {
  description = "Environment name (dev / staging / prod)"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "environment must be one of: dev, staging, prod"
  }
}
 
variable "gemini_api_key" {
  description = "Gemini API key (managed via Secret Manager)"
  type        = string
  sensitive   = true  # Never printed in plan output
}

Step 2: IAM and Service Account Automation

Sound IAM design for Gemini API applications means following least-privilege principles: separate the runtime service account (used by Cloud Run) from the deployment service account (used by your CI/CD pipeline).

# modules/iam/main.tf
 
# Runtime service account for the application
resource "google_service_account" "gemini_app_runner" {
  account_id   = "gemini-app-runner-${var.environment}"
  display_name = "Gemini App Runtime SA (${var.environment})"
  description  = "Service account used by Cloud Run to call Gemini API"
  project      = var.project_id
}
 
# Allow reading secrets from Secret Manager
resource "google_project_iam_member" "secret_accessor" {
  project = var.project_id
  role    = "roles/secretmanager.secretAccessor"
  member  = "serviceAccount:${google_service_account.gemini_app_runner.email}"
}
 
# Allow the SA to invoke other Cloud Run services (for service mesh)
resource "google_project_iam_member" "run_invoker" {
  project = var.project_id
  role    = "roles/run.invoker"
  member  = "serviceAccount:${google_service_account.gemini_app_runner.email}"
}
 
# Vertex AI access (if using Gemini via Vertex AI endpoint)
resource "google_project_iam_member" "vertex_user" {
  project = var.project_id
  role    = "roles/aiplatform.user"
  member  = "serviceAccount:${google_service_account.gemini_app_runner.email}"
}
 
# Cloud Logging write access
resource "google_project_iam_member" "log_writer" {
  project = var.project_id
  role    = "roles/logging.logWriter"
  member  = "serviceAccount:${google_service_account.gemini_app_runner.email}"
}
 
output "runner_service_account_email" {
  value = google_service_account.gemini_app_runner.email
}

Step 3: Secure API Key Management with Secret Manager

Storing your Gemini API key in Secret Manager — and pulling it into Cloud Run at runtime — means the key never appears in your container image, environment file, or git history.

# modules/secrets/main.tf
 
resource "google_secret_manager_secret" "gemini_api_key" {
  secret_id = "gemini-api-key-${var.environment}"
  project   = var.project_id
 
  replication {
    auto {}  # Automatic multi-region replication (recommended)
  }
 
  labels = {
    environment = var.environment
    managed-by  = "terraform"
    service     = "gemini-api"
  }
}
 
# Store the actual key value as a secret version
resource "google_secret_manager_secret_version" "gemini_api_key_v1" {
  secret      = google_secret_manager_secret.gemini_api_key.id
  secret_data = var.gemini_api_key
 
  # Prevent Terraform from deleting or rotating secret versions
  lifecycle {
    ignore_changes = [secret_data]
  }
}
 
# Grant the runtime SA read access to this specific secret
resource "google_secret_manager_secret_iam_member" "gemini_api_key_access" {
  secret_id = google_secret_manager_secret.gemini_api_key.secret_id
  role      = "roles/secretmanager.secretAccessor"
  member    = "serviceAccount:${var.runner_service_account_email}"
}
 
output "gemini_api_key_secret_name" {
  value = google_secret_manager_secret.gemini_api_key.name
}

Step 4: Automating Cloud Run Deployment for Gemini Workloads

Deploying a Gemini API backend on Cloud Run requires careful tuning. LLM inference takes longer than typical HTTP requests, and response generation is memory-intensive. Here's a production-ready Cloud Run configuration:

# modules/cloud_run/main.tf
 
resource "google_cloud_run_v2_service" "gemini_app" {
  name     = "gemini-app-${var.environment}"
  location = var.region
  project  = var.project_id
 
  template {
    service_account = var.runner_service_account_email
 
    scaling {
      # Dev saves cost by scaling to zero; prod keeps at least 1 warm instance
      min_instance_count = var.environment == "prod" ? 1 : 0
      max_instance_count = var.environment == "prod" ? 20 : 5
    }
 
    containers {
      image = var.container_image
 
      resources {
        limits = {
          cpu    = var.environment == "prod" ? "2" : "1"
          memory = var.environment == "prod" ? "4Gi" : "2Gi"
        }
        # Keep CPU allocated during request (critical for LLM inference)
        cpu_idle          = false
        startup_cpu_boost = true
      }
 
      # Standard environment variables
      env {
        name  = "APP_ENV"
        value = var.environment
      }
      env {
        name  = "GOOGLE_CLOUD_PROJECT"
        value = var.project_id
      }
 
      # Pull Gemini API key from Secret Manager at runtime
      env {
        name = "GEMINI_API_KEY"
        value_source {
          secret_key_ref {
            secret  = var.gemini_api_key_secret_name
            version = "latest"
          }
        }
      }
 
      # Health checks
      startup_probe {
        http_get {
          path = "/health"
          port = 8080
        }
        initial_delay_seconds = 10
        period_seconds        = 5
        failure_threshold     = 10
      }
 
      liveness_probe {
        http_get {
          path = "/health"
          port = 8080
        }
        period_seconds    = 30
        failure_threshold = 3
      }
    }
 
    # Generous timeout for LLM inference (default 5min is often too short)
    timeout = "300s"
  }
 
  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }
}
 
# Public access (disable in prod and use Identity-Aware Proxy instead)
resource "google_cloud_run_service_iam_member" "public_access" {
  count    = var.allow_unauthenticated ? 1 : 0
  location = google_cloud_run_v2_service.gemini_app.location
  project  = var.project_id
  service  = google_cloud_run_v2_service.gemini_app.name
  role     = "roles/run.invoker"
  member   = "allUsers"
}
 
output "service_url" {
  value = google_cloud_run_v2_service.gemini_app.uri
}

Step 5: Environment Isolation with Terraform Workspaces

Managing three separate environments safely requires isolating their Terraform state files. Terraform Workspaces handle this cleanly:

# Create and switch to the dev workspace
terraform workspace new dev
terraform workspace select dev
 
# Apply dev configuration
terraform apply -var-file="environments/dev/terraform.tfvars"
 
# Switch to staging
terraform workspace select staging
terraform apply -var-file="environments/staging/terraform.tfvars"

# environments/dev/terraform.tfvars
project_id            = "my-project-dev"
environment           = "dev"
region                = "us-central1"
container_image       = "us-central1-docker.pkg.dev/my-project-dev/app/gemini-app:latest"
allow_unauthenticated = true   # Share URL with teammates in dev
 
# environments/prod/terraform.tfvars
project_id            = "my-project-prod"
environment           = "prod"
region                = "us-central1"
container_image       = "us-central1-docker.pkg.dev/my-project-prod/app/gemini-app:v1.2.0"
allow_unauthenticated = false  # Enforce authentication in prod

Critical recommendation: Use separate GCP projects for dev and prod. This eliminates an entire class of IAM misconfiguration risks — a broken terraform apply in dev literally cannot affect production resources in a different project.

Step 6: Budget Alerts and Monitoring Automation

AI API costs can surprise you. Terraform lets you bake cost guardrails directly into your infrastructure:

# modules/monitoring/main.tf
 
resource "google_billing_budget" "gemini_api_budget" {
  billing_account = var.billing_account_id
  display_name    = "Gemini API Budget (${var.environment})"
 
  budget_filter {
    projects        = ["projects/${var.project_number}"]
    services        = ["services/6F81-5844-456A"]  # Cloud AI Platform service ID
    calendar_period = "MONTH"
  }
 
  amount {
    specified_amount {
      currency_code = "USD"
      units         = var.environment == "prod" ? "500" : "50"
    }
  }
 
  # Alert at 70%, 90%, and 100% of budget
  threshold_rules {
    threshold_percent = 0.7
    spend_basis       = "CURRENT_SPEND"
  }
  threshold_rules {
    threshold_percent = 0.9
    spend_basis       = "CURRENT_SPEND"
  }
  threshold_rules {
    threshold_percent = 1.0
    spend_basis       = "FORECASTED_SPEND"
  }
 
  all_updates_rule {
    pubsub_topic                     = google_pubsub_topic.budget_alerts.id
    monitoring_notification_channels = [google_monitoring_notification_channel.email.name]
  }
}
 
# Alert when p99 latency exceeds 10 seconds
resource "google_monitoring_alert_policy" "high_latency" {
  display_name = "Gemini App High Latency (${var.environment})"
  combiner     = "OR"
 
  conditions {
    display_name = "Request latency p99 > 10s"
    condition_threshold {
      filter          = "resource.type=\"cloud_run_revision\" AND metric.type=\"run.googleapis.com/request_latencies\""
      duration        = "300s"
      comparison      = "COMPARISON_GT"
      threshold_value = 10000  # 10 seconds in milliseconds
 
      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_PERCENTILE_99"
      }
    }
  }
 
  notification_channels = [google_monitoring_notification_channel.email.name]
}

Step 7: GitHub Actions CI/CD for Terraform

The goal is to move infrastructure changes through the same review process as application code — no more manual terraform apply in production:

# .github/workflows/terraform.yml
name: Terraform CI/CD
 
on:
  push:
    branches: [main]
    paths: ['gemini-ai-infra/**']
  pull_request:
    paths: ['gemini-ai-infra/**']
 
env:
  TF_VERSION: "1.7.5"
  WORKING_DIR: ./gemini-ai-infra
 
jobs:
  terraform-check:
    name: Validate & Plan
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
      id-token: write  # Required for Workload Identity Federation
 
    steps:
      - uses: actions/checkout@v4
 
      # Keyless auth via Workload Identity Federation (no SA key files)
      - id: auth
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
          service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
 
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}
 
      - name: Terraform Format Check
        run: terraform fmt -check -recursive
        working-directory: ${{ env.WORKING_DIR }}
 
      - name: Terraform Init
        run: terraform init
        working-directory: ${{ env.WORKING_DIR }}
 
      - name: Terraform Validate
        run: terraform validate
        working-directory: ${{ env.WORKING_DIR }}
 
      - name: Terraform Plan (Staging)
        id: plan
        run: |
          terraform workspace select staging
          terraform plan \
            -var-file="environments/staging/terraform.tfvars" \
            -var="gemini_api_key=${{ secrets.GEMINI_API_KEY_STAGING }}" \
            -no-color -out=tfplan
        working-directory: ${{ env.WORKING_DIR }}
 
      - name: Post Plan to PR
        uses: actions/github-script@v7
        if: github.event_name == 'pull_request'
        with:
          script: |
            const output = `#### Terraform Plan 📋\n\`\`\`\n${{ steps.plan.outputs.stdout }}\n\`\`\`\n*Triggered by @${{ github.actor }}*`;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });
 
  terraform-apply:
    name: Apply to Production
    runs-on: ubuntu-latest
    needs: terraform-check
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment: production  # Requires manual approval in GitHub Environments
 
    steps:
      - uses: actions/checkout@v4
 
      - id: auth
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
          service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
 
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}
 
      - name: Apply to Production
        run: |
          terraform init
          terraform workspace select prod
          terraform apply \
            -var-file="environments/prod/terraform.tfvars" \
            -var="gemini_api_key=${{ secrets.GEMINI_API_KEY_PROD }}" \
            -auto-approve
        working-directory: ${{ env.WORKING_DIR }}

Workload Identity Federation (WIF) eliminates the need to store service account key files in GitHub Secrets. Instead, GitHub Actions' OIDC token is exchanged for a short-lived Google Cloud access token, dramatically reducing the blast radius of any secret compromise.

Step 8: Team Collaboration Best Practices

A few guidelines that prevent the most common Terraform team problems:

First, never run terraform apply locally against production. The CI/CD pipeline should be the only path to apply production changes. Use environment: production in GitHub Actions to require manual approval from a designated reviewer.

Second, lock down terraform state commands. Direct state manipulation is a footgun. Restrict it to break-glass procedures with a documented runbook, and always take a state backup before any manual state operations.

Third, never commit terraform.tfvars files. Add them to .gitignore. Pass sensitive values (like gemini_api_key) through CI/CD environment variables or Terraform Cloud Variable Sets, never through committed files.

Common Errors and How to Fix Them

Error: Error creating Service: googleapi: Error 403

The deploying service account is missing required permissions. Check that roles/run.admin and roles/iam.serviceAccountUser are granted:

gcloud projects get-iam-policy YOUR_PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:YOUR_SA_EMAIL" \
  --format="table(bindings.role)"

Error: context deadline exceeded during Cloud Run deploy

Terraform's default provider timeout (10 minutes) can be too short for large container images or cold-start Cloud Run deployments. Extend it:

resource "google_cloud_run_v2_service" "gemini_app" {
  # ...
  timeouts {
    create = "20m"
    update = "20m"
  }
}

Error: Secret "gemini-api-key-prod" not found

Either the secret doesn't exist yet, or the runtime service account lacks roles/secretmanager.secretAccessor on that specific secret. Verify the IAM binding in google_secret_manager_secret_iam_member and confirm the secret version is in ENABLED state via the GCP Console.

Looking back

Bringing Terraform into your Gemini API development workflow transforms infrastructure management from a manual, error-prone process into a reviewable, auditable, and reproducible one. The patterns in this guide — isolated service accounts, Secret Manager integration, per-environment Cloud Run tuning, budget alerts, and GitHub Actions CI/CD — give you a solid foundation you can adapt as your AI application grows.

For a deep dive into Cloud Run deployment patterns for Gemini API backends, see Gemini 3.1 Pro × Cloud Run: Production Serverless AI API Guide. For securing your production environment beyond infrastructure, check out Gemini API Production Security Complete Guide.

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.