◈ API / SDK/2026-06-26Advanced

When your Gemini API spend cap trips, paying users go down too — isolating the blast radius with per-tier projects

A Project Spend Cap stops the entire project at once. To keep a runaway free tier from taking paying users down with it, this is a design note on isolating the cap's blast radius across per-tier projects and closing the ~10-minute delay with an application-side soft budget gate.

Gemini API¹⁴⁷ Cost management² Spend Caps Architecture⁷ Production³⁰

✦ Premium Article

It happened on a weekend when free users surged all at once. In the backend of an app I run on my own, the Gemini API cost was quietly drifting off my projected curve. I had set a Project Spend Cap, so if it hit the ceiling it would stop there — and for a moment, that thought reassured me.

But once I looked again calmly, that "stop" does not stop only free users. The paid users' calls, running on the same project's key, stop together with them. The very people I wanted to protect would be caught in the blast. A design meant to set a ceiling was one step away from cutting the path that mattered most.

A small cold feeling settled in my chest. A safety net, strung up with good intentions, becomes a blade when it catches the wrong way. This is a design note on placing Project Spend Caps correctly as a last-resort safety net, splitting their blast radius between paid and free, and degrading in stages before the hard block.

A Project Spend Cap stops "all of the project"

First, let's get the feature's outline precise. Project Spend Caps launched in AI Studio on March 16, 2026, as a per-project monthly dollar limit. Configuration is entirely console-side: in AI Studio, select the target project, open "Spend" in the sidebar, and under "Monthly spend cap" click "Edit spend cap" to enter the amount.

The official starting points, organized by use case, look like this.

Use case	Recommended starting monthly cap
Personal experimentation	$10
Prototype	$50
Small production	$200
Growing app	$500

Separately, billing-account-level tier caps took effect on April 1, 2026: $250 for Tier 1, $2,000 for Tier 2, and $20,000 or more for Tier 3. It's easiest to think of Project Spend Caps as a way to cut a finer ceiling per project, inside that account-level cap.

The behavior is the crux. When a project reaches its cap, API requests from that project are blocked until the next billing cycle begins or you raise the cap. There is roughly a 10-minute delay before it takes effect, and any overage incurred during that window is on you.

That's the extent of what's documented. The primary sources are the official announcement on controlling Gemini API costs and the Billing documentation. The problem is what isn't written there: what gets dragged down with you after the cap trips.

One project, one cap is dangerous because the blast radius is too wide

Most solo projects place a single API key in a single GCP project and serve free and paid users from the same key. It's simple, and at first there's no problem at all.

Apply one Project Spend Cap to that setup, and the cap acts on the project's total spend. So even if the cause of hitting the cap is a flood of free-user traffic, what stops is every request in the project. The calls of paid users — the ones supporting you with a few hundred yen of tips or subscriptions a month — start returning 4xx/5xx at the same instant.

In availability terms, this is a blast radius that's too wide. The runaway cost source is the free tier, yet the damage reaches the paid tier. The revenue path you want to protect and the risk source bleeding cost share a fate under the same ceiling.

As an indie developer running several solo projects in parallel myself, this one hit home. Stopping cost is correct in itself. But unless you design "for whom you stop, and whom you keep running," the safety net drops the very person who matters most.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Breaks down the one-project-one-cap trap and contains the blast radius — where a runaway free tier stops paying users too — by splitting projects per tier

✦Implements an application-side soft budget gate that fires before the platform cap, degrading gracefully to a cheaper model or cached response instead of a hard block, accounting for the ~10-minute delay

✦A reconciliation pattern that instruments both the real-time estimate from usageMetadata and the lagging authoritative billing value, alerting on divergence (with working TypeScript)

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Split projects per tier, split the ceiling

The fix is simple. Separate the runaway cost source from the revenue path you want to protect into different GCP projects.

Concretely, split a free-tier project and a paid-tier project, and give each an independent API key and an independent Project Spend Cap. Set a low cap on the free tier (say $30) and a roomier cap on the paid tier (say $300). Now, even if free-user abuse drives the free project into its ceiling, the paid project keeps running unscathed. The blast radius is contained inside the tier.

The application side only needs to choose which key to use based on the user's tier.

// project-router.ts
// Route by user tier to keys from independent GCP projects.
// Give each project its own Project Spend Cap.
 
type Tier = "free" | "paid";
 
interface ProjectBinding {
  apiKey: string;        // API key issued per project
  projectId: string;     // label for observation / reconciliation
  softBudgetUsd: number; // soft budget (below); set lower than the platform cap
}
 
const PROJECTS: Record<Tier, ProjectBinding> = {
  free: {
    apiKey: process.env.GEMINI_KEY_FREE!,
    projectId: "myapp-free",
    softBudgetUsd: 25, // throttle at $25, ahead of the $30 hard cap
  },
  paid: {
    apiKey: process.env.GEMINI_KEY_PAID!,
    projectId: "myapp-paid",
    softBudgetUsd: 270, // go on alert at $270, ahead of the $300 hard cap
  },
};
 
export function resolveProject(tier: Tier): ProjectBinding {
  return PROJECTS[tier];
}

One operational note here. Splitting projects lets you read free and paid costs as independent invoices. As a byproduct, you get an accurate monthly picture of which tier is eating your margin. Beyond the cap itself, it works as an instrument for business decisions.

A "soft budget gate" to close the ~10-minute delay

Even after splitting projects, there's still a hole. Project Spend Caps take roughly 10 minutes to take effect, and the overage during that window is on you. In other words, the hard cap is not a real-time control device but a safety net that engages late. When calls cluster per second, 10 minutes is enough to overshoot the cap badly.

So place an application-side soft budget gate that fires before you hit the platform cap. The idea: right after each call, compute an approximate cost from usageMetadata and accumulate the month-to-date total per project in a counter. When the total exceeds the soft budget, instead of hard-blocking, degrade in stages — downgrade an upper model to a lower one, serve a cached response, or politely ask the user to wait.

First, the part that derives an approximate cost from usageMetadata. Since pricing varies by model and region, keep the price table externalized as config.

// cost.ts
// Unit price per 1M tokens (USD). Update to match current pricing.
interface ModelPrice { inPerM: number; outPerM: number; }
 
const PRICES: Record<string, ModelPrice> = {
  "gemini-3.5-flash": { inPerM: 0.30, outPerM: 2.50 },
  "gemini-3-flash":   { inPerM: 0.15, outPerM: 0.60 },
  // Always confirm and update against the official Billing docs
};
 
interface UsageMetadata {
  promptTokenCount?: number;
  candidatesTokenCount?: number;
}
 
export function estimateCostUsd(model: string, usage: UsageMetadata): number {
  const p = PRICES[model];
  if (!p) return 0; // for unknown models, prefer raising a separate alert over returning 0
  const inTok = usage.promptTokenCount ?? 0;
  const outTok = usage.candidatesTokenCount ?? 0;
  return (inTok / 1_000_000) * p.inPerM + (outTok / 1_000_000) * p.outPerM;
}

Next, the gate that accumulates the month-to-date total and compares it against the soft budget. This is written assuming an external counter like Cloudflare KV, but Redis or a DB works too. The requirement is simply that the month-to-date estimate can be shared across the project.

// budget-gate.ts
import { resolveProject, type Tier } from "./project-router";
import { estimateCostUsd } from "./cost";
 
interface Counter {
  get(key: string): Promise<number>;
  add(key: string, delta: number): Promise<void>;
}
 
function monthKey(projectId: string): string {
  const now = new Date();
  const ym = `${now.getUTCFullYear()}-${String(now.getUTCMonth() + 1).padStart(2, "0")}`;
  return `spend:${projectId}:${ym}`;
}
 
export type Decision =
  | { action: "proceed"; model: string }
  | { action: "degrade"; model: string } // downgrade to a lower model
  | { action: "serve_cache" }            // switch to a cached response
  | { action: "soft_block" };            // politely ask the user to wait
 
export async function decideBeforeCall(
  tier: Tier,
  requestedModel: string,
  counter: Counter,
): Promise<Decision> {
  const proj = resolveProject(tier);
  const spent = await counter.get(monthKey(proj.projectId));
  const ratio = spent / proj.softBudgetUsd;
 
  if (ratio < 0.8) return { action: "proceed", model: requestedModel };
  if (ratio < 0.95) return { action: "degrade", model: "gemini-3-flash" };
  if (ratio < 1.0) return { action: "serve_cache" };
  return { action: "soft_block" };
}
 
// Accumulate the approximate cost after the call
export async function recordCost(
  tier: Tier,
  model: string,
  usage: { promptTokenCount?: number; candidatesTokenCount?: number },
  counter: Counter,
): Promise<void> {
  const proj = resolveProject(tier);
  const cost = estimateCostUsd(model, usage);
  await counter.add(monthKey(proj.projectId), cost);
}

The advantage of this gate is that it lets you escape toward "lowering experience quality a little" rather than a hard 4xx. You answer free users with a lower model or cache while bending the slope of spend down yourself, ahead of the cap. Rather than fearing the ~10-minute delay, you actively throttle the month's consumption down to a pace where that delay no longer matters.

Reconcile against the authoritative billed value, assuming the estimate drifts

What the soft budget gate accumulates is, after all, an estimate from usageMetadata. It will always diverge from the actual bill. Discounts on cached tokens, failed requests, a stale price table — there are several reasons it drifts.

So hold two layers: the estimated counter as "fast and real-time but approximate," and the billing and Project Spend Cap display as "lagging but authoritative," and reconcile them periodically. When the divergence exceeds a set band, treat it as a sign of a bug in the price table or the instrumentation.

// reconcile.ts
// At a cadence like once a day, reconcile the estimated total against the actual billed value.
// Pull the billed value from Cloud Billing aggregation or the AI Studio Spend display.
 
interface ReconcileInput {
  projectId: string;
  estimatedUsd: number; // your own counter's month-to-date total
  billedUsd: number;    // actual month-to-date value from billing
  hardCapUsd: number;   // the Project Spend Cap you've set
}
 
export function reconcile(input: ReconcileInput): {
  divergencePct: number;
  alerts: string[];
} {
  const { estimatedUsd, billedUsd, hardCapUsd } = input;
  const base = Math.max(billedUsd, 1e-6);
  const divergencePct = Math.abs(estimatedUsd - billedUsd) / base * 100;
 
  const alerts: string[] = [];
  if (divergencePct > 15) {
    alerts.push(`Estimate vs. billed diverges by ${divergencePct.toFixed(1)}% — inspect price table or instrumentation`);
  }
  if (billedUsd / hardCapUsd > 0.85) {
    alerts.push(`Billed value is ${(billedUsd / hardCapUsd * 100).toFixed(0)}% of the hard cap — consider adjusting the cap`);
  }
  return { divergencePct, alerts };
}

With this reconciliation in place, you notice before the soft budget gate becomes unreliable. If the estimate has quietly drifted far from the actual value, the gate fires too early or too late, needlessly throttling free users or overshooting the cap. The idea is to periodically calibrate the instrument itself.

What to do — and not do — as you approach the cap

Finally, a short operational summary.

When the billed value crosses 85% of the hard cap, first isolate which tier is the cause. If you've split projects, the bill tells you immediately which ceiling you're nearing. If it's the free tier, apply degradation first: lower the soft budget gate's threshold, downgrade to an even lighter model, widen the cache range.

If the paid tier is the one nearing the cap, that's actually a good sign that the business is growing. Before rushing to block, decide to raise the Project Spend Cap. The raise takes effect immediately, but don't forget to raise the soft budget and alert thresholds to match. Move only one and the instrument's markings go out of alignment.

What you must not do is use the hard cap as a day-to-day control device. Given the ~10-minute delay and the per-billing-cycle release, the hard cap is a final brake that stops an accident at a fixed amount — not a tool for adjusting the throttle. Leave daily adjustment to the soft budget gate, and keep the hard cap quietly in place as the ceiling for when things truly get out of hand.

As a next step, start with separating the free and paid projects. Splitting into two keys and giving each its own cap already contains the blast radius inside the tier. The soft budget gate and reconciliation can be layered on top, a little at a time. I'd be glad if this gives a handhold to any other indie developer running several solo projects in parallel.

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.