Running Gemini behind a wallpaper app I maintain on my own, 429 RESOURCE_EXHAUSTED is not a rare error. The problem was that for a long time I didn't notice there are two kinds. One is a transient rate limit — you sent too much in the same second, and waiting a few hundred milliseconds clears it. The other is exhaustion — this project has spent its budget for the month, and no amount of waiting or retrying will get you through until the calendar flips.
Handling both with the same exponential backoff means the retry layer quietly thrashes on the second case. With a setting of up to seven retries per request, your app keeps pounding an exhausted project with seven times the doomed traffic, and to the user it just looks like an app that's mysteriously slow to load. For an ad-supported free app, that latency turns straight into churn.
On June 26, 2026, Project Spend Caps became generally available, letting you set a per-project monthly dollar ceiling. It's a welcome way to cap costs structurally — but it also reliably raises the odds of hitting a "you're over the cap" 429 in production. Which means a design that retries every 429 uniformly is exactly the thing worth revisiting right now. Separating projects at the structural level is covered in splitting Spend Cap blast radius by tier; this article focuses on degrading at request time.
Split 429 into "wait and it clears" and "waiting won't help"
The first move is to classify the 429 before it ever reaches the retry layer. There are three signals to lean on.
The first is google.rpc.RetryInfo in the error response. When the server explicitly says "you may retry after this delay," it includes a retryDelay field. A 429 carrying that is, by design, a rate limit you're allowed to retry.
The second is the QuotaFailure detail, which tells you which quota dimension you tripped (requests-per-minute, tokens-per-minute, and so on). A per-second or per-minute quota recovers if you wait; a daily or monthly ceiling operates on a completely different time scale.
The third — and the most important — is information only you hold: your own monthly spend gate. Trying to determine "did I hit the Spend Cap?" purely from the API's error body produces a brittle implementation that depends on the fine shape of the error. Instead, keep a rough running total of "how much have I spent this month?" on your side and make that the primary axis of classification. Treat the API details as a supporting signal only.
| Signal | Meaning | Retry decision |
|---|---|---|
| RetryInfo.retryDelay present | Server expects recovery after a stated wait | Retryable (wait the stated seconds) |
| QuotaFailure is a per-minute quota | RPM/TPM exceeded; recovers soon | Retryable (backoff) |
| Your monthly spend gate is over the line | Likely out of budget for the month | Not retryable (degrade) |
| No RetryInfo, repeated unexplained exhaustion | Undeterminable but not recovering | Trip the breaker conservatively |
The design rule here is: when in doubt, don't call. Retrying costs you time and a sliver of latency budget, but pounding an exhausted project buys you nothing at all.
Implementing the classifier
Using Gemini's Python SDK (google-genai), here's a classifier that reads those signals off the exception. Exception attribute names drift between SDK versions, so the trick is to extract things defensively rather than depend on one specific attribute.
# pip install google-genai
from dataclasses import dataclass
from enum import Enum
import json
import re
class Verdict(Enum):
RETRYABLE = "retryable" # clears with a wait (backoff OK)
TERMINAL = "terminal" # pointless this month (degrade)
UNKNOWN = "unknown" # undeterminable (trip conservatively)
@dataclass
class Classification:
verdict: Verdict
retry_after_s: float | None # server-stated wait, if any
reason: str
def _extract_details(err) -> dict:
"""Pull structured details off the exception, absorbing SDK differences."""
# google-genai's APIError often carries .code / .status / .details,
# but versions vary, so probe with getattr and fall back to the string body.
payload = {}
for attr in ("details", "response_json", "args"):
val = getattr(err, attr, None)
if isinstance(val, dict):
payload = val
break
if isinstance(val, (list, tuple)) and val and isinstance(val[0], dict):
payload = val[0]
break
if not payload:
# last resort: scrape a JSON fragment out of the stringified body
text = str(getattr(err, "message", "") or err)
m = re.search(r"\{.*\}", text, re.DOTALL)
if m:
try:
payload = json.loads(m.group(0))
except json.JSONDecodeError:
payload = {}
return payload
def _retry_delay_seconds(details: dict) -> float | None:
"""Convert google.rpc.RetryInfo retryDelay (e.g. "5s") into seconds."""
error = details.get("error", details)
for d in error.get("details", []):
t = d.get("@type", "")
if "RetryInfo" in t:
raw = d.get("retryDelay", "")
m = re.match(r"(\d+(?:\.\d+)?)s", str(raw))
if m:
return float(m.group(1))
return None
def _quota_dimension(details: dict) -> str | None:
"""Read the quota ID from QuotaFailure (a hint for per-minute vs not)."""
error = details.get("error", details)
for d in error.get("details", []):
if "QuotaFailure" in d.get("@type", ""):
for v in d.get("violations", []):
qid = v.get("quotaId") or v.get("subject") or ""
if qid:
return qid
return None
def classify_429(err, monthly_budget_exhausted: bool) -> Classification:
"""Classify a 429 into three buckets. monthly_budget_exhausted comes
from your own spend gate."""
details = _extract_details(err)
delay = _retry_delay_seconds(details)
qid = _quota_dimension(details) or ""
# If your own gate says "done for the month," trust that first.
if monthly_budget_exhausted:
return Classification(Verdict.TERMINAL, None, "monthly spend gate exhausted")
# Server stated a wait -> rate limit. Just wait.
if delay is not None:
return Classification(Verdict.RETRYABLE, delay, f"server RetryInfo={delay}s")
# Hit a per-minute quota (PerMinute, etc.) -> recovers with a wait
if re.search(r"(per[-_ ]?minute|PerMinute|RPM|TPM)", qid, re.IGNORECASE):
return Classification(Verdict.RETRYABLE, None, f"per-minute quota: {qid}")
# Daily/monthly/project exhaustion -> waiting generally won't fix it
if re.search(r"(per[-_ ]?day|PerDay|monthly|project)", qid, re.IGNORECASE):
return Classification(Verdict.TERMINAL, None, f"long-window quota: {qid}")
# Exhaustion with no RetryInfo and no readable dimension -> undeterminable
return Classification(Verdict.UNKNOWN, None, "no RetryInfo, unknown quota")The key is that monthly_budget_exhausted — your own boolean — is trusted above everything else. That value isn't a guess; it's a fact grounded in your own records. The API's error shape may change in the future, but the verdict "my estimated spend this month hit the ceiling" is owned by your code. Robustness in the Spend Cap era starts with not delegating that judgment to the server.