Related:
◈ Gemini API/2026-06-23Advanced
Your Gemini API Average Latency Looks Great — But Some Users Still Get Stuck. Defending p95/p99
Your average TTFT is fast, yet a fraction of users keep hitting frozen responses. That is a tail-latency problem (p95/p99). From measurement to model routing, streaming budgets, cache accounting, and retry design — here are the defenses that actually held up in production, with code.
◈ Gemini API/2026-05-28Advanced
Running an SLO and Error Budget for the Gemini API as an Indie Developer — Guarding Four Sites with Burn-Rate Monitoring
Notes from running the Gemini API inside four production sites as an indie developer. A practical SLO and Error Budget design that fits a single-person operation: Cloudflare Workers and KV for burn-rate calculation, simplified multi-window alerts, and decision rules for what to freeze when the budget runs out.