GEMINI LABJP
MODEL — Gemini 3.5 Flash is now generally available, beating 3.1 Pro on nearly all benchmarks while running 4x fasterAGENTS — Managed Agents arrive in the Gemini API in public preview, running autonomous agents in isolated Google-hosted Linux sandboxesSEARCH — File Search adds multimodal search, natively embedding and searching images via gemini-embedding-2API — Event-driven webhooks now replace polling for the Batch API and long-running operationsSTUDIO — Google AI Studio builds Android apps from plain language and generates images on the fly with Nano BananaMIGRATION — Gemini CLI reaches end-of-life on June 18; migrate to the Agentic 2.0 CLI (two image-preview models retire June 25)MODEL — Gemini 3.5 Flash is now generally available, beating 3.1 Pro on nearly all benchmarks while running 4x fasterAGENTS — Managed Agents arrive in the Gemini API in public preview, running autonomous agents in isolated Google-hosted Linux sandboxesSEARCH — File Search adds multimodal search, natively embedding and searching images via gemini-embedding-2API — Event-driven webhooks now replace polling for the Batch API and long-running operationsSTUDIO — Google AI Studio builds Android apps from plain language and generates images on the fly with Nano BananaMIGRATION — Gemini CLI reaches end-of-life on June 18; migrate to the Agentic 2.0 CLI (two image-preview models retire June 25)
TAG

p95

1 articles
Back to all tags
Related:
Gemini API1Tail Latency1SLO1Streaming1
Gemini API/2026-06-23Advanced

Your Gemini API Average Latency Looks Great — But Some Users Still Get Stuck. Defending p95/p99

Your average TTFT is fast, yet a fraction of users keep hitting frozen responses. That is a tail-latency problem (p95/p99). From measurement to model routing, streaming budgets, cache accounting, and retry design — here are the defenses that actually held up in production, with code.