●MODEL — Gemini 3.5 Flash is now generally available, beating 3.1 Pro on nearly all benchmarks while running 4x faster●AGENTS — Managed Agents arrive in the Gemini API in public preview, running autonomous agents in isolated Google-hosted Linux sandboxes●SEARCH — File Search adds multimodal search, natively embedding and searching images via gemini-embedding-2●API — Event-driven webhooks now replace polling for the Batch API and long-running operations●STUDIO — Google AI Studio builds Android apps from plain language and generates images on the fly with Nano Banana●MIGRATION — Gemini CLI reaches end-of-life on June 18; migrate to the Agentic 2.0 CLI (two image-preview models retire June 25)●MODEL — Gemini 3.5 Flash is now generally available, beating 3.1 Pro on nearly all benchmarks while running 4x faster●AGENTS — Managed Agents arrive in the Gemini API in public preview, running autonomous agents in isolated Google-hosted Linux sandboxes●SEARCH — File Search adds multimodal search, natively embedding and searching images via gemini-embedding-2●API — Event-driven webhooks now replace polling for the Batch API and long-running operations●STUDIO — Google AI Studio builds Android apps from plain language and generates images on the fly with Nano Banana●MIGRATION — Gemini CLI reaches end-of-life on June 18; migrate to the Agentic 2.0 CLI (two image-preview models retire June 25)

TAG

p95

1 articles

← Back to all tags

Gemini API¹ Tail Latency¹ SLO¹ Streaming¹

◈ Gemini API/2026-06-23Advanced

Your Gemini API Average Latency Looks Great — But Some Users Still Get Stuck. Defending p95/p99

Your average TTFT is fast, yet a fraction of users keep hitting frozen responses. That is a tail-latency problem (p95/p99). From measurement to model routing, streaming budgets, cache accounting, and retry design — here are the defenses that actually held up in production, with code.