GEMINI LABJP
SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soonSIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
Articles/API / SDK
API / SDK/2026-04-25Advanced

Tracing Gemini API in Production with OpenTelemetry: See Every Step of a Single Request

After three months of running Gemini API in production, plain logs stop telling you why latency, cost, or failures spike. This guide walks through wrapping Gemini in OpenTelemetry — Python and Node.js code, GenAI semantic conventions, sampling, and Grafana/Datadog wiring — so you can see the full anatomy of every request.

gemini-api285opentelemetrytracing2observability7production124

Premium Article

Three months into running Gemini API in production, the access log stopped being enough. Average latency looked fine, error rates looked fine, but real users kept hitting "the spinner just sat there for nine seconds" or "this user's bill jumped 8x overnight." Once I caught myself spending more time crafting Cloud Logging filters than actually finding the bottleneck, I finally introduced OpenTelemetry distributed tracing — and within a week I could explain every awkward request my service made.

This guide is the version of that journey I wish I had read before starting. We will wrap Gemini API calls in OpenTelemetry spans, follow the official GenAI Semantic Conventions, instrument streaming and Function Calling loops, and set up the whole thing so you can swap Grafana Tempo, Jaeger, and Datadog APM behind a single config flag — no vendor lock-in.

Why distributed tracing matters more for Gemini than for ordinary APIs

A backend that calls Gemini behaves quite differently from a CRUD API. A single user request usually fans out into retrieval, vector search, model calls, possible Function Calling loops, and post-processing. Each leg has its own failure mode and latency profile, and aggregate dashboards systematically hide the worst experiences.

Three real failure cases I have personally hit:

  • Average latency stayed at 1.8s, but gemini-2.5-pro calls with long prompts pinned at 12s, leaving the UI spinner running long enough that users thought the page had crashed.
  • Function Calling that was supposed to terminate in one hop occasionally looped to five hops on certain inputs, burning roughly 8x the tokens of a normal request.
  • Vector retrieval flipped between 200ms and 4 seconds depending on the day, and I wasted a week blaming Gemini before realizing the retriever was the problem.

You cannot chase any of these from a histogram. Distributed tracing reframes the problem: each user request becomes a single trace, and every internal step (model call, retrieval, JSON repair, guardrail) becomes a span you can lay out on a timeline. The moment you can replay a slow request frame by frame, the diagnostics conversation goes from speculation to evidence.

OpenTelemetry concepts, framed for a Gemini backend

You only need a handful of concepts before writing code:

  • Trace: the full record of one user request, identified by a single trace_id.
  • Span: the smallest unit inside a trace. Make one per measurable step — a Gemini call, a Pinecone lookup, a JSON validation pass.
  • Context propagation: the wiring that keeps spans nested across services. The HTTP traceparent header is what carries it.
  • Span attributes: key-value tags on each span (token counts, model name, end-user ID).
  • Span events: discrete in-span markers (streaming chunk arrivals, guardrail trips).

OpenTelemetry now publishes GenAI Semantic Conventions that standardize attribute names. Sticking to them is the cheapest insurance you can buy against future migration pain. The ones you will use constantly:

  • gen_ai.system — vendor identifier (google.gemini for our purposes)
  • gen_ai.request.model — the model you asked for (e.g., gemini-2.5-pro)
  • gen_ai.response.model — the model that actually answered (Google occasionally routes)
  • gen_ai.usage.input_tokens / gen_ai.usage.output_tokens — token usage
  • gen_ai.request.temperature, gen_ai.request.top_p, etc. — sampling parameters
  • gen_ai.response.finish_reasons — array of completion reasons

Once every span carries these consistently, your dashboards do not care which backend you ship traces to. You will thank yourself for the discipline six months from now when the on-call rotation changes hands.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Engineers stuck debugging Gemini latency, runaway cost, or silent retries can stand up span-level tracing today and see exactly where a request spends its time
You will learn the OpenTelemetry GenAI semantic conventions and a concrete pattern for emitting tokens, model choice, cache hits, and finish reasons consistently across every call
Streaming responses and Function Calling loops included — you can wire the same setup to Grafana Tempo, Jaeger, or Datadog APM without changing application code
Secure payment via Stripe · Cancel anytime
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-04-23
Gemini API × Langfuse — A Production Playbook for LLM Observability
A practical, production-grade guide to wiring Gemini API into Langfuse — tracing architecture, cost attribution, LLM-as-Judge on live traffic, PII masking, and sampling — with runnable code.
API / SDK2026-05-23
Gemini API × Sentry: A Production Pipeline for LLM Error Tracking and Prompt Failure Observability
Pair Sentry's error tracking with Gemini-specific failure modes so you can catch safety filter blocks, recitation rejections, empty completions, and quiet latency drift in production.
API / SDK2026-04-02
Gemini API × Spring Boot Enterprise Production Guide: Spring AI, Multi-Tenancy, Security & Observability
A complete guide to running Gemini API in production with Spring Boot. Covers Spring AI framework integration, multi-tenant architecture, API key management, async processing, observability with Micrometer/OpenTelemetry, and enterprise testing strategies.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →