GEMINI LABJP
OUTAGE — Gemini recovers from one of its biggest outages (errors 1076/1099) as engineering mitigations take effectDAILY-BRIEF — The new Daily Brief agent works overnight, analyzing your inbox, calendar, and tasks into a personalized morning digestGEMINI-OMNI — Gemini Omni combines Gemini with Google's generative media models to produce consistent, high-quality video from a single promptENTERPRISE — Gemini 3.5 Flash is enabled by default in Gemini Enterprise as of Jun 8 and can no longer be turned offDEPRECATION — Image preview models (3.1-flash-image / 3-pro-image) shut down Jun 25; migrate to the GA versions nowFILE-SEARCH — File Search now supports multimodal search, natively embedding and searching images via gemini-embedding-2OUTAGE — Gemini recovers from one of its biggest outages (errors 1076/1099) as engineering mitigations take effectDAILY-BRIEF — The new Daily Brief agent works overnight, analyzing your inbox, calendar, and tasks into a personalized morning digestGEMINI-OMNI — Gemini Omni combines Gemini with Google's generative media models to produce consistent, high-quality video from a single promptENTERPRISE — Gemini 3.5 Flash is enabled by default in Gemini Enterprise as of Jun 8 and can no longer be turned offDEPRECATION — Image preview models (3.1-flash-image / 3-pro-image) shut down Jun 25; migrate to the GA versions nowFILE-SEARCH — File Search now supports multimodal search, natively embedding and searching images via gemini-embedding-2
Articles/API / SDK
API / SDK/2026-06-12Intermediate

Is Anyone Actually Using Your Gemini Feature? Measuring Acceptance, Regeneration, and Edit Distance

Token charts will not tell you whether users embrace a Gemini-powered feature. A practical design for measuring acceptance rate, regeneration rate, and edit distance with Swift and BigQuery, with two weeks of real numbers.

gemini-api221product analyticsinstrumentationFirebase AnalyticsBigQueryindie development6

Premium Article

When the major Gemini API outage hit on June 11, I sat watching error-rate graphs, waiting for recovery. The next morning, with every chart back to normal, a different question crept in. Error rate: fine. Latency: fine. Token consumption: exactly as projected. But was the feature actually being used? I had not a single number that could answer that.

One of the wellness apps I run as an indie developer has a small Gemini-powered feature that generates a short encouraging message when the app opens in the morning. For the two weeks after launch, everything I monitored lived on the API side. Whether users saved the output, immediately regenerated it, or quietly closed the screen — I was collecting none of it. This article is the design record of the product-side instrumentation I built after that realization.

Token Consumption Does Not Measure Value

An API dashboard reports request counts, error rates, latency, and token consumption. All of these are essential for operational health, but every one of them is a supply-side number. None of them describe demand.

In my app, generations held steady at roughly 1,900 per day, and from the supply side things looked healthy. Once the instrumentation described below went in, a different picture emerged: only 41% of generated messages were saved, and 28% were discarded through the regenerate button. Tokens were being consumed on schedule while nearly half the output was a miss for the person reading it. The supply-side charts stay clean while the demand side fails silently — that, I learned firsthand, is the uncomfortable property of AI features.

LLM-ops practice has plenty to say about automated quality evaluation and cost monitoring, but user acceptance is a different layer. A response that scores perfectly on a quality rubric still gets discarded if it does not match the reader's mood. A product feature needs product measurement.

Break the Generation Lifecycle into Five Events

I started by writing down, in chronological order, everything that happens between the user and a generated message, and collapsed it into five events.

  • ai_shown — the feature's entry point became visible (the generate button is on screen)
  • ai_generated — the first generation completed and the output was presented
  • ai_regenerated — the user tapped "try again" and replaced the output
  • ai_accepted — the user did something that counts as acceptance, such as saving or sharing
  • ai_generation_failed — the generation ended in an error

There is deliberately no abandoned event. A silent exit means the user did nothing, and "nothing" is unreliable to emit from a client. Instead, abandonment is derived at query time: a session with ai_generated but neither ai_accepted nor ai_regenerated. Deriving it turned out to be the only way to count it without gaps.

Every event carries the same four parameters.

  • feature — an identifier such as daily_message, so multiple AI features in one app can be compared on the same axis
  • prompt_version — a version string for the prompt; every improvement is evaluated by splitting on this
  • model_id — the model that served the call, so model migrations can be isolated from prompt changes
  • latency_ms — perceived wait time, measured from tap to render, not the API-side latency

The discipline that matters most here is keeping the event count small. My first draft had twelve events; the rollup queries only ever touched five. The customer of your instrumentation is the future you writing SQL. An event that never appears in a query is nothing but transmission cost.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A five-event taxonomy for the generation lifecycle, with a complete Swift instrumentation wrapper
Working definitions for acceptance rate, regeneration rate, and normalized edit distance, plus weekly rollup SQL for the GA4 BigQuery export
Two weeks of real numbers that raised acceptance from 41% to 63%, and a triage checklist for when the metrics stay low
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-12
Letting File Search's Multimodal Mode Find Wallpapers I Couldn't: A Field Report
I tested whether File Search's new multimodal retrieval (gemini-embedding-2) could replace category tags for finding one wallpaper among thousands. A 300-image trial, the walls I hit, and where semantic search actually fits — with working code.
API / SDK2026-06-12
Gemini's Preview Image Models Shut Down on June 25 — Code Diffs and Checks From an Actual GA Migration
How I moved my image pipeline off Gemini's preview image models before the June 25 shutdown — confirming GA model IDs, Python code diffs, regression checks, and a safe cutover order.
API / SDK2026-06-12
Gemini Interactions API: Fixing What Broke When the Legacy outputs Schema Was Removed on June 6
Google removed the Gemini Interactions API legacy outputs schema on June 6, 2026. A symptom-based walkthrough of migrating to steps and the new response_format.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →