GEMINI LABJP
FLASH GA — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for sustained frontier performance on agentic and coding tasksTOGGLE — From Jun 16 the Gemini 3.5 Flash feature toggle is removed in the Global, US, and EU multi-regions, so check any configs that depend on itAGENTS — Managed Agents launched in public preview, letting developers build and deploy autonomous, stateful agents inside Google-hosted isolated Linux sandboxesIMAGE — The image preview models gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25; migrate to their successorsSEARCH — File Search now supports multimodal search, natively embedding and searching images via the gemini-embedding-2 modelCLI — Gemini CLI and Code Assist end individual access on Jun 18; free users and AI Pro/Ultra subscribers are directed to the Antigravity CLIFLASH GA — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for sustained frontier performance on agentic and coding tasksTOGGLE — From Jun 16 the Gemini 3.5 Flash feature toggle is removed in the Global, US, and EU multi-regions, so check any configs that depend on itAGENTS — Managed Agents launched in public preview, letting developers build and deploy autonomous, stateful agents inside Google-hosted isolated Linux sandboxesIMAGE — The image preview models gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25; migrate to their successorsSEARCH — File Search now supports multimodal search, natively embedding and searching images via the gemini-embedding-2 modelCLI — Gemini CLI and Code Assist end individual access on Jun 18; free users and AI Pro/Ultra subscribers are directed to the Antigravity CLI
Articles/Dev Tools
Dev Tools/2026-06-15Advanced

When Your Firestore × Gemini Embeddings RAG Quietly Degrades — Designing for Re-Embedding

A RAG built on Firestore native vector search and Gemini Embeddings drifts when the embedding model changes generations, and retrieval quality drops with no errors. Here is how to detect the drift, re-embed without downtime, and keep retrieval cost in check.

gemini-api232firestorevector-search5rag19embeddings9reembeddingproduction106

Premium Article

When results get "somehow worse," start here

A RAG built on Firestore native vector search plus Gemini embeddings is shockingly easy to stand up. You skip the separate vector database entirely, store a vector next to each document, fire a KNN query, and reasonable-looking search just works.

The trouble shows up later. After a few months in production, you start getting a vague report one day: search results have gotten "somehow worse." No errors. Latency is fine. Yet documents that used to land at the top with one query no longer surface.

Running help search for several of my own sites on this exact setup, as an indie developer, I have hit this more than once. The cause is almost always an embedding model generation change. Through 2026, Gemini's embeddings moved to gemini-embedding-001 as the GA model, and a multimodal line grew up alongside it for File Search. When the model changes, the same sentence embeds to a different vector. If your stored document vectors and your freshly generated query vectors live in different spaces, the distance math stops meaning anything.

This kind of silent degradation is easy to miss precisely because nothing throws. Below, I lay out how to detect the drift, how to rebuild without downtime, and how to keep retrieval cost under control — all in code.

Why the vector space drifts — the trap of version-less design

Vector search assumes the document side and the query side are expressed with the same embedding model, the same dimensionality, and the same normalization. In practice, that assumption breaks along three paths.

The first is swapping the embedding model itself. Vectors made with the text-embedding-004 generation and vectors made with gemini-embedding-001 differ in both dimensionality and internal representation. Change one line of model name in your code, and only new documents land in the new space while older ones stay in the old one.

The second is changing the output dimensionality. gemini-embedding-001 defaults to 3072 dimensions, but truncating to 768 or 1536 to save cost and storage is common. Firestore's vector index requires a fixed dimension, so the moment old and new dimensions mix, queries break outright.

The third is mismatching the task type. Gemini embeddings distinguish RETRIEVAL_DOCUMENT from RETRIEVAL_QUERY. Only when you embed with RETRIEVAL_DOCUMENT on write and RETRIEVAL_QUERY on search do you get the space tuned for asymmetric retrieval. Forget to align these and accuracy drops with no error at all.

The shared root cause is that the vector carries no metadata about which model, which dimension, and which task produced it. A version-less vector becomes indistinguishable the instant a new generation arrives.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How an embedding model upgrade silently misaligns your vector space, and the detection query that surfaces it in production
A blue-green re-embedding migration that rebuilds every document vector without taking the service down
Using RETRIEVAL_DOCUMENT vs RETRIEVAL_QUERY correctly, with a distance threshold and rerank to hold retrieval cost down
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Dev Tools2026-04-03
Next.js 15 App Router × Gemini API: The Complete Full-Stack
Build production-grade full-stack AI applications with Next.js 15 App Router and the Gemini API. Covers Server Actions, Streaming, RAG pipelines, authentication, rate limiting, and deployment.
API / SDK2026-04-28
Beyond Embeddings: Production Reranking with Vertex AI Ranking and Gemini-as-Judge
When pure embedding search nails the top-3 but buries the right answer at rank 4, you need a reranker. This guide walks through a production-grade two-stage architecture using Vertex AI Ranking API and Gemini-as-judge — with cost, latency, and evaluation patterns that hold up under load.
API / SDK2026-03-30
Gemini API Multimodal RAG Pipeline Production Guide— Building Cross-Format Search with Images, PDFs, and Video
Build a production-grade multimodal RAG pipeline with Gemini 2.5 Pro: unified vector search across text, images, PDFs, and video with cost optimization and scaling patterns.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →