GEMINI LABJP
FLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasksTIER — New tiers like 3.1 Pro and 3.1 Flash-Lite are rolling into apps, cloud products, and business toolsPIXEL — The June Pixel Drop adds Gemini music generation, AI video and music creation, and screen-recording reactionsOMNI — Gemini Omni (creation), 3 Deep Think (reasoning), and Deep Research (automation) all advance in parallelLIVE — Gemini Live's real-time interaction is expanding across Android, Search, YouTube, and connected Google appsULTRA — Google AI Ultra offers top model access, Deep Research, Veo 3 video, and a 1M-token context windowFLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasksTIER — New tiers like 3.1 Pro and 3.1 Flash-Lite are rolling into apps, cloud products, and business toolsPIXEL — The June Pixel Drop adds Gemini music generation, AI video and music creation, and screen-recording reactionsOMNI — Gemini Omni (creation), 3 Deep Think (reasoning), and Deep Research (automation) all advance in parallelLIVE — Gemini Live's real-time interaction is expanding across Android, Search, YouTube, and connected Google appsULTRA — Google AI Ultra offers top model access, Deep Research, Veo 3 video, and a 1M-token context window
Articles/API / SDK
API / SDK/2026-06-19Advanced

When Your pgvector Search Quietly Gets Worse — Field Notes on Protecting Recall with Gemini Embeddings

A semantic search built on Gemini Embeddings and PostgreSQL pgvector tends to lose precision over months without throwing a single error. These are field notes on the real causes — model pinning, operator/index mismatch, HNSW reindexing, and recall collapse under filters — with working code.

gemini-api243pgvector4semantic-search2embeddings11postgresql2hnswproduction116

Premium Article

Right after launch, the search felt sharp. Six months in, it feels "slightly dull." No errors. Latency unchanged. But the article that used to land at the top now sits third, and the trickle of "I searched but couldn't find it" messages slowly grows. As an indie developer running cross-site search on pgvector across several of my own projects, I have been caught by this quiet decay more than once.

The tricky part is that it never surfaces as a bug. The SELECT succeeds and rows come back. What breaks is the ranking, not availability. These are field notes on how a semantic search built on Gemini Embeddings and PostgreSQL pgvector loses recall in production, and the fixes I actually applied, in order.

First, turn "dullness" into a number — you can't fix recall you don't measure

Before debating decay, you need a way to measure Recall@k, or the whole discussion collapses into anecdotes. Start by fixing a small evaluation set with known answers.

Treat exact (non-indexed) search as ground truth, then measure how much the approximate index path misses. In pgvector you can force exact scan by disabling the index scan.

# recall_probe.py — measure HNSW Recall@k against exhaustive search
import psycopg2
 
DB = {"host": "localhost", "database": "semantic_search",
      "user": "postgres", "password": "your_password"}
 
def topk_ids(cur, qvec, k, exact: bool):
    # disable index only for the exact (ground-truth) pass
    cur.execute("SET LOCAL enable_indexscan = %s", ("off" if exact else "on",))
    cur.execute(
        """
        SELECT id
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
        """,
        (str(qvec), k),
    )
    return [r[0] for r in cur.fetchall()]
 
def measure_recall(query_vectors, k=10):
    conn = psycopg2.connect(**DB)
    hits, total = 0, 0
    for qv in query_vectors:
        with conn.cursor() as cur:
            truth = set(topk_ids(cur, qv, k, exact=True))
            approx = set(topk_ids(cur, qv, k, exact=False))
        hits += len(truth & approx)
        total += k
    conn.close()
    return hits / total  # Recall@k
 
# e.g. track Recall@10 over 200 representative queries
# print(round(measure_recall(sample_query_vecs, k=10), 4))  # 0.991

Logging this weekly lets you isolate which of the causes below is biting. I settled on alerting when Recall@10 drops under 0.97 — catching the drift as it begins, not after rankings have already shifted.

Cause 1: storage and query build their vectors differently

This is the most common and most overlooked. Embeddings are only comparable when they were made with the same model, the same dimension, the same normalization, and the same task intent. Over a long-running system, that consistency erodes.

Three typical drifts:

DriftHow it happensResult
Silent model updateCode points at a latest alias and the model swaps underneathNew rows land in a different space and mix with old ones
task_type mismatchStorage uses RETRIEVAL_DOCUMENT and the query reuses the sameQuery-side optimization is lost; recall quietly drops
Dimension mix-upoutput_dimensionality changed later, or normalization skippedDistance scale shifts and thresholds become meaningless

The fix is simple: pin the vector-generation config in one place and reference an explicit model ID. Do not use a latest-style alias on the production storage or query path.

# embedding_config.py — pin generation config in one place
from google import genai
 
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
 
# Pin a fixed model, not an alias. Fix the dimension too, and make this
# module the only place allowed to call embed.
EMBED_MODEL = "gemini-embedding-001"   # never latest/exp in production
EMBED_DIM = 768
 
def embed(text: str, *, is_query: bool) -> list[float]:
    res = client.models.embed_content(
        model=EMBED_MODEL,
        contents=text,
        config={
            # always split task_type between storage and query
            "task_type": "RETRIEVAL_QUERY" if is_query else "RETRIEVAL_DOCUMENT",
            "output_dimensionality": EMBED_DIM,
        },
    )
    v = res.embeddings[0].values
    # When you request fewer than 3072 dims, Gemini may not normalize the
    # output, so L2-normalize yourself if you rely on cosine distance.
    norm = sum(x * x for x in v) ** 0.5
    return [x / norm for x in v] if norm else v

Stamp each row with the config it was built under so you can audit later. Keep embed_model and embed_dim next to embedding, and you can detect rows that no longer match the current config.

ALTER TABLE documents ADD COLUMN embed_model TEXT;
ALTER TABLE documents ADD COLUMN embed_dim INT;
 
-- check whether vectors built under different configs got mixed in
SELECT embed_model, embed_dim, count(*)
FROM documents
GROUP BY 1, 2
ORDER BY 3 DESC;
-- if this splits into 2+ groups, that mix is likely the "dullness" itself

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Why recall drops when storage and query embeddings drift in model, dimension, or task_type — and how to pin them down in config
A measurement-driven routine for tuning HNSW ef_search, handling dead tuples, and timing reindexes
Why HNSW recall collapses under WHERE filters, and how partial indexes, iterative scan, and candidate widening fix it
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-04-29
Dynamic Few-Shot for Gemini API — A Self-Improving Prompt That Picks Examples by Vector Search
Hand-picked, hard-coded few-shot examples stop scaling once your inputs drift. This guide builds a Gemini Embeddings + vector search pipeline that selects the best 3-5 examples per request and grows them from production feedback, with copy-paste code.
API / SDK2026-04-28
Beyond Embeddings: Production Reranking with Vertex AI Ranking and Gemini-as-Judge
When pure embedding search nails the top-3 but buries the right answer at rank 4, you need a reranker. This guide walks through a production-grade two-stage architecture using Vertex AI Ranking API and Gemini-as-judge — with cost, latency, and evaluation patterns that hold up under load.
API / SDK2026-04-14
Gemini API Embeddings vs Vector Databases: Pinecone, Qdrant, pgvector, and Cloud Spanner Compared for Production
Benchmark Pinecone, Qdrant, pgvector, and Cloud Spanner Vector using Gemini text-embedding-004 with real latency, cost, and code. The definitive production selection guide.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →