GEMINI LABJP
FLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasksAGENTS — Managed Agents arrive in public preview, running autonomous agents in Google-hosted isolated Linux sandboxesWEBHOOK — Event-driven webhooks now replace polling for the Batch API and long-running operationsSEARCH — File Search goes multimodal, embedding and searching images via gemini-embedding-2SUNSET — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down on June 25ANTIGRAVITY — The Antigravity Agent managed agent (antigravity-preview-05-2026) is available in public previewFLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasksAGENTS — Managed Agents arrive in public preview, running autonomous agents in Google-hosted isolated Linux sandboxesWEBHOOK — Event-driven webhooks now replace polling for the Batch API and long-running operationsSEARCH — File Search goes multimodal, embedding and searching images via gemini-embedding-2SUNSET — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down on June 25ANTIGRAVITY — The Antigravity Agent managed agent (antigravity-preview-05-2026) is available in public preview
Articles/API / SDK
API / SDK/2026-06-15Advanced

Permission-Aware RAG — Designing Gemini Search That Only Cites What the User Is Allowed to See

The day you add RAG to internal search, drafts and finance memos nobody should see start leaking into answers. This is a production design — metadata filtering, defense in depth, and audit logging — for letting Gemini search while respecting permissions, with working code.

gemini-api243rag19security9access-controlproduction116

Premium Article

I still remember the first day I wired Gemini-powered RAG into an internal knowledge search. The demo was flawless. Ask "how did ad revenue trend last month?" and it pulled the right-looking numbers and answered cleanly. The problem surfaced the moment I asked that same question signed in as a part-timer's account.

A finance memo I'd only ever shared with leadership quietly showed up as the basis for the answer.

The document body never rendered. But Gemini happily said "last month was up versus the prior month" — summarizing numbers that account should never have touched. Vector search returns "chunks that are semantically close," and asks nothing whatsoever about "is this person allowed to see it."

Permission-aware RAG is not a feature you bolt on later. You have to weave permissions into the retrieval design itself. Here I'll walk through the failures I hit personally as an indie developer building cross-service knowledge search across the Dolice Labs properties, and how I fixed them — with runnable code.

Why ordinary RAG leaks silently

A typical RAG pipeline has three stages. Split documents into chunks, embed them, store them in a vector store. When a question arrives, run nearest-neighbor search for the top k, and hand those to Gemini as context.

Nowhere in that design does the word "permission" appear. The index is "one giant bag where everyone's documents are mixed together," and nearest-neighbor search simply pulls semantically close items out of that bag. If an executives-only finance memo is in the bag, it'll match a part-timer's question too.

The first fix people reach for is "post-filtering": let Gemini answer, then check the cited documents and drop the ones the user can't access. This is wrong twice over. First, by the time you drop them, Gemini has already read and summarized the secret. Second, if the top k fill up with documents the user can't see, the documents they should see get pushed out of range, and answer quality drops too.

The correct order is reversed: filter by permission, then search. This is called pre-filtering, or security trimming.

The core design — making permission a first-class part of retrieval

The idea is simple. Give every single chunk an ACL (access control list): "the set of principals allowed to see this chunk." A principal is a user ID or a group ID.

At query time, expand the asking user into their set of principals, and only consider "chunks whose ACL intersects one of those principals." The vector-similarity computation happens only within that narrowed set.

The key is to enforce the ACL as a metadata filter in the vector store. Most vector stores (Pinecone, Qdrant, pgvector, sqlite-vec, and others) can apply a nearest-neighbor search and a metadata condition simultaneously. That gets you the top k that are "semantically close and allowed" in a single query.

First, the indexing side. When you store a chunk, always attach allowed_principals alongside the embedding.

from google import genai
 
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
 
def embed(text: str) -> list[float]:
    res = client.models.embed_content(
        model="gemini-embedding-001",
        contents=text,
    )
    return res.embeddings[0].values
 
def index_chunk(store, doc_id: str, chunk_text: str, allowed_principals: list[str]):
    """Store one chunk with its ACL.
    allowed_principals lists the user/group IDs allowed to see this chunk.
    e.g. ["user:masaki", "group:executives"]
    """
    if not allowed_principals:
        # An empty ACL means "nobody can see it," NOT "everyone can."
        # Get this backwards and any unlabeled document leaks to everyone.
        raise ValueError(f"doc={doc_id} has no allowed_principals (rejecting to prevent accidental public exposure)")
    store.upsert(
        id=f"{doc_id}",
        vector=embed(chunk_text),
        metadata={
            "text": chunk_text,
            "doc_id": doc_id,
            "allowed_principals": allowed_principals,
        },
    )

Throwing on an empty allowed_principals looks minor but is the single most important line. The implementations that fail are the ones that silently treat "empty ACL = unrestricted = public." The safe default is not "visible to all" but "visible to none" (deny by default). A document someone forgot to label should disappear from search, not leak.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
If you froze up at the thought of internal search leaking other people's drafts or financials into answers, you'll leave with a production architecture that treats permissions as a first-class part of retrieval.
You'll get copy/paste-ready defense in depth: ACL labels on every chunk, a hard pre-filter by the user's principal set at query time, and a re-check against the source of truth right before answering.
You'll close the 'stale ACL' gap and the subtler leaks through citations and chat history, and design an audit log that records who answered what, and on what basis.
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-15
Defending Against Prompt Injection When You Pass External Text to the Gemini API
User reviews, scraped articles, and other untrusted text are the entry point for indirect prompt injection when you feed them to the Gemini API. Here is a prioritized, code-backed defense you can drop into a production pipeline: trust-boundary isolation, schema constraints, a two-stage screening pass, and output sanitization.
API / SDK2026-05-06
Building a RAG Evaluation Framework with Gemini API: RAGAS, LLM-as-Judge, and Custom Metrics Production Masterclass
Complete guide to building a quantitative RAG evaluation framework using RAGAS, LLM-as-Judge with Gemini API, and custom domain metrics — including CI/CD integration and production monitoring.
API / SDK2026-05-02
Building a Fully Edge RAG with Gemini API and Cloudflare Vectorize: A Production Guide for Low Latency, Low Cost, Global Delivery
Combine Gemini Embedding with Cloudflare Vectorize to ship a production RAG that runs entirely inside the Workers runtime — global latency, predictable cost, and a defensive layer covering subrequest limits, retries, and tenant isolation.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →