GEMINI LABJP
SEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact sourceAPI — Event-driven Webhooks replace polling for the Batch API and long-running operationsDEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation nowMODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x fasterAGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxesSTUDIO — Google AI Studio can now generate Android apps from natural-language promptsSEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact sourceAPI — Event-driven Webhooks replace polling for the Batch API and long-running operationsDEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation nowMODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x fasterAGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxesSTUDIO — Google AI Studio can now generate Android apps from natural-language prompts
Articles/API / SDK
API / SDK/2026-06-24Advanced

Citing the exact page and figure in File Search answers with visual-citation metadata

File Search grounding metadata now carries media_id and page_numbers, so you can trace each sentence of an answer back to a specific page and figure. Here's how I built a sentence-level, verifiable citation layer over a mix of PDFs and images.

gemini88file-search2grounding5gemini-api246rag21

Premium Article

Feed a PDF into File Search and ask Gemini about it, and you could always get back "Source: design-spec.pdf." But the moment a reader or teammate asks "does it really say that?", you couldn't point them to which part of a 47-page PDF to read. As an indie developer running help-reference data for my own apps, I hit this wall over and over and ended up pasting screenshots by hand.

On June 24, 2026, File Search grounding metadata gained media_id (visual citations) and page_numbers, and that manual work is gone. You can now trace which sentence of an answer rests on which page and which figure, straight from the API response. This article walks through building a citation layer that attaches "page number + figure thumbnail" to each sentence, over reference data that mixes PDFs and images.

What actually changed — two new fields in grounding metadata

Until now, grounding metadata was, roughly, chunk-level: "this answer is based on these chunks." The two new fields push that granularity one level finer.

FieldWhere it livesMeaning
page_numbersretrieved_context of each grounding chunkWhich PDF page(s) the chunk came from (an array when it spans pages)
media_idretrieved_context of each grounding chunkThe visual-citation identifier — for image-derived chunks (figures, screenshots), it points to which image is the source

The key is how these combine with grounding_supports, which says "this span of the answer is supported by this chunk." Each support entry carries the start and end character index of an answer span plus the chunk indices behind it. Look up page_numbers and media_id by chunk index, and every sentence traces back to "page 12 of design-spec.pdf, figure 3."

Grasp the response shape first

Before the implementation, let's see what we're handling. A generate_content response with File Search enabled hangs grounding_metadata off candidates[0]. Cleaned up, it looks like this.

# Conceptual structure of grounding_metadata (a tidied real response)
{
  "grounding_chunks": [
    {
      "retrieved_context": {
        "title": "design-spec.pdf",
        "text": "Auth tokens expire after 3600 seconds by default…",
        "page_numbers": [12],          # <- new field
        "media_id": None               # text chunk, so None
      }
    },
    {
      "retrieved_context": {
        "title": "onboarding-flow.png",
        "text": "Login screen transition diagram",
        "page_numbers": None,
        "media_id": "media/abc123"     # <- new field (image-derived)
      }
    }
  ],
  "grounding_supports": [
    {
      "segment": {"start_index": 0, "end_index": 41, "text": "Tokens expire after 3600 seconds."},
      "grounding_chunk_indices": [0],
      "confidence_scores": [0.94]
    }
  ]
}

grounding_supports[i].grounding_chunk_indices points into grounding_chunks. Once you hold that mapping, the rest is just connecting sentences to sources.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Turn a setup that returned a source filename but never the exact page or figure into sentence-level, verifiable citations using page_numbers and media_id
Drop in rendering logic that joins grounding_supports with grounding_chunks to attach precise page numbers and figure thumbnails to every sentence
Take away the production fixes you'll actually hit: fallbacks for missing metadata and how to collapse duplicate citations from the same page
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-06-23
Your File Search Store Goes Stale in Production — Catalog Sync and Drift Detection That Actually Hold
Load a catalog into File Search once and forget it, and within weeks it starts confidently pointing users at assets you already pulled. Here is the sync pipeline I run: hash-based incremental import, a blue/green rebuild that swallows deletions, and a nightly drift audit.
API / SDK2026-06-19
Building location-aware AI with Gemini's Google Maps grounding: pricing and the source-display rules tutorials skip
How to ship a 'recommend something nearby' feature with Gemini API's Google Maps grounding, with the $25/1K cost design and the source-display obligations laid out for indie developers.
API / SDK2026-05-01
Citation-Grounded RAG with Gemini: Production Patterns for Source Attribution and Hallucination Detection
A practical guide to wiring trustworthy citations into a Gemini-powered RAG pipeline. Covers structured output, post-hoc validation, UI rendering, and a quantitative grounding score you can put on a dashboard.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →