●SEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact source●API — Event-driven Webhooks replace polling for the Batch API and long-running operations●DEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation now●MODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x faster●AGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxes●STUDIO — Google AI Studio can now generate Android apps from natural-language prompts●SEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact source●API — Event-driven Webhooks replace polling for the Batch API and long-running operations●DEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation now●MODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x faster●AGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxes●STUDIO — Google AI Studio can now generate Android apps from natural-language prompts
Citing the exact page and figure in File Search answers with visual-citation metadata
File Search grounding metadata now carries media_id and page_numbers, so you can trace each sentence of an answer back to a specific page and figure. Here's how I built a sentence-level, verifiable citation layer over a mix of PDFs and images.
Feed a PDF into File Search and ask Gemini about it, and you could always get back "Source: design-spec.pdf." But the moment a reader or teammate asks "does it really say that?", you couldn't point them to which part of a 47-page PDF to read. As an indie developer running help-reference data for my own apps, I hit this wall over and over and ended up pasting screenshots by hand.
On June 24, 2026, File Search grounding metadata gained media_id (visual citations) and page_numbers, and that manual work is gone. You can now trace which sentence of an answer rests on which page and which figure, straight from the API response. This article walks through building a citation layer that attaches "page number + figure thumbnail" to each sentence, over reference data that mixes PDFs and images.
What actually changed — two new fields in grounding metadata
Until now, grounding metadata was, roughly, chunk-level: "this answer is based on these chunks." The two new fields push that granularity one level finer.
Field
Where it lives
Meaning
page_numbers
retrieved_context of each grounding chunk
Which PDF page(s) the chunk came from (an array when it spans pages)
media_id
retrieved_context of each grounding chunk
The visual-citation identifier — for image-derived chunks (figures, screenshots), it points to which image is the source
The key is how these combine with grounding_supports, which says "this span of the answer is supported by this chunk." Each support entry carries the start and end character index of an answer span plus the chunk indices behind it. Look up page_numbers and media_id by chunk index, and every sentence traces back to "page 12 of design-spec.pdf, figure 3."
Grasp the response shape first
Before the implementation, let's see what we're handling. A generate_content response with File Search enabled hangs grounding_metadata off candidates[0]. Cleaned up, it looks like this.
# Conceptual structure of grounding_metadata (a tidied real response){ "grounding_chunks": [ { "retrieved_context": { "title": "design-spec.pdf", "text": "Auth tokens expire after 3600 seconds by default…", "page_numbers": [12], # <- new field "media_id": None # text chunk, so None } }, { "retrieved_context": { "title": "onboarding-flow.png", "text": "Login screen transition diagram", "page_numbers": None, "media_id": "media/abc123" # <- new field (image-derived) } } ], "grounding_supports": [ { "segment": {"start_index": 0, "end_index": 41, "text": "Tokens expire after 3600 seconds."}, "grounding_chunk_indices": [0], "confidence_scores": [0.94] } ]}
grounding_supports[i].grounding_chunk_indices points into grounding_chunks. Once you hold that mapping, the rest is just connecting sentences to sources.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Turn a setup that returned a source filename but never the exact page or figure into sentence-level, verifiable citations using page_numbers and media_id
✦Drop in rendering logic that joins grounding_supports with grounding_chunks to attach precise page numbers and figure thumbnails to every sentence
✦Take away the production fixes you'll actually hit: fallbacks for missing metadata and how to collapse duplicate citations from the same page
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Rendering logic that resolves page + figure per sentence
This is the heart of the article. We split the answer by the grounding_supports spans and attach a page number and image reference to each. The defenses against missing metadata are baked in from the start so you can use it as is.
from google import genaifrom google.genai import typesclient = genai.Client(api_key="YOUR_GEMINI_API_KEY")def build_cited_answer(response): """Split the answer into supported spans and attach verifiable sources.""" cand = response.candidates[0] meta = getattr(cand, "grounding_metadata", None) answer_text = cand.content.parts[0].text # No metadata = an ungrounded answer. Return it without sources. if meta is None or not getattr(meta, "grounding_supports", None): return [{"text": answer_text, "citations": []}] chunks = meta.grounding_chunks or [] segments = [] for sup in meta.grounding_supports: seg = sup.segment citations = [] for idx in sup.grounding_chunk_indices: if idx >= len(chunks): continue # guard against index drift ctx = chunks[idx].retrieved_context citations.append({ "title": ctx.title, "pages": getattr(ctx, "page_numbers", None), # e.g. [12] "media_id": getattr(ctx, "media_id", None), # e.g. "media/abc123" }) segments.append({ "text": answer_text[seg.start_index:seg.end_index], "citations": _dedupe_citations(citations), }) return segmentsdef _dedupe_citations(citations): """Collapse citations that point to the same file and page.""" seen, out = set(), [] for c in citations: key = (c["title"], tuple(c["pages"] or []), c["media_id"]) if key in seen: continue seen.add(key) out.append(c) return out
Run an answer through this and it comes out structured like so.
# Example output of build_cited_answer[ { "text": "Tokens expire after 3600 seconds.", "citations": [{"title": "design-spec.pdf", "pages": [12], "media_id": None}] }, { "text": "The post-login flow is shown in the following figure.", "citations": [{"title": "onboarding-flow.png", "pages": None, "media_id": "media/abc123"}] }]
Each sentence is now tied to "page 12 of design-spec.pdf" or "the figure in onboarding-flow.png." All that's left is rendering it.
Pulling the actual figure from a media_id
A media_id is a string identifier, not the image itself. To show a thumbnail, you need one extra step to fetch that media from the File Search store. Whether you can actually show the figure makes or breaks how convincing the citation feels.
def resolve_media_thumbnail(client, media_id): """Fetch displayable image bytes from a media_id. Returns None on failure.""" if not media_id: return None try: # Retrieve the stored media (the retrieval API depends on store config) media = client.files.get(name=media_id) return media # a file reference; convert to an <img> src in the UI except Exception as e: # Expired or deleted media is not a rare case print(f"media resolve failed for {media_id}: {e}") return None
Why wrap it in try/except matters in production. When you update reference data, old media_id values expire. Even a few seconds of lag between generation and rendering can produce a "media not found." If you don't swallow that exception and fall back to a text citation here, the whole citation UI crashes. I underestimated this at first and shipped a bug where figures went blank for a moment right after a store rebuild.
Turn PDF page numbers into a reader path
page_numbers is useful just displayed, but wiring it to a PDF viewer's page anchor makes it genuinely practical. Most viewers open a page via the URL fragment #page=12.
def page_anchor_url(base_pdf_url, page_numbers): """Turn page_numbers into a URL that opens the right PDF page.""" if not page_numbers: return base_pdf_url # Jump to the first page; show a range separately if needed return f"{base_pdf_url}#page={page_numbers[0]}"# Usageurl = page_anchor_url("https://example.com/docs/design-spec.pdf", [12])# -> "https://example.com/docs/design-spec.pdf#page=12"
With that in place, a link like "design-spec.pdf p.12 ↗" sits beside each sentence, and the reader jumps to the supporting paragraph in one click. Moving from "showing" a source to "letting the reader verify" it is what these two fields really unlock.
Pitfalls you will hit in production
The implementation is less of a time sink than the operations around it. Here are the holes I fell into running File Search help references, with fixes.
Symptom
Cause
Fix
Some sentences carry no source
That span isn't in grounding_supports (the model filled in from general knowledge)
Visually distinguish unsourced spans. Label them as "beyond the reference data," not fabrication
page_numbers stays None
A PDF where page boundaries can't be extracted (e.g. scanned images)
Run OCR + page tagging at ingest. Fall back to the page image via media_id
The same page is cited repeatedly
Multiple chunks retrieved from one page
Dedupe per page (the _dedupe_citations above)
media_id won't resolve
Old media expired after a data update
Fall back to a text citation in try/except and prompt a regenerate
That first row is the core of trustworthiness. Honestly showing in the UI that not every sentence has a source actually earns the reader's trust. Force a citation onto every line and you end up stamping "Source: foo.pdf" onto general claims that live outside the reference data — which is exactly where verification falls apart.
How well it fits your own reference data
This shines for documents with page structure (PDFs, slides) and for data where you want a figure to be the evidence. Conversely, a store of nothing but short text snippets won't carry page_numbers or media_id, and you're back to plain chunk citations. Take a quick inventory of which media your store is built from before adopting this, and you'll save the wasted effort.
Start by dropping one PDF into File Search and running it through build_cited_answer to see whether page_numbers comes back. If the pages it returns line up with the text, your reference data is already ready to produce verifiable citations.
Share
Thank You for Reading
Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.