GEMINI LABJP
OUTAGE — Gemini recovers from one of its biggest outages (errors 1076/1099) as engineering mitigations take effectDAILY-BRIEF — The new Daily Brief agent works overnight, analyzing your inbox, calendar, and tasks into a personalized morning digestGEMINI-OMNI — Gemini Omni combines Gemini with Google's generative media models to produce consistent, high-quality video from a single promptENTERPRISE — Gemini 3.5 Flash is enabled by default in Gemini Enterprise as of Jun 8 and can no longer be turned offDEPRECATION — Image preview models (3.1-flash-image / 3-pro-image) shut down Jun 25; migrate to the GA versions nowFILE-SEARCH — File Search now supports multimodal search, natively embedding and searching images via gemini-embedding-2OUTAGE — Gemini recovers from one of its biggest outages (errors 1076/1099) as engineering mitigations take effectDAILY-BRIEF — The new Daily Brief agent works overnight, analyzing your inbox, calendar, and tasks into a personalized morning digestGEMINI-OMNI — Gemini Omni combines Gemini with Google's generative media models to produce consistent, high-quality video from a single promptENTERPRISE — Gemini 3.5 Flash is enabled by default in Gemini Enterprise as of Jun 8 and can no longer be turned offDEPRECATION — Image preview models (3.1-flash-image / 3-pro-image) shut down Jun 25; migrate to the GA versions nowFILE-SEARCH — File Search now supports multimodal search, natively embedding and searching images via gemini-embedding-2
Articles/API / SDK
API / SDK/2026-06-12Intermediate

Letting File Search's Multimodal Mode Find Wallpapers I Couldn't: A Field Report

I tested whether File Search's new multimodal retrieval (gemini-embedding-2) could replace category tags for finding one wallpaper among thousands. A 300-image trial, the walls I hit, and where semantic search actually fits — with working code.

Gemini API132File Searchmultimodal36image searchindie development5

"That sunset-over-the-ocean wallpaper with the purple grading — which folder was it in?" Last week, while assembling a featured collection for one of my wallpaper apps, I spent over ten minutes hunting for that single image. The category looked right. I walked the tags. Still nothing. The reason turned out to be mundane: the image had been filed under "sky," not "ocean."

As an indie developer, I maintain several thousand wallpaper assets across my apps. For organizing them, I rely on automatic classification into 30 categories with Gemini Vision — I wrote about that setup in my earlier field report on auto-classifying wallpapers with Gemini Vision, and the accuracy has held up well. But classification means putting each image into exactly one box. A nuance that spans boxes — "sunset" and "ocean" and "purple" — slips right through the structure.

Then came the news that File Search now supports native image embedding and retrieval via gemini-embedding-2. If a natural-language query can search the images themselves, maybe this cross-box treasure hunt finally goes away. Here is what I learned from a 300-image trial.

Classification wasn't broken — so why did I want search?

To be honest, category-based browsing works fine for end users. The pain was entirely on the operations side.

Curating featured pages, picking source material for store screenshots, swapping seasonal campaigns — for these tasks I search by impression, not by category. "A quiet, cool-toned morning." "A city night view with an open line of sight." A vocabulary like that doesn't compress into 30 categories, so I would end up scrolling thumbnail grids by eye. Five to fifteen minutes per hunt, several times a month.

Adding more tags was never a real option. The finer a tag taxonomy gets, the less consistently it gets applied, and the maintenance cost quietly devours whatever the search gains. That's not theory — it's what managing thousands of images has taught me. What I wanted was search without designing a taxonomy at all.

Getting 300 images into a store and searchable

I used the Python SDK (google-genai). There are only three moves: create a store, import images, query.

First, creating the store and uploading. One rule I now treat as non-negotiable: always attach an asset ID in custom_metadata so results can be joined back to your own asset database.

import pathlib
from google import genai
 
client = genai.Client()  # reads GEMINI_API_KEY from the environment
 
# Create a trial File Search store
store = client.file_search_stores.create(
    config={"display_name": "wallpaper-assets-trial"}
)
 
# Import the 300 images from the June 2026 intake batch
for path in sorted(pathlib.Path("./assets/2026-06").glob("*.jpg")):
    client.file_search_stores.upload_to_file_search_store(
        file_search_store_name=store.name,
        file=str(path),
        config={
            "display_name": path.stem,
            "custom_metadata": [
                {"key": "asset_id", "string_value": path.stem},
            ],
        },
    )

Why this matters: search results come back as retrieval chunks, and without an ID in the metadata you have no reliable way to map a hit back to the file in your own records. I imported my first few dozen images without IDs and had to redo them.

Querying is just a regular generate_content call with File Search passed as a tool.

from google.genai import types
 
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Find wallpapers of a sunset over the ocean with a purple-leaning tone",
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=[store.name]
                )
            )
        ]
    ),
)
 
# For an internal tool, the grounding chunks matter more than the prose
meta = response.candidates[0].grounding_metadata
for chunk in meta.grounding_chunks:
    print(chunk.retrieved_context.title)  # join against custom_metadata

When you're building an operations tool, what you actually want is the list of matching images, not the model's narrative answer. Treating grounding_metadata as the primary output and the response text as garnish made the whole implementation much more straightforward.

Three walls I hit

First, indexing lag. An image is not searchable the moment you upload it. In my 300-image trial, full availability took a few minutes — seven at the longest. That's fine for a batch-ingest-tonight, use-tomorrow workflow, but it breaks the manual habit of "upload, then immediately check." I ended up appending a wait step to my ingest script that fires a representative query and confirms the new batch is retrievable before exiting.

Second, mixing semantic and exact conditions. Embedding search is great at "purple-ish sunset over the ocean," but hard constraints like "aspect ratio 19.5:9 or taller" or "ingested before 2024" are simply not its job. Rather than force it, I settled on a two-stage design: File Search proposes semantic candidates, and my own asset database filters by resolution and intake date. Metadata filters exist on the File Search side too, but for numeric range filtering I trust my own database more.

Third, the cost model — which I actually came to like. File Search doesn't charge for query-time embeddings; you pay for embedding work at indexing time. In other words, you pay once at ingest, and searching afterward stays cheap. For 300 images the indexing cost was pocket change, but before committing thousands of assets I had to ask: do dormant images that nobody will ever search for belong in the store? My answer is no — active assets go in, retired ones get deleted from the store.

I'm keeping classification — the two solve different problems

Before the trial, part of me suspected semantic search might retire the category pipeline altogether. After running 20 test queries — 17 of which surfaced the intended image near the top — my conclusion is to keep both. They occupy different roles.

  • Where category classification wins: end-user browsing. The stability of the "boxes" is itself the value; in a screen people open daily, predictability builds trust
  • Where multimodal search wins: operations-side hunting. One-off queries whose vocabulary changes every time. Reducing taxonomy maintenance to zero is the real payoff
  • Where they meet: recurring search vocabulary ("sunset," "cool tones") becomes evidence for the next category redesign, or for the vocabulary of an in-app search feature

And that purple sunset from the opening? File Search returned it as the top hit on my very first query. Ten minutes of squinting at thumbnails collapsed into seconds, and I felt a quiet surge of warmth watching it happen.

Next steps

I'm not importing the full archive yet. The plan is to pipe only new intake batches into the store, log how many minutes of real work the search replaces each month, and decide from the numbers. Semantic search feels convenient in a way that's easy to overtrust — without measurements it risks becoming a standing cost with an unverified benefit.

If you maintain image assets that tags never quite capture, I'd suggest starting with a small store of a few hundred images. Once you account for indexing lag and metadata design, the whole trial fits in half a day. I hope this record is useful to anyone wrestling with the same problem.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API / SDK2026-06-03
Gemini Live API Audio Sounds Sped Up — Fixing the Sample Rate Mismatch
When Gemini Live API responses sound high-pitched and sped up, or come back full of noise, the cause is almost always that the 24kHz output is being played at a different sample rate. Here are the concrete fixes for both the browser and iOS.
API / SDK2026-05-14
3 Months Using Gemini API as My App Backend — An Indie Developer's Honest Review
After 12 years of indie development and 50M+ app downloads, I adopted Gemini API as the backbone for a new app. Here's what the costs, latency, and quality actually looked like after three months.
API / SDK2026-05-06
One Month with Gemini 2.5 Flash: An Indie Developer's Honest Cost and Performance Report
Real cost, speed, and quality data from running Gemini 2.5 Flash across three indie apps for a full month. Includes free-tier usage patterns, Flash vs Pro decision criteria, and cost-minimizing Python code.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →