◈ API / SDK/2026-06-27Advanced

Your Gemini Structured Output Keys Keep Reordering — Pin Them With propertyOrdering

You constrained the shape with responseSchema, yet the JSON key order shifts between calls and your snapshot tests go red for no reason. Here is why field order is not guaranteed by default, how propertyOrdering fixes it, how Pydantic sets it for you, and how to align few-shot examples — all with working code.

Gemini API¹⁴⁸ Structured Output⁸ propertyOrdering Snapshot Testing JSON Schema²

✦ Premium Article

When you automate content operations across several sites, you end up asking Gemini for structured metadata constantly: give me the title, category, tags, and summary as JSON, then push that straight into the next stage. I felt safe because responseSchema constrained the shape. Then one day my snapshot tests started going red with no explanation.

Opening the diff was almost anticlimactic. The values were identical. The only thing that had changed was the order of the keys. An output that was {"title": ..., "tags": ...} came back the next run as {"tags": ..., "title": ...}. Right types, right values, and yet a text comparison reports a mismatch.

This article walks through that behavior — structured output does not guarantee field order by default — and how propertyOrdering pins it, including the trap that bites you once you add few-shot examples. It follows the same order in which I actually hit each issue as an indie developer maintaining pipelines across my own Dolice Labs sites.

Why the order moves — JSON schema being honest

The root of the confusion is in JSON itself: object keys carry no notion of order. The Schema object's properties (it descends from OpenAPI) is conceptually an unordered map, and the order the model writes fields in is decided at generation time. Even with temperature at 0 and a byte-identical prompt, sampling noise can swap the arrangement.

Google's structured output guidance says as much: if you depend on order, specify it explicitly. So there are two separate guarantees here — the type can be fixed, but the ordering has to be fixed separately. Miss that distinction and you lose time on the assumption that "I constrained the type, so the output must be fully determined," which is exactly the hole I fell into.

Here is the typical code that hands Gemini a raw dict schema.

# pip install google-genai
from google import genai
from google.genai import types
 
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
 
# A dict schema with no ordering specified
schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "category": {"type": "string"},
        "tags": {"type": "array", "items": {"type": "string"}},
        "summary": {"type": "string"},
    },
    "required": ["title", "category", "tags", "summary"],
}
 
resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Extract the metadata for this article: ...",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=schema,
    ),
)
print(resp.text)
# Sometimes {"title":..., "category":..., "tags":[...], "summary":...}
# Other times {"summary":..., "title":..., "tags":[...], "category":...}

Even though response_schema is set, the raw key order of resp.text wobbles between calls. If you only json.loads it and read values, nothing is wrong. The trouble appears only when you save and compare the generated JSON text itself. For me that was snapshot tests, plus a workflow that committed generated results to git as diffs.

Pin the order with propertyOrdering

The fix is simple: add propertyOrdering (camelCase in REST/JSON) to the Schema and list the property names in the order you want them emitted.

schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "category": {"type": "string"},
        "tags": {"type": "array", "items": {"type": "string"}},
        "summary": {"type": "string"},
    },
    # Add just this. The output stabilizes in this order.
    "propertyOrdering": ["title", "category", "tags", "summary"],
    "required": ["title", "category", "tags", "summary"],
}
 
resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Extract the metadata for this article: ...",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=schema,
    ),
)
# Output is always title -> category -> tags -> summary

propertyOrdering is an ordering hint, not a type constraint. Because validation still passes without it, it is easy to forget when you hand-write dict schemas — that has been my lived experience. If you have nested objects, each child Schema needs its own propertyOrdering too. Write it only at the top and the nested fields keep shuffling, and the diffs come back.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Understand why responseSchema does not pin key order, and add one line of propertyOrdering to a dict schema to lock it down today

✦Learn how passing a Pydantic model as response_schema turns declaration order into propertyOrdering automatically, with verification code

✦Stop snapshot regressions that flapped purely on key order, using a two-layer fix: pin the order and normalize on compare

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Pass a Pydantic model and declaration order becomes the order

Raw dicts are flexible, but hand-written propertyOrdering slips through the cracks. What I have settled on is passing a Pydantic model as response_schema to the google-genai SDK. In that case the SDK converts the model's field declaration order straight into propertyOrdering. You no longer write the order anywhere — "the order in your code" becomes "the order in the output."

from pydantic import BaseModel
from google import genai
from google.genai import types
 
class ArticleMeta(BaseModel):
    title: str
    category: str
    tags: list[str]
    summary: str
 
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")
 
resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Extract the metadata for this article: ...",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=ArticleMeta,  # declaration order becomes propertyOrdering
    ),
)
 
meta = resp.parsed   # comes back as an ArticleMeta instance
print(meta.title, meta.tags)

It is safer to verify the order really is pinned rather than assume it. When I migrated, I ran a small check that hit the same input dozens of times and counted whether the raw key order matched.

import json
 
def key_order(raw: str) -> list[str]:
    # Top-level keys, in the order they appear in the raw text
    return list(json.loads(raw).keys())
 
orders = set()
for _ in range(30):
    r = client.models.generate_content(
        model="gemini-2.5-flash",
        contents="Extract the metadata for this article: ...",
        config=types.GenerateContentConfig(
            response_mime_type="application/json",
            response_schema=ArticleMeta,
        ),
    )
    orders.add(tuple(key_order(r.text)))
 
assert len(orders) == 1, f"Order split into {len(orders)} variants: {orders}"
print("OK, key order pinned to one:", orders.pop())

Since Python 3.7, json.loads preserves object insertion order, so keys() lets you observe the on-the-wire ordering. Thirty calls with a single element in orders means it is pinned. On my machine, an order that split into 4-6 variants across 30 calls without propertyOrdering collapsed to exactly one after I set it.

Aligning order when you use few-shot

This was the trap I overlooked the longest. When you feed few-shot examples as JSON inside the prompt, matching the field order inside the examples to propertyOrdering measurably helps quality. If summary comes first in your examples while the schema pins title first, the model receives a contradictory signal about which to follow. That can degrade not just stability but the extraction accuracy itself.

The countermeasure is almost embarrassingly simple: do not hand-write the examples, generate them mechanically from the order you want to pin.

ORDER = ["title", "category", "tags", "summary"]
 
def ordered_json(d: dict) -> str:
    # Rebuild in ORDER, then serialize
    ordered = {k: d[k] for k in ORDER if k in d}
    return json.dumps(ordered, ensure_ascii=False)
 
few_shot = ordered_json({
    "title": "Summarizing meeting notes with Gemini",
    "category": "gemini-api",
    "tags": ["summary", "notes"],
    "summary": "How to summarize transcribed meeting notes with Gemini.",
})
# -> always {"title":...,"category":...,"tags":[...],"summary":...} in ORDER

If your few-shot examples, your propertyOrdering, and your Pydantic declaration order all derive from a single source of truth — one ORDER list — a mismatch becomes structurally impossible. I moved this ORDER constant to the top of the module and now build both the schema and the examples from it.

Do not rely on order alone — normalize on compare

Even with order pinned, as long as the comparison side demands exact text equality, a future minor change leaves room to go red again. So I run a two-layer defense: pin the order on the generation side, and normalize the keys on the comparison side. When taking a snapshot, parse once, re-sort the keys, then save and compare — the ordering wobble never reaches the comparison ring at all.

def canonical(raw: str) -> str:
    # parse -> sort keys -> re-serialize. Absorbs ordering differences.
    obj = json.loads(raw)
    return json.dumps(obj, ensure_ascii=False, sort_keys=True, indent=2)
 
# snapshot comparison
assert canonical(resp.text) == canonical(saved_snapshot)

Pinning the order stabilizes the artifact a human reads; normalization stabilizes the ring a machine compares in. The roles differ, so keeping both — rather than one — also makes it easier to isolate the cause later. In fact, had I added only this normalization before propertyOrdering, the test red would have gone away. But the diff committed to git stayed noisy on every run, so in the end I needed both.

The table below organizes the three layers I kept conflating: type, order, and comparison.

Layer	What it guarantees	Tool
Type	Presence and type of fields	responseSchema / required
Order	Key order (visual stability)	propertyOrdering / Pydantic declaration order
Compare	Stable diff judgement	parse + sort_keys normalization

Small snags I hit during migration

A few things tripped me up when wiring this into an existing pipeline, kept short. If it saves a little time for someone on the same setup, that would make me happy.

Missing the nesting, as above: writing propertyOrdering at the top does not govern the order inside nested objects. Write it per child Schema. The other was a casing mix-up between the raw dict and camelCase. The REST/JSON field is propertyOrdering, but some SDKs accept a snake_case alias, and I once wrote property_ordering by mistake and had it silently ignored. When something you set does not take effect, the fastest path is to log the actual schema you are sending and eyeball whether the field name is really there.

Finally, a key listed in propertyOrdering that is absent from properties — or the reverse, present in properties but missing from propertyOrdering — is its own source of accidents. On the test side I assert in one line that the two sets are equal, so any edit to the schema surfaces the drift immediately.

def assert_ordering_complete(schema: dict):
    props = set(schema["properties"].keys())
    ordered = set(schema.get("propertyOrdering", []))
    assert props == ordered, f"Ordering drift (missing/extra): {props ^ ordered}"

It is tempting to believe structured output is fully determined once the type is fixed, but in practice three separate concerns overlap: type, order, and comparison. Start by checking whether your own pipeline saves and compares generated JSON text. If it does, drop in propertyOrdering (or a Pydantic declaration order) in one place today. The quiet feeling of the diff red going still is modest, but it is real.

If you want to push structured output design one step deeper, the notes on switching types per input kind are a good next read: Switching Gemini structured output types per input with anyOf discriminated unions.

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.