●FLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasks●TIER — New tiers like 3.1 Pro and 3.1 Flash-Lite are rolling into apps, cloud products, and business tools●PIXEL — The June Pixel Drop adds Gemini music generation, AI video and music creation, and screen-recording reactions●OMNI — Gemini Omni (creation), 3 Deep Think (reasoning), and Deep Research (automation) all advance in parallel●LIVE — Gemini Live's real-time interaction is expanding across Android, Search, YouTube, and connected Google apps●ULTRA — Google AI Ultra offers top model access, Deep Research, Veo 3 video, and a 1M-token context window●FLASH — Gemini 3.5 Flash is now generally available, billed as the most intelligent model for agentic and coding tasks●TIER — New tiers like 3.1 Pro and 3.1 Flash-Lite are rolling into apps, cloud products, and business tools●PIXEL — The June Pixel Drop adds Gemini music generation, AI video and music creation, and screen-recording reactions●OMNI — Gemini Omni (creation), 3 Deep Think (reasoning), and Deep Research (automation) all advance in parallel●LIVE — Gemini Live's real-time interaction is expanding across Android, Search, YouTube, and connected Google apps●ULTRA — Google AI Ultra offers top model access, Deep Research, Veo 3 video, and a 1M-token context window
Generate Japanese and English in One Structured Call to Stop Term Drift
Generating Japanese and English versions separately makes terminology drift article by article. Pair both languages in one Gemini 3.5 Flash structured-output call, pin a glossary, and detect drift mechanically — with measured results.
Back when I generated the Japanese and English versions of an article in two separate calls, the part of my day that kept getting eaten was proofreading. Running paired bilingual posts at Dolice as an indie developer, the body would be correct, yet "streaming" would come out as streaming in one version and incremental delivery in the other. To a reader these are tiny snags, but when terminology wobbles from article to article, the quiet trust a site earns gets chipped away.
Most of that drift came from one thing: I was producing Japanese and English in two independent calls. The model picks "the most natural word right here" on each call, so when the calls are split, the wording splits too. In my own workflow I switched to putting both languages into a single structured output and fixing the glossary before generation. This article is the design, and a record of what actually changed after the switch.
Where the translation drifts when you generate separately
It helps to locate the drift first. When I generated the two languages independently, three spots wobbled most:
Product and feature terms (e.g. rendering "思考" as both thinking and reasoning)
UI labels and button text ("保存" as Save / Store / Keep)
Heading granularity (Japanese gets 8 sections, English silently compresses to 6)
The third one is the nastiest. When the meaning matches but the structure does not, switching languages makes the reader feel they cannot return to the same article. My site pairs URLs across languages, so a structural mismatch becomes a step the reader trips over.
The root cause is that the two calls do not know each other's results. So the clean fix is to put both into one output and let the model decide them together.
Decide two things before writing the schema
Before any schema, pin down two things. Skip them and drift survives even with structured output.
First, a glossary: the terms you refuse to let wobble, held as Japanese, English, and a short usage note. Second, an output contract: the fields that must match across languages (section count, number of code blocks, and so on).
Keep the glossary in an external file and load it at generation time. A minimal shape:
{ "glossary": [ { "ja": "思考", "en": "thinking", "note": "Gemini's thinking feature; do not split into reasoning" }, { "ja": "構造化出力", "en": "structured output", "note": "umbrella term when using responseSchema" }, { "ja": "用語集", "en": "glossary", "note": "the topic here; do not translate as vocabulary" } ], "contract": { "min_sections": 6, "lang_pair": ["ja", "en"] }}
Once "what should match" is written down, the rest is just schema and verification.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A concrete responseSchema that pairs Japanese and English so two calls collapse into one
✦Pinning a glossary in the system instruction cut translation drift by about 45% — with the verification code
✦Mechanical drift detection via term presence and length ratio, plus a partial-regeneration fallback
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
This is the heart of the switch. Before, I called twice — once for Japanese, once for English. After, one responseSchema holds both languages side by side and returns them in a single call.
Before (two independent calls):
from google import genaiclient = genai.Client()def gen_one(lang: str, topic: str) -> str: prompt = f"Write an article in {lang} on: {topic}" res = client.models.generate_content( model="gemini-3.5-flash", contents=prompt, ) return res.textja = gen_one("Japanese", "Gemini structured output")en = gen_one("English", "Gemini structured output")# ja and en never see each other's wording, so the translation drifts
After (the pair in one call):
from google import genaifrom pydantic import BaseModelclass Section(BaseModel): heading_ja: str heading_en: str body_ja: str body_en: strclass Article(BaseModel): title_ja: str title_en: str sections: list[Section]client = genai.Client()def gen_pair(topic_ja: str, topic_en: str, glossary_text: str) -> Article: prompt = ( "Write the Japanese and English versions of the same topic at once.\n" "Pair heading and body across languages in each section.\n" f"Japanese topic: {topic_ja}\n" f"English topic: {topic_en}\n" ) res = client.models.generate_content( model="gemini-3.5-flash", contents=prompt, config={ "system_instruction": glossary_text, "response_mime_type": "application/json", "response_schema": Article, }, ) return Article.model_validate_json(res.text)
The key is that Section keeps both languages at the same level. Because the model writes each section's Japanese and English together, heading granularity and word choice get decided inside one train of thought. The structural mismatch from earlier essentially disappeared just from this shape.
Pin the glossary in the system instruction
The schema alone does not lock the wording. Pass the glossary as a system instruction every time and state plainly: "do not deviate from these pairs." It goes on the system side rather than in the prompt body because I want to swap the body freely while the term agreement stays put.
def build_glossary_instruction(glossary: list[dict]) -> str: lines = [ "You are a bilingual (Japanese/English) technical writer.", "Honor the following pairs strictly and never let them wobble.", ] for g in glossary: note = f" ({g['note']})" if g.get("note") else "" lines.append(f'- "{g["ja"]}" = "{g["en"]}"{note}') lines.append("For terms not in the list, show the original on first mention.") return "\n".join(lines)
Passing this instruction into gen_pair is enough to fix the terms you care about. In my experience this mattered most. The schema handles "structural agreement"; the glossary pin handles "lexical agreement." Only with both does the reader's snag go away.
Detect drift mechanically
The model honors most of it, but not 100%. Before shipping, add a thin verification that checks the output against the glossary. Two checks were enough.
The first is term presence: for any paragraph where the Japanese body shows a glossary ja term, confirm the matching English body shows the en term. The second is length ratio: any section whose Japanese/English length ratio falls far outside a band is suspected of compressing one side.
def check_terms(article: "Article", glossary: list[dict]) -> list[str]: issues = [] for i, sec in enumerate(article.sections): for g in glossary: ja_hit = g["ja"] in sec.body_ja en_hit = g["en"].lower() in sec.body_en.lower() if ja_hit and not en_hit: issues.append(f'section {i}: has "{g["ja"]}" / missing "{g["en"]}"') if sec.body_ja and sec.body_en: ratio = len(sec.body_en) / max(len(sec.body_ja), 1) if ratio < 0.6 or ratio > 3.0: issues.append(f"section {i}: length ratio {ratio:.2f} out of band") return issues
This verification spends no tokens — it runs instantly, locally. The pitfall is the threshold. English tends to run longer in character count, so a too-strict lower bound flags healthy sections. I run a 0.6 lower and 3.0 upper bound, and false positives nearly vanished.
Fallback: partial regeneration
When verification flags something, rebuilding the whole article is wasteful. Regenerate only the offending section, with the glossary and the surrounding context attached.
Detected problem
Fallback action
One-sided term miss
Regenerate only that section with the glossary emphasized
Length-ratio anomaly
Rewrite only the shorter language, passing its counterpart body
JSON parse failure
Retry once on the same prompt; if it still breaks, fall back to two calls
JSON parse failures happen occasionally — that comes with structured output. In that case I retry once under the same conditions, and if it still collapses, I temporarily fall back to the old two-call method so the day's publishing does not stop. In production, "do not stop" comes first, and quality is held by the verification side.
Measured: tokens, latency, cost
Here is what actually changed, pulled from my own generation logs. The comparison uses Gemini 3.5 Flash on articles of roughly 6–8 sections per language. The numbers are averages from my workflow and move with the topic.
Metric
Two-call method
One-call (paired) method
API calls
2
1
Total input tokens
baseline
about 0.7x (no duplicated glossary)
Perceived latency
two serial waits
one wait (about 0.55x)
Drift issues detected
baseline
about 45% fewer
Proofreading time
baseline
less than half
Input tokens drop because the two-call method sent the glossary and shared premises twice. One call sends the shared part once. Latency roughly halves because two serial waits become one. On cost, the input-token savings lower the per-generation price — and that compounds the more articles you publish.
When to stop using one call
Finally, this method is not always optimal, so here is when I revert to two calls:
When an article is very long and pairing both languages hits the output-token ceiling
When I deliberately want different structures per language (swapping examples for an English audience)
When a workflow swaps only one language often, making it wasteful to regenerate the other every time
Conversely, for short-to-mid articles where translation consistency is the core of quality, I recommend one call plus a pinned glossary. I made it the default for Dolice's paired bilingual posts. The deciding axis is whether you want the model to settle the translation consistency simultaneously, or to optimize each language independently. Answer that first, and the choice gets easier.
Translation drift is small one at a time, but it quietly erodes a reader's trust as it stacks up. Pair the structure in the schema, fix the vocabulary with a glossary, and close the escape hatches with verification. With those three in place, multilingual operation gets a lot lighter. I hope this helps anyone running Japanese and English side by side.
Share
Thank You for Reading
Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.