◈ API / SDK/2026-06-19Advanced

Generate Japanese and English in One Structured Call to Stop Term Drift

Generating Japanese and English versions separately makes terminology drift article by article. Pair both languages in one Gemini 3.5 Flash structured-output call, pin a glossary, and detect drift mechanically — with measured results.

Gemini API¹⁴³ Structured Output⁷ Multilingual³ responseSchema⁴ Gemini 3.5 Flash²

✦ Premium Article

Back when I generated the Japanese and English versions of an article in two separate calls, the part of my day that kept getting eaten was proofreading. Running paired bilingual posts at Dolice as an indie developer, the body would be correct, yet "streaming" would come out as streaming in one version and incremental delivery in the other. To a reader these are tiny snags, but when terminology wobbles from article to article, the quiet trust a site earns gets chipped away.

Most of that drift came from one thing: I was producing Japanese and English in two independent calls. The model picks "the most natural word right here" on each call, so when the calls are split, the wording splits too. In my own workflow I switched to putting both languages into a single structured output and fixing the glossary before generation. This article is the design, and a record of what actually changed after the switch.

Where the translation drifts when you generate separately

It helps to locate the drift first. When I generated the two languages independently, three spots wobbled most:

Product and feature terms (e.g. rendering "思考" as both thinking and reasoning)
UI labels and button text ("保存" as Save / Store / Keep)
Heading granularity (Japanese gets 8 sections, English silently compresses to 6)

The third one is the nastiest. When the meaning matches but the structure does not, switching languages makes the reader feel they cannot return to the same article. My site pairs URLs across languages, so a structural mismatch becomes a step the reader trips over.

The root cause is that the two calls do not know each other's results. So the clean fix is to put both into one output and let the model decide them together.

Decide two things before writing the schema

Before any schema, pin down two things. Skip them and drift survives even with structured output.

First, a glossary: the terms you refuse to let wobble, held as Japanese, English, and a short usage note. Second, an output contract: the fields that must match across languages (section count, number of code blocks, and so on).

Keep the glossary in an external file and load it at generation time. A minimal shape:

{
  "glossary": [
    { "ja": "思考", "en": "thinking", "note": "Gemini's thinking feature; do not split into reasoning" },
    { "ja": "構造化出力", "en": "structured output", "note": "umbrella term when using responseSchema" },
    { "ja": "用語集", "en": "glossary", "note": "the topic here; do not translate as vocabulary" }
  ],
  "contract": { "min_sections": 6, "lang_pair": ["ja", "en"] }
}

Once "what should match" is written down, the rest is just schema and verification.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A concrete responseSchema that pairs Japanese and English so two calls collapse into one

✦Pinning a glossary in the system instruction cut translation drift by about 45% — with the verification code

✦Mechanical drift detection via term presence and length ratio, plus a partial-regeneration fallback

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Pair the two languages in responseSchema

This is the heart of the switch. Before, I called twice — once for Japanese, once for English. After, one responseSchema holds both languages side by side and returns them in a single call.

Before (two independent calls):

from google import genai
 
client = genai.Client()
 
def gen_one(lang: str, topic: str) -> str:
    prompt = f"Write an article in {lang} on: {topic}"
    res = client.models.generate_content(
        model="gemini-3.5-flash",
        contents=prompt,
    )
    return res.text
 
ja = gen_one("Japanese", "Gemini structured output")
en = gen_one("English", "Gemini structured output")
# ja and en never see each other's wording, so the translation drifts

After (the pair in one call):

from google import genai
from pydantic import BaseModel
 
class Section(BaseModel):
    heading_ja: str
    heading_en: str
    body_ja: str
    body_en: str
 
class Article(BaseModel):
    title_ja: str
    title_en: str
    sections: list[Section]
 
client = genai.Client()
 
def gen_pair(topic_ja: str, topic_en: str, glossary_text: str) -> Article:
    prompt = (
        "Write the Japanese and English versions of the same topic at once.\n"
        "Pair heading and body across languages in each section.\n"
        f"Japanese topic: {topic_ja}\n"
        f"English topic: {topic_en}\n"
    )
    res = client.models.generate_content(
        model="gemini-3.5-flash",
        contents=prompt,
        config={
            "system_instruction": glossary_text,
            "response_mime_type": "application/json",
            "response_schema": Article,
        },
    )
    return Article.model_validate_json(res.text)

The key is that Section keeps both languages at the same level. Because the model writes each section's Japanese and English together, heading granularity and word choice get decided inside one train of thought. The structural mismatch from earlier essentially disappeared just from this shape.

Pin the glossary in the system instruction

The schema alone does not lock the wording. Pass the glossary as a system instruction every time and state plainly: "do not deviate from these pairs." It goes on the system side rather than in the prompt body because I want to swap the body freely while the term agreement stays put.

def build_glossary_instruction(glossary: list[dict]) -> str:
    lines = [
        "You are a bilingual (Japanese/English) technical writer.",
        "Honor the following pairs strictly and never let them wobble.",
    ]
    for g in glossary:
        note = f" ({g['note']})" if g.get("note") else ""
        lines.append(f'- "{g["ja"]}" = "{g["en"]}"{note}')
    lines.append("For terms not in the list, show the original on first mention.")
    return "\n".join(lines)

Passing this instruction into gen_pair is enough to fix the terms you care about. In my experience this mattered most. The schema handles "structural agreement"; the glossary pin handles "lexical agreement." Only with both does the reader's snag go away.

Detect drift mechanically

The model honors most of it, but not 100%. Before shipping, add a thin verification that checks the output against the glossary. Two checks were enough.

The first is term presence: for any paragraph where the Japanese body shows a glossary ja term, confirm the matching English body shows the en term. The second is length ratio: any section whose Japanese/English length ratio falls far outside a band is suspected of compressing one side.

def check_terms(article: "Article", glossary: list[dict]) -> list[str]:
    issues = []
    for i, sec in enumerate(article.sections):
        for g in glossary:
            ja_hit = g["ja"] in sec.body_ja
            en_hit = g["en"].lower() in sec.body_en.lower()
            if ja_hit and not en_hit:
                issues.append(f'section {i}: has "{g["ja"]}" / missing "{g["en"]}"')
        if sec.body_ja and sec.body_en:
            ratio = len(sec.body_en) / max(len(sec.body_ja), 1)
            if ratio < 0.6 or ratio > 3.0:
                issues.append(f"section {i}: length ratio {ratio:.2f} out of band")
    return issues

This verification spends no tokens — it runs instantly, locally. The pitfall is the threshold. English tends to run longer in character count, so a too-strict lower bound flags healthy sections. I run a 0.6 lower and 3.0 upper bound, and false positives nearly vanished.

Fallback: partial regeneration

When verification flags something, rebuilding the whole article is wasteful. Regenerate only the offending section, with the glossary and the surrounding context attached.

Detected problem	Fallback action
One-sided term miss	Regenerate only that section with the glossary emphasized
Length-ratio anomaly	Rewrite only the shorter language, passing its counterpart body
JSON parse failure	Retry once on the same prompt; if it still breaks, fall back to two calls

JSON parse failures happen occasionally — that comes with structured output. In that case I retry once under the same conditions, and if it still collapses, I temporarily fall back to the old two-call method so the day's publishing does not stop. In production, "do not stop" comes first, and quality is held by the verification side.

Measured: tokens, latency, cost

Here is what actually changed, pulled from my own generation logs. The comparison uses Gemini 3.5 Flash on articles of roughly 6–8 sections per language. The numbers are averages from my workflow and move with the topic.

Metric	Two-call method	One-call (paired) method
API calls	2	1
Total input tokens	baseline	about 0.7x (no duplicated glossary)
Perceived latency	two serial waits	one wait (about 0.55x)
Drift issues detected	baseline	about 45% fewer
Proofreading time	baseline	less than half

Input tokens drop because the two-call method sent the glossary and shared premises twice. One call sends the shared part once. Latency roughly halves because two serial waits become one. On cost, the input-token savings lower the per-generation price — and that compounds the more articles you publish.

When to stop using one call

Finally, this method is not always optimal, so here is when I revert to two calls:

When an article is very long and pairing both languages hits the output-token ceiling
When I deliberately want different structures per language (swapping examples for an English audience)
When a workflow swaps only one language often, making it wasteful to regenerate the other every time

Conversely, for short-to-mid articles where translation consistency is the core of quality, I recommend one call plus a pinned glossary. I made it the default for Dolice's paired bilingual posts. The deciding axis is whether you want the model to settle the translation consistency simultaneously, or to optimize each language independently. Answer that first, and the choice gets easier.

Translation drift is small one at a time, but it quietly erodes a reader's trust as it stacks up. Pair the structure in the schema, fix the vocabulary with a glossary, and close the escape hatches with verification. With those three in place, multilingual operation gets a lot lighter. I hope this helps anyone running Japanese and English side by side.

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.