Why Use Gemini API for Translation Automation
When scaling an app or web service globally, multilingual support is a non-negotiable requirement. Traditional machine translation services handle straightforward sentence-by-sentence conversion well, but they fall short when you need to maintain brand voice, enforce consistent terminology, or adapt content for specific audiences.
Gemini API leverages large language model capabilities to deliver context-aware, natural translations. By using System Instructions, you can define tone, glossaries, and project-specific translation rules that are consistently applied across every request. This makes it a powerful tool for building custom localization pipelines that go beyond simple word replacement.
In this guide, we'll walk through building a multilingual translation pipeline using Gemini API with the Python SDK. This article is designed for developers and product managers who are familiar with basic API calls and want to automate their translation workflows.
Setting Up Your Environment and API Key
Start by grabbing an API key from Google AI Studio (aistudio.google.com). Once you have it, store it as an environment variable — never hardcode API keys in your source code.
# Set the API key as an environment variable
export GEMINI_API_KEY="YOUR_API_KEY"
# Install the Python SDK
pip install google-genaiImport the SDK and verify your connection:
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
# Connection test
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Hello, respond with 'Connection successful'"
)
print(response.text)
# Expected output: Connection successfulImplementing Basic Translation Requests
The standard pattern for translation with Gemini API is to pass your translation rules via System Instructions and the source text as the user message.
from google import genai
from google.genai import types
client = genai.Client(api_key="YOUR_API_KEY")
def translate_text(text: str, target_lang: str, glossary: dict = None) -> str:
"""Translate text using Gemini API"""
# Append glossary rules if provided
glossary_rules = ""
if glossary:
terms = "\n".join(f"- {k} → {v}" for k, v in glossary.items())
glossary_rules = f"\n\n## Glossary (always use these translations)\n{terms}"
system_instruction = f"""You are a professional translator.
Follow these rules when translating:
## Core Rules
- Preserve the nuance and tone of the original text
- Use standard terminology for the target language
- Keep markdown formatting and HTML tags intact
- Only translate comments inside code blocks, not the code itself
- Return only the translated text with no explanations or annotations
## Target Language
{target_lang}{glossary_rules}"""
response = client.models.generate_content(
model="gemini-2.5-flash",
config=types.GenerateContentConfig(
system_instruction=system_instruction,
temperature=0.3, # Lower temperature for translation consistency
),
contents=text,
)
return response.text
# Usage example
source = "Gemini API を使えば、インテリジェントなアプリケーションを簡単に構築できます。"
result = translate_text(
text=source,
target_lang="English",
glossary={"インテリジェントなアプリケーション": "intelligent applications"}
)
print(result)
# Expected output: With Gemini API, you can easily build intelligent applications.Notice the temperature is set to 0.3. For translation tasks, consistency matters more than creativity, so a lower value helps reduce output variance between runs.
Using Glossaries for Terminology Management
In large-scale localization projects, terminology consistency makes or breaks quality. By managing glossaries in JSON files and automatically injecting them into translation requests, you can keep terms aligned across your entire product.
import json
from pathlib import Path
def load_glossary(glossary_path: str, target_lang: str) -> dict:
"""Load language-specific glossary from a JSON file"""
path = Path(glossary_path)
if not path.exists():
return {}
with open(path, "r", encoding="utf-8") as f:
all_glossaries = json.load(f)
return all_glossaries.get(target_lang, {})
# Example glossary.json structure
# {
# "ja": {
# "machine learning": "機械学習",
# "fine-tuning": "ファインチューニング",
# "context window": "コンテキストウィンドウ",
# "grounding": "グラウンディング"
# },
# "fr": {
# "machine learning": "apprentissage automatique",
# "fine-tuning": "ajustement fin",
# "context window": "fenêtre de contexte"
# }
# }
glossary = load_glossary("glossary.json", "ja")
translated = translate_text(
text="Fine-tuning improves model performance within a specific context window.",
target_lang="Japanese",
glossary=glossary
)
print(translated)
# Expected output: ファインチューニングにより、特定のコンテキストウィンドウ内でのモデル性能が向上します。Externalizing glossaries into separate files allows your translation team and engineering team to update terminology independently without touching application code.
Batch Translation for Multiple Files
Real-world localization typically involves translating dozens or hundreds of files at once. Here's a script that batch-translates all Markdown files in a directory:
import time
from pathlib import Path
from google import genai
from google.genai import types
client = genai.Client(api_key="YOUR_API_KEY")
def translate_file(
source_path: Path,
output_dir: Path,
target_lang: str,
glossary: dict,
) -> dict:
"""Translate a single file and save the result"""
content = source_path.read_text(encoding="utf-8")
glossary_rules = ""
if glossary:
terms = "\n".join(f"- {k} → {v}" for k, v in glossary.items())
glossary_rules = f"\n\n## Glossary\n{terms}"
system_instruction = f"""As a professional translator,
translate this Markdown file into {target_lang}.
- Preserve markdown structure (headings, lists, code blocks)
- Only translate the title and description in frontmatter (YAML)
- Only translate comments inside code blocks{glossary_rules}"""
response = client.models.generate_content(
model="gemini-2.5-flash",
config=types.GenerateContentConfig(
system_instruction=system_instruction,
temperature=0.3,
),
contents=content,
)
output_path = output_dir / source_path.name
output_path.write_text(response.text, encoding="utf-8")
return {
"file": source_path.name,
"status": "success",
"chars": len(response.text),
}
def batch_translate(
source_dir: str,
output_dir: str,
target_lang: str,
glossary_path: str = None,
concurrency: int = 3,
):
"""Batch translate all Markdown files in a directory"""
src = Path(source_dir)
out = Path(output_dir)
out.mkdir(parents=True, exist_ok=True)
files = list(src.glob("*.md")) + list(src.glob("*.mdx"))
glossary = load_glossary(glossary_path, target_lang) if glossary_path else {}
results = []
for f in files:
r = translate_file(f, out, target_lang, glossary)
results.append(r)
time.sleep(1.0) # Rate limit protection
print(f"Translation complete: {len(results)} files")
for r in results:
print(f" {r['file']}: {r['chars']} characters")
# Usage
# batch_translate(
# source_dir="content/en",
# output_dir="content/ja",
# target_lang="Japanese",
# glossary_path="glossary.json",
# )The concurrency parameter controls parallel execution. Gemini API has [rate limits]((/articles/gemini-api/gemini-api-rate-limiting-quota-management), so aim for concurrency=2 on the free plan and around concurrency=5 on paid plans.
Automated Translation Quality Checks
Building automatic quality checks into your pipeline can drastically reduce review effort. An effective approach is using Gemini API itself to evaluate translation output.
def check_translation_quality(
original: str,
translated: str,
target_lang: str,
) -> dict:
"""Automatically evaluate translation quality"""
system_instruction = """You are a translation quality reviewer.
Compare the original text with the translation and evaluate it on these criteria.
Return your evaluation in this exact JSON format:
{
"score": integer from 1-10,
"issues": ["list of problems found"],
"suggestions": ["list of improvement suggestions"]
}
## Evaluation Criteria
- accuracy: Does the translation convey the original meaning?
- fluency: Is the translation natural in the target language?
- terminology: Are technical terms translated appropriately?
- completeness: Is any information missing or incorrectly added?"""
prompt = f"""## Original Text
{original}
## Translation ({target_lang})
{translated}
Please evaluate the translation above."""
response = client.models.generate_content(
model="gemini-2.5-flash",
config=types.GenerateContentConfig(
system_instruction=system_instruction,
temperature=0.2,
response_mime_type="application/json",
),
contents=prompt,
)
return json.loads(response.text)
# Usage example
quality = check_translation_quality(
original="The context caching feature reduces API costs by up to 90%.",
translated="コンテキストキャッシュ機能により、APIコストを最大90%削減できます。",
target_lang="Japanese",
)
print(f"Score: {quality['score']}/10")
print(f"Issues: {quality['issues']}")
# Expected output:
# Score: 9/10
# Issues: []Setting response_mime_type="application/json" ensures Gemini API always returns a structured JSON response. For a deeper dive into structured output patterns, check out [the practical guide to Gemini structured output]((/articles/gemini-advanced/gemini-structured-output-production-guide).
Translating App Localization Files
One of the most common use cases is translating i18n files (JSON, YAML, .strings) for mobile and web apps. Here's a script that handles JSON-based i18n files:
def translate_i18n_json(
source_json: dict,
target_lang: str,
glossary: dict = None,
) -> dict:
"""Translate only the values in an i18n JSON file"""
glossary_rules = ""
if glossary:
terms = "\n".join(f"- {k} → {v}" for k, v in glossary.items())
glossary_rules = f"\n\n## Glossary\n{terms}"
system_instruction = f"""You are an app localization specialist.
Translate the values in the following JSON into {target_lang}.
## Rules
- Do not modify keys
- Preserve placeholders ({{name}}, %d, etc.) exactly as they are
- Adjust text length to be natural for UI elements (keep button labels short)
- Return only the translated JSON{glossary_rules}"""
response = client.models.generate_content(
model="gemini-2.5-flash",
config=types.GenerateContentConfig(
system_instruction=system_instruction,
temperature=0.2,
response_mime_type="application/json",
),
contents=json.dumps(source_json, ensure_ascii=False, indent=2),
)
return json.loads(response.text)
# Usage example
en_strings = {
"app.title": "My AI Assistant",
"app.welcome": "Welcome back, {{name}}!",
"button.submit": "Submit",
"button.cancel": "Cancel",
"error.network": "Network error. Please try again.",
"settings.language": "Language",
"settings.notifications": "Notifications",
}
ja_strings = translate_i18n_json(en_strings, "Japanese")
print(json.dumps(ja_strings, ensure_ascii=False, indent=2))
# Expected output:
# {
# "app.title": "AI アシスタント",
# "app.welcome": "おかえりなさい、{{name}}さん!",
# "button.submit": "送信",
# "button.cancel": "キャンセル",
# "error.network": "ネットワークエラーが発生しました。もう一度お試しください。",
# "settings.language": "言語",
# "settings.notifications": "通知"
# }The crucial detail here is explicitly instructing the model to preserve placeholders like {{name}} and %d. Broken placeholders are one of the most common localization bugs, and clear System Instructions prevent them.
For more on error handling and retry strategies, see [Gemini API Error Handling and Retry Patterns]((/articles/gemini-api/gemini-api-error-handling-retry-patterns).
Looking back
We've covered the full spectrum of building a multilingual translation and localization pipeline with Gemini API — from basic translation requests to glossary management, batch processing, and automated quality checks.
The key takeaways are that System Instructions are your primary tool for maintaining translation consistency, temperature should be kept low (0.2–0.3) for reproducible results, and response_mime_type="application/json" gives you structured output that integrates cleanly into automation pipelines.
If you're looking to extend your pipeline to handle PDFs, images, and other document types, the [Advanced Multimodal Document Processing Guide]((/articles/gemini-api/gemini-document-processing-advanced) is a great next step.
For a comprehensive reference on this topic,