GEMINI LABJP
SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soonSIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
Articles/API / SDK
API / SDK/2026-04-02Intermediate

Gemini TTS API: Generate Expressive Voice with Style Controls

A comprehensive guide to using the Gemini 2.5 Flash and Pro Text-to-Speech API. Learn how to specify voice styles, handle multi-speaker audio, and control tone and emotion through prompts — with practical code examples.

Gemini TTSText to SpeechVoice GenerationGemini API181Speech Synthesis

What is Gemini TTS?

Gemini TTS (Text-to-Speech) API is Google's official service that converts text into natural-sounding speech. Available on Gemini 2.5 Flash and Pro, it stands out by allowing you to specify styles and emotions, generating expressive audio rather than robotic speech.

How Gemini TTS Differs from Traditional TTS

Traditional text-to-speech services (like Google Cloud TTS) concatenate pre-recorded audio samples, offering high quality but limited expressiveness. Gemini TTS, powered by generative AI, lets you control style and emotion through prompts—resulting in human-like, flexible voice output.

Common Use Cases

  • Podcast Production — Create multiple character voices
  • Video Narration — Auto-narrate YouTube and TikTok content
  • Game Development — Dynamically generate character dialogue
  • Accessibility — Audio content for visually impaired users
  • Language Learning — Pronunciation guides and pronunciation practice

Core Specifications of Gemini TTS

Supported Models

  • Gemini 2.5 Flash — Fast, cost-effective (recommended)
  • Gemini 2.0 Pro — Higher-quality output

Available Voice Types

Gemini TTS offers multiple voice options trained from diverse speakers:

  • Breeze — Neutral, suited for announcements
  • Melody — Warm, conversational tone
  • Sage — Calm, speaker-like quality
  • Ember — Emotional, storytelling tone
  • Juniper — High-energy, kid-friendly
  • Orbit — Deep, cinematic narration

Pricing

  • Gemini 2.5 Flash: $1 per 1 million characters (audio output)
  • Gemini 2.0 Pro: $2 per 1 million characters
  • Free Tier: Up to 1,000 requests/month

API Setup and Basic Usage

Step 1: Obtain a Gemini API Key

Visit Google AI Studio and create an API key.

# Set environment variable
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Step 2: Install Python Library

pip install google-generativeai

Step 3: Minimal Code Example

import google.generativeai as genai
 
genai.configure(api_key="YOUR_GEMINI_API_KEY")
 
# Convert text to speech
response = genai.GenerativeModel(
    model_name="gemini-2.5-flash",
    system_prompt="You are a voice actor with expertise in audio narration."
).generate_content(
    genai.Part.from_data(
        mime_type="text/plain",
        data="Hello, world! This is a test of Gemini TTS."
    )
)
 
# Save as audio file
with open("output.mp3", "wb") as f:
    f.write(response.parts[0].inline_data.data)

Controlling Voice Style and Tone

Using Prompts to Set Emotion

The true power of Gemini TTS lies in prompt-based control. You can specify detailed emotional and tonal characteristics for your generated speech.

Example 1: Casual Friendly Conversation

style_prompt = """
Generate speech for a friendly conversation between friends.
Tone: Warm, approachable, slightly enthusiastic.
Speaking style: Casual, conversational, like chatting over coffee.
Text: "Hey! I just finished this amazing article about AI.
You've got to check it out when you have time!"
"""
 
response = genai.GenerativeModel("gemini-2.5-flash").generate_content(style_prompt)

Example 2: Professional Presentation

style_prompt = """
Generate speech for a professional presentation.
Tone: Confident, authoritative, measured.
Speaking style: Formal yet engaging, with natural pauses.
Text: "Today, we'll explore the fundamental principles of machine learning,
and how they're transforming industries worldwide."
"""

Example 3: Dramatic Storytelling

style_prompt = """
Generate speech for dramatic storytelling.
Tone: Emotional, dramatic, engaging.
Speaking style: Expressive, with dynamic energy shifts.
Pacing: Variable - slower for emotional moments, faster for action.
Text: "The sun set over the horizon as she realized everything had changed.
In that moment, she understood what she had to do."
"""

Fine-tuning with Audio Parameters

Fine-tune output with these parameters:

  • speed: 0.5 to 2.0 (slow to fast)
  • pitch: -10.0 to 10.0 (low to high)
  • volume_gain_db: -16.0 to 16.0 (quiet to loud)
response = genai.GenerativeModel("gemini-2.5-flash").generate_content(
    genai.Part.from_data(
        mime_type="application/json",
        data={
            "text": "Your narration text here",
            "voice_config": {
                "pitch": 0.0,
                "speaking_rate": 1.0,
                "volume_gain_db": 0.0
            }
        }
    )
)

Multi-Speaker TTS

For conversational content (podcasts, radio dramas), switching between speakers is essential.

Setting Speaker Labels

multi_speaker_text = """
Host: Welcome to AI Talk Podcast! Today's guest is an AI researcher.
 
Guest: Thank you for having me! I'm excited to share what we've learned.
 
Host: Let's dive in. What's the biggest misconception about AI?
 
Guest: People often think AI can reason like humans, but...
"""
 
# Generate audio for each speaker separately
host_audio = generate_tts(multi_speaker_text, speaker="host", style="warm")
guest_audio = generate_tts(multi_speaker_text, speaker="guest", style="authoritative")
 
# Mix audio tracks
merge_audio(host_audio, guest_audio, output_file="podcast.mp3")

Practical Use Cases

Use Case 1: Podcast Production

Automatically narrate scripts with multiple speakers, then add background music and sound effects for complete podcast automation.

Use Case 2: Video Narration

Auto-narrate YouTube and TikTok content with professional-quality voices. Reduces video production time by 90%.

Use Case 3: Game Development

Generate character dialogue in real-time, supporting dynamic storyline branching with natural voice acting.

Common Errors and Solutions

Error 1: "API quota exceeded"

Cause: Monthly request limit reached
Solution: Upgrade from Free Tier to a paid plan, or distribute usage across months

Error 2: "Invalid audio format"

Cause: Output format not MP3/WAV, or incorrect bitrate
Solution: Use PCM or MP3 (128kbps or higher), convert with ffmpeg if needed

Error 3: Long text generation failure

Cause: Single request exceeds 5,000 characters
Solution: Split text into chunks and make multiple API calls

Wrapping up

Gemini TTS API is a powerful tool for converting text into human-sounding speech. Its key strengths are flexible prompt-based expression control, multi-speaker support, and affordable pricing. Whether you're creating podcasts, narrating videos, or developing games, Gemini TTS enables new possibilities for audio content creation.

Get your API key today and start with "Hello, world!"


Reference Books: AI and Machine Learning Engineer's Implementation Guide (O'Reilly)

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API / SDK2026-06-04
Don't make Gemini judge your AdMob report — confine structured output to extraction
When deciding AdMob floors (eCPM thresholds), letting Gemini make the decision itself is dangerous. Confine structured output to 'extracting a messy report into typed data,' and keep the threshold judgment in deterministic code — here is the reasoning and implementation, with the actual decision rules from running 42 groups.
API / SDK2026-06-03
Gemini Live API Audio Sounds Sped Up — Fixing the Sample Rate Mismatch
When Gemini Live API responses sound high-pitched and sped up, or come back full of noise, the cause is almost always that the 24kHz output is being played at a different sample rate. Here are the concrete fixes for both the browser and iOS.
API / SDK2026-06-03
Reconciling Orphaned Gemini Files API Uploads Across a Fleet of Apps
Files API uploads quietly expire after 48 hours. Here's how I keep orphaned files and quota under control across six apps, using reconciliation against my own database and a scheduled cleanup job — written up as production notes from running wallpaper apps.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →