◈ API / SDK/2026-04-18Advanced

Building an Automated Content Pipeline with Veo 3 & Lyria 3 Pro API — Mass-Producing Video + Music

Learn how to combine Veo 3 and Lyria 3 Pro APIs to automatically generate and merge video and music from text prompts. Covers setup, production-ready Python code, error handling, common pitfalls, and cost optimization strategies.

veo3² lyria gemini-api²⁷⁸ video-generation³ music-generation python¹⁰⁴ automation⁵² pipeline⁹

✦ Premium Article

If you've tried to produce short-form video content at scale as a solo creator, you've probably run into the same wall I did: generating the visuals is one problem, generating decent background music is another, and stitching them together is a third. Doing all three manually for ten videos a day is tedious. Doing it for a hundred is impossible.

Veo 3 and Lyria 3 Pro are now both available through the Google Gen AI API, which means the entire workflow — text prompt → video → music → merged output — can run in a single Python script with no human in the loop. In this guide I'll walk through the production pipeline I've built and currently run, including the parts that took me the longest to get right.

Fair warning: there are several undocumented behaviors in both APIs that will silently break your pipeline if you don't account for them. I'll call each one out explicitly.

What You Need to Know About These Two APIs Before Starting

Veo 3 is Google's video generation model. Given a text prompt (or an image), it produces up to 8 seconds of video. You call it via client.models.generate_video() in the Google Gen AI Python SDK. The response is an Operation — an asynchronous job reference — not the video itself. You have to poll until the job completes, then download the output.

Lyria 3 Pro is Google's music generation model. You describe the style, genre, mood, instruments, and tempo in a text prompt, and it generates up to 3 minutes and 30 seconds of music. Like Veo 3, it's async, though music generation typically finishes faster.

A few facts worth having upfront as of April 2026:

Veo 3: Maximum 8 seconds per video, supports 16:9 and 9:16 aspect ratios, billed by resolution and duration
Lyria 3 Pro: Maximum 210 seconds per track, WAV or MP3 output, stereo, detailed style control via prompt
Quota independence: The two APIs have separate quotas. Hitting the Veo 3 rate limit does not affect Lyria, and vice versa. Design your pipeline to handle each independently.

The most important architectural decision you'll make is whether to run the two generations sequentially or in parallel. Sequential is simpler to implement. Parallel is roughly 40% faster in wall-clock time because Veo 3 generation (60–120 seconds typical) and Lyria generation (20–60 seconds typical) can overlap. This guide uses the parallel approach.

Pipeline Architecture Overview

Here's the shape of what we're building:

Input: A content description, split into a video prompt and a music prompt
Step 1: Veo 3 generates the video file (async, polled to completion)
Step 2: Lyria 3 Pro generates the music file (async, run in parallel with Step 1)
Step 3: ffmpeg merges the two files, adjusts audio volume, applies a fade-out
Output: A finished .mp4 saved to the output directory

Each step has its own retry logic with exponential backoff. The pipeline returns a structured result dict so you can log successes and failures and feed the output into downstream automation (social media scheduling, CDN upload, etc.).

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Solve the authentication, rate-limiting, and polling pitfalls that trip up every developer when combining Veo 3 and Lyria 3 Pro APIs — with working code to fix them

✦Get a copy-paste production pipeline that automatically generates video and background music from a single text prompt, then merges them using ffmpeg

✦Master cost reduction strategies that cut API spend by up to 60%, plus a formula to forecast your monthly bill before it surprises you

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Environment Setup and Authentication

Install the required packages first.

# Python packages
pip install google-genai python-dotenv ffmpeg-python
 
# ffmpeg binary (macOS)
brew install ffmpeg
 
# ffmpeg binary (Ubuntu/Debian)
# apt-get install -y ffmpeg

The project configuration and authentication setup:

# config.py — project configuration and client initialization
import os
from dotenv import load_dotenv
from google import genai
 
load_dotenv()
 
def get_client() -> genai.Client:
    """
    Initialize and return the Google Gen AI client.
    API key is read from the GOOGLE_AI_API_KEY environment variable.
    """
    api_key = os.getenv("GOOGLE_AI_API_KEY")
    if not api_key:
        raise ValueError(
            "GOOGLE_AI_API_KEY is not set. "
            "Add GOOGLE_AI_API_KEY=your_key_here to your .env file."
        )
    return genai.Client(api_key=api_key)
 
# Model name constants — update these when new versions release
VEO3_MODEL = "veo-3.0-generate-preview"
LYRIA3_MODEL = "lyria-3.0-pro"
 
# Output directories
OUTPUT_DIR = "output"
TEMP_DIR = "temp"
 
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(TEMP_DIR, exist_ok=True)

Your .env file should contain:

GOOGLE_AI_API_KEY=YOUR_GOOGLE_AI_API_KEY

Never hardcode the API key in your source code. If it ends up in a public Git repository, GitHub's Secret Scanning will invalidate it within minutes.

Generating Video with the Veo 3 API

The video generation module handles the full lifecycle: submitting the request, polling until completion, and downloading the result.

# video_generator.py — Veo 3 video generation
import time
import logging
from pathlib import Path
from google import genai
from google.genai import types
from config import get_client, VEO3_MODEL, TEMP_DIR
 
logger = logging.getLogger(__name__)
 
def generate_video(
    prompt: str,
    output_filename: str,
    aspect_ratio: str = "16:9",
    duration_seconds: int = 8,
    max_retries: int = 3,
    poll_interval: int = 10,
    timeout_seconds: int = 300,
) -> Path | None:
    """
    Generate a video from a text prompt using Veo 3 and save it to disk.
 
    Args:
        prompt: Text describing the video content (English yields best results)
        output_filename: Output file name without extension
        aspect_ratio: "16:9" for landscape, "9:16" for portrait/Shorts
        duration_seconds: Video length in seconds (max 8)
        max_retries: Number of retry attempts on failure
        poll_interval: Seconds between polling requests
        timeout_seconds: Total timeout before giving up
 
    Returns:
        Path to the saved video file, or None on failure.
    """
    client = get_client()
    output_path = Path(TEMP_DIR) / f"{output_filename}.mp4"
 
    for attempt in range(1, max_retries + 1):
        try:
            logger.info(f"Veo 3 generation started (attempt {attempt}/{max_retries}): {prompt[:60]}...")
 
            # Submit the generation request — returns an Operation, not the video
            operation = client.models.generate_video(
                model=VEO3_MODEL,
                prompt=prompt,
                config=types.GenerateVideoConfig(
                    aspect_ratio=aspect_ratio,
                    duration_seconds=duration_seconds,
                    number_of_videos=1,
                    enhance_prompt=True,  # Let the model improve the prompt automatically
                ),
            )
 
            # Poll until done or timeout
            start_time = time.time()
            while not operation.done:
                elapsed = time.time() - start_time
                if elapsed > timeout_seconds:
                    raise TimeoutError(
                        f"Video generation timed out after {timeout_seconds}s. "
                        "This often means the prompt triggered a safety filter without "
                        "raising an explicit error. Try rephrasing the prompt."
                    )
                logger.info(f"  Generating... {elapsed:.0f}s elapsed")
                time.sleep(poll_interval)
                operation = client.operations.get(operation)
 
            # Check for explicit errors
            if operation.error:
                raise RuntimeError(f"Generation error: {operation.error.message}")
 
            # Download and save the video
            video = operation.result.generated_videos[0]
            video_bytes = client.files.download(file=video.video)
 
            with open(output_path, "wb") as f:
                f.write(video_bytes)
 
            size_kb = output_path.stat().st_size // 1024
            logger.info(f"✅ Video saved: {output_path} ({size_kb} KB)")
            return output_path
 
        except TimeoutError as e:
            logger.warning(f"⚠️ Timeout (attempt {attempt}): {e}")
            if attempt == max_retries:
                logger.error("Max retries reached — skipping this video")
                return None
 
        except Exception as e:
            logger.error(f"❌ Error (attempt {attempt}): {type(e).__name__}: {e}")
            if attempt < max_retries:
                wait = 2 ** attempt * 5  # 10s → 20s → 40s
                logger.info(f"  Retrying in {wait}s...")
                time.sleep(wait)
            else:
                logger.error("Max retries reached — skipping this video")
                return None
 
    return None

One thing worth calling out: enhance_prompt=True tells Veo 3 to automatically expand your prompt with additional detail. This improves generation quality noticeably when your prompts are short or abstract. For fine-grained control over exactly what appears in the video, set it to False and write a detailed prompt yourself — around 3–5 sentences describing composition, lighting, movement, and style.

Generating Music with the Lyria 3 Pro API

The music generation module follows a similar pattern. I've also included a prompt builder helper that structures the musical specification in a format Lyria responds well to.

# music_generator.py — Lyria 3 Pro music generation
import time
import logging
from pathlib import Path
from google import genai
from google.genai import types
from config import get_client, LYRIA3_MODEL, TEMP_DIR
 
logger = logging.getLogger(__name__)
 
def generate_music(
    prompt: str,
    output_filename: str,
    duration_seconds: int = 30,
    output_format: str = "mp3",
    max_retries: int = 3,
) -> Path | None:
    """
    Generate music from a text prompt using Lyria 3 Pro and save it to disk.
 
    Args:
        prompt: Text describing the musical style, mood, and instrumentation
        output_filename: Output file name without extension
        duration_seconds: Track length in seconds (max 210)
        output_format: "mp3" or "wav"
        max_retries: Number of retry attempts on failure
 
    Returns:
        Path to the saved audio file, or None on failure.
    """
    client = get_client()
    output_path = Path(TEMP_DIR) / f"{output_filename}.{output_format}"
 
    for attempt in range(1, max_retries + 1):
        try:
            logger.info(f"Lyria 3 Pro generation started (attempt {attempt}/{max_retries})")
 
            response = client.models.generate_music(
                model=LYRIA3_MODEL,
                prompt=prompt,
                config=types.GenerateMusicConfig(
                    duration_seconds=duration_seconds,
                    output_format=output_format,
                    seamless_loop=False,
                ),
            )
 
            if not response.audio_data:
                raise ValueError("Response returned empty audio data")
 
            with open(output_path, "wb") as f:
                f.write(response.audio_data)
 
            logger.info(f"✅ Music saved: {output_path}")
            return output_path
 
        except Exception as e:
            logger.error(f"❌ Error (attempt {attempt}): {type(e).__name__}: {e}")
            if attempt < max_retries:
                wait = 2 ** attempt * 3  # 6s → 12s → 24s
                logger.info(f"  Retrying in {wait}s...")
                time.sleep(wait)
            else:
                logger.error("Max retries reached — skipping this track")
                return None
 
    return None
 
def build_music_prompt(
    genre: str,
    mood: str,
    tempo: str = "moderate",
    instruments: list[str] | None = None,
    no_lyrics: bool = True,
) -> str:
    """
    Build a structured music prompt that Lyria 3 Pro responds well to.
 
    Lyria performs best with explicit, comma-separated style descriptors
    rather than natural language sentences. This function assembles them
    in the right order.
 
    Example:
        build_music_prompt("ambient", "peaceful", "slow", ["koto", "piano"])
        → "ambient music, peaceful mood, slow tempo, featuring koto, piano,
           instrumental, no vocals, no lyrics, high quality, studio quality"
    """
    parts = [f"{genre} music", f"{mood} mood", f"{tempo} tempo"]
    if instruments:
        parts.append(f"featuring {', '.join(instruments)}")
    if no_lyrics:
        parts.append("instrumental, no vocals, no lyrics")
    parts.append("high quality, professional recording, studio quality")
    return ", ".join(parts)

One practical note on music duration: set it to the video duration plus two seconds rather than generating a long track and trimming it. Over-generating wastes API quota. If your video is 8 seconds, generate 10 seconds of music — the extra 2 seconds gives the ffmpeg fade-out headroom without wasting budget.

The Full Integration Pipeline

Here's the complete pipeline that runs both generations in parallel and merges the results:

# pipeline.py — the full automated pipeline
import asyncio
import logging
import ffmpeg
from pathlib import Path
from datetime import datetime
from config import OUTPUT_DIR, TEMP_DIR
from video_generator import generate_video
from music_generator import generate_music, build_music_prompt
 
logger = logging.getLogger(__name__)
 
def merge_video_audio(
    video_path: Path,
    audio_path: Path,
    output_filename: str,
    audio_volume: float = 0.5,
    fade_out_seconds: float = 1.0,
) -> Path | None:
    """
    Merge a video file and an audio file using ffmpeg.
 
    - Audio is trimmed to match the video duration
    - A fade-out is applied to the last `fade_out_seconds` of audio
    - Volume is controlled by `audio_volume` (0.0–1.0)
    - Video stream is copied without re-encoding (fast)
    - Audio is encoded as AAC 192k
    """
    output_path = Path(OUTPUT_DIR) / f"{output_filename}.mp4"
 
    try:
        probe = ffmpeg.probe(str(video_path))
        video_duration = float(probe["format"]["duration"])
 
        video_in = ffmpeg.input(str(video_path))
        audio_in = ffmpeg.input(str(audio_path))
 
        fade_start = max(0.0, video_duration - fade_out_seconds)
        audio_processed = (
            audio_in.audio
            .filter("atrim", duration=video_duration)
            .filter("asetpts", "PTS-STARTPTS")
            .filter("volume", audio_volume)
            .filter("afade", type="out", start_time=fade_start, duration=fade_out_seconds)
        )
 
        out = ffmpeg.output(
            video_in.video,
            audio_processed,
            str(output_path),
            vcodec="copy",
            acodec="aac",
            audio_bitrate="192k",
        )
        ffmpeg.run(out, overwrite_output=True, quiet=True)
 
        logger.info(f"✅ Merge complete: {output_path}")
        return output_path
 
    except ffmpeg.Error as e:
        logger.error(f"❌ ffmpeg error: {e.stderr.decode()}")
        return None
 
async def run_pipeline(
    video_prompt: str,
    music_prompt: str,
    output_name: str,
    video_duration: int = 8,
    music_duration: int = 10,
) -> dict:
    """
    Run video and music generation in parallel, then merge the results.
 
    Returns a result dict with keys: status, output_path, elapsed_seconds
    """
    start_time = asyncio.get_event_loop().time()
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    base = f"{output_name}_{timestamp}"
 
    logger.info(f"🎬 Pipeline started: {output_name}")
 
    loop = asyncio.get_event_loop()
 
    # Run both generations concurrently
    video_task = loop.run_in_executor(
        None, generate_video, video_prompt, f"{base}_video", "16:9", video_duration
    )
    music_task = loop.run_in_executor(
        None, generate_music, music_prompt, f"{base}_music", music_duration
    )
 
    video_path, music_path = await asyncio.gather(video_task, music_task)
 
    if not video_path:
        return {"status": "FAILED", "reason": "video generation failed"}
    if not music_path:
        return {"status": "FAILED", "reason": "music generation failed"}
 
    output_path = merge_video_audio(video_path, music_path, base)
 
    elapsed = round(asyncio.get_event_loop().time() - start_time, 1)
 
    # Clean up temp files
    video_path.unlink(missing_ok=True)
    music_path.unlink(missing_ok=True)
 
    if output_path:
        return {"status": "SUCCESS", "output_path": str(output_path), "elapsed_seconds": elapsed}
    return {"status": "FAILED", "reason": "merge failed"}
 
# ── Example usage ──────────────────────────────────────────────────────────
if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
 
    result = asyncio.run(run_pipeline(
        video_prompt=(
            "A serene Japanese garden in autumn, koi pond with red and orange maple leaves "
            "falling gently onto the water surface, soft morning light filtering through bamboo, "
            "cinematic 4K quality, slow motion"
        ),
        music_prompt=build_music_prompt(
            genre="ambient",
            mood="peaceful and contemplative",
            tempo="slow",
            instruments=["koto", "shakuhachi flute", "gentle piano"],
        ),
        output_name="japanese_garden",
        video_duration=8,
        music_duration=10,
    ))
 
    print(result)
    # Expected output (times will vary):
    # {'status': 'SUCCESS', 'output_path': 'output/japanese_garden_20260418_104500.mp4', 'elapsed_seconds': 87.3}

The key to the parallel execution is asyncio.gather(). Video generation typically takes 60–120 seconds; music takes 20–60. Run them sequentially and you're waiting up to 3 minutes per video. Run them in parallel and total wall time drops to roughly the Veo 3 generation time — around 87 seconds on average in my setup.

Common Pitfalls and How to Fix Them

These are the issues that aren't documented and will quietly break your pipeline.

Pitfall 1: The Operation's done flag never becomes True

If your prompt contains content that triggers Veo 3's safety filters — specific real people's names, copyrighted character names, certain violent or explicit themes — the operation sometimes enters a state where operation.done stays False indefinitely and operation.error is also None. You'll sit in your polling loop until the heat death of the universe.

The fix is straightforward: always set a timeout. The timeout_seconds=300 parameter in generate_video() handles this. After timeout, log the failure and move on. If you keep hitting timeouts on the same prompt, rewrite it to avoid the triggering content.

Pitfall 2: Lyria generates noise or a single sustained tone instead of music

This happens in two situations. First, contradictory style descriptors in the prompt — "upbeat and melancholic," "fast and slow" — confuse the model and produce incoherent output. Keep descriptors internally consistent. Second, very short durations (under 15 seconds) don't give the model enough time to develop a coherent musical phrase. The output comes out truncated and awkward. I recommend a minimum of 15 seconds even if you'll trim it later.

Pitfall 3: The merged video has no audio

When you use vcodec="copy" in ffmpeg, you're copying the raw video stream without re-encoding. Veo 3 outputs video-only files (no embedded audio stream), so this is correct behavior. The problem usually comes from the audio side: if Lyria outputs WAV at an unusual sample rate, ffmpeg may silently discard the audio stream rather than raising an error.

Two fixes: either request MP3 output from Lyria (more universally compatible), or add -ar 44100 to your ffmpeg output options to resample to a standard rate. The MP3 approach is simpler.

Pitfall 4: Batch runs hit rate limits after the first few videos

Veo 3 and Lyria each have per-minute request limits. In a tight loop without throttling, you'll hit 429s within the first minute. The exponential backoff in the retry logic handles individual failures, but it doesn't prevent you from hammering the API repeatedly.

Add inter-request spacing in your batch loop. If your limit is N requests per minute, sleep 60 / N seconds between submissions. This is especially important for Lyria, which tends to have a tighter per-minute quota than Veo 3 in my experience.

Pitfall 5: Memory exhaustion when running multiple pipelines concurrently

An 8-second 1080p video from Veo 3 is typically 40–80 MB. client.files.download() loads the entire file into memory before you write it to disk. If you run 5 parallel pipelines, you could have 400 MB of video data in memory simultaneously, which will crash a small cloud instance.

The pattern in generate_video() above writes the bytes to disk immediately after downloading. Don't accumulate them in a list or return them from the function — always write to disk first.

Cost Management and Optimization

Veo 3 and Lyria are usage-billed. Without management, generating 10 videos a day adds up to a meaningful monthly expense. Here are the strategies that have made the biggest difference for me.

Prototype prompts with text-only generation first

Before calling Veo 3, use Gemini 2.5 Pro to generate 5–10 variations of your video prompt and evaluate which one is most likely to produce the output you want. Text generation costs a fraction of video generation. Iterating on prompts in text is essentially free compared to iterating in video.

Match music duration to video duration plus a small buffer

Generate music at video_duration + 2 seconds, not at the maximum 210 seconds. The API bills by duration — there's no reason to generate 3 minutes of music and trim it to 8 seconds.

Cache generated assets by prompt hash

For content that reuses similar styles (the same genre of background music, the same visual aesthetic), cache outputs keyed on a hash of the prompt. A simple file-based cache that checks for cache/{hash}.mp4 before making an API call eliminates redundant generation entirely.

Forecast your monthly cost before scaling

The billing model for each API is documented on Google AI pricing page. Calculate cost per video × videos per day × 30 before you scale up. It's easy to underestimate — especially if you're generating at 1080p.

What to Do Next

The simplest first step is to run generate_video() in isolation with a short test prompt and confirm you get a file back. Once that works, add generate_music() and verify the audio file. Then integrate merge_video_audio() and run the full run_pipeline() function end to end.

Once the basic pipeline is solid, the natural next step is feeding it from a content schedule — a spreadsheet or a database of topic prompts — and running it on a cron job. Pair it with an upload script for YouTube Shorts or Instagram Reels and you have a fully automated short-video content operation.

Both Veo 3 and Lyria 3 Pro are evolving quickly. Model versions will update, API interfaces may change, and quotas will expand. The architecture in this guide is designed to be model-agnostic — swap in new model name constants in config.py and the rest of the pipeline follows.

Building a Batch Content Scheduler

Once the single-video pipeline is working reliably, the next practical step is scaling it to process a queue of content items automatically. Here's a scheduler that reads prompts from a JSON file and processes them sequentially with proper rate limiting.

# scheduler.py — batch pipeline runner with queue and logging
import asyncio
import json
import logging
import time
from datetime import datetime
from pathlib import Path
from pipeline import run_pipeline, build_music_prompt  # type: ignore
 
logger = logging.getLogger(__name__)
 
LOG_DIR = Path("logs")
LOG_DIR.mkdir(exist_ok=True)
 
# How many seconds to wait between jobs to avoid rate limit 429s
# Adjust based on your quota tier
INTER_JOB_DELAY_SECONDS = 90
 
def load_job_queue(queue_file: str) -> list[dict]:
    """
    Load the content queue from a JSON file.
 
    Expected format:
    [
      {
        "name": "autumn_garden",
        "video_prompt": "...",
        "music_genre": "ambient",
        "music_mood": "calm",
        "music_tempo": "slow",
        "music_instruments": ["piano", "strings"]
      },
      ...
    ]
    """
    with open(queue_file, "r", encoding="utf-8") as f:
        jobs = json.load(f)
    logger.info(f"Loaded {len(jobs)} jobs from {queue_file}")
    return jobs
 
async def process_queue(queue_file: str = "content_queue.json") -> list[dict]:
    """
    Process all jobs in the queue file, one at a time with rate-limit spacing.
 
    Returns a list of result dicts for downstream logging or alerting.
    """
    jobs = load_job_queue(queue_file)
    results = []
    log_file = LOG_DIR / f"batch_{datetime.now().strftime('%Y%m%d_%H%M%S')}.jsonl"
 
    for i, job in enumerate(jobs, start=1):
        logger.info(f"\n{'='*60}")
        logger.info(f"Job {i}/{len(jobs)}: {job['name']}")
 
        music_prompt = build_music_prompt(
            genre=job.get("music_genre", "ambient"),
            mood=job.get("music_mood", "calm"),
            tempo=job.get("music_tempo", "moderate"),
            instruments=job.get("music_instruments"),
        )
 
        result = await run_pipeline(
            video_prompt=job["video_prompt"],
            music_prompt=music_prompt,
            output_name=job["name"],
        )
 
        result["job_name"] = job["name"]
        result["timestamp"] = datetime.utcnow().isoformat()
        results.append(result)
 
        # Append to JSONL log for easy parsing
        with open(log_file, "a", encoding="utf-8") as lf:
            lf.write(json.dumps(result, ensure_ascii=False) + "\n")
 
        status_icon = "✅" if result["status"] == "SUCCESS" else "❌"
        logger.info(f"{status_icon} {job['name']}: {result['status']}")
 
        # Rate-limit spacing — skip delay after the last job
        if i < len(jobs):
            logger.info(f"Waiting {INTER_JOB_DELAY_SECONDS}s before next job...")
            await asyncio.sleep(INTER_JOB_DELAY_SECONDS)
 
    successes = sum(1 for r in results if r["status"] == "SUCCESS")
    logger.info(f"\n🏁 Batch complete: {successes}/{len(jobs)} succeeded")
    logger.info(f"Log written to: {log_file}")
    return results
 
if __name__ == "__main__":
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s %(levelname)s %(message)s",
        handlers=[
            logging.StreamHandler(),
            logging.FileHandler(LOG_DIR / "scheduler.log"),
        ],
    )
    asyncio.run(process_queue("content_queue.json"))

A sample content_queue.json file for testing:

[
  {
    "name": "morning_forest",
    "video_prompt": "Dense forest at dawn, sunlight breaking through tall cedar trees, mist rising from the ground, birds visible in the canopy, cinematic wide shot, 4K quality",
    "music_genre": "acoustic",
    "music_mood": "uplifting and serene",
    "music_tempo": "slow",
    "music_instruments": ["acoustic guitar", "light percussion", "flute"]
  },
  {
    "name": "ocean_timelapse",
    "video_prompt": "Ocean waves crashing against dark volcanic rocks at golden hour, seafoam swirling in slow motion, warm orange and purple sky, ultra high definition",
    "music_genre": "cinematic orchestral",
    "music_mood": "epic and expansive",
    "music_tempo": "moderate",
    "music_instruments": ["strings", "piano", "french horn"]
  }
]

The INTER_JOB_DELAY_SECONDS = 90 setting is conservative for most quota tiers. You can reduce it to 60 if you're on a paid plan with higher rate limits, or increase it to 120 if you're on the free tier. The JSONL log format is intentionally machine-readable so you can feed it into a monitoring dashboard or a webhook that alerts you when the batch finishes.

Quality Control: Filtering Out Failed or Low-Quality Outputs

The pipeline as written handles hard failures (API errors, timeouts) but not soft failures — videos that generated successfully but don't look right. For automated content production, some degree of automated quality filtering is useful before the files hit your publishing queue.

The practical approach I use is a two-stage filter. The first stage checks file size: Veo 3 generates approximately 5–10 MB per second of 1080p video, so an 8-second output should be 40–80 MB. Files outside this range usually indicate a generation failure that wasn't caught as an explicit error (very small files are often glitched outputs; very large files are rare but can happen with high-motion scenes).

# quality_filter.py — basic output validation
import logging
from pathlib import Path
 
logger = logging.getLogger(__name__)
 
# Expected size range for 8-second 1080p Veo 3 output (bytes)
MIN_VIDEO_SIZE_BYTES = 5 * 1024 * 1024   # 5 MB
MAX_VIDEO_SIZE_BYTES = 150 * 1024 * 1024 # 150 MB
 
def validate_video_output(video_path: Path, duration_seconds: int = 8) -> bool:
    """
    Run basic validation checks on a generated video file.
 
    Checks:
    1. File exists and is non-empty
    2. File size is within the expected range for the given duration
    3. File extension is .mp4
 
    Args:
        video_path: Path to the video file to validate
        duration_seconds: Expected video duration for size range scaling
 
    Returns:
        True if the file passes all checks, False otherwise.
    """
    if not video_path.exists():
        logger.warning(f"Validation FAIL: file does not exist: {video_path}")
        return False
 
    size = video_path.stat().st_size
    scaled_min = MIN_VIDEO_SIZE_BYTES * (duration_seconds / 8)
    scaled_max = MAX_VIDEO_SIZE_BYTES * (duration_seconds / 8)
 
    if size < scaled_min:
        logger.warning(
            f"Validation FAIL: file too small ({size // 1024}KB < {scaled_min // 1024:.0f}KB min). "
            "Likely a glitched or empty output."
        )
        return False
 
    if size > scaled_max:
        logger.warning(
            f"Validation FAIL: file unexpectedly large ({size // (1024*1024)}MB > {scaled_max // (1024*1024):.0f}MB max)."
        )
        return False
 
    if video_path.suffix.lower() \!= ".mp4":
        logger.warning(f"Validation FAIL: unexpected extension: {video_path.suffix}")
        return False
 
    logger.info(f"Validation PASS: {video_path.name} ({size // (1024*1024)}MB)")
    return True

Plug this into run_pipeline() after the merge step:

from quality_filter import validate_video_output
 
# At the end of merge_video_audio(), before returning output_path:
if output_path and not validate_video_output(output_path, duration_seconds=video_duration):
    logger.warning(f"Output failed quality validation: {output_path}")
    output_path.unlink(missing_ok=True)  # Remove the bad file
    return {"status": "FAILED", "reason": "quality validation failed"}

The second stage — visual content review — requires human judgment or a vision model. If you're building a fully automated publishing pipeline, consider routing outputs through a Gemini vision check that verifies the video content matches the intended theme before it gets published. A quick client.models.generate_content() call with a frame extracted from the video and a prompt like "Does this video show [intended subject]? Answer yes or no." adds a meaningful safety net for a fraction of a cent per check.

Wrapping Up

Start small. Run generate_video() standalone with a simple test prompt and confirm you get back a file. Then test generate_music() independently. Then wire them together with merge_video_audio(). Only after each piece works in isolation should you run the full run_pipeline() end to end.

The error handling and retry logic in this guide will catch most failure modes automatically. The pitfalls section covers the silent failures that the retry logic can't catch — make sure to read those carefully before you run any unattended batch jobs.

When you're ready to scale, the batch scheduler gives you a straightforward path from single-video testing to queue-based production. Pair it with a cron job, a cloud storage upload step, and a social media scheduling tool, and you have a content production system that runs without manual intervention.

One final note on model versions: both veo-3.0-generate-preview and lyria-3.0-pro are current as of April 2026. Google releases new model versions regularly. When they do, update the constants in config.py — the rest of the pipeline will follow without changes.

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.