●SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMA●FLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasks●IMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25●MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxes●FILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2●DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon●SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMA●FLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasks●IMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25●MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxes●FILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2●DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
Building an Automated Content Pipeline with Veo 3 & Lyria 3 Pro API — Mass-Producing Video + Music
Learn how to combine Veo 3 and Lyria 3 Pro APIs to automatically generate and merge video and music from text prompts. Covers setup, production-ready Python code, error handling, common pitfalls, and cost optimization strategies.
If you've tried to produce short-form video content at scale as a solo creator, you've probably run into the same wall I did: generating the visuals is one problem, generating decent background music is another, and stitching them together is a third. Doing all three manually for ten videos a day is tedious. Doing it for a hundred is impossible.
Veo 3 and Lyria 3 Pro are now both available through the Google Gen AI API, which means the entire workflow — text prompt → video → music → merged output — can run in a single Python script with no human in the loop. In this guide I'll walk through the production pipeline I've built and currently run, including the parts that took me the longest to get right.
Fair warning: there are several undocumented behaviors in both APIs that will silently break your pipeline if you don't account for them. I'll call each one out explicitly.
What You Need to Know About These Two APIs Before Starting
Veo 3 is Google's video generation model. Given a text prompt (or an image), it produces up to 8 seconds of video. You call it via client.models.generate_video() in the Google Gen AI Python SDK. The response is an Operation — an asynchronous job reference — not the video itself. You have to poll until the job completes, then download the output.
Lyria 3 Pro is Google's music generation model. You describe the style, genre, mood, instruments, and tempo in a text prompt, and it generates up to 3 minutes and 30 seconds of music. Like Veo 3, it's async, though music generation typically finishes faster.
A few facts worth having upfront as of April 2026:
Veo 3: Maximum 8 seconds per video, supports 16:9 and 9:16 aspect ratios, billed by resolution and duration
Lyria 3 Pro: Maximum 210 seconds per track, WAV or MP3 output, stereo, detailed style control via prompt
Quota independence: The two APIs have separate quotas. Hitting the Veo 3 rate limit does not affect Lyria, and vice versa. Design your pipeline to handle each independently.
The most important architectural decision you'll make is whether to run the two generations sequentially or in parallel. Sequential is simpler to implement. Parallel is roughly 40% faster in wall-clock time because Veo 3 generation (60–120 seconds typical) and Lyria generation (20–60 seconds typical) can overlap. This guide uses the parallel approach.
Pipeline Architecture Overview
Here's the shape of what we're building:
Input: A content description, split into a video prompt and a music prompt
Step 1: Veo 3 generates the video file (async, polled to completion)
Step 2: Lyria 3 Pro generates the music file (async, run in parallel with Step 1)
Step 3: ffmpeg merges the two files, adjusts audio volume, applies a fade-out
Output: A finished .mp4 saved to the output directory
Each step has its own retry logic with exponential backoff. The pipeline returns a structured result dict so you can log successes and failures and feed the output into downstream automation (social media scheduling, CDN upload, etc.).
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Solve the authentication, rate-limiting, and polling pitfalls that trip up every developer when combining Veo 3 and Lyria 3 Pro APIs — with working code to fix them
✦Get a copy-paste production pipeline that automatically generates video and background music from a single text prompt, then merges them using ffmpeg
✦Master cost reduction strategies that cut API spend by up to 60%, plus a formula to forecast your monthly bill before it surprises you
The project configuration and authentication setup:
# config.py — project configuration and client initializationimport osfrom dotenv import load_dotenvfrom google import genaiload_dotenv()def get_client() -> genai.Client: """ Initialize and return the Google Gen AI client. API key is read from the GOOGLE_AI_API_KEY environment variable. """ api_key = os.getenv("GOOGLE_AI_API_KEY") if not api_key: raise ValueError( "GOOGLE_AI_API_KEY is not set. " "Add GOOGLE_AI_API_KEY=your_key_here to your .env file." ) return genai.Client(api_key=api_key)# Model name constants — update these when new versions releaseVEO3_MODEL = "veo-3.0-generate-preview"LYRIA3_MODEL = "lyria-3.0-pro"# Output directoriesOUTPUT_DIR = "output"TEMP_DIR = "temp"os.makedirs(OUTPUT_DIR, exist_ok=True)os.makedirs(TEMP_DIR, exist_ok=True)
Your .env file should contain:
GOOGLE_AI_API_KEY=YOUR_GOOGLE_AI_API_KEY
Never hardcode the API key in your source code. If it ends up in a public Git repository, GitHub's Secret Scanning will invalidate it within minutes.
Generating Video with the Veo 3 API
The video generation module handles the full lifecycle: submitting the request, polling until completion, and downloading the result.
# video_generator.py — Veo 3 video generationimport timeimport loggingfrom pathlib import Pathfrom google import genaifrom google.genai import typesfrom config import get_client, VEO3_MODEL, TEMP_DIRlogger = logging.getLogger(__name__)def generate_video( prompt: str, output_filename: str, aspect_ratio: str = "16:9", duration_seconds: int = 8, max_retries: int = 3, poll_interval: int = 10, timeout_seconds: int = 300,) -> Path | None: """ Generate a video from a text prompt using Veo 3 and save it to disk. Args: prompt: Text describing the video content (English yields best results) output_filename: Output file name without extension aspect_ratio: "16:9" for landscape, "9:16" for portrait/Shorts duration_seconds: Video length in seconds (max 8) max_retries: Number of retry attempts on failure poll_interval: Seconds between polling requests timeout_seconds: Total timeout before giving up Returns: Path to the saved video file, or None on failure. """ client = get_client() output_path = Path(TEMP_DIR) / f"{output_filename}.mp4" for attempt in range(1, max_retries + 1): try: logger.info(f"Veo 3 generation started (attempt {attempt}/{max_retries}): {prompt[:60]}...") # Submit the generation request — returns an Operation, not the video operation = client.models.generate_video( model=VEO3_MODEL, prompt=prompt, config=types.GenerateVideoConfig( aspect_ratio=aspect_ratio, duration_seconds=duration_seconds, number_of_videos=1, enhance_prompt=True, # Let the model improve the prompt automatically ), ) # Poll until done or timeout start_time = time.time() while not operation.done: elapsed = time.time() - start_time if elapsed > timeout_seconds: raise TimeoutError( f"Video generation timed out after {timeout_seconds}s. " "This often means the prompt triggered a safety filter without " "raising an explicit error. Try rephrasing the prompt." ) logger.info(f" Generating... {elapsed:.0f}s elapsed") time.sleep(poll_interval) operation = client.operations.get(operation) # Check for explicit errors if operation.error: raise RuntimeError(f"Generation error: {operation.error.message}") # Download and save the video video = operation.result.generated_videos[0] video_bytes = client.files.download(file=video.video) with open(output_path, "wb") as f: f.write(video_bytes) size_kb = output_path.stat().st_size // 1024 logger.info(f"✅ Video saved: {output_path} ({size_kb} KB)") return output_path except TimeoutError as e: logger.warning(f"⚠️ Timeout (attempt {attempt}): {e}") if attempt == max_retries: logger.error("Max retries reached — skipping this video") return None except Exception as e: logger.error(f"❌ Error (attempt {attempt}): {type(e).__name__}: {e}") if attempt < max_retries: wait = 2 ** attempt * 5 # 10s → 20s → 40s logger.info(f" Retrying in {wait}s...") time.sleep(wait) else: logger.error("Max retries reached — skipping this video") return None return None
One thing worth calling out: enhance_prompt=True tells Veo 3 to automatically expand your prompt with additional detail. This improves generation quality noticeably when your prompts are short or abstract. For fine-grained control over exactly what appears in the video, set it to False and write a detailed prompt yourself — around 3–5 sentences describing composition, lighting, movement, and style.
Generating Music with the Lyria 3 Pro API
The music generation module follows a similar pattern. I've also included a prompt builder helper that structures the musical specification in a format Lyria responds well to.
# music_generator.py — Lyria 3 Pro music generationimport timeimport loggingfrom pathlib import Pathfrom google import genaifrom google.genai import typesfrom config import get_client, LYRIA3_MODEL, TEMP_DIRlogger = logging.getLogger(__name__)def generate_music( prompt: str, output_filename: str, duration_seconds: int = 30, output_format: str = "mp3", max_retries: int = 3,) -> Path | None: """ Generate music from a text prompt using Lyria 3 Pro and save it to disk. Args: prompt: Text describing the musical style, mood, and instrumentation output_filename: Output file name without extension duration_seconds: Track length in seconds (max 210) output_format: "mp3" or "wav" max_retries: Number of retry attempts on failure Returns: Path to the saved audio file, or None on failure. """ client = get_client() output_path = Path(TEMP_DIR) / f"{output_filename}.{output_format}" for attempt in range(1, max_retries + 1): try: logger.info(f"Lyria 3 Pro generation started (attempt {attempt}/{max_retries})") response = client.models.generate_music( model=LYRIA3_MODEL, prompt=prompt, config=types.GenerateMusicConfig( duration_seconds=duration_seconds, output_format=output_format, seamless_loop=False, ), ) if not response.audio_data: raise ValueError("Response returned empty audio data") with open(output_path, "wb") as f: f.write(response.audio_data) logger.info(f"✅ Music saved: {output_path}") return output_path except Exception as e: logger.error(f"❌ Error (attempt {attempt}): {type(e).__name__}: {e}") if attempt < max_retries: wait = 2 ** attempt * 3 # 6s → 12s → 24s logger.info(f" Retrying in {wait}s...") time.sleep(wait) else: logger.error("Max retries reached — skipping this track") return None return Nonedef build_music_prompt( genre: str, mood: str, tempo: str = "moderate", instruments: list[str] | None = None, no_lyrics: bool = True,) -> str: """ Build a structured music prompt that Lyria 3 Pro responds well to. Lyria performs best with explicit, comma-separated style descriptors rather than natural language sentences. This function assembles them in the right order. Example: build_music_prompt("ambient", "peaceful", "slow", ["koto", "piano"]) → "ambient music, peaceful mood, slow tempo, featuring koto, piano, instrumental, no vocals, no lyrics, high quality, studio quality" """ parts = [f"{genre} music", f"{mood} mood", f"{tempo} tempo"] if instruments: parts.append(f"featuring {', '.join(instruments)}") if no_lyrics: parts.append("instrumental, no vocals, no lyrics") parts.append("high quality, professional recording, studio quality") return ", ".join(parts)
One practical note on music duration: set it to the video duration plus two seconds rather than generating a long track and trimming it. Over-generating wastes API quota. If your video is 8 seconds, generate 10 seconds of music — the extra 2 seconds gives the ffmpeg fade-out headroom without wasting budget.
The Full Integration Pipeline
Here's the complete pipeline that runs both generations in parallel and merges the results:
# pipeline.py — the full automated pipelineimport asyncioimport loggingimport ffmpegfrom pathlib import Pathfrom datetime import datetimefrom config import OUTPUT_DIR, TEMP_DIRfrom video_generator import generate_videofrom music_generator import generate_music, build_music_promptlogger = logging.getLogger(__name__)def merge_video_audio( video_path: Path, audio_path: Path, output_filename: str, audio_volume: float = 0.5, fade_out_seconds: float = 1.0,) -> Path | None: """ Merge a video file and an audio file using ffmpeg. - Audio is trimmed to match the video duration - A fade-out is applied to the last `fade_out_seconds` of audio - Volume is controlled by `audio_volume` (0.0–1.0) - Video stream is copied without re-encoding (fast) - Audio is encoded as AAC 192k """ output_path = Path(OUTPUT_DIR) / f"{output_filename}.mp4" try: probe = ffmpeg.probe(str(video_path)) video_duration = float(probe["format"]["duration"]) video_in = ffmpeg.input(str(video_path)) audio_in = ffmpeg.input(str(audio_path)) fade_start = max(0.0, video_duration - fade_out_seconds) audio_processed = ( audio_in.audio .filter("atrim", duration=video_duration) .filter("asetpts", "PTS-STARTPTS") .filter("volume", audio_volume) .filter("afade", type="out", start_time=fade_start, duration=fade_out_seconds) ) out = ffmpeg.output( video_in.video, audio_processed, str(output_path), vcodec="copy", acodec="aac", audio_bitrate="192k", ) ffmpeg.run(out, overwrite_output=True, quiet=True) logger.info(f"✅ Merge complete: {output_path}") return output_path except ffmpeg.Error as e: logger.error(f"❌ ffmpeg error: {e.stderr.decode()}") return Noneasync def run_pipeline( video_prompt: str, music_prompt: str, output_name: str, video_duration: int = 8, music_duration: int = 10,) -> dict: """ Run video and music generation in parallel, then merge the results. Returns a result dict with keys: status, output_path, elapsed_seconds """ start_time = asyncio.get_event_loop().time() timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") base = f"{output_name}_{timestamp}" logger.info(f"🎬 Pipeline started: {output_name}") loop = asyncio.get_event_loop() # Run both generations concurrently video_task = loop.run_in_executor( None, generate_video, video_prompt, f"{base}_video", "16:9", video_duration ) music_task = loop.run_in_executor( None, generate_music, music_prompt, f"{base}_music", music_duration ) video_path, music_path = await asyncio.gather(video_task, music_task) if not video_path: return {"status": "FAILED", "reason": "video generation failed"} if not music_path: return {"status": "FAILED", "reason": "music generation failed"} output_path = merge_video_audio(video_path, music_path, base) elapsed = round(asyncio.get_event_loop().time() - start_time, 1) # Clean up temp files video_path.unlink(missing_ok=True) music_path.unlink(missing_ok=True) if output_path: return {"status": "SUCCESS", "output_path": str(output_path), "elapsed_seconds": elapsed} return {"status": "FAILED", "reason": "merge failed"}# ── Example usage ──────────────────────────────────────────────────────────if __name__ == "__main__": logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") result = asyncio.run(run_pipeline( video_prompt=( "A serene Japanese garden in autumn, koi pond with red and orange maple leaves " "falling gently onto the water surface, soft morning light filtering through bamboo, " "cinematic 4K quality, slow motion" ), music_prompt=build_music_prompt( genre="ambient", mood="peaceful and contemplative", tempo="slow", instruments=["koto", "shakuhachi flute", "gentle piano"], ), output_name="japanese_garden", video_duration=8, music_duration=10, )) print(result) # Expected output (times will vary): # {'status': 'SUCCESS', 'output_path': 'output/japanese_garden_20260418_104500.mp4', 'elapsed_seconds': 87.3}
The key to the parallel execution is asyncio.gather(). Video generation typically takes 60–120 seconds; music takes 20–60. Run them sequentially and you're waiting up to 3 minutes per video. Run them in parallel and total wall time drops to roughly the Veo 3 generation time — around 87 seconds on average in my setup.
Common Pitfalls and How to Fix Them
These are the issues that aren't documented and will quietly break your pipeline.
Pitfall 1: The Operation's done flag never becomes True
If your prompt contains content that triggers Veo 3's safety filters — specific real people's names, copyrighted character names, certain violent or explicit themes — the operation sometimes enters a state where operation.done stays False indefinitely and operation.error is also None. You'll sit in your polling loop until the heat death of the universe.
The fix is straightforward: always set a timeout. The timeout_seconds=300 parameter in generate_video() handles this. After timeout, log the failure and move on. If you keep hitting timeouts on the same prompt, rewrite it to avoid the triggering content.
Pitfall 2: Lyria generates noise or a single sustained tone instead of music
This happens in two situations. First, contradictory style descriptors in the prompt — "upbeat and melancholic," "fast and slow" — confuse the model and produce incoherent output. Keep descriptors internally consistent. Second, very short durations (under 15 seconds) don't give the model enough time to develop a coherent musical phrase. The output comes out truncated and awkward. I recommend a minimum of 15 seconds even if you'll trim it later.
Pitfall 3: The merged video has no audio
When you use vcodec="copy" in ffmpeg, you're copying the raw video stream without re-encoding. Veo 3 outputs video-only files (no embedded audio stream), so this is correct behavior. The problem usually comes from the audio side: if Lyria outputs WAV at an unusual sample rate, ffmpeg may silently discard the audio stream rather than raising an error.
Two fixes: either request MP3 output from Lyria (more universally compatible), or add -ar 44100 to your ffmpeg output options to resample to a standard rate. The MP3 approach is simpler.
Pitfall 4: Batch runs hit rate limits after the first few videos
Veo 3 and Lyria each have per-minute request limits. In a tight loop without throttling, you'll hit 429s within the first minute. The exponential backoff in the retry logic handles individual failures, but it doesn't prevent you from hammering the API repeatedly.
Add inter-request spacing in your batch loop. If your limit is N requests per minute, sleep 60 / N seconds between submissions. This is especially important for Lyria, which tends to have a tighter per-minute quota than Veo 3 in my experience.
Pitfall 5: Memory exhaustion when running multiple pipelines concurrently
An 8-second 1080p video from Veo 3 is typically 40–80 MB. client.files.download() loads the entire file into memory before you write it to disk. If you run 5 parallel pipelines, you could have 400 MB of video data in memory simultaneously, which will crash a small cloud instance.
The pattern in generate_video() above writes the bytes to disk immediately after downloading. Don't accumulate them in a list or return them from the function — always write to disk first.
Cost Management and Optimization
Veo 3 and Lyria are usage-billed. Without management, generating 10 videos a day adds up to a meaningful monthly expense. Here are the strategies that have made the biggest difference for me.
Prototype prompts with text-only generation first
Before calling Veo 3, use Gemini 2.5 Pro to generate 5–10 variations of your video prompt and evaluate which one is most likely to produce the output you want. Text generation costs a fraction of video generation. Iterating on prompts in text is essentially free compared to iterating in video.
Match music duration to video duration plus a small buffer
Generate music at video_duration + 2 seconds, not at the maximum 210 seconds. The API bills by duration — there's no reason to generate 3 minutes of music and trim it to 8 seconds.
Cache generated assets by prompt hash
For content that reuses similar styles (the same genre of background music, the same visual aesthetic), cache outputs keyed on a hash of the prompt. A simple file-based cache that checks for cache/{hash}.mp4 before making an API call eliminates redundant generation entirely.
Forecast your monthly cost before scaling
The billing model for each API is documented on Google AI pricing page. Calculate cost per video × videos per day × 30 before you scale up. It's easy to underestimate — especially if you're generating at 1080p.
What to Do Next
The simplest first step is to run generate_video() in isolation with a short test prompt and confirm you get a file back. Once that works, add generate_music() and verify the audio file. Then integrate merge_video_audio() and run the full run_pipeline() function end to end.
Once the basic pipeline is solid, the natural next step is feeding it from a content schedule — a spreadsheet or a database of topic prompts — and running it on a cron job. Pair it with an upload script for YouTube Shorts or Instagram Reels and you have a fully automated short-video content operation.
Both Veo 3 and Lyria 3 Pro are evolving quickly. Model versions will update, API interfaces may change, and quotas will expand. The architecture in this guide is designed to be model-agnostic — swap in new model name constants in config.py and the rest of the pipeline follows.
Building a Batch Content Scheduler
Once the single-video pipeline is working reliably, the next practical step is scaling it to process a queue of content items automatically. Here's a scheduler that reads prompts from a JSON file and processes them sequentially with proper rate limiting.
# scheduler.py — batch pipeline runner with queue and loggingimport asyncioimport jsonimport loggingimport timefrom datetime import datetimefrom pathlib import Pathfrom pipeline import run_pipeline, build_music_prompt # type: ignorelogger = logging.getLogger(__name__)LOG_DIR = Path("logs")LOG_DIR.mkdir(exist_ok=True)# How many seconds to wait between jobs to avoid rate limit 429s# Adjust based on your quota tierINTER_JOB_DELAY_SECONDS = 90def load_job_queue(queue_file: str) -> list[dict]: """ Load the content queue from a JSON file. Expected format: [ { "name": "autumn_garden", "video_prompt": "...", "music_genre": "ambient", "music_mood": "calm", "music_tempo": "slow", "music_instruments": ["piano", "strings"] }, ... ] """ with open(queue_file, "r", encoding="utf-8") as f: jobs = json.load(f) logger.info(f"Loaded {len(jobs)} jobs from {queue_file}") return jobsasync def process_queue(queue_file: str = "content_queue.json") -> list[dict]: """ Process all jobs in the queue file, one at a time with rate-limit spacing. Returns a list of result dicts for downstream logging or alerting. """ jobs = load_job_queue(queue_file) results = [] log_file = LOG_DIR / f"batch_{datetime.now().strftime('%Y%m%d_%H%M%S')}.jsonl" for i, job in enumerate(jobs, start=1): logger.info(f"\n{'='*60}") logger.info(f"Job {i}/{len(jobs)}: {job['name']}") music_prompt = build_music_prompt( genre=job.get("music_genre", "ambient"), mood=job.get("music_mood", "calm"), tempo=job.get("music_tempo", "moderate"), instruments=job.get("music_instruments"), ) result = await run_pipeline( video_prompt=job["video_prompt"], music_prompt=music_prompt, output_name=job["name"], ) result["job_name"] = job["name"] result["timestamp"] = datetime.utcnow().isoformat() results.append(result) # Append to JSONL log for easy parsing with open(log_file, "a", encoding="utf-8") as lf: lf.write(json.dumps(result, ensure_ascii=False) + "\n") status_icon = "✅" if result["status"] == "SUCCESS" else "❌" logger.info(f"{status_icon} {job['name']}: {result['status']}") # Rate-limit spacing — skip delay after the last job if i < len(jobs): logger.info(f"Waiting {INTER_JOB_DELAY_SECONDS}s before next job...") await asyncio.sleep(INTER_JOB_DELAY_SECONDS) successes = sum(1 for r in results if r["status"] == "SUCCESS") logger.info(f"\n🏁 Batch complete: {successes}/{len(jobs)} succeeded") logger.info(f"Log written to: {log_file}") return resultsif __name__ == "__main__": logging.basicConfig( level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s", handlers=[ logging.StreamHandler(), logging.FileHandler(LOG_DIR / "scheduler.log"), ], ) asyncio.run(process_queue("content_queue.json"))
A sample content_queue.json file for testing:
[ { "name": "morning_forest", "video_prompt": "Dense forest at dawn, sunlight breaking through tall cedar trees, mist rising from the ground, birds visible in the canopy, cinematic wide shot, 4K quality", "music_genre": "acoustic", "music_mood": "uplifting and serene", "music_tempo": "slow", "music_instruments": ["acoustic guitar", "light percussion", "flute"] }, { "name": "ocean_timelapse", "video_prompt": "Ocean waves crashing against dark volcanic rocks at golden hour, seafoam swirling in slow motion, warm orange and purple sky, ultra high definition", "music_genre": "cinematic orchestral", "music_mood": "epic and expansive", "music_tempo": "moderate", "music_instruments": ["strings", "piano", "french horn"] }]
The INTER_JOB_DELAY_SECONDS = 90 setting is conservative for most quota tiers. You can reduce it to 60 if you're on a paid plan with higher rate limits, or increase it to 120 if you're on the free tier. The JSONL log format is intentionally machine-readable so you can feed it into a monitoring dashboard or a webhook that alerts you when the batch finishes.
Quality Control: Filtering Out Failed or Low-Quality Outputs
The pipeline as written handles hard failures (API errors, timeouts) but not soft failures — videos that generated successfully but don't look right. For automated content production, some degree of automated quality filtering is useful before the files hit your publishing queue.
The practical approach I use is a two-stage filter. The first stage checks file size: Veo 3 generates approximately 5–10 MB per second of 1080p video, so an 8-second output should be 40–80 MB. Files outside this range usually indicate a generation failure that wasn't caught as an explicit error (very small files are often glitched outputs; very large files are rare but can happen with high-motion scenes).
# quality_filter.py — basic output validationimport loggingfrom pathlib import Pathlogger = logging.getLogger(__name__)# Expected size range for 8-second 1080p Veo 3 output (bytes)MIN_VIDEO_SIZE_BYTES = 5 * 1024 * 1024 # 5 MBMAX_VIDEO_SIZE_BYTES = 150 * 1024 * 1024 # 150 MBdef validate_video_output(video_path: Path, duration_seconds: int = 8) -> bool: """ Run basic validation checks on a generated video file. Checks: 1. File exists and is non-empty 2. File size is within the expected range for the given duration 3. File extension is .mp4 Args: video_path: Path to the video file to validate duration_seconds: Expected video duration for size range scaling Returns: True if the file passes all checks, False otherwise. """ if not video_path.exists(): logger.warning(f"Validation FAIL: file does not exist: {video_path}") return False size = video_path.stat().st_size scaled_min = MIN_VIDEO_SIZE_BYTES * (duration_seconds / 8) scaled_max = MAX_VIDEO_SIZE_BYTES * (duration_seconds / 8) if size < scaled_min: logger.warning( f"Validation FAIL: file too small ({size // 1024}KB < {scaled_min // 1024:.0f}KB min). " "Likely a glitched or empty output." ) return False if size > scaled_max: logger.warning( f"Validation FAIL: file unexpectedly large ({size // (1024*1024)}MB > {scaled_max // (1024*1024):.0f}MB max)." ) return False if video_path.suffix.lower() \!= ".mp4": logger.warning(f"Validation FAIL: unexpected extension: {video_path.suffix}") return False logger.info(f"Validation PASS: {video_path.name} ({size // (1024*1024)}MB)") return True
Plug this into run_pipeline() after the merge step:
from quality_filter import validate_video_output# At the end of merge_video_audio(), before returning output_path:if output_path and not validate_video_output(output_path, duration_seconds=video_duration): logger.warning(f"Output failed quality validation: {output_path}") output_path.unlink(missing_ok=True) # Remove the bad file return {"status": "FAILED", "reason": "quality validation failed"}
The second stage — visual content review — requires human judgment or a vision model. If you're building a fully automated publishing pipeline, consider routing outputs through a Gemini vision check that verifies the video content matches the intended theme before it gets published. A quick client.models.generate_content() call with a frame extracted from the video and a prompt like "Does this video show [intended subject]? Answer yes or no." adds a meaningful safety net for a fraction of a cent per check.
Wrapping Up
Start small. Run generate_video() standalone with a simple test prompt and confirm you get back a file. Then test generate_music() independently. Then wire them together with merge_video_audio(). Only after each piece works in isolation should you run the full run_pipeline() end to end.
The error handling and retry logic in this guide will catch most failure modes automatically. The pitfalls section covers the silent failures that the retry logic can't catch — make sure to read those carefully before you run any unattended batch jobs.
When you're ready to scale, the batch scheduler gives you a straightforward path from single-video testing to queue-based production. Pair it with a cron job, a cloud storage upload step, and a social media scheduling tool, and you have a content production system that runs without manual intervention.
One final note on model versions: both veo-3.0-generate-preview and lyria-3.0-pro are current as of April 2026. Google releases new model versions regularly. When they do, update the constants in config.py — the rest of the pipeline will follow without changes.
Share
Thank You for Reading
Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.