All Articles
Building an LLM-as-Judge Evaluation Pipeline with Gemini — Production-Grade Design and Implementation
A practical guide to building an LLM-as-Judge evaluation pipeline using Gemini 2.5 Pro / 3 Pro as the judge. Covers Pointwise / Pairwise judging, bias mitigation, human-correlation measurement, and cost optimization, with working Python code for production use.
Why Gemini Says It Cannot See Your Image — A Practical Diagnosis Guide
If Gemini API replies 'I don't see an image' despite an attached file, the cause is almost always client-side. This guide walks through the four checks — mime_type, payload size, SDK version, and model selection — with copy-pasteable fixes.
Precise Output Control in Gemini API: A Practical Guide to maxOutputTokens and stopSequences
Combine maxOutputTokens and stopSequences in the Gemini API to shape response length exactly the way you need. Stop responses from being cut off, going over budget, or breaking JSON parsing — with production-tested patterns.
Production Streaming UI with Gemini API + TanStack Query — Cancellation, Retries, and Cache Coherence
TanStack Query is optimized for one-shot REST/JSON requests, so streaming responses don't fit naturally. This guide walks through the gotchas of using Gemini API SSE with TanStack Query and the production-grade design patterns that hold up in real apps.
Gemini API × Inngest: Building Fault-Tolerant AI Workflows for Production
A practical guide to building durable, fault-tolerant Gemini API workflows with Inngest — covering retries, fan-out/fan-in, human approval, throttling, and dead-letter patterns.
When Gemini API URL Context Returns Nothing: A Diagnostic Walkthrough
If Gemini's URL Context tool stays silent or returns generic answers, the cause is almost always one of three things: tool configuration, URL formatting, or site-side restrictions. Here's how to isolate which.
Fixing 'Thoughts must be present in conversation history' in Gemini API: A Practical Guide to Thought Signatures in Multi-Turn Tool Calls
If you're hitting 'Thoughts must be present in conversation history when using thinking signature' in Gemini 2.5/3.x with multi-turn function calling, this guide walks through what's actually happening and how to fix it in five minutes — Python SDK, REST, and streaming all covered.
Building a Production-Grade Gemini API Backend with NestJS — DI, Filters, and Guards
A practical pattern for wrapping the Gemini API in a NestJS backend. Covers DI-based service design, SSE streaming, exception filters for 429/safety errors, and guards for API-key auth and rate limiting.
Dynamic Few-Shot for Gemini API — A Self-Improving Prompt That Picks Examples by Vector Search
Hand-picked, hard-coded few-shot examples stop scaling once your inputs drift. This guide builds a Gemini Embeddings + vector search pipeline that selects the best 3-5 examples per request and grows them from production feedback, with copy-paste code.
Gemini API Best Temperature for Translation Tasks — Optimal Values by Use Case
Choosing the right temperature for Gemini API translation tasks is harder than the docs let on. This guide gives you tested values, side-by-side outputs, and production patterns by use case.
Track Gemini API Costs in Production with usageMetadata — A Per-Request Logging Pattern That Reconciles With Your Bill
A practical pattern for capturing Gemini API's usageMetadata on every request so you can attribute spend by endpoint, user, and model — and reconcile against the Google Cloud bill at the end of the month. Covers cached and thoughts tokens, JSONL logging, and a daily budget alert.
Beyond Embeddings: Production Reranking with Vertex AI Ranking and Gemini-as-Judge
When pure embedding search nails the top-3 but buries the right answer at rank 4, you need a reranker. This guide walks through a production-grade two-stage architecture using Vertex AI Ranking API and Gemini-as-judge — with cost, latency, and evaluation patterns that hold up under load.