GEMINI LABJP
API — The Gemini API now processes over 16 billion tokens per minute, roughly on par with OpenAIENTERPRISE — Gemini Enterprise passes 8 million paid seats across more than 2,800 companiesAGENT — Claude Opus 4.8 arrives on Gemini Enterprise Agent Platform, expanding multi-vendor choicesSPEECH — gemini-3.1-flash-tts-preview adds streaming speech generation via streamGenerateContentDATA — Crossbeam data stores can now connect to Gemini Enterprise in public previewMODEL — Gemini 3.5 Flash GA and Gemma 4 round out options for agentic and lightweight workloadsAPI — The Gemini API now processes over 16 billion tokens per minute, roughly on par with OpenAIENTERPRISE — Gemini Enterprise passes 8 million paid seats across more than 2,800 companiesAGENT — Claude Opus 4.8 arrives on Gemini Enterprise Agent Platform, expanding multi-vendor choicesSPEECH — gemini-3.1-flash-tts-preview adds streaming speech generation via streamGenerateContentDATA — Crossbeam data stores can now connect to Gemini Enterprise in public previewMODEL — Gemini 3.5 Flash GA and Gemma 4 round out options for agentic and lightweight workloads
Articles/Advanced
Advanced/2026-07-03Advanced

Your Tool Results Are Quietly Eating the Conversation — Handle Passing to Keep Gemini Function Calling Contexts Lean

Tool results linger in Function Calling history and compound your input tokens every turn. Two implementations — a token-budgeted compactor and handle passing — cut my measured input by roughly 8x, with the pitfalls I hit along the way.

Gemini API166Function Calling16agents7context managementcost optimization7

Premium Article

When I pointed an agent at 312 app reviews, everything slowed down from turn three onward. Responses that came back in two seconds early in the loop were taking over ten by the end, and at month's close, this one job's line on the invoice looked oddly inflated.

As an indie developer, I run a nightly agent that classifies and aggregates reviews for my apps using Gemini's Function Calling. The naive implementation — stuff whatever fetch_reviews returns straight into the functionResponse — worked fine while review counts were small. Then a busy month arrived, and the behavior flipped. The culprit wasn't generation output at all: the huge JSON my tool returned was sitting in the conversation history and getting re-sent as input on every subsequent turn.

In a Function Calling loop, you append the model's functionCall and your functionResponse to contents and send the whole thing again. So a tool result weighing 48,000 tokens gets billed five more times across a six-turn loop. If you only watch output tokens, this input-side compounding is easy to miss.

Measure Input Tokens per Turn — the Culprit Is Residency

Before fixing anything, put numbers on it. On each turn, run the full contents you're about to send through count_tokens.

# turn_meter.py — chart per-turn input token growth in your agent loop
# Solves: turning "the later turns feel slow" into a number
from google import genai
 
client = genai.Client()  # API key from GEMINI_API_KEY env var
MODEL = "gemini-flash-latest"
 
def log_turn_tokens(turn: int, contents: list) -> int:
    """Record token count of the full contents just before sending"""
    result = client.models.count_tokens(model=MODEL, contents=contents)
    print(f"turn={turn} input_tokens={result.total_tokens}")
    return result.total_tokens

Counting is free-tier friendly and it's one extra line in the loop. My review-analysis agent (a six-turn run that fetches reviews from two stores, classifies, and aggregates) looked like this before any fixes:

TurnWhat's in contextInput tokens (measured)
1Instructions only1,240
2+ fetch_reviews result (store A, 312 rows)49,800
3+ intermediate classification output52,300
4+ fetch_reviews result (store B, 287 rows)101,900
5+ intermediate aggregation output104,500
6Final report generation105,100

One run consumed about 410,000 input tokens in total, and roughly 80% of that was re-sent review JSON. I was paying most of the job's latency and cost for repetition that contributed nothing to output quality.

Three Ways to Hand Over a Tool Result — Full, Compacted, or by Handle

How a tool result reaches the model is a design choice, even though it rarely gets treated as one. I've settled on this three-way framing:

ApproachWhat the model seesGood fitRisk
Full payloadThe entire result JSONSmall results (roughly under 2,000 tokens)History residency compounds input
Budgeted compactionA slimmed version keeping priority fieldsOnly some fields matter to the taskA dropped field turns out to be needed later
Handle passingRow count, schema, digest + a reference IDLarge results, detail lookups are rareEach detail lookup costs an extra turn

The deciding question is simple: does the model genuinely need to read the full payload? Data like reviews — processed once, never re-read — has no business living in the history.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
You'll be able to diagnose why your agent gets slower and pricier with every turn, using countTokens to chart per-turn input growth instead of guessing
You can drop in two copy-paste implementations — a token-budgeted compactor and handle passing — that cut a review-analysis agent's input tokens by roughly 8x in my measurements
You'll sidestep the traps of context compaction ahead of time: breaking thought signatures, over-compaction re-fetch loops, and parallel-call response ordering
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Advanced2026-06-13
A Minimal Autonomous Agent with Gemini — Tool-Loop Design Lessons
Building an autonomous agent from a minimal setup with the google-genai SDK's automatic function calling — plus the step limits, tool allowlists, and retry decisions learned from automating real blog operations.
Advanced2026-03-31
Build a Personal AI Secretary with Gemini API — Task Automation, Email Summaries & Schedule Optimization for Solopreneurs
A complete guide to building a production-grade AI secretary system for freelancers and solopreneurs using Gemini API. Covers Function Calling implementation for task automation, email summarization, and schedule optimization, all the way through Cloud Run deployment.
Advanced2026-03-25
Automated Monetization Infrastructure with Gemini API — 6 Revenue Engines Powered by Multimodal AI and Function Calling
A comprehensive guide to 6 automated revenue engines built on Gemini API's multimodal processing, Function Calling, and context caching. Covers SaaS, API services, content pipelines, data analysis, Workspace integration, and education platforms.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →