GEMINI LABJP
SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soonSIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
Articles/API / SDK
API / SDK/2026-05-04Intermediate

Implementing Structured Output with Gemini Function Calling — Multi-Tool Design Patterns

A practical guide to reliable structured output with Gemini API Function Calling — covering tool definition best practices, multi-tool coordination, and error handling.

Gemini API181Function Calling16structured output7tool usePython52

When I first tried Gemini's Function Calling, I assumed it was just a formalized version of asking for JSON output. Once I used it in production, the difference became clear.

The key distinction: the model decides when and whether to call tools based on context. It selects which data it needs, fetches it, and integrates the results — autonomously. This guide explains that mechanism with practical code you can adapt directly.

The Basic Structure of Function Calling

Function Calling follows a three-step cycle:

  1. Send a request with tool definitions attached
  2. The model returns a function_call — it has not executed the tool yet
  3. Your code runs the tool, sends the result back to the model

The separation — the model signals tool use, your application executes it — is architecturally important. You retain full control of what actually runs.

import google.generativeai as genai
 
genai.configure(api_key="YOUR_GEMINI_API_KEY")
 
tools = [
    {
        "function_declarations": [
            {
                "name": "get_weather",
                "description": "Get current weather for a specified city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "City name (e.g., Tokyo, New York)"
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "Temperature unit"
                        }
                    },
                    "required": ["city"]
                }
            }
        ]
    }
]
 
model = genai.GenerativeModel("gemini-2.0-flash", tools=tools)
response = model.generate_content("What is the weather in Tokyo today?")
 
for part in response.candidates[0].content.parts:
    if hasattr(part, "function_call"):
        print(f"Tool called: {part.function_call.name}")
        print(f"Arguments: {dict(part.function_call.args)}")

Writing Tool Definitions That Actually Work

The quality of description fields directly affects model decision accuracy. This is mentioned briefly in official docs, but the impact in practice is significant.

Weak: "description": "Get weather"

Strong: "description": "Retrieve current temperature, weather condition, and humidity for a city. Use this when the user asks about weather, or when weather conditions are needed to make a decision."

The "use this when..." clause suppresses over-triggering — the model calling a tool when it's not needed. When multiple tools have overlapping capabilities, this differentiation instruction is particularly effective.

Using enum to constrain parameter values is also worth the effort. Specifying enum: ["celsius", "fahrenheit"] for the unit parameter eliminates the risk of the model passing unexpected values.

Coordinating Multiple Tools

Most real applications combine multiple tools. Gemini can return multiple tool calls in a single response (parallel function calling).

def handle_function_calls(response, tool_implementations):
    """Process function calls and return results"""
    function_results = []
 
    for part in response.candidates[0].content.parts:
        if not hasattr(part, "function_call"):
            continue
        
        fc = part.function_call
        func_name = fc.name
        func_args = dict(fc.args)
        
        if func_name not in tool_implementations:
            function_results.append({
                "function_response": {
                    "name": func_name,
                    "response": {"error": f"Unknown function: {func_name}"}
                }
            })
            continue
        
        try:
            result = tool_implementations[func_name](**func_args)
            function_results.append({
                "function_response": {
                    "name": func_name,
                    "response": {"result": result}
                }
            })
        except Exception as e:
            function_results.append({
                "function_response": {
                    "name": func_name,
                    "response": {"error": str(e)}
                }
            })
    
    return function_results
 
def get_weather(city: str, unit: str = "celsius") -> dict:
    return {"city": city, "temperature": 22, "condition": "Sunny", "unit": unit}
 
def get_news(topic: str, max_results: int = 3) -> list:
    return [{"title": f"{topic} news {i}", "url": f"https://example.com/{i}"} for i in range(max_results)]
 
tool_implementations = {
    "get_weather": get_weather,
    "get_news": get_news
}

When Gemini returns multiple tool calls at once, they're independent of each other and safe to execute in parallel. Using asyncio.gather() to run them concurrently meaningfully reduces latency.

Handling Errors and Uncertainty

When a tool returns an error, Gemini adjusts its response accordingly — but only if you pass error information in a structured, informative way. Vague errors cause the model to speculate.

def safe_tool_response(func_name: str, result=None, error=None) -> dict:
    if error:
        return {
            "function_response": {
                "name": func_name,
                "response": {
                    "success": False,
                    "error": str(error),
                    "suggestion": "Try an alternative approach"
                }
            }
        }
    return {
        "function_response": {
            "name": func_name,
            "response": {
                "success": True,
                "data": result
            }
        }
    }

Including a success flag and suggestion field lets the model make informed decisions — "this tool failed, let me try a different path" — rather than silently producing a hallucinated response.

A Real-World Use Case

I built an app review analysis tool using this approach. I passed App Store review text to Gemini and defined three tools:

  • categorize_review — classify into UI/bug/feature request/etc.
  • extract_sentiment — score sentiment and extract key phrases
  • flag_urgent — mark reviews reporting crashes or critical issues

Implementing this with plain JSON mode would require 3 separate API calls per review. With Function Calling, Gemini infers from context that all three tools should run and returns the calls together. API call count dropped to one-third, with a corresponding reduction in cost and latency.

Where to Start

Begin with a single tool and experiment with description phrasing. Watching how changes to the "use this when..." clause affect when the model calls the tool is the fastest way to build intuition.

The official Gemini API documentation on Function Calling covers the fundamentals. The parallel execution pattern and structured error responses described here are practical additions I arrived at through real usage — consider them field notes alongside the official guide.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API / SDK2026-04-23
Parallel Function Calling in Gemini API: Production Patterns, Pitfalls, and Monitoring
A production guide to Parallel Function Calling in the Gemini API: DAG tool design, partial failure handling, rate limits, and monitoring — with working code.
API / SDK2026-03-16
Gemini 3 Thought Signatures: Stateful Agent Reasoning
Learn how Thought Signatures work in Gemini 3 and how to implement them correctly in multi-turn agentic workflows with Function Calling. Includes Python code examples with expected outputs.
API / SDK2026-05-24
Why Your Gemini File API Uploads Vanish After 48 Hours — and How to Code Around It
Gemini File API resources are auto-deleted 48 hours after upload. Here is how to recognize the failure, why it happens, and concrete patterns for re-uploading, falling back to inline data, and managing expiration safely.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →