Implementing Structured Output with Gemini Function Calling — Multi-Tool Design Patterns

When I first tried Gemini's Function Calling, I assumed it was just a formalized version of asking for JSON output. Once I used it in production, the difference became clear.

The key distinction: the model decides when and whether to call tools based on context. It selects which data it needs, fetches it, and integrates the results — autonomously. This guide explains that mechanism with practical code you can adapt directly.

The Basic Structure of Function Calling

Function Calling follows a three-step cycle:

Send a request with tool definitions attached
The model returns a function_call — it has not executed the tool yet
Your code runs the tool, sends the result back to the model

The separation — the model signals tool use, your application executes it — is architecturally important. You retain full control of what actually runs.

import google.generativeai as genai
 
genai.configure(api_key="YOUR_GEMINI_API_KEY")
 
tools = [
    {
        "function_declarations": [
            {
                "name": "get_weather",
                "description": "Get current weather for a specified city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "City name (e.g., Tokyo, New York)"
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "Temperature unit"
                        }
                    },
                    "required": ["city"]
                }
            }
        ]
    }
]
 
model = genai.GenerativeModel("gemini-2.0-flash", tools=tools)
response = model.generate_content("What is the weather in Tokyo today?")
 
for part in response.candidates[0].content.parts:
    if hasattr(part, "function_call"):
        print(f"Tool called: {part.function_call.name}")
        print(f"Arguments: {dict(part.function_call.args)}")

Writing Tool Definitions That Actually Work

The quality of description fields directly affects model decision accuracy. This is mentioned briefly in official docs, but the impact in practice is significant.

Weak: "description": "Get weather"

Strong: "description": "Retrieve current temperature, weather condition, and humidity for a city. Use this when the user asks about weather, or when weather conditions are needed to make a decision."

The "use this when..." clause suppresses over-triggering — the model calling a tool when it's not needed. When multiple tools have overlapping capabilities, this differentiation instruction is particularly effective.

Using enum to constrain parameter values is also worth the effort. Specifying enum: ["celsius", "fahrenheit"] for the unit parameter eliminates the risk of the model passing unexpected values.

Coordinating Multiple Tools

Most real applications combine multiple tools. Gemini can return multiple tool calls in a single response (parallel function calling).

def handle_function_calls(response, tool_implementations):
    """Process function calls and return results"""
    function_results = []
 
    for part in response.candidates[0].content.parts:
        if not hasattr(part, "function_call"):
            continue
        
        fc = part.function_call
        func_name = fc.name
        func_args = dict(fc.args)
        
        if func_name not in tool_implementations:
            function_results.append({
                "function_response": {
                    "name": func_name,
                    "response": {"error": f"Unknown function: {func_name}"}
                }
            })
            continue
        
        try:
            result = tool_implementations[func_name](**func_args)
            function_results.append({
                "function_response": {
                    "name": func_name,
                    "response": {"result": result}
                }
            })
        except Exception as e:
            function_results.append({
                "function_response": {
                    "name": func_name,
                    "response": {"error": str(e)}
                }
            })
    
    return function_results
 
def get_weather(city: str, unit: str = "celsius") -> dict:
    return {"city": city, "temperature": 22, "condition": "Sunny", "unit": unit}
 
def get_news(topic: str, max_results: int = 3) -> list:
    return [{"title": f"{topic} news {i}", "url": f"https://example.com/{i}"} for i in range(max_results)]
 
tool_implementations = {
    "get_weather": get_weather,
    "get_news": get_news
}

When Gemini returns multiple tool calls at once, they're independent of each other and safe to execute in parallel. Using asyncio.gather() to run them concurrently meaningfully reduces latency.

Handling Errors and Uncertainty

When a tool returns an error, Gemini adjusts its response accordingly — but only if you pass error information in a structured, informative way. Vague errors cause the model to speculate.

def safe_tool_response(func_name: str, result=None, error=None) -> dict:
    if error:
        return {
            "function_response": {
                "name": func_name,
                "response": {
                    "success": False,
                    "error": str(error),
                    "suggestion": "Try an alternative approach"
                }
            }
        }
    return {
        "function_response": {
            "name": func_name,
            "response": {
                "success": True,
                "data": result
            }
        }
    }

Including a success flag and suggestion field lets the model make informed decisions — "this tool failed, let me try a different path" — rather than silently producing a hallucinated response.

A Real-World Use Case

I built an app review analysis tool using this approach. I passed App Store review text to Gemini and defined three tools:

categorize_review — classify into UI/bug/feature request/etc.
extract_sentiment — score sentiment and extract key phrases
flag_urgent — mark reviews reporting crashes or critical issues

Implementing this with plain JSON mode would require 3 separate API calls per review. With Function Calling, Gemini infers from context that all three tools should run and returns the calls together. API call count dropped to one-third, with a corresponding reduction in cost and latency.

Where to Start

Begin with a single tool and experiment with description phrasing. Watching how changes to the "use this when..." clause affect when the model calls the tool is the fastest way to build intuition.

The official Gemini API documentation on Function Calling covers the fundamentals. The parallel execution pattern and structured error responses described here are practical additions I arrived at through real usage — consider them field notes alongside the official guide.