●SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMA●FLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasks●IMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25●MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxes●FILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2●DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon●SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMA●FLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasks●IMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25●MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxes●FILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2●DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
to Gemini API Function Calling: Tool Integration and Practical Usage
A practical deep dive into using Gemini API's Function Calling to give AI real tools and external API access. From design patterns to production implementation, covered systematically.
Gemini API's Function Calling lets AI models invoke external functions and APIs during a conversation. This moves AI beyond text generation alone — enabling real-time data retrieval, computation, and integration with external services that affect the real world.
As of 2026, Gemini API Function Calling has matured considerably. Parallel tool invocation, forced tool-use mode, and well-structured tool definitions are all production-ready. This guide covers everything from first-time setup to advanced patterns.
How Function Calling Works
The End-to-End Flow
Function Calling operates in the following sequence:
The developer defines available tools (functions) in the API request
The user sends a message
The Gemini model decides which tool to call, and with what arguments
The model returns a tool_calls response with those instructions
The application executes the tool and passes results back to the model
The model generates a final response incorporating the tool output
The critical point: the Gemini model itself doesn't execute the tools. It only decides which tool to call and with what arguments. Actual execution happens in your application code. This keeps security and control in your hands.
Basic Tool Definition Structure
import google.generativeai as genai# Define a toolweather_function = { "name": "get_current_weather", "description": "Retrieves the current weather for a specified city. Temperature unit can be Celsius or Fahrenheit.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name and country code (e.g., Tokyo, JP)" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["location"] }}# Pass the tool to the modelmodel = genai.GenerativeModel( model_name="gemini-2.0-flash", tools=[weather_function])
The quality of the description field is what matters most. The model uses it to decide when to invoke the tool. Vague descriptions lead to wrong tool selection. Be explicit: what does this tool do, when should it be used, and what does it take and return?
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How Function Calling works and how to design effective tool definitions — ready to implement today
✦Controlling parallel and sequential tool calls to build complex workflows
✦Production-ready implementation with robust error handling and security best practices
Secure payment via Stripe · Cancel anytime
Practical Tool Implementation
Basic Single-Tool Call
import google.generativeai as genaiimport jsongenai.configure(api_key="YOUR_GEMINI_API_KEY")def get_current_weather(location: str, unit: str = "celsius") -> dict: """Calls a real weather API — using dummy data here""" return { "location": location, "temperature": 22 if unit == "celsius" else 71.6, "unit": unit, "condition": "Sunny", "humidity": 65 }def run_function_call_example(): model = genai.GenerativeModel( model_name="gemini-2.0-flash", tools=[get_current_weather] # Pass function directly — schema is auto-generated ) chat = model.start_chat() response = chat.send_message("What's the weather in Tokyo right now?") # Process function calls until none remain while response.candidates[0].content.parts[0].function_call.name: fc = response.candidates[0].content.parts[0].function_call # Execute the tool if fc.name == "get_current_weather": result = get_current_weather(**fc.args) # Return results to the model response = chat.send_message( genai.protos.Content( parts=[genai.protos.Part( function_response=genai.protos.FunctionResponse( name=fc.name, response={"result": json.dumps(result)} ) )] ) ) return response.text
Implementing Parallel Tool Calls
Gemini 2.0 and later support calling multiple tools in parallel, allowing multiple independent pieces of information to be fetched simultaneously:
def handle_parallel_function_calls(response, tools_map): """Generic handler for parallel tool invocations""" function_calls = [ part.function_call for part in response.candidates[0].content.parts if hasattr(part, 'function_call') and part.function_call.name ] if not function_calls: return None # Execute all tools (use asyncio or ThreadPool in production) results = [] for fc in function_calls: if fc.name in tools_map: result = tools_map[fc.name](**fc.args) results.append( genai.protos.Part( function_response=genai.protos.FunctionResponse( name=fc.name, response={"result": json.dumps(result)} ) ) ) return genai.protos.Content(parts=results)
Advanced Function Calling Design Patterns
Controlling Tool Selection Mode
Gemini API provides configuration options to control how tools are used:
from google.generativeai.types import ToolConfig, FunctionCallingConfig# Mode 1: AUTO (default) — model decides on its ownauto_config = ToolConfig( function_calling_config=FunctionCallingConfig(mode="AUTO"))# Mode 2: ANY — must use at least one tool (forced)forced_config = ToolConfig( function_calling_config=FunctionCallingConfig( mode="ANY", allowed_function_names=["get_current_weather", "search_database"] ))# Mode 3: NONE — no tool use (text response only)no_tools_config = ToolConfig( function_calling_config=FunctionCallingConfig(mode="NONE"))
The ANY mode is particularly useful when you always need structured output — for example, when parsing user input into a database schema regardless of how the user phrases their request.
Stateful Multi-Turn Conversations
Function Calling really shines across multi-turn conversations. Here's an example of a data analysis assistant:
class DataAnalysisAssistant: def __init__(self): self.tools = [ self.query_database, self.calculate_statistics, self.create_visualization, self.export_report ] self.model = genai.GenerativeModel( model_name="gemini-2.0-pro", tools=self.tools, system_instruction="""You are a data analysis expert.Use tools as needed to answer the user's questions.When multiple tools are required, use them in logical order. """ ) self.chat = self.model.start_chat() def analyze(self, user_query: str) -> str: response = self.chat.send_message(user_query) # Loop until no more tool calls while True: function_calls = self._extract_function_calls(response) if not function_calls: break results = self._execute_function_calls(function_calls) response = self.chat.send_message(results) return response.text
Error Handling and Security
Robust Error Handling
Production environments will encounter failures during tool execution. Proper error handling lets the AI understand what went wrong and try alternatives:
def safe_execute_tool(tool_name: str, args: dict, tools_map: dict) -> dict: """Tool execution wrapper with graceful error handling""" try: if tool_name not in tools_map: return { "error": f"Tool '{tool_name}' not found", "available_tools": list(tools_map.keys()) } result = tools_map[tool_name](**args) return {"success": True, "result": result} except ValueError as e: return { "error": "Invalid argument value", "details": str(e), "suggestion": "Check argument types and acceptable ranges" } except TimeoutError: return { "error": "Tool execution timed out", "suggestion": "Try again later or consider a different approach" } except Exception as e: return { "error": "Unexpected error occurred", "details": str(e) }
Security Considerations
Security deserves careful attention when using Function Calling in production:
Input validation: Never pass AI-generated tool arguments directly to your systems without validation. Arguments that might contain file paths, SQL queries, or shell commands must always be validated and sanitized first.
Least privilege: Each tool should have only the permissions it needs. A read-only data retrieval tool should never have write access.
Rate limiting: Prevent AI from triggering excessive API calls by rate-limiting tool execution — especially for tools that call external services.
Audit logging: Log every tool invocation in production — what was called, when, and with what arguments. Critical for debugging and detecting anomalous patterns.
customer_support_tools = [ { "name": "get_order_status", "description": "Retrieves order status, shipping info, and estimated delivery date for a given order number", "parameters": { "type": "object", "properties": { "order_id": {"type": "string", "description": "Order number (e.g., ORD-2026-12345)"} }, "required": ["order_id"] } }, { "name": "process_return_request", "description": "Processes a return or exchange request. Only use after getting explicit user confirmation.", "parameters": { "type": "object", "properties": { "order_id": {"type": "string"}, "reason": {"type": "string"}, "type": {"type": "string", "enum": ["return", "exchange"]} }, "required": ["order_id", "reason", "type"] } }]
Use Case 2: Code Review Assistant
code_review_tools = [ "analyze_code_complexity", # Measure code complexity "check_security_vulnerabilities", # Scan for security issues "suggest_refactoring", # Propose refactoring opportunities "run_tests", # Execute test suite "check_dependency_updates" # Check for dependency updates]
Orchestrating these tools creates a fully automated pipeline from code submission to comprehensive review report.
Performance Optimization
Caching Tool Definitions
When using the same toolset repeatedly, caching tool definitions saves costs. The Gemini API supports caching system prompts that include tool definitions — especially valuable when you have many tools.
Async Parallelization for Speed
When multiple tools can run independently, async parallel execution dramatically reduces latency:
import asyncioasync def execute_parallel_tools(function_calls: list, tools_map: dict) -> list: """Execute multiple tools in parallel using async""" async def execute_single(fc): tool = tools_map.get(fc.name) if not tool: return {"error": f"Unknown tool: {fc.name}"} if asyncio.iscoroutinefunction(tool): result = await tool(**fc.args) else: # Run synchronous functions in a thread pool loop = asyncio.get_event_loop() result = await loop.run_in_executor(None, lambda: tool(**fc.args)) return result return await asyncio.gather(*[execute_single(fc) for fc in function_calls])
A Note from an Indie Developer
Closing Thoughts
Gemini API's Function Calling opens a window from AI into the real world. With well-designed tool definitions, robust error handling, and security-conscious implementation, you can build AI assistants that are genuinely useful in production.
Start with a simple single-tool implementation and work your way up to parallel calls and complex workflows as you get comfortable. Mastering Function Calling substantially expands what's possible with AI application development.
We're rooting for you to build something truly impactful.
Streaming Responses: The Basics
With standard API requests, the model generates the entire response before sending it back. Streaming delivers text incrementally as it's generated, dramatically improving the user experience.
Streaming in Python
import google.generativeai as genai# Configure your API keygenai.configure(api_key="YOUR_API_KEY")# Initialize the modelmodel = genai.GenerativeModel("gemini-2.5-pro")# Receive streaming responseresponse = model.generate_content( "Explain how to build a web scraper in Python, step by step.", stream=True)# Output each chunk in real timefor chunk in response: if chunk.text: print(chunk.text, end="", flush=True)print() # Final newline# Expected behavior:# Text appears in real time (token by token)# No waiting for the complete response to generate
Streaming in TypeScript/JavaScript
import { GoogleGenerativeAI } from "@google/generative-ai";const genAI = new GoogleGenerativeAI("YOUR_API_KEY");const model = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });async function streamResponse(prompt: string) { const result = await model.generateContentStream(prompt); // Process chunks as they arrive for await (const chunk of result.stream) { const text = chunk.text(); process.stdout.write(text); } // You can also get the complete final response const finalResponse = await result.response; console.log("\n\nTotal tokens:", finalResponse.usageMetadata?.totalTokenCount);}// UsagestreamResponse("What are the best practices for designing REST APIs in TypeScript?");// Expected output:// (Text streams in real time)// Total tokens: 1234
Combining Streaming with Function Calling
By combining streaming and Function Calling, you can build advanced applications that respond in real time while also fetching external data.
import google.generativeai as genaigenai.configure(api_key="YOUR_API_KEY")model = genai.GenerativeModel( model_name="gemini-2.5-pro", tools=[get_weather])# Streaming + Function Callingresponse = model.generate_content( "Help me plan a picnic for tomorrow. Check Tokyo's weather and create an appropriate packing list.", stream=True)for chunk in response: # Handle Function Call requests in stream for part in chunk.parts: if hasattr(part, 'function_call'): result = get_weather(city="Tokyo") print(f"\n[Weather data fetched: {result}]\n") # Handle text responses if chunk.text: print(chunk.text, end="", flush=True)
Wrapping Up
Gemini API's streaming and Function Calling capabilities are powerful building blocks for real-time AI applications. Streaming improves the user experience with instant feedback, while Function Calling seamlessly connects your AI to external data and tools.
Start with a simple streaming chat, then layer in Function Calling to integrate external APIs like weather services or news feeds. This incremental approach is the most reliable way to build sophisticated AI-powered applications.
Share
Thank You for Reading
Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.