If you've worked on building agentic AI systems before, you've faced this challenge: you want to delegate complex multi-step reasoning to a language model, letting it combine multiple tools to answer user queries. Yet most open-weight models make it surprisingly difficult to reliably control the model's output format.
This is exactly why Gemma 4, released in April 2026, represents a significant shift for the open-source community. Gemma 4 natively supports function calling, structured JSON output, and system instruction following — making it possible to build robust agentic systems without relying on closed commercial models.
Why Function Calling Support Changes Everything
With earlier open-weight models (Llama 2/3 and similar), reliable function calling was hard:
- Prompt engineering became the bottleneck: "Output JSON format" instructions are frequently ignored
- Fragile output parsing: Malformed JSON forced you to build retry logic around unreliable outputs
- Scale didn't guarantee reliability: Larger model size didn't necessarily mean stricter JSON adherence
Gemma 4 shifts this paradigm.
Gemma 4 natively supports system prompts and accepts function definitions via JSON Schema. When you provide tool definitions, the model outputs structured JSON reliably. This means function calling becomes an API contract, not a prompt engineering guessing game.
Tools Parameter: How to Define Functions at the API Level
Gemma 4's Function Calling interface is compatible with Google Gemini API conventions.
Here's an example agent that retrieves weather and shipping status based on user queries:
```python import anthropic import json
client = anthropic.Anthropic(api_key="YOUR_GEMINI_API_KEY")
tools = [ { "name": "get_weather", "description": "Get weather information for a specified city", "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "City name" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["city"] } }, { "name": "track_shipment", "description": "Track shipping status", "input_schema": { "type": "object", "properties": { "tracking_number": { "type": "string", "description": "Tracking number" } }, "required": ["tracking_number"] } } ]
user_message = "What's the weather in Tokyo? Also, check tracking number SHP-12345 for me."
response = client.messages.create( model="gemini-2.0-flash", max_tokens=1024, system="You are an assistant that combines multiple tools to answer user questions.", tools=tools, messages=[ {"role": "user", "content": user_message} ] )
print(f"Stop Reason: {response.stop_reason}") for block in response.content: if hasattr(block, 'type'): print(f"Content Type: {block.type}") if block.type == "tool_use": print(f" Tool Name: {block.name}") print(f" Input: {json.dumps(block.input, indent=2)}") ```
Three critical points here:
1. `input_schema` must be strict JSON Schema
Gemma 4 parses this schema to constrain output. Type definitions (`string`, `number`, `array`) and `enum` constraints reduce invalid outputs significantly.
2. The `tools` parameter accepts multiple functions simultaneously
The model selects which function(s) to invoke based on the user query. In this example, the model might call both weather and shipment tracking.
3. Detect tool invocation via `stop_reason == "tool_calls"`
When the API response has `stop_reason == "tool_calls"`, the model has decided to call functions. The `tool_use` blocks in `response.content` contain the function invocations.
The Implementation Pitfall: Handling Multiple Concurrent Tool Calls
Here's a common mistake when implementing Function Calling:
```python
❌ Common mistake
if response.stop_reason == "tool_calls": for block in response.content: if block.type == "tool_use": result = execute_function(block.name, block.input) break # ← This skips remaining tool calls! ```
This pattern breaks when users ask questions requiring multiple tool invocations. Gemma 4 can return multiple function calls in a single response, so you can't break out of the loop early.
The correct approach:
```python
✅ Correct implementation
def process_tool_calls(response): results = []
if response.stop_reason == "tool_calls":
for block in response.content:
if block.type == "tool_use":
function_result = execute_function(block.name, block.input)
results.append({
"tool_use_id": block.id,
"function_name": block.name,
"result": function_result
})
return results
def run_agent_loop(user_message): messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="gemini-2.0-flash",
max_tokens=1024,
tools=tools,
messages=messages
)
tool_results = process_tool_calls(response)
if response.stop_reason \!= "tool_calls":
for block in response.content:
if block.type == "text":
print(f"Assistant: {block.text}")
break
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": result["tool_use_id"],
"content": str(result["result"])
}
for result in tool_results
]
})
```
This loop is essential. Gemma 4 (like Gemini's function calling) supports multi-turn conversations. You invoke functions in the first request, return their results to the model, and it continues reasoning to generate final text output.
System Prompts and Function Calling Behavior
Whether Gemma 4 effectively uses functions depends partly on system prompt quality:
```python system_prompt = """ You are an agentic assistant that uses tools to answer user questions.
Critical rules:
- Understand the user question and identify ALL required tools
- If multiple tools can run concurrently, invoke them in a single step
- After execution, interpret results and explain them clearly to the user
- When uncertain, explicitly state missing information
- If a question doesn't require tools, answer directly without function calls """
response = client.messages.create( model="gemini-2.0-flash", max_tokens=1024, system=system_prompt, tools=tools, messages=[{"role": "user", "content": user_message}] ) ```
The key insight here is explicitly teaching the model when to use tools versus when to respond directly. Many open models default to invoking tools whenever they're available. But simple conversational questions don't need function calls. Clear system prompt separation reduces API costs and latency.
Gemma 4 26B MoE vs 31B Dense: Which to Choose for Function Calling?
Gemma 4 offers two primary variants:
- 26B MoE (active ~4B): Lightweight and fast. Suitable for edge deployment
- 31B Dense: Larger with higher accuracy. Better for complex multi-step agents
For Function Calling, use 31B Dense. Complex tool selection and multi-step reasoning are more reliable with the denser architecture. 26B MoE is lighter but may struggle with intricate agentic scenarios.
That said, if latency or on-device execution is critical, validate 26B MoE's Function Calling behavior first before deciding on a larger model.
Moving Forward: Building Production-Grade Agents
You've now seen the core of implementing Function Calling with Gemma 4. To scale this into production systems, add:
- Reliability patterns: Retry logic for network failures and timeouts
- Prompt caching: Cache system prompts and tool definitions for longer conversations
- Output validation: Ensure tool results match expected formats
This foundation is expanded in a premium deep-dive covering advanced agentic patterns and optimization techniques. For now, mastering Function Calling mechanics in Gemma 4 is your launchpad for building reliable agentic AI.