Why Bring Gemini AI Into Your Android App
Google's Gemini API offers much more than text generation — it supports image recognition, audio understanding, Function Calling, and other multimodal capabilities through a single unified API. As the center of Google's ecosystem, Android provides the most natural platform for integrating Gemini into native mobile experiences.
This guide walks you through the entire process of adding Gemini to a Kotlin-based Android app using the Firebase AI Logic SDK (formerly Firebase Vertex AI SDK). From initial setup to production-quality streaming chat, you'll have working code at every step.
If you'd like a general overview of the Gemini API before diving in, check out [Gemini API Quickstart]((/articles/gemini-api/gemini-api-quickstart).
Prerequisites and Environment Setup
Development Requirements
To follow along, you'll need:
- Android Studio Ladybug (2025.3) or later
- Kotlin 1.9+
- Android SDK API level 21+ (minSdk)
- Firebase project (Blaze plan recommended)
- API key from Google AI Studio, or Gemini API enabled in your Firebase console
Setting Up Your Firebase Project
The Firebase AI Logic SDK requires a Firebase project with your Android app registered. In the Firebase Console:
- Go to Project Settings → "Add app" and register your Android app
- Download
google-services.jsonand place it in yourapp/directory - Navigate to the "AI Logic" section and enable the Gemini API
Adding Gradle Dependencies
Add the Firebase BOM and AI Logic SDK to your module-level build.gradle.kts:
// build.gradle.kts (Module: app)
plugins {
id("com.android.application")
id("org.jetbrains.kotlin.android")
id("com.google.gms.google-services")
}
dependencies {
// Firebase BOM manages all Firebase library versions
implementation(platform("com.google.firebase:firebase-bom:33.12.0"))
// Firebase AI Logic SDK for Gemini API integration
implementation("com.google.firebase:firebase-ai")
// Coroutines for streaming responses
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.8.1")
// Lifecycle ViewModel for UI integration
implementation("androidx.lifecycle:lifecycle-viewmodel-ktx:2.8.7")
implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.8.7")
}Run a Gradle sync to make sure all dependencies resolve correctly.
Basic Text Generation — Your First Gemini Call
Let's start with the simplest possible interaction: sending a text prompt and receiving a response. Initialize a GenerativeModel instance and call generateContent.
import com.google.firebase.ai.FirebaseAI
import com.google.firebase.ai.GenerativeModel
import com.google.firebase.ai.type.GenerativeBackend
// Initialize the GenerativeModel
val model: GenerativeModel = FirebaseAI
.getInstance()
.generativeModel(
modelName = "gemini-3-flash", // Fast, cost-effective model
backend = GenerativeBackend.googleAI() // Google AI backend
)
// Generate text (call within a coroutine)
suspend fun generateResponse(prompt: String): String {
val response = model.generateContent(prompt)
return response.text ?: "Failed to get a response"
}
// Usage example:
// viewModelScope.launch {
// val result = generateResponse("Give me 3 useful Kotlin extension functions")
// println(result)
// // Expected output:
// // 1. String.isEmailValid() - Email validation
// // 2. View.visible() - Toggle view visibility
// // 3. Context.toast(message) - Quick Toast display
// }GenerativeBackend.googleAI() connects directly through Google AI Studio. If your organization requires VPC or data residency controls, switch to GenerativeBackend.vertexAI().
Streaming Responses for Real-Time Display
In a chat interface, streaming tokens as they're generated dramatically improves perceived responsiveness. The Firebase AI Logic SDK supports Kotlin's Flow for streaming.
import kotlinx.coroutines.flow.Flow
import kotlinx.coroutines.flow.flow
import com.google.firebase.ai.type.GenerateContentResponse
fun streamResponse(prompt: String): Flow<String> = flow {
val stream: Flow<GenerateContentResponse> =
model.generateContentStream(prompt)
stream.collect { chunk ->
chunk.text?.let { text ->
emit(text) // Emit partial text as it arrives
}
}
}
// ViewModel usage
class ChatViewModel : ViewModel() {
private val _response = MutableStateFlow("")
val response: StateFlow<String> = _response
fun askGemini(prompt: String) {
viewModelScope.launch {
_response.value = ""
streamResponse(prompt).collect { partial ->
_response.value += partial
}
}
}
}
// Expected behavior:
// "Kotlin" → "Kotlin is" → "Kotlin is a modern" → ...
// UI updates incrementally without waiting for the full responseFor deeper coverage of streaming patterns, see [Gemini API Streaming × Function Calling Integration Guide]((/articles/gemini-api/gemini-api-streaming-function-calling-guide).
Multimodal Input — Analyzing Camera Images with Gemini
One of Gemini's standout features is multimodal support. You can send images captured by the device camera directly to Gemini for analysis.
import android.graphics.Bitmap
import com.google.firebase.ai.type.content
suspend fun analyzeImage(bitmap: Bitmap, question: String): String {
// Build a multimodal prompt with text and image
val inputContent = content {
image(bitmap) // Pass the Bitmap directly
text(question)
}
val response = model.generateContent(inputContent)
return response.text ?: "Could not analyze the image"
}
// Example: Analyzing a photo of food
// val result = analyzeImage(
// bitmap = cameraBitmap,
// question = "What dish is this and roughly how many calories does it have?"
// )
// Expected output:
// "This appears to be Carbonara. A typical serving contains approximately
// 650-800 calories. Key ingredients include pasta, egg yolk, pancetta,
// and Parmigiano-Reggiano cheese."The content {} builder also accepts PDF and video binary data. For larger files, consider using the Files API to upload first and then pass the file reference.
Function Calling — Letting AI Invoke App Features
Function Calling allows Gemini to recognize user intent (like "check the weather") and request your app to execute a specific function. The AI doesn't call external APIs directly — your app acts as the intermediary.
import com.google.firebase.ai.type.FunctionDeclaration
import com.google.firebase.ai.type.Schema
import com.google.firebase.ai.type.Tool
import com.google.firebase.ai.type.FunctionResponse
import com.google.firebase.ai.type.content
import kotlinx.serialization.json.JsonObject
import kotlinx.serialization.json.JsonPrimitive
// 1. Declare the function (tell Gemini what's available)
val getWeatherFunc = FunctionDeclaration(
name = "getWeather",
description = "Get current weather information for a specified city",
parameters = mapOf(
"city" to Schema.string("City name to get weather for (e.g., Tokyo, New York)")
)
)
// 2. Create model with tools
val modelWithTools = FirebaseAI
.getInstance()
.generativeModel(
modelName = "gemini-3-flash",
backend = GenerativeBackend.googleAI(),
tools = listOf(Tool(listOf(getWeatherFunc)))
)
// 3. Handle Function Calling in the conversation
suspend fun chatWithFunctions(userMessage: String): String {
val chat = modelWithTools.startChat()
val response = chat.sendMessage(userMessage)
// Check if Gemini requested a function call
val functionCall = response.functionCalls.firstOrNull()
if (functionCall != null) {
// Execute the actual logic on the app side
val city = functionCall.args["city"] as? String ?: "Tokyo"
val weatherData = fetchWeatherFromApi(city) // Your own API call
// Send the result back to Gemini
val functionResponse = content {
part(FunctionResponse(
name = "getWeather",
response = JsonObject(mapOf(
"temperature" to JsonPrimitive(weatherData.temp),
"condition" to JsonPrimitive(weatherData.condition)
))
))
}
val finalResponse = chat.sendMessage(functionResponse)
return finalResponse.text ?: ""
}
return response.text ?: ""
}
// Expected output (when user asks "What's the weather in Tokyo?"):
// "The current weather in Tokyo is sunny with a temperature of 22°C.
// It's a great day to be outside, though clouds may move in by evening,
// so you might want to carry a small umbrella."To explore advanced Function Calling design patterns, see [Gemini Function Calling Practical Guide]((/articles/gemini-api/gemini-function-calling-practical-guide).
Building a Chat UI with ViewModel and Jetpack Compose
Now let's bring everything together into a practical chat interface. This follows the MVVM architecture with real-time streaming display.
// ChatViewModel.kt
class ChatViewModel : ViewModel() {
private val model = FirebaseAI
.getInstance()
.generativeModel(
modelName = "gemini-3-flash",
backend = GenerativeBackend.googleAI()
)
private val chat = model.startChat()
data class Message(
val text: String,
val isUser: Boolean,
val isStreaming: Boolean = false
)
private val _messages = MutableStateFlow<List<Message>>(emptyList())
val messages: StateFlow<List<Message>> = _messages
private val _isLoading = MutableStateFlow(false)
val isLoading: StateFlow<Boolean> = _isLoading
fun sendMessage(userText: String) {
// Add user message
_messages.value += Message(text = userText, isUser = true)
_isLoading.value = true
viewModelScope.launch {
try {
// Receive streaming response
var aiResponse = ""
_messages.value += Message(
text = "", isUser = false, isStreaming = true
)
chat.sendMessageStream(userText).collect { chunk ->
chunk.text?.let { partial ->
aiResponse += partial
// Update the last message
_messages.value = _messages.value.dropLast(1) +
Message(
text = aiResponse,
isUser = false,
isStreaming = true
)
}
}
// Mark streaming as complete
_messages.value = _messages.value.dropLast(1) +
Message(text = aiResponse, isUser = false)
} catch (e: Exception) {
_messages.value += Message(
text = "An error occurred: ${e.localizedMessage}",
isUser = false
)
} finally {
_isLoading.value = false
}
}
}
}Observe this ViewModel from a @Composable function using collectAsState(), and message additions and streaming updates will reactively reflect in your UI.
Error Handling and Production Best Practices
To keep your app running smoothly in production, here are essential patterns to implement.
Rate Limiting and Retry Logic
import kotlinx.coroutines.delay
suspend fun <T> retryWithBackoff(
maxRetries: Int = 3,
initialDelay: Long = 1000L,
block: suspend () -> T
): T {
var currentDelay = initialDelay
repeat(maxRetries - 1) { attempt ->
try {
return block()
} catch (e: Exception) {
// Retry on 429 (Rate Limit) or 503 (Service Unavailable)
if (e.message?.contains("429") == true ||
e.message?.contains("503") == true) {
delay(currentDelay)
currentDelay *= 2 // Exponential backoff
} else {
throw e // Rethrow other errors immediately
}
}
}
return block() // Final attempt
}
// Usage:
// val result = retryWithBackoff {
// model.generateContent("Your prompt here")
// }Configuring Safety Settings
When passing user input directly to Gemini, explicitly set Safety Settings to prevent inappropriate content generation.
import com.google.firebase.ai.type.HarmCategory
import com.google.firebase.ai.type.HarmBlockThreshold
import com.google.firebase.ai.type.SafetySetting
val safeModel = FirebaseAI
.getInstance()
.generativeModel(
modelName = "gemini-3-flash",
backend = GenerativeBackend.googleAI(),
safetySettings = listOf(
SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.MEDIUM_AND_ABOVE),
SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.MEDIUM_AND_ABOVE),
SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.MEDIUM_AND_ABOVE),
SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.MEDIUM_AND_ABOVE)
)
)For a comprehensive guide on API key management and prompt injection defense, see [Gemini API Production Security Complete Guide]((/articles/gemini-api/gemini-api-production-security).
Looking back
In this guide, we covered how to integrate the Gemini API into a Kotlin Android app using the Firebase AI Logic SDK. Starting from basic text generation, we progressed through streaming responses, multimodal image analysis with the device camera, Function Calling for extending app capabilities, and production-grade error handling.
The combination of Gemini and Android represents the most natural AI integration pattern within Google's ecosystem. Start by building a prototype with the code in this guide, then customize it to fit your specific app's needs.