Building AI-Powered Android Apps with Gemini API and Kotlin

Why Bring Gemini AI Into Your Android App

Google's Gemini API offers much more than text generation — it supports image recognition, audio understanding, Function Calling, and other multimodal capabilities through a single unified API. As the center of Google's ecosystem, Android provides the most natural platform for integrating Gemini into native mobile experiences.

This guide walks you through the entire process of adding Gemini to a Kotlin-based Android app using the Firebase AI Logic SDK (formerly Firebase Vertex AI SDK). From initial setup to production-quality streaming chat, you'll have working code at every step.

If you'd like a general overview of the Gemini API before diving in, check out [Gemini API Quickstart]((/articles/gemini-api/gemini-api-quickstart).

Prerequisites and Environment Setup

Development Requirements

To follow along, you'll need:

Android Studio Ladybug (2025.3) or later
Kotlin 1.9+
Android SDK API level 21+ (minSdk)
Firebase project (Blaze plan recommended)
API key from Google AI Studio, or Gemini API enabled in your Firebase console

Setting Up Your Firebase Project

The Firebase AI Logic SDK requires a Firebase project with your Android app registered. In the Firebase Console:

Go to Project Settings → "Add app" and register your Android app
Download google-services.json and place it in your app/ directory
Navigate to the "AI Logic" section and enable the Gemini API

Adding Gradle Dependencies

Add the Firebase BOM and AI Logic SDK to your module-level build.gradle.kts:

// build.gradle.kts (Module: app)
plugins {
    id("com.android.application")
    id("org.jetbrains.kotlin.android")
    id("com.google.gms.google-services")
}
 
dependencies {
    // Firebase BOM manages all Firebase library versions
    implementation(platform("com.google.firebase:firebase-bom:33.12.0"))
 
    // Firebase AI Logic SDK for Gemini API integration
    implementation("com.google.firebase:firebase-ai")
 
    // Coroutines for streaming responses
    implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.8.1")
 
    // Lifecycle ViewModel for UI integration
    implementation("androidx.lifecycle:lifecycle-viewmodel-ktx:2.8.7")
    implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.8.7")
}

Run a Gradle sync to make sure all dependencies resolve correctly.

Basic Text Generation — Your First Gemini Call

Let's start with the simplest possible interaction: sending a text prompt and receiving a response. Initialize a GenerativeModel instance and call generateContent.

import com.google.firebase.ai.FirebaseAI
import com.google.firebase.ai.GenerativeModel
import com.google.firebase.ai.type.GenerativeBackend
 
// Initialize the GenerativeModel
val model: GenerativeModel = FirebaseAI
    .getInstance()
    .generativeModel(
        modelName = "gemini-3-flash",        // Fast, cost-effective model
        backend = GenerativeBackend.googleAI() // Google AI backend
    )
 
// Generate text (call within a coroutine)
suspend fun generateResponse(prompt: String): String {
    val response = model.generateContent(prompt)
    return response.text ?: "Failed to get a response"
}
 
// Usage example:
// viewModelScope.launch {
//     val result = generateResponse("Give me 3 useful Kotlin extension functions")
//     println(result)
//     // Expected output:
//     // 1. String.isEmailValid() - Email validation
//     // 2. View.visible() - Toggle view visibility
//     // 3. Context.toast(message) - Quick Toast display
// }

GenerativeBackend.googleAI() connects directly through Google AI Studio. If your organization requires VPC or data residency controls, switch to GenerativeBackend.vertexAI().

Streaming Responses for Real-Time Display

In a chat interface, streaming tokens as they're generated dramatically improves perceived responsiveness. The Firebase AI Logic SDK supports Kotlin's Flow for streaming.

import kotlinx.coroutines.flow.Flow
import kotlinx.coroutines.flow.flow
import com.google.firebase.ai.type.GenerateContentResponse
 
fun streamResponse(prompt: String): Flow<String> = flow {
    val stream: Flow<GenerateContentResponse> =
        model.generateContentStream(prompt)
 
    stream.collect { chunk ->
        chunk.text?.let { text ->
            emit(text) // Emit partial text as it arrives
        }
    }
}
 
// ViewModel usage
class ChatViewModel : ViewModel() {
    private val _response = MutableStateFlow("")
    val response: StateFlow<String> = _response
 
    fun askGemini(prompt: String) {
        viewModelScope.launch {
            _response.value = ""
            streamResponse(prompt).collect { partial ->
                _response.value += partial
            }
        }
    }
}
 
// Expected behavior:
// "Kotlin" → "Kotlin is" → "Kotlin is a modern" → ...
// UI updates incrementally without waiting for the full response

For deeper coverage of streaming patterns, see [Gemini API Streaming × Function Calling Integration Guide]((/articles/gemini-api/gemini-api-streaming-function-calling-guide).

Multimodal Input — Analyzing Camera Images with Gemini

One of Gemini's standout features is multimodal support. You can send images captured by the device camera directly to Gemini for analysis.

import android.graphics.Bitmap
import com.google.firebase.ai.type.content
 
suspend fun analyzeImage(bitmap: Bitmap, question: String): String {
    // Build a multimodal prompt with text and image
    val inputContent = content {
        image(bitmap)   // Pass the Bitmap directly
        text(question)
    }
 
    val response = model.generateContent(inputContent)
    return response.text ?: "Could not analyze the image"
}
 
// Example: Analyzing a photo of food
// val result = analyzeImage(
//     bitmap = cameraBitmap,
//     question = "What dish is this and roughly how many calories does it have?"
// )
// Expected output:
// "This appears to be Carbonara. A typical serving contains approximately
//  650-800 calories. Key ingredients include pasta, egg yolk, pancetta,
//  and Parmigiano-Reggiano cheese."

The content {} builder also accepts PDF and video binary data. For larger files, consider using the Files API to upload first and then pass the file reference.

Function Calling — Letting AI Invoke App Features

Function Calling allows Gemini to recognize user intent (like "check the weather") and request your app to execute a specific function. The AI doesn't call external APIs directly — your app acts as the intermediary.

import com.google.firebase.ai.type.FunctionDeclaration
import com.google.firebase.ai.type.Schema
import com.google.firebase.ai.type.Tool
import com.google.firebase.ai.type.FunctionResponse
import com.google.firebase.ai.type.content
import kotlinx.serialization.json.JsonObject
import kotlinx.serialization.json.JsonPrimitive
 
// 1. Declare the function (tell Gemini what's available)
val getWeatherFunc = FunctionDeclaration(
    name = "getWeather",
    description = "Get current weather information for a specified city",
    parameters = mapOf(
        "city" to Schema.string("City name to get weather for (e.g., Tokyo, New York)")
    )
)
 
// 2. Create model with tools
val modelWithTools = FirebaseAI
    .getInstance()
    .generativeModel(
        modelName = "gemini-3-flash",
        backend = GenerativeBackend.googleAI(),
        tools = listOf(Tool(listOf(getWeatherFunc)))
    )
 
// 3. Handle Function Calling in the conversation
suspend fun chatWithFunctions(userMessage: String): String {
    val chat = modelWithTools.startChat()
    val response = chat.sendMessage(userMessage)
 
    // Check if Gemini requested a function call
    val functionCall = response.functionCalls.firstOrNull()
    if (functionCall != null) {
        // Execute the actual logic on the app side
        val city = functionCall.args["city"] as? String ?: "Tokyo"
        val weatherData = fetchWeatherFromApi(city) // Your own API call
 
        // Send the result back to Gemini
        val functionResponse = content {
            part(FunctionResponse(
                name = "getWeather",
                response = JsonObject(mapOf(
                    "temperature" to JsonPrimitive(weatherData.temp),
                    "condition" to JsonPrimitive(weatherData.condition)
                ))
            ))
        }
        val finalResponse = chat.sendMessage(functionResponse)
        return finalResponse.text ?: ""
    }
 
    return response.text ?: ""
}
 
// Expected output (when user asks "What's the weather in Tokyo?"):
// "The current weather in Tokyo is sunny with a temperature of 22°C.
//  It's a great day to be outside, though clouds may move in by evening,
//  so you might want to carry a small umbrella."

To explore advanced Function Calling design patterns, see [Gemini Function Calling Practical Guide]((/articles/gemini-api/gemini-function-calling-practical-guide).

Building a Chat UI with ViewModel and Jetpack Compose

Now let's bring everything together into a practical chat interface. This follows the MVVM architecture with real-time streaming display.

// ChatViewModel.kt
class ChatViewModel : ViewModel() {
    private val model = FirebaseAI
        .getInstance()
        .generativeModel(
            modelName = "gemini-3-flash",
            backend = GenerativeBackend.googleAI()
        )
 
    private val chat = model.startChat()
 
    data class Message(
        val text: String,
        val isUser: Boolean,
        val isStreaming: Boolean = false
    )
 
    private val _messages = MutableStateFlow<List<Message>>(emptyList())
    val messages: StateFlow<List<Message>> = _messages
 
    private val _isLoading = MutableStateFlow(false)
    val isLoading: StateFlow<Boolean> = _isLoading
 
    fun sendMessage(userText: String) {
        // Add user message
        _messages.value += Message(text = userText, isUser = true)
        _isLoading.value = true
 
        viewModelScope.launch {
            try {
                // Receive streaming response
                var aiResponse = ""
                _messages.value += Message(
                    text = "", isUser = false, isStreaming = true
                )
 
                chat.sendMessageStream(userText).collect { chunk ->
                    chunk.text?.let { partial ->
                        aiResponse += partial
                        // Update the last message
                        _messages.value = _messages.value.dropLast(1) +
                            Message(
                                text = aiResponse,
                                isUser = false,
                                isStreaming = true
                            )
                    }
                }
 
                // Mark streaming as complete
                _messages.value = _messages.value.dropLast(1) +
                    Message(text = aiResponse, isUser = false)
            } catch (e: Exception) {
                _messages.value += Message(
                    text = "An error occurred: ${e.localizedMessage}",
                    isUser = false
                )
            } finally {
                _isLoading.value = false
            }
        }
    }
}

Observe this ViewModel from a @Composable function using collectAsState(), and message additions and streaming updates will reactively reflect in your UI.

Error Handling and Production Best Practices

To keep your app running smoothly in production, here are essential patterns to implement.

Rate Limiting and Retry Logic

import kotlinx.coroutines.delay
 
suspend fun <T> retryWithBackoff(
    maxRetries: Int = 3,
    initialDelay: Long = 1000L,
    block: suspend () -> T
): T {
    var currentDelay = initialDelay
    repeat(maxRetries - 1) { attempt ->
        try {
            return block()
        } catch (e: Exception) {
            // Retry on 429 (Rate Limit) or 503 (Service Unavailable)
            if (e.message?.contains("429") == true ||
                e.message?.contains("503") == true) {
                delay(currentDelay)
                currentDelay *= 2 // Exponential backoff
            } else {
                throw e // Rethrow other errors immediately
            }
        }
    }
    return block() // Final attempt
}
 
// Usage:
// val result = retryWithBackoff {
//     model.generateContent("Your prompt here")
// }

Configuring Safety Settings

When passing user input directly to Gemini, explicitly set Safety Settings to prevent inappropriate content generation.

import com.google.firebase.ai.type.HarmCategory
import com.google.firebase.ai.type.HarmBlockThreshold
import com.google.firebase.ai.type.SafetySetting
 
val safeModel = FirebaseAI
    .getInstance()
    .generativeModel(
        modelName = "gemini-3-flash",
        backend = GenerativeBackend.googleAI(),
        safetySettings = listOf(
            SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.MEDIUM_AND_ABOVE),
            SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.MEDIUM_AND_ABOVE),
            SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.MEDIUM_AND_ABOVE),
            SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.MEDIUM_AND_ABOVE)
        )
    )

For a comprehensive guide on API key management and prompt injection defense, see [Gemini API Production Security Complete Guide]((/articles/gemini-api/gemini-api-production-security).

Looking back

In this guide, we covered how to integrate the Gemini API into a Kotlin Android app using the Firebase AI Logic SDK. Starting from basic text generation, we progressed through streaming responses, multimodal image analysis with the device camera, Function Calling for extending app capabilities, and production-grade error handling.

The combination of Gemini and Android represents the most natural AI integration pattern within Google's ecosystem. Start by building a prototype with the code in this guide, then customize it to fit your specific app's needs.