Running Gemma 4 Locally in Android Studio via Ollama — Setup, Performance, and Real-World Development Experience

The concern about sending proprietary code to cloud AI APIs is legitimate — especially in enterprise or regulated environments. Android Studio's local LLM support, paired with Ollama and Gemma 4, offers a practical alternative: AI-powered coding assistance that never leaves your machine.

This article documents the setup process on MacOS and what the actual development experience looks like once everything is running.

Prerequisites

MacOS (Apple Silicon strongly recommended — M-series chips provide significant GPU acceleration)
Android Studio Narwhal or later
Ollama (download at ollama.com)
Available RAM: at least 32GB recommended for Gemma 26B

If you're on an Intel Mac or Windows machine, the setup works the same way. For lower-spec systems, the 4B or 9B variants are more practical.

Step 1: Get Gemma 4 Running in Ollama

# Download Gemma 4 26B (~17GB)
ollama pull gemma4:26b
 
# Start the Ollama server
ollama serve
 
# In a separate terminal, verify it's working
ollama run gemma4:26b "Explain ViewModels in Android development"

Ollama serves the model API at http://localhost:11434 by default. Android Studio connects to this endpoint.

For lighter hardware, the smaller variants are worth trying:

ollama pull gemma4:4b   # ~3GB, fastest
ollama pull gemma4:9b   # ~6GB, better quality

Step 2: Configure Android Studio

Open Preferences > Tools > Gemini Code Assist > AI Provider. Select Local LLM and enter:

Endpoint URL: http://localhost:11434
Model name: gemma4:26b (or whichever variant you downloaded)

Use the connection test if available. A successful test means Ollama is reachable and the model responds. After saving, the AI assistant in your code editor will use your local Gemma 4 instead of the cloud Gemini API.

What Works Well

With the 26B model on Apple Silicon, code completion quality is genuinely usable. Kotlin idioms, standard Android architecture patterns, and common library usage are handled well:

// Repository pattern with error handling — model understands the context
class UserRepository @Inject constructor(
    private val userDao: UserDao,
    private val apiService: ApiService
) {
    suspend fun getUser(id: String): Result<User> {
        return try {
            val localUser = userDao.findById(id)
            if (localUser != null) {
                Result.success(localUser)
            } else {
                val remoteUser = apiService.fetchUser(id)
                userDao.insert(remoteUser)
                Result.success(remoteUser)
            }
        } catch (e: Exception) {
            Result.failure(e)
        }
    }
}

The model recognizes dependency injection patterns, coroutine conventions, and the Repository pattern without needing explicit explanation.

Where It Falls Short

Response latency is noticeably higher than cloud APIs. On M3 Max (96GB RAM) with Gemma 26B, completions arrive in 2–4 seconds. That's workable but you feel it, especially coming from cloud-based instant completions.

Jetpack Compose suggestions are less accurate than the cloud Gemini model. Compose evolves quickly, and local models don't receive real-time knowledge updates. For Compose-heavy projects, the tradeoff may not be worth it.

The 4B and 9B models are faster but struggle with complex multi-file context. They're fine for boilerplate generation but miss nuances in larger codebases.

Auto-Starting Ollama at Login

Running ollama serve manually every session gets old. Register it as a MacOS launch agent:

mkdir -p ~/Library/LaunchAgents
cat > ~/Library/LaunchAgents/com.ollama.serve.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.serve</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>
EOF
 
launchctl load ~/Library/LaunchAgents/com.ollama.serve.plist

Android Studio will have local AI available from the moment you open it.

When Local Makes Sense

The case for local LLM in Android development is strongest when:

Your codebase contains unreleased features, trade secrets, or regulated data
Your organization prohibits code from leaving internal infrastructure
You want deterministic, reproducible AI behavior unaffected by cloud model updates

The tradeoff is real: lower raw performance, slower updates, and higher hardware requirements. But for teams where code privacy is non-negotiable, this setup makes AI assistance accessible where it previously wasn't an option.

The premium article covers the deeper configuration: switching between models by task, fine-tuning Gemma 4 for project-specific patterns, and integrating local LLM into your CI environment.