GEMINI LABJP
SIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soonSIRI — WWDC 2026 confirms the revamped Siri runs on a Google Gemini model, though it won't ship in the EU at iOS 27 due to the DMAFLASH3.5 — Gemini 3.5 Flash is now GA, the top Flash model for sustained frontier performance on agentic and coding tasksIMAGE-GA — Gemini 3.1 Flash Image and 3.1 Pro Image are GA as native visual models; the preview versions shut down Jun 25MANAGED-AGENTS — Managed Agents launch in public preview in the Gemini API, running autonomous agents in Google-hosted isolated Linux sandboxesFILE-SEARCH — File Search now supports multimodal search, with native image embedding and retrieval via gemini-embedding-2DEPRECATION — gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shut down Jun 25 — migrate to the GA models soon
Articles/Dev Tools
Dev Tools/2026-05-06Intermediate

Running Gemma 4 Locally in Android Studio via Ollama — Setup, Performance, and Real-World Development Experience

A hands-on guide to connecting Android Studio's local LLM feature with Gemma 4 via Ollama. Covers MacOS setup, model selection, practical coding experience, and when local AI makes more sense than cloud APIs.

Android Studio2Gemma 416Ollama9local LLM6Android development2offline AIGemini Code Assist7

The concern about sending proprietary code to cloud AI APIs is legitimate — especially in enterprise or regulated environments. Android Studio's local LLM support, paired with Ollama and Gemma 4, offers a practical alternative: AI-powered coding assistance that never leaves your machine.

This article documents the setup process on MacOS and what the actual development experience looks like once everything is running.

Prerequisites

  • MacOS (Apple Silicon strongly recommended — M-series chips provide significant GPU acceleration)
  • Android Studio Narwhal or later
  • Ollama (download at ollama.com)
  • Available RAM: at least 32GB recommended for Gemma 26B

If you're on an Intel Mac or Windows machine, the setup works the same way. For lower-spec systems, the 4B or 9B variants are more practical.

Step 1: Get Gemma 4 Running in Ollama

# Download Gemma 4 26B (~17GB)
ollama pull gemma4:26b
 
# Start the Ollama server
ollama serve
 
# In a separate terminal, verify it's working
ollama run gemma4:26b "Explain ViewModels in Android development"

Ollama serves the model API at http://localhost:11434 by default. Android Studio connects to this endpoint.

For lighter hardware, the smaller variants are worth trying:

ollama pull gemma4:4b   # ~3GB, fastest
ollama pull gemma4:9b   # ~6GB, better quality

Step 2: Configure Android Studio

Open Preferences > Tools > Gemini Code Assist > AI Provider. Select Local LLM and enter:

  • Endpoint URL: http://localhost:11434
  • Model name: gemma4:26b (or whichever variant you downloaded)

Use the connection test if available. A successful test means Ollama is reachable and the model responds. After saving, the AI assistant in your code editor will use your local Gemma 4 instead of the cloud Gemini API.

What Works Well

With the 26B model on Apple Silicon, code completion quality is genuinely usable. Kotlin idioms, standard Android architecture patterns, and common library usage are handled well:

// Repository pattern with error handling — model understands the context
class UserRepository @Inject constructor(
    private val userDao: UserDao,
    private val apiService: ApiService
) {
    suspend fun getUser(id: String): Result<User> {
        return try {
            val localUser = userDao.findById(id)
            if (localUser != null) {
                Result.success(localUser)
            } else {
                val remoteUser = apiService.fetchUser(id)
                userDao.insert(remoteUser)
                Result.success(remoteUser)
            }
        } catch (e: Exception) {
            Result.failure(e)
        }
    }
}

The model recognizes dependency injection patterns, coroutine conventions, and the Repository pattern without needing explicit explanation.

Where It Falls Short

Response latency is noticeably higher than cloud APIs. On M3 Max (96GB RAM) with Gemma 26B, completions arrive in 2–4 seconds. That's workable but you feel it, especially coming from cloud-based instant completions.

Jetpack Compose suggestions are less accurate than the cloud Gemini model. Compose evolves quickly, and local models don't receive real-time knowledge updates. For Compose-heavy projects, the tradeoff may not be worth it.

The 4B and 9B models are faster but struggle with complex multi-file context. They're fine for boilerplate generation but miss nuances in larger codebases.

Auto-Starting Ollama at Login

Running ollama serve manually every session gets old. Register it as a MacOS launch agent:

mkdir -p ~/Library/LaunchAgents
cat > ~/Library/LaunchAgents/com.ollama.serve.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.serve</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>
EOF
 
launchctl load ~/Library/LaunchAgents/com.ollama.serve.plist

Android Studio will have local AI available from the moment you open it.

When Local Makes Sense

The case for local LLM in Android development is strongest when:

  • Your codebase contains unreleased features, trade secrets, or regulated data
  • Your organization prohibits code from leaving internal infrastructure
  • You want deterministic, reproducible AI behavior unaffected by cloud model updates

The tradeoff is real: lower raw performance, slower updates, and higher hardware requirements. But for teams where code privacy is non-negotiable, this setup makes AI assistance accessible where it previously wasn't an option.

The premium article covers the deeper configuration: switching between models by task, fine-tuning Gemma 4 for project-specific patterns, and integrating local LLM into your CI environment.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

Advanced2026-05-06
Production-Grade Gemma 4 + Ollama + Android Studio — Task Routing, Fine-Tuning, Team Deployment, and CI Integration
A deep-dive into running Gemma 4 locally for Android development at production scale. Covers model-routing proxies, LoRA fine-tuning for project-specific patterns, Docker Compose team setup, and GitHub Actions AI code review integration.
Dev Tools2026-04-23
Ollama 'pull model manifest: file does not exist' — A Diagnostic Flow That Actually Isolates the Cause
Every guide tells you to check the tag name when Ollama throws `Error: pull model manifest: file does not exist`, but sometimes the tag is fine and the error persists. This post walks a 5-step diagnostic flow — tag, proxy, storage, registry, version — that isolates the real cause in under ten minutes, for Gemma 4 and beyond.
Dev Tools2026-04-14
Common Errors When Running Gemma 4 with Ollama or LM Studio — And How to Fix Them
Running Gemma 4 locally with Ollama or LM Studio? This guide covers the most common errors — VRAM issues, model loading failures, slow inference, and connection problems — with concrete solutions.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →