GEMINI LABJP
SEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact sourceAPI — Event-driven Webhooks replace polling for the Batch API and long-running operationsDEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation nowMODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x fasterAGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxesSTUDIO — Google AI Studio can now generate Android apps from natural-language promptsSEARCH — File Search grounding now adds media_id for visual citations and page numbers, so you can trace the exact sourceAPI — Event-driven Webhooks replace polling for the Batch API and long-running operationsDEPRECATION — Two image preview models (e.g. gemini-3.1-flash-image-preview) shut down June 25; migrate dependent automation nowMODEL — Gemini 3.5 Flash is GA, beating 3.1 Pro on nearly every benchmark while running 4x fasterAGENTS — Managed Agents are in public preview on the Gemini API, running autonomous agents in isolated Linux sandboxesSTUDIO — Google AI Studio can now generate Android apps from natural-language prompts
Articles/Dev Tools
Dev Tools/2026-06-24Beginner

Running Gemma 4 Locally on Windows — A Hands-On LLM in Two Commands with Ollama

How to run Google's lightweight open model Gemma 4 locally on a Windows laptop. With Ollama, you go from install to running in effectively two commands. Plus how to split work between the cloud Gemini API and a local Gemma.

Gemma2Gemma 412Ollama8local LLM6WindowsGemini69

Cloud APIs are convenient, but when you only want to try something quickly, the preamble of issuing a key and minding your billing quota can feel quietly heavy. Having one small model that runs entirely on your own PC lowers that mental barrier in an instant.

As an indie developer running apps and blogs, I increasingly reach for an LLM to do research or draft text. For light tasks that do not quite warrant a cloud call, a model that runs locally turns out to be more useful than I expected. Here is the shortest path to running Google's lightweight open model Gemma 4 on a Windows laptop.

Why it helps to keep one locally running LLM

This does not make the cloud Gemini API unnecessary. It is a question of division. A locally running model has a few advantages the cloud does not.

  • No key or quota to mind — try it the moment the idea strikes
  • Input never leaves your machine — hand it drafts and notes you would rather not send out
  • Works offline — it does not stall in places with shaky connectivity

Conversely, the reasoning power of the latest large models, or work needing a huge context, suits the cloud better. Keep a two-tier setup — local for light tasks, cloud for heavy ones — and you get the best of both. Since I became deliberate about this switch, my wasted API calls dropped.

A rough guide to the hardware

Gemma 4 comes in a lightweight small variant that runs even on an ordinary laptop. For reference, the setup I saw ran the small variant conversing without trouble on a Windows 11 laptop with an Intel Core i7-class CPU and 32GB of memory.

More memory gives more headroom. The small variant might run on 16GB, but for comfort, 32GB is reassuring. It runs on CPU without a GPU, though response speed depends on the hardware. I recommend starting with the small variant on the PC you have, and moving up a size if it feels lacking.

Two commands with Ollama get you running

The star of setup is Ollama. It handles downloading and launching the model, so you can start a conversation without fiddly configuration. Open PowerShell and run these three lines in order.

irm https://ollama.com/install.ps1 | iex
ollama --version
ollama run gemma4:e2b

Line one installs Ollama itself, line two confirms the install succeeded, and line three downloads and launches the small Gemma 4 variant. What you actually do comes down to two moves — "install" and "run" — and typing that final ollama run fetches the model and drops you straight into a conversation.

Check the exchange after it launches

Once it finishes launching, it waits for your input. Send a greeting and it replies, and it converses in Japanese just fine too. To quit, type /bye to leave conversation mode.

You can also ask a single one-off question without entering conversation mode.

ollama run gemma4:e2b "What is the second element in the periodic table?"

Passing the prompt as an argument like this hands you just the answer and returns you to the command line right away. For short checks, or calls from a script, this was the handy form.

Check which models are installed

You can list the models you have downloaded with this command.

ollama list

It lines up the name, ID, size, and modified time, which helps keep things tidy when you have several models. The small variant is only a few GB, so the storage hit is limited. When space gets tight, run while deleting models you no longer use.

How to divide the work

A local Gemma 4 suits light tasks — quick drafts, short checks, and consulting on notes you would rather not send out. For involved design discussions or long-form generation grounded in the latest information, the cloud Gemini API gives steadier results.

In my case, I settled into throwing a rough first pass locally and re-routing to the cloud when it falls short. As an indie developer, this two-tier flow quietly helps me cut API cost while adding moves to my workflow. In the follow-up, I dig into how to fold a local Gemma 4 into real work — tricks to improve perceived response speed, and wiring it into a simple pipeline. For now, type three lines on the PC you have and feel a model come up in your own hands.

Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

Dev Tools2026-06-24
Folding a Local Gemma 4 into Daily Work — Practical Notes on the Ollama API and Response Speed
Taking a local Gemma 4 you can now run interactively and folding it into real work: how to hit Ollama's local API from a script, tricks to improve perceived response speed, and a two-tier fallback that automatically routes to the cloud Gemini API — code included.
Dev Tools2026-05-06
Running Gemma 4 Locally in Android Studio via Ollama — Setup, Performance, and Real-World Development Experience
A hands-on guide to connecting Android Studio's local LLM feature with Gemma 4 via Ollama. Covers MacOS setup, model selection, practical coding experience, and when local AI makes more sense than cloud APIs.
Dev Tools2026-05-04
Gemma 4 26B A4B + OpenCode: Build a Free, Local Coding Agent on Your Mac or Linux Box
Apache 2.0–licensed Gemma 4 26B A4B paired with OpenCode finally puts a local coding agent within reach. Here is the practical setup walkthrough — choosing between Ollama, LM Studio, and vLLM, plus the agent configs I actually use.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →