Running Gemma 4 Locally on Windows — A Hands-On LLM in Two Commands with Ollama

Cloud APIs are convenient, but when you only want to try something quickly, the preamble of issuing a key and minding your billing quota can feel quietly heavy. Having one small model that runs entirely on your own PC lowers that mental barrier in an instant.

As an indie developer running apps and blogs, I increasingly reach for an LLM to do research or draft text. For light tasks that do not quite warrant a cloud call, a model that runs locally turns out to be more useful than I expected. Here is the shortest path to running Google's lightweight open model Gemma 4 on a Windows laptop.

Why it helps to keep one locally running LLM

This does not make the cloud Gemini API unnecessary. It is a question of division. A locally running model has a few advantages the cloud does not.

No key or quota to mind — try it the moment the idea strikes
Input never leaves your machine — hand it drafts and notes you would rather not send out
Works offline — it does not stall in places with shaky connectivity

Conversely, the reasoning power of the latest large models, or work needing a huge context, suits the cloud better. Keep a two-tier setup — local for light tasks, cloud for heavy ones — and you get the best of both. Since I became deliberate about this switch, my wasted API calls dropped.

A rough guide to the hardware

Gemma 4 comes in a lightweight small variant that runs even on an ordinary laptop. For reference, the setup I saw ran the small variant conversing without trouble on a Windows 11 laptop with an Intel Core i7-class CPU and 32GB of memory.

More memory gives more headroom. The small variant might run on 16GB, but for comfort, 32GB is reassuring. It runs on CPU without a GPU, though response speed depends on the hardware. I recommend starting with the small variant on the PC you have, and moving up a size if it feels lacking.

Two commands with Ollama get you running

The star of setup is Ollama. It handles downloading and launching the model, so you can start a conversation without fiddly configuration. Open PowerShell and run these three lines in order.

irm https://ollama.com/install.ps1 | iex
ollama --version
ollama run gemma4:e2b

Line one installs Ollama itself, line two confirms the install succeeded, and line three downloads and launches the small Gemma 4 variant. What you actually do comes down to two moves — "install" and "run" — and typing that final ollama run fetches the model and drops you straight into a conversation.

Check the exchange after it launches

Once it finishes launching, it waits for your input. Send a greeting and it replies, and it converses in Japanese just fine too. To quit, type /bye to leave conversation mode.

You can also ask a single one-off question without entering conversation mode.

ollama run gemma4:e2b "What is the second element in the periodic table?"

Passing the prompt as an argument like this hands you just the answer and returns you to the command line right away. For short checks, or calls from a script, this was the handy form.

Check which models are installed

You can list the models you have downloaded with this command.

ollama list

It lines up the name, ID, size, and modified time, which helps keep things tidy when you have several models. The small variant is only a few GB, so the storage hit is limited. When space gets tight, run while deleting models you no longer use.

How to divide the work

A local Gemma 4 suits light tasks — quick drafts, short checks, and consulting on notes you would rather not send out. For involved design discussions or long-form generation grounded in the latest information, the cloud Gemini API gives steadier results.

In my case, I settled into throwing a rough first pass locally and re-routing to the cloud when it falls short. As an indie developer, this two-tier flow quietly helps me cut API cost while adding moves to my workflow. In the follow-up, I dig into how to fold a local Gemma 4 into real work — tricks to improve perceived response speed, and wiring it into a simple pipeline. For now, type three lines on the PC you have and feel a model come up in your own hands.