GEMINI LABJP
CLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context windowCLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context window
Articles/Dev Tools
Dev Tools/2026-06-17Advanced

Running Gemini Chat History on Redis — Field Notes on Not Losing Conversation State in Production

Keep a Gemini ChatSession in process memory and it evaporates on every redeploy or scale event. Here is how I back it with Redis in production, covering token budgets, concurrent sends, SDK coupling, and graceful degradation, with the code I actually run.

gemini83gemini-api239redis3session-managementproduction113operations3

Premium Article

The first bug report I got after shipping a chat feature was "I can't pick up where I left off yesterday." Locally everything worked. The moment it ran on Cloud Run, the assistant reintroduced itself every time a user reopened the app. The cause was embarrassingly simple: I was holding the ChatSession object in process memory.

Gemini's chats.create(history=...) will happily respond with full context as long as you hand it the history. Starting from history=[] in the official sample is correct — but if you scale the service without deciding where that history lives, it breaks quietly on day one. Below are the parts of my own chat state management, drawn from running it as an indie developer, that matter, framed around backing it with Redis: where the naive implementation falls apart, and how I closed each gap, with the code attached.

A note on models: this article assumes gemini-3.5-flash, the default as of June 2026, and gemini-3.5-pro when you want heavier reasoning. Because the default model can shift under you and change the shape of outputs, decoupling your stored history from the SDK (covered later) lets you migrate without rewriting the storage layer each time.

The three failure modes I watch in production

In-memory state breaks down for one root reason: the lifetime of a container does not match the lifetime of a conversation. These are the three symptoms I actively monitor.

First, containers are short-lived. A Cloud Run instance shuts down minutes after traffic stops. On restart your global variables are empty and the in-flight conversation is gone. No exception is logged, so you usually learn about it from a user, which makes it the nastiest of the three.

Second, request routing under horizontal scale. If a user's first and second messages land on different instances, each sees its own isolated memory and the history never connects. Sticky sessions can pin a user to one instance, but you give up scaling flexibility — pushing state outside the process is the cleaner answer.

Third, client reconnection. When a mobile app is killed and reopened hours later, there is no guarantee the server still holds that state. Designing around "a session can disappear at any time" turns out to be the more robust posture.

Redis fits this role well: millisecond reads and writes, automatic expiry via TTL, and Pub/Sub for real-time notification if you need it. As a relay point for chat state, it is easy to work with.

A "just works" version, and where it frays

Let's start from the minimal version, then knock down the gaps one at a time.

# requirements: google-genai, redis
import json
import os
import redis
from google import genai
 
r = redis.Redis.from_url(os.environ["REDIS_URL"], decode_responses=True)
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 
def chat_once(session_id: str, user_message: str) -> str:
    raw = r.get(f"chat:{session_id}")
    history = json.loads(raw) if raw else []
 
    chat = client.chats.create(model="gemini-3.5-flash", history=history)
    response = chat.send_message(user_message)
 
    new_history = [
        {"role": msg.role, "parts": [{"text": p.text} for p in msg.parts]}
        for msg in chat.get_history()
    ]
    r.set(f"chat:{session_id}", json.dumps(new_history))
    return response.text

This runs. But after a few days in production, the following frays showed up in order.

First, history grows without bound. Each round adds hundreds to thousands of tokens; on a chatty session I saw it cross 100k tokens in about a week. Response latency degrades visibly and input-token billing climbs linearly.

Second, there is no TTL. You write and never expire, so abandoned sessions linger. Redis memory climbs, and depending on maxmemory and the eviction policy, new sessions eventually fail to persist.

Third, concurrent writes corrupt history. If the same user sends from two tabs, one save overwrites the other wholesale. Combined with optimistic UI updates, the display and the stored state drift apart into a hard-to-reproduce bug.

Fourth, it couples to the Gemini SDK's internals. Serializing the output of chat.get_history() directly means old data stops loading when an SDK upgrade changes the structure.

Let's close them in turn.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How to move conversation state out of the process, and how to tell apart the three failure modes that erase history on redeploy and horizontal scale
A sliding-window-plus-summary implementation that caps tokens from measured data, and why the summary step belongs on a lightweight model
The full production skeleton: safe Lua lock release, an SDK-independent storage format, and graceful degradation when Redis is down
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Dev Tools2026-06-17
Catching Deprecated Gemini Models in CI ― A Guard for Back-to-Back Shutdown Deadlines
When shutdowns and deprecations pile up, build a CI check that mechanically finds stale Gemini model strings across your repo. Includes a deprecation registry, a scanner, and a days-remaining warn/fail tier you can copy and run.
Dev Tools2026-06-15
When Your Firestore × Gemini Embeddings RAG Quietly Degrades — Designing for Re-Embedding
A RAG built on Firestore native vector search and Gemini Embeddings drifts when the embedding model changes generations, and retrieval quality drops with no errors. Here is how to detect the drift, re-embed without downtime, and keep retrieval cost in check.
Dev Tools2026-06-02
A Lightweight Gemini Backend with Bun and Hono — Reclaiming the Small Tools of Indie Development
Has your Node and Express Gemini backend grown heavy with dependencies and build times? Here is how I moved one to Bun and Hono — folding streaming, rate limiting, cost caps, testing, and self-hosting into a single light runtime — along with the pitfalls I hit in production.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →