GEMINI LABJP
CLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context windowCLI — As of Jun 18, Gemini CLI and the Gemini Code Assist IDE extensions stop serving AI Pro/Ultra and free individual users; Antigravity CLI is the successorFLASH — The Gemini 3.5 series begins with 3.5 Flash, built for agents and coding with strength on long-horizon tasksDEEPTHINK — Gemini 3 Deep Think is rolling out to Google AI Ultra as the top reasoning mode for math, science, and logicAPP — The Gemini app gains a Daily Brief, a redesigned interface, the Gemini Omni video model, and a personal agent called Gemini SparkDESIGN — A new design language, Neural Expressive, rebuilds the experience for richer visuals and faster switching between modalitiesULTRA — Google AI Ultra bundles top model access, Deep Research, Veo 3 video, and a 1M-token context window
Articles/API / SDK
API / SDK/2026-06-16Advanced

Your Gemini Live API session forgets the conversation every time it reconnects — field notes on token refresh and session resumption

Why a Gemini Live API WebSocket drops the conversation and the user's in-flight speech on every reconnect, and how to close the gap with single-use ephemeral tokens, session resumption handles, and the goAway warning.

Gemini Live API4WebSocket3Ephemeral TokenSession ResumptionReconnectRealtime

Premium Article

A train enters a tunnel, the WebSocket drops for a few seconds, and when it comes back the assistant has forgotten how the conversation started. The Gemini Live API voice assistant I was building as an indie developer ran flawlessly in the demo, then started dropping turns like this every day once it shipped to real devices.

The disconnects themselves are unavoidable. Mobile networks drop, and Live API sessions have limits. What production really tests is how you come back. Plenty of implementations get as far as reconnect logic, yet lose two things the moment they return: authentication, and the conversation context. These notes build a reconnect that keeps both, in the order I actually hit the problems.

One note on models: this assumes the gemini-2.5-flash family that reached general availability in June 2026. If you are still on gemini-2.0-flash, fold the model ID migration into this reconnect work rather than doing it separately.

Reconnects break authentication — tokens are not reusable

The first wall was getting rejected with a 401 on every reconnect. The initial connect succeeds, but every attempt after that fails.

The cause is the nature of ephemeral tokens. The short-lived token your backend issues is consumed once a connection is established. In other words, a single token is good for one WebSocket. If your reconnect code holds the first token in a variable and reuses it, the second attempt sends a spent token and authentication fails.

The fix is simple ordering: fetch a fresh token from the backend every time you try to connect. Here is the issuing endpoint.

// app/api/live/token/route.ts
import { NextRequest, NextResponse } from "next/server";
 
const TOKEN_ENDPOINT =
  "https://generativelanguage.googleapis.com/v1alpha/ephemeralTokens:create";
 
export async function POST(req: NextRequest) {
  // Always check your own auth first. Skip it and this becomes an open token vending machine.
  const session = await getServerSession(req);
  if (!session?.user) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }
 
  const apiKey = process.env.GEMINI_API_KEY;
  if (!apiKey) {
    return NextResponse.json({ error: "Server misconfigured" }, { status: 500 });
  }
 
  const res = await fetch(`${TOKEN_ENDPOINT}?key=${apiKey}`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      // Separate the window to open a session (newSessionExpireTime) from the overall lifetime (expireTime).
      uses: 1,
      expireTime: new Date(Date.now() + 30 * 60 * 1000).toISOString(),
      newSessionExpireTime: new Date(Date.now() + 60 * 1000).toISOString(),
      liveConnectConstraints: {
        model: "models/gemini-2.5-flash",
        config: {
          responseModalities: ["AUDIO"],
          systemInstruction: {
            parts: [{ text: "You are this app's dedicated assistant." }],
          },
        },
      },
    }),
  });
 
  if (!res.ok) {
    console.error("token issue failed:", await res.text());
    return NextResponse.json({ error: "Failed to issue token" }, { status: 502 });
  }
 
  const data = await res.json();
  return NextResponse.json({ token: data.name, expiresAt: data.expireTime });
}

The lever here is newSessionExpireTime. Separate from the token's overall lifetime (expireTime), you can keep the window for opening the first connection short. If the token leaks, an attacker only has a few dozen seconds to open a new session. When you connect immediately after issuing, a one-minute window is plenty.

On the client, do not hold the token. Pass a function that fetches one right before connecting.

// Hand the connector a token-getter, not the token itself.
const getToken = async (): Promise<string> => {
  const res = await fetch("/api/live/token", { method: "POST" });
  if (!res.ok) throw new Error("token fetch failed");
  const { token } = await res.json();
  return token;
};

That one change ends the 401 loop. Keep no token as state; pull a fresh one the instant you need it. That is the baseline posture for auth when reconnects are expected.

Session resumption handles — return without losing context

Even with auth fixed, a problem remains: the reconnect succeeds, but the assistant no longer remembers the previous exchange.

When you open a fresh WebSocket, Live API sees a brand-new session. Send only the setup message and every prior turn is gone. The defense is session resumption.

It works like this. During a connection, the server periodically sends a sessionResumptionUpdate message. Its newHandle is a ticket pointing at the current session state. The client keeps the latest one and passes it inside setup on reconnect, and Live API carries the old context into the new connection.

Keep the latest handle and send a resuming setup in one place.

// lib/LiveSession.ts
type Json = Record<string, unknown>;
 
export class LiveSession {
  private ws: WebSocket | null = null;
  private resumptionHandle: string | null = null;  // latest resumption handle
  private attempts = 0;
  private closedByUser = false;
  private readonly maxAttempts = 6;
  private readonly base = 1000;
 
  constructor(
    private readonly getToken: () => Promise<string>,
    private readonly wsBase: string,
    private readonly onMessage: (m: Json) => void,
  ) {}
 
  async connect() {
    this.closedByUser = false;
    await this.open();
  }
 
  private async open() {
    const token = await this.getToken();  // always a fresh token
    const ws = new WebSocket(`${this.wsBase}?access_token=${token}`);
 
    ws.onopen = () => {
      this.attempts = 0;
      ws.send(JSON.stringify({
        setup: {
          model: "models/gemini-2.5-flash",
          generationConfig: { responseModalities: ["AUDIO"] },
          // Pass the handle if we have one; omit it for a fresh start.
          sessionResumption: this.resumptionHandle
            ? { handle: this.resumptionHandle }
            : {},
        },
      }));
    };
 
    ws.onmessage = (e) => {
      const msg = JSON.parse(e.data) as Json;
      // Overwrite the handle on every update so it stays current.
      const update = msg.sessionResumptionUpdate as
        | { resumable?: boolean; newHandle?: string }
        | undefined;
      if (update?.resumable && update.newHandle) {
        this.resumptionHandle = update.newHandle;
      }
      // Treat the server's disconnect warning as a trigger to reconnect early.
      if ("goAway" in msg) {
        this.reconnectSoon();
        return;
      }
      this.onMessage(msg);
    };
 
    ws.onclose = () => {
      this.ws = null;
      if (!this.closedByUser) this.reconnectSoon();
    };
    ws.onerror = (err) => console.error("ws error", err);
    this.ws = ws;
  }
 
  private reconnectSoon() {
    if (this.attempts >= this.maxAttempts) {
      console.error("reconnect limit reached; prompt the user to resume manually");
      return;
    }
    // Exponential backoff + jitter: 1s, 2s, 4s, ... plus randomness to avoid a thundering herd.
    const delay = this.base * 2 ** this.attempts + Math.random() * 500;
    this.attempts++;
    setTimeout(() => this.open(), delay);
  }
 
  send(data: Json): boolean {
    if (this.ws?.readyState !== WebSocket.OPEN) return false;  // silently drop while disconnected
    this.ws.send(JSON.stringify(data));
    return true;
  }
 
  disconnect() {
    this.closedByUser = true;
    this.ws?.close();
    this.ws = null;
  }
}

Two things to watch in this code.

First, update the handle on every arrival. As the conversation advances, sessionResumptionUpdate swaps in a ticket pointing at newer state. Hold onto an old handle and resumption still works, but it rewinds the context by several turns. Keep overwriting with the latest.

Second, do not grab the handle when resumable is false. The server also sends updates at moments when a resumption point is not yet settled. Resuming from that ticket fails, so only store it when resumable is true and a handle is present.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
The classic 401 loop where reconnects reuse a spent token, and the exact code that fetches a fresh one right before each connect
Carrying conversation context across a drop with sessionResumption handles, so the user never loses a turn mid-sentence
Turning the goAway warning into a proactive reconnect instead of an abrupt disconnect, with the state machine to do it
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API / SDK2026-04-04
Building Voice Agents with Gemini Live API: A Basics
Learn how to build real-time voice agents using Gemini Live API. From setup to implementation examples, this guide covers everything you need to get started.
Dev Tools2026-05-05
Integrating Gemini Live API into an Expo App — Real-Time Voice Conversation Guide
Learn how to integrate Gemini Live API into an Expo (React Native) mobile app for real-time voice conversation. Includes practical code examples for WebSocket connections, audio recording, and playback.
Dev Tools2026-05-03
Building a Real-Time Voice SaaS With Gemini Live API — Full Implementation With Stripe Billing
A complete production-grade implementation guide for a real-time voice SaaS using Gemini Live API. Covers WebSocket setup, Cloudflare Workers Durable Objects, and per-second Stripe Meter Events billing — with full code.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →