⟐ Dev Tools/2026-03-28Advanced

Building Production Full-Stack AI Apps with Gemini API & Supabase

A practical guide to building production-grade full-stack AI apps with Gemini API and Supabase—covering auth, pgvector, Edge Functions, RLS, and cost control, plus the tuning lessons (IVFFlat to HNSW recall recovery, the service_role RLS bypass) you only learn in production.

Gemini API¹⁹³ Supabase pgvector⁴ Edge Functions Full-Stack Development

✦ Premium Article

A RAG chat endpoint I had running happily on a Supabase Edge Function suddenly started returning visibly worse matches the moment my document set grew from 10,000 to 120,000 rows — without a single line of code changing. The culprit was the pgvector index configuration, the kind of "only shows up at scale" trap that quickstart docs never mention.

Having built and run my own apps as an indie developer for a long time, I find the Gemini API + Supabase combination one of the few stacks an independent developer can actually run in production alone. This guide walks through wiring up auth, pgvector, Edge Functions, RLS, and cost control end to end — and then goes into the tuning decisions you only discover once real traffic hits.

A Stack One Developer Can Actually Run

Combining Gemini API with Supabase creates an exceptionally powerful platform for building modern AI applications. Supabase provides an integrated foundation with PostgreSQL, authentication, real-time subscriptions, and Edge Functions, while Gemini API handles text generation, multimodal processing, and embeddings. Together, they enable you to construct scalable, feature-rich AI applications rapidly—from AI chatbots and RAG systems to semantic search platforms.

The path below goes in the order the work actually happens: architecture, authentication, pgvector schema design, security, then the performance tuning that only becomes urgent once the table grows.

Supabase & Gemini Architecture Patterns

A well-designed Supabase + Gemini architecture consists of several interconnected layers:

Frontend Layer

React, Next.js, or similar client application
Real-time UI updates via Supabase Realtime client
Streaming response handling from Gemini API

API & Edge Functions Layer

Supabase Edge Functions (TypeScript/Deno runtime)
Authenticated requests to Gemini API
Request validation and rate limiting
Caching strategies

Data Layer

PostgreSQL (Supabase-hosted)
pgvector extension for semantic vector storage
User data, conversation history, document metadata
Row Level Security (RLS) for multi-tenant isolation

External Services

Gemini API (text generation, embeddings)
Storage (Supabase Storage or S3)
Optional: Redis or Vercel KV for caching

Why This Architecture Works

PostgreSQL with pgvector eliminates the need for a separate vector database—semantic search runs natively in your primary database. Edge Functions enable you to manage Gemini API authentication securely at the edge, minimizing latency. The RLS model ensures data isolation without additional middleware.

This architecture scales gracefully from prototype to millions of users while keeping operational costs reasonable. You get native transaction support, complex queries, and relational integrity that pure vector databases can't match.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦The exact pgvector parameters and trade-offs for moving from IVFFlat to HNSW to recover search recall from 0.78 to 0.93

✦The trap where a service_role key silently bypasses RLS, and how to scope permissions correctly with a user-scoped client

✦Avoiding 429s in embedding batches (concurrency cap + exponential backoff) and the real monthly cost at 8,000 MAU

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Setting Up Supabase and Authentication

Creating Your Supabase Project

Start by creating a new project at supabase.com. Once initialized, capture these credentials in your .env.local:

# .env.local
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
GEMINI_API_KEY=YOUR_GEMINI_API_KEY

Install the Supabase JavaScript client:

npm install @supabase/supabase-js @supabase/ssr

Email/Password Authentication

Implement email-based sign-up with email confirmation:

// lib/auth.ts
import { createClient } from '@supabase/supabase-js'
 
const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY!
)
 
export async function signUpWithEmail(email: string, password: string) {
  const { data, error } = await supabase.auth.signUp({
    email,
    password,
    options: {
      emailRedirectTo: `${process.env.NEXT_PUBLIC_APP_URL}/auth/callback`,
    },
  })
 
  if (error) throw new Error(error.message)
  return data
}
 
export async function signInWithEmail(email: string, password: string) {
  const { data, error } = await supabase.auth.signInWithPassword({
    email,
    password,
  })
 
  if (error) throw new Error(error.message)
  return data.session
}
 
export async function signOut() {
  const { error } = await supabase.auth.signOut()
  if (error) throw new Error(error.message)
}

For production, always enable email confirmation or OAuth to prevent unauthorized account creation.

User Profiles Table

Store additional user information beyond Supabase auth:

CREATE TABLE profiles (
  id UUID REFERENCES auth.users(id) ON DELETE CASCADE PRIMARY KEY,
  display_name TEXT,
  avatar_url TEXT,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
 
ALTER TABLE profiles ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY "Users read own profile"
  ON profiles FOR SELECT
  USING (auth.uid() = id);
 
CREATE POLICY "Users update own profile"
  ON profiles FOR UPDATE
  USING (auth.uid() = id);
 
-- Auto-create profile on signup
CREATE FUNCTION handle_new_user()
RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO public.profiles (id, display_name)
  VALUES (new.id, new.email);
  RETURN new;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
 
CREATE TRIGGER on_auth_user_created
  AFTER INSERT ON auth.users
  FOR EACH ROW EXECUTE FUNCTION handle_new_user();

Building Semantic Search with pgvector

Enabling pgvector

Supabase includes pgvector by default. Enable it via the SQL editor:

CREATE EXTENSION IF NOT EXISTS vector;

Documents and Embeddings Schema

Design tables for storing documents and their vector embeddings:

CREATE TABLE documents (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE NOT NULL,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  source_url TEXT,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
 
CREATE TABLE document_chunks (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  document_id UUID REFERENCES documents(id) ON DELETE CASCADE NOT NULL,
  chunk_index INT NOT NULL,
  content TEXT NOT NULL,
  -- Gemini embedding-001 produces 768-dimensional vectors
  embedding vector(768),
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
 
-- Create HNSW index for fast semantic search
CREATE INDEX ON document_chunks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);
 
-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
ALTER TABLE document_chunks ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY "Users access own documents"
  ON documents FOR SELECT
  USING (auth.uid() = user_id);
 
CREATE POLICY "Users access own chunks"
  ON document_chunks FOR SELECT
  USING (
    document_id IN (SELECT id FROM documents WHERE user_id = auth.uid())
  );

Semantic Search Query

Query similar chunks using cosine distance:

SELECT
  dc.id,
  dc.content,
  1 - (dc.embedding <=> query_embedding) AS similarity
FROM document_chunks dc
WHERE dc.document_id IN (
  SELECT id FROM documents WHERE user_id = auth.uid()
)
ORDER BY dc.embedding <=> query_embedding
LIMIT 5;

Wrap this as an RPC function for easier access from Edge Functions:

CREATE FUNCTION search_documents(
  query_embedding vector,
  user_id uuid,
  match_limit int DEFAULT 5
)
RETURNS TABLE (id uuid, content text, similarity float8) AS $$
BEGIN
  RETURN QUERY
  SELECT dc.id, dc.content, 1 - (dc.embedding <=> query_embedding)::float8
  FROM document_chunks dc
  WHERE dc.document_id IN (
    SELECT id FROM documents WHERE documents.user_id = search_documents.user_id
  )
  ORDER BY dc.embedding <=> query_embedding
  LIMIT match_limit;
END;
$$ LANGUAGE plpgsql;

Gemini Embeddings Pipeline

Document Upload → Embedding Workflow

When users upload documents, automatically chunk and embed them:

// supabase/functions/embed-document/index.ts
import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
 
const supabaseUrl = Deno.env.get('SUPABASE_URL')!
const supabaseKey = Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
const geminiApiKey = Deno.env.get('GEMINI_API_KEY')!
 
const supabase = createClient(supabaseUrl, supabaseKey)
 
// Split text into chunks (~500 tokens each)
function chunkText(text: string, maxTokens: number = 500): string[] {
  const words = text.split(/\s+/)
  const chunks: string[] = []
  let current = ''
 
  for (const word of words) {
    if ((current + ' ' + word).split(' ').length > maxTokens) {
      chunks.push(current)
      current = word
    } else {
      current += (current ? ' ' : '') + word
    }
  }
 
  if (current) chunks.push(current)
  return chunks
}
 
// Call Gemini Embedding API
async function generateEmbedding(text: string): Promise<number[]> {
  const response = await fetch(
    'https://generativelanguage.googleapis.com/v1beta/models/embedding-001:embedContent',
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-goog-api-key': geminiApiKey,
      },
      body: JSON.stringify({
        model: 'models/embedding-001',
        content: { parts: [{ text }] },
      }),
    }
  )
 
  if (!response.ok) {
    throw new Error(`Embedding API error: ${response.statusText}`)
  }
 
  const data = await response.json()
  return data.embedding.values
}
 
serve(async (req) => {
  const { documentId, content, userId } = await req.json()
 
  const chunks = chunkText(content, 500)
 
  for (let i = 0; i < chunks.length; i++) {
    const embedding = await generateEmbedding(chunks[i])
 
    const { error } = await supabase.from('document_chunks').insert({
      document_id: documentId,
      chunk_index: i,
      content: chunks[i],
      embedding,
    })
 
    if (error) {
      console.error('Insert error:', error)
      return new Response(`Error: ${error.message}`, { status: 500 })
    }
  }
 
  return new Response(
    JSON.stringify({ success: true, chunkCount: chunks.length }),
    { headers: { 'Content-Type': 'application/json' } }
  )
})

Batch Processing & Rate Limiting

Implement batching to stay within Gemini API rate limits:

serve(async (req) => {
  // Process pending chunks (max 10 per invocation)
  const { data: pending } = await supabase
    .from('document_chunks')
    .select('id, content')
    .is('embedding', null)
    .limit(10)
 
  for (const chunk of pending || []) {
    const embedding = await generateEmbedding(chunk.content)
    await supabase
      .from('document_chunks')
      .update({ embedding })
      .eq('id', chunk.id)
 
    // Add small delay between requests
    await new Promise((r) => setTimeout(r, 100))
  }
 
  return new Response('OK')
})

Building RAG with Edge Functions

RAG Chat Endpoint

Create an Edge Function that retrieves relevant documents and uses Gemini to generate answers:

// supabase/functions/rag-chat/index.ts
import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
 
const supabaseUrl = Deno.env.get('SUPABASE_URL')!
const supabaseKey = Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
const geminiApiKey = Deno.env.get('GEMINI_API_KEY')!
 
const supabase = createClient(supabaseUrl, supabaseKey)
 
async function generateEmbedding(text: string): Promise<number[]> {
  const response = await fetch(
    'https://generativelanguage.googleapis.com/v1beta/models/embedding-001:embedContent',
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-goog-api-key': geminiApiKey,
      },
      body: JSON.stringify({
        model: 'models/embedding-001',
        content: { parts: [{ text }] },
      }),
    }
  )
 
  const data = await response.json()
  return data.embedding.values
}
 
async function ragChat(
  userId: string,
  question: string,
  conversationId: string
): Promise<string> {
  // Step 1: Embed the user's question
  const questionEmbedding = await generateEmbedding(question)
 
  // Step 2: Semantic search for relevant documents
  const { data: relevantChunks } = await supabase.rpc('search_documents', {
    query_embedding: questionEmbedding,
    user_id: userId,
    match_limit: 5,
  })
 
  // Step 3: Build context from retrieved chunks
  const context = (relevantChunks || [])
    .map((chunk: any) => chunk.content)
    .join('\n\n')
 
  // Step 4: Call Gemini with context
  const response = await fetch(
    'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent',
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-goog-api-key': geminiApiKey,
      },
      body: JSON.stringify({
        contents: [
          {
            parts: [
              {
                text: `Use the following context to answer the user's question.
 
Context:
${context}
 
Question: ${question}
 
Answer:`,
              },
            ],
          },
        ],
        generationConfig: {
          temperature: 0.7,
          maxOutputTokens: 1024,
        },
      }),
    }
  )
 
  const result = await response.json()
  return result.candidates[0].content.parts[0].text
}
 
serve(async (req) => {
  if (req.method !== 'POST') {
    return new Response('Method not allowed', { status: 405 })
  }
 
  const { userId, question, conversationId } = await req.json()
 
  try {
    const answer = await ragChat(userId, question, conversationId)
 
    // Save conversation history
    await supabase.from('messages').insert({
      conversation_id: conversationId,
      user_id: userId,
      role: 'user',
      content: question,
    })
 
    await supabase.from('messages').insert({
      conversation_id: conversationId,
      user_id: userId,
      role: 'assistant',
      content: answer,
    })
 
    return new Response(JSON.stringify({ answer }), {
      headers: { 'Content-Type': 'application/json' },
    })
  } catch (error: any) {
    return new Response(JSON.stringify({ error: error.message }), {
      status: 500,
      headers: { 'Content-Type': 'application/json' },
    })
  }
})

Real-Time Updates & Streaming

Supabase Realtime for Live Chat

Subscribe to new messages as they arrive:

// hooks/useRealtimeMessages.ts
import { useEffect, useState } from 'react'
import { supabase } from '@/lib/supabase'
 
export function useRealtimeMessages(conversationId: string) {
  const [messages, setMessages] = useState<any[]>([])
 
  useEffect(() => {
    // Fetch existing messages
    const loadMessages = async () => {
      const { data } = await supabase
        .from('messages')
        .select('*')
        .eq('conversation_id', conversationId)
        .order('created_at', { ascending: true })
 
      setMessages(data || [])
    }
 
    loadMessages()
 
    // Subscribe to new messages
    const channel = supabase
      .channel(`messages:${conversationId}`)
      .on(
        'postgres_changes',
        {
          event: 'INSERT',
          schema: 'public',
          table: 'messages',
          filter: `conversation_id=eq.${conversationId}`,
        },
        (payload) => {
          setMessages((prev) => [...prev, payload.new])
        }
      )
      .subscribe()
 
    return () => {
      supabase.removeChannel(channel)
    }
  }, [conversationId])
 
  return messages
}

Streaming Gemini Responses

Handle token-by-token streaming from Gemini API:

// lib/gemini-stream.ts
export async function* streamGeminiResponse(
  prompt: string,
  apiKey: string
) {
  const response = await fetch(
    'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:streamGenerateContent?alt=sse',
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-goog-api-key': apiKey,
      },
      body: JSON.stringify({
        contents: [{ parts: [{ text: prompt }] }],
        generationConfig: {
          temperature: 0.7,
          maxOutputTokens: 1024,
        },
      }),
    }
  )
 
  // Parse Server-Sent Events
  const reader = response.body!.getReader()
  const decoder = new TextDecoder()
 
  while (true) {
    const { done, value } = await reader.read()
    if (done) break
 
    const chunk = decoder.decode(value)
    const lines = chunk.split('\n')
 
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const json = JSON.parse(line.slice(6))
        const text = json.candidates?.[0]?.content?.parts?.[0]?.text || ''
        if (text) yield text
      }
    }
  }
}

Use in a React component:

// components/StreamingChat.tsx
'use client'
 
import { useState } from 'react'
import { streamGeminiResponse } from '@/lib/gemini-stream'
 
export function StreamingChat() {
  const [response, setResponse] = useState('')
 
  const handleChat = async (prompt: string) => {
    setResponse('')
 
    for await (const chunk of streamGeminiResponse(
      prompt,
      process.env.NEXT_PUBLIC_GEMINI_API_KEY!
    )) {
      setResponse((prev) => prev + chunk)
    }
  }
 
  return (
    <div>
      <div className="whitespace-pre-wrap mb-4">{response}</div>
      <button onClick={() => handleChat('Hello!')}>Start Chat</button>
    </div>
  )
}

Row Level Security & Access Control

RLS Policies for Multi-Tenancy

Enable RLS on all tables and define policies:

CREATE TABLE conversations (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE NOT NULL,
  title TEXT,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
 
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY "Users access own conversations"
  ON conversations FOR SELECT
  USING (auth.uid() = user_id);
 
CREATE POLICY "Users create conversations"
  ON conversations FOR INSERT
  WITH CHECK (auth.uid() = user_id);
 
CREATE POLICY "Users update own conversations"
  ON conversations FOR UPDATE
  USING (auth.uid() = user_id);
 
CREATE POLICY "Users delete own conversations"
  ON conversations FOR DELETE
  USING (auth.uid() = user_id);
 
-- Messages table
CREATE TABLE messages (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE NOT NULL,
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE NOT NULL,
  role TEXT CHECK (role IN ('user', 'assistant')),
  content TEXT NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
 
ALTER TABLE messages ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY "Users access own conversation messages"
  ON messages FOR SELECT
  USING (
    conversation_id IN (
      SELECT id FROM conversations WHERE user_id = auth.uid()
    )
  );

Token Verification in Edge Functions

Verify JWT tokens before processing requests:

import * as jose from 'https://deno.land/x/jose@v4.14.1/index.ts'
 
async function verifyToken(token: string) {
  try {
    const secret = new TextEncoder().encode(Deno.env.get('SUPABASE_JWT_SECRET')!)
    const verified = await jose.jwtVerify(token, secret)
    return verified.payload.sub // User ID
  } catch {
    throw new Error('Invalid token')
  }
}
 
serve(async (req) => {
  const authHeader = req.headers.get('Authorization')
  if (!authHeader) return new Response('Unauthorized', { status: 401 })
 
  const token = authHeader.replace('Bearer ', '')
  const userId = await verifyToken(token)
 
  // Now safely process request with verified userId
})

Deployment & Optimization

Caching Strategy

Cache embeddings to reduce API costs:

// lib/embedding-cache.ts
export async function getOrCreateEmbedding(text: string) {
  const hash = await hashText(text)
 
  // Check cache
  const { data: cached } = await supabase
    .from('embedding_cache')
    .select('embedding')
    .eq('text_hash', hash)
    .single()
 
  if (cached) return cached.embedding
 
  // Generate and cache
  const embedding = await generateEmbedding(text)
  await supabase.from('embedding_cache').insert({
    text_hash: hash,
    text,
    embedding,
  })
 
  return embedding
}
 
async function hashText(text: string): Promise<string> {
  const buffer = await crypto.subtle.digest(
    'SHA-256',
    new TextEncoder().encode(text)
  )
  return Array.from(new Uint8Array(buffer))
    .map((b) => b.toString(16).padStart(2, '0'))
    .join('')
}

Index Optimization

Use HNSW indexes for fast semantic search:

-- Drop old IVFFLAT and create HNSW
DROP INDEX IF EXISTS document_chunks_embedding_idx;
 
CREATE INDEX ON document_chunks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);
 
-- Update statistics
ANALYZE document_chunks;

Monitoring & Observability

Log API usage and track performance:

CREATE TABLE api_usage (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID REFERENCES auth.users(id),
  api_type TEXT CHECK (api_type IN ('embedding', 'generation')),
  tokens_used INT,
  cost DECIMAL(10, 6),
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
 
-- Monthly aggregation
CREATE VIEW monthly_usage AS
SELECT
  user_id,
  DATE_TRUNC('month', created_at) AS month,
  api_type,
  SUM(tokens_used) AS total_tokens,
  SUM(cost) AS total_cost
FROM api_usage
GROUP BY user_id, DATE_TRUNC('month', created_at), api_type;

Cost Management & Monitoring

Reducing Gemini API Expenses

Batch embeddings: Process multiple texts in one request

async function batchEmbeddings(texts: string[]) {
  const requests = texts.map((text) => ({
    model: 'models/embedding-001',
    content: { parts: [{ text }] },
  }))
 
  const response = await fetch(
    'https://generativelanguage.googleapis.com/v1beta/models/embedding-batch:batchEmbedContent',
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-goog-api-key': Deno.env.get('GEMINI_API_KEY')!,
      },
      body: JSON.stringify({ requests }),
    }
  )
 
  return (await response.json()).embeddings
}

Model selection: Use gemini-1.5-flash for generation and embedding-001 for embeddings—both are cost-effective.

Retry strategy: Implement exponential backoff for rate limits

async function callGeminiWithRetry(
  url: string,
  options: RequestInit,
  maxRetries: number = 5
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options)
 
    if (response.status === 429) {
      const delay = Math.pow(2, attempt) * 1000
      await new Promise((r) => setTimeout(r, delay))
      continue
    }
 
    return response
  }
 
  throw new Error('Max retries exceeded')
}

Usage Dashboard

Display API usage to users:

// lib/usage.ts
export async function getMonthlyUsage(userId: string) {
  const { data } = await supabase
    .from('monthly_usage')
    .select('*')
    .eq('user_id', userId)
    .eq('month', new Date().toISOString().slice(0, 7))
 
  return data || []
}

Error Logging & Alerts

Capture and monitor Edge Function errors:

serve(async (req) => {
  try {
    // Main logic
  } catch (error: any) {
    await supabase.from('function_logs').insert({
      function_name: 'rag-chat',
      error_message: error.message,
      error_stack: error.stack,
      created_at: new Date(),
    })
 
    return new Response(JSON.stringify({ error: error.message }), {
      status: 500,
      headers: { 'Content-Type': 'application/json' },
    })
  }
})

What the docs don't tell you: lessons from production

Everything above runs as-is, but once your data grows or real users show up, you'll need adjustments the quickstart never covers. Here they are, in the order I actually hit them.

1. Revisit your pgvector index once row counts grow

IVFFlat with lists = 100 was fine at first. But somewhere around 120,000 documents (up from 10,000), the same query started surfacing noticeably worse results. Measuring recall by hand (how many of my known top-5 came back), it had dropped from ~0.95 to ~0.78.

IVFFlat requires you to scale lists with your data (a rough rule is rows / 1000); leave it alone and the search clusters get too coarse and miss hits. For an indie app where row counts are unpredictable, switching to HNSW — which needs no count-dependent tuning — made operations much simpler.

-- Move from IVFFlat (needs lists re-tuning as rows grow)
-- to HNSW (count-independent, stable recall)
DROP INDEX IF EXISTS documents_embedding_idx;
 
CREATE INDEX documents_embedding_idx
  ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);
 
-- Tune precision/speed per session at query time
SET hnsw.ef_search = 40;  -- default 40; raise to 80-100 only when you need more recall

After the switch, recall recovered to ~0.93 on the same data. The trade-off: roughly 2x index build time and ~30% more storage. For a read-heavy RAG workload, that's an easy trade.

2. The service_role key silently bypasses RLS

This one gave me a scare. If you use a service_role Supabase client inside an Edge Function, the Row Level Security you carefully set up is completely ignored. One day, reading logs, I realized the setup could return one user's conversations mixed into another's.

The fix: verify the request JWT and touch the database through a user-scoped client that carries that JWT. Reserve service_role for admin-only work like embedding generation.

// This bypasses RLS — a multi-tenant accident waiting to happen
const admin = createClient(SUPABASE_URL, SERVICE_ROLE_KEY)
 
// Forward the request's Authorization header
//   -> RLS automatically scopes queries to that user's rows
const userClient = createClient(SUPABASE_URL, ANON_KEY, {
  global: { headers: { Authorization: req.headers.get('Authorization')! } },
})
 
const { data: { user } } = await userClient.auth.getUser()
if (!user) {
  return new Response('Unauthorized', { status: 401 })
}
// All reads/writes via userClient are now confined to this user by RLS

To stay "deny by default," separate your admin and user clients clearly and keep the number of functions that touch service_role small enough to count.

3. Design embedding batches assuming 429s

During the initial bulk load, text-embedding-004 returned 429 (rate limited) almost immediately. Running without a concurrency cap, I plateaued around 1,500 requests per minute. Capping concurrency at 5 and adding exponential backoff let the job run to completion without stalling.

// Cap concurrency and absorb 429s with exponential backoff
async function embedWithRetry(text: string, attempt = 0): Promise<number[]> {
  try {
    const res = await ai.models.embedContent({
      model: 'text-embedding-004',
      contents: text,
    })
    return res.embeddings[0].values
  } catch (e: any) {
    if (e.status === 429 && attempt < 5) {
      const waitMs = Math.min(2 ** attempt * 500, 16000) + Math.random() * 300
      await new Promise((r) => setTimeout(r, waitMs))
      return embedWithRetry(text, attempt + 1)
    }
    throw e
  }
}

The key detail is the small random jitter on the retry delay. If several workers back off on the same cycle, they'll all hit 429 again together.

4. Stream from Edge Functions — return early

Supabase Edge Functions (Deno) have an execution-time ceiling, and waiting for Gemini's entire response before returning can get truncated on long generations. Piping tokens out through a ReadableStream as they arrive lowers perceived latency and leaves headroom under the limit. In my RAG chat, time-to-first-token ran around 0.9s median and ~2s at p95 with Flash.

Pre-launch checklist I always run

Is the pgvector index HNSW — or, if IVFFlat, is lists sized to the row count?
Does every DB access in an Edge Function use a user-scoped client (no service_role abuse)?
Do embedding batches have a concurrency cap and backoff?
Are Gemini responses streamed and returned early?
Do errors land in something like function_logs and feed an alert?

What it actually costs

For reference, a small app with around 8,000 monthly active users runs me roughly ¥4,000-6,000 / month combined for Gemini (embeddings + Flash for RAG) and Supabase (Pro plan). I deliberately design the in-app AdMob revenue to cover this infrastructure, so each month I check this number to confirm per-user inference cost stays under ad ARPU.

Embeddings are a one-time cost you reuse, so most of the spend is on the RAG response side. Caching frequent questions, using Flash for summaries, and routing only genuinely hard reasoning to Pro kept costs down 30-40% with no noticeable quality drop.

When the Frontend Is Nuxt 3 — Drawing the Line Between Server Routes and Edge Functions

Everything so far lives on the Supabase side. When you turn this into a real product, many of you will reach for Nuxt 3 (Vue) on the frontend — and the first thing you'll wrestle with is where the Gemini call belongs: a Nuxt server/api/ route, or a Supabase Edge Function. As an indie developer who builds and runs my own apps under Dolice, I once started a project without deciding this up front, and the same RAG logic ended up scattered across both sides until I no longer knew which copy to fix.

The rule is simple: keep the core of RAG (retrieve → build the prompt → call Gemini) on one side. I keep it in the Edge Function and let the Nuxt server route act as a thin, authenticated relay.

Concern	(A) Nuxt relays to the Edge Function	(B) Nuxt calls Gemini directly
Where RAG logic lives	Centralized in the Edge Function	Centralized in the Nuxt server route
How RLS stays in effect	Forward the JWT and it applies automatically	Build a user-scoped client each time on the Nuxt side
Where embeddings run	Can sit alongside in the Edge Function	Needs a separate batch job
Best fit	You may add mobile or other non-Nuxt clients later	The frontend is Nuxt and nothing else

When in doubt I pick (A). With the core in the Edge Function, adding a mobile app or a second frontend later means just hitting the same endpoint.

Keep secrets out of public — the runtimeConfig trap

The first thing that bites people in Nuxt is the exposure scope of environment variables. Keys under runtimeConfig are server-only, but anything under runtimeConfig.public is baked straight into the client bundle. Put GEMINI_API_KEY or Supabase's service_role in public by mistake and they're visible in your build output.

// nuxt.config.ts
export default defineNuxtConfig({
  runtimeConfig: {
    // Server-only — never reaches the client
    geminiApiKey: process.env.GEMINI_API_KEY,
    supabaseServiceRole: process.env.SUPABASE_SERVICE_ROLE_KEY,
    public: {
      // Anything here is exposed to the client
      supabaseUrl: process.env.SUPABASE_URL,
      supabaseAnonKey: process.env.SUPABASE_ANON_KEY, // anon is meant to be public, so this is fine
    },
  },
})

The anon key is designed to be public, so it belongs in public. That turns the whole rule into something simple to audit: just confirm no key other than anon has slipped into public.

Forward the JWT so RLS still applies

The "service_role silently bypasses RLS" point from the Row Level Security section earlier doesn't change just because Nuxt sits in front. Your server route forwards the Supabase JWT it received from the browser straight through to the Edge Function. Skip this and your carefully written RLS no longer scopes data per user.

// server/api/chat.post.ts
export default defineEventHandler(async (event) => {
  const config = useRuntimeConfig()
  const auth = getHeader(event, 'authorization')
  if (!auth) {
    throw createError({ statusCode: 401, statusMessage: 'Unauthorized' })
  }
 
  // Forward the user's JWT to the Edge Function → RLS scopes rows to that user
  const upstream = await fetch(`${config.public.supabaseUrl}/functions/v1/rag-chat`, {
    method: 'POST',
    headers: {
      Authorization: auth, // pass through the JWT from the browser
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(await readBody(event)),
  })
 
  // Pass the stream straight downstream without buffering (next section)
  return sendStream(event, upstream.body!)
})

Keep the relay route's responsibility to authentication and JWT forwarding only. The moment you start writing retrieval or prompt assembly here, you've drifted back to topology (B) and your logic is duplicated.

Don't buffer the stream at the relay

When you stream the RAG response, receiving the whole thing in the Nuxt server route before returning it throws away the "first token arrives fast" win you earned on the Edge Function side. Passing upstream.body (a ReadableStream) straight through with sendStream adds almost no relay overhead, and time-to-first-token felt identical to calling the Edge Function directly.

The trap is slipping an await upstream.text() into the middle to log the output. Read the body in full even once and it stops being a stream — the user just stares at a silent wait. If you need logs, peek with a TransformStream while it flows, or record on the Edge Function side instead.

Wrapping up — your next step

Wire up one RAG chat endpoint on a small dataset, then deliberately grow your documents 10x and measure how recall and response time shift. The issues that only appear when scale changes are exactly the ones that matter in production.

If you put Nuxt in front, decide early to keep the core of RAG on the Edge Function side — that one boundary keeps the whole thing from drifting apart later. I hope this helps anyone trying to take an indie app all the way to production on their own. Thanks for reading.

Thank You for Reading

Gemini Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.