Back to Blog
AITutorialsRN

How to Build a Mobile RAG Application in React Native

Complete guide to building Retrieval Augmented Generation (RAG) apps in React Native. Learn document processing, embeddings, vector search, and AI-powered Q&A for mobile devices.

Building a mobile RAG (Retrieval Augmented Generation) application requires four core components: document upload and parsing (PDF, DOCX, TXT), text chunking into searchable segments with embeddings generation, vector storage for semantic search (Supabase pgvector or local storage), and an AI chat interface that retrieves relevant context before generating responses. This architecture enables mobile apps to answer questions from custom documents with accurate, cited information.

What is RAG and why build it on mobile?

RAG (Retrieval Augmented Generation) is an AI technique that combines document retrieval with language models. Instead of relying solely on an AI's training data, RAG apps:

  • Let users upload their own documents (contracts, manuals, research papers)
  • Convert documents into searchable embeddings
  • Find relevant sections when users ask questions
  • Feed those sections to the AI as context
  • Generate accurate answers with source citations

Why mobile RAG matters

Mobile RAG applications solve real problems:

  • Healthcare: Doctors query patient records and medical literature on-the-go
  • Legal: Lawyers search case files and legal documents from their phone
  • Education: Students chat with their textbooks and lecture notes
  • Enterprise: Sales teams access product documentation during client meetings
  • Personal: Users build second brains from their personal documents

Traditional mobile apps can't do this. They either don't have AI, or they use generic AI without your specific documents.

RAG architecture for mobile apps

A production-ready mobile RAG system has these components:

1. Document ingestion pipeline

  • File upload from device (DocumentPicker, Camera, File System)
  • Document parsing (PDF → text, DOCX → text, images → OCR)
  • Text preprocessing (remove formatting, handle special characters)

2. Chunking and embedding generation

  • Split documents into semantic chunks (usually 500-1500 characters)
  • Generate vector embeddings for each chunk (OpenAI, Voyage, Cohere)
  • Store embeddings with metadata (source document, page number, timestamp)

3. Vector storage and search

  • Database with vector similarity search (Supabase pgvector, Pinecone, local SQLite + vector extension)
  • Hybrid search combining semantic similarity + keyword matching
  • Ranking algorithm to surface the most relevant chunks

4. AI chat interface with context injection

  • User asks a question
  • System searches vector database for relevant chunks
  • Top chunks are injected into the AI prompt as context
  • AI generates an answer based on the retrieved context
  • Citations link back to source documents

Step-by-step: Building mobile RAG in React Native

Step 1: Document upload and parsing

Start with file selection using Expo's DocumentPicker:

import * as DocumentPicker from 'expo-document-picker';

async function uploadDocument() {
  const result = await DocumentPicker.getDocumentAsync({
    type: ['application/pdf', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'text/plain'],
    copyToCacheDirectory: true,
  });

  if (result.type === 'success') {
    // Send to backend for processing
    await processDocument(result.uri, result.name, result.mimeType);
  }
}

async function processDocument(uri: string, name: string, mimeType: string) {
  const formData = new FormData();
  formData.append('file', {
    uri,
    name,
    type: mimeType,
  });

  const response = await fetch('https://your-api.com/api/documents/upload', {
    method: 'POST',
    body: formData,
    headers: {
      'Authorization': `Bearer ${userToken}`,
    },
  });

  const data = await response.json();
  console.log('Document processed:', data.documentId);
}

Step 2: Backend document processing

On the backend, extract text from documents:

// /api/documents/upload/route.ts
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { DocxLoader } from '@langchain/community/document_loaders/fs/docx';

export async function POST(req: Request) {
  const formData = await req.formData();
  const file = formData.get('file') as File;

  let text = '';

  // Parse based on file type
  if (file.type === 'application/pdf') {
    const loader = new PDFLoader(file);
    const docs = await loader.load();
    text = docs.map(doc => doc.pageContent).join('\n');
  } else if (file.type === 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') {
    const loader = new DocxLoader(file);
    const docs = await loader.load();
    text = docs[0].pageContent;
  } else if (file.type === 'text/plain') {
    text = await file.text();
  }

  // Store raw text and proceed to chunking
  const documentId = await saveDocument({
    userId: req.user.id,
    filename: file.name,
    text: text,
  });

  // Process in background: chunk and embed
  await processDocumentEmbeddings(documentId);

  return Response.json({ documentId, status: 'processing' });
}

Step 3: Text chunking with overlap

Split documents into chunks that preserve context:

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

async function chunkDocument(text: string) {
  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,        // ~1000 characters per chunk
    chunkOverlap: 200,       // 200 character overlap between chunks
    separators: ['\n\n', '\n', '. ', ' ', ''],
  });

  const chunks = await splitter.splitText(text);
  return chunks;
}

// Why overlap matters:
// Without overlap:
// Chunk 1: "...the company was founded in 1998"
// Chunk 2: "The founders were John and Jane..."
// Question: "When was the company founded?" → might miss context

// With overlap:
// Chunk 1: "...the company was founded in 1998. The founders..."
// Chunk 2: "...founded in 1998. The founders were John and Jane..."
// Question: "When was the company founded?" → both chunks have context

Step 4: Generate embeddings

Convert chunks into vector embeddings for semantic search:

import OpenAI from 'openai';

async function generateEmbeddings(chunks: string[]) {
  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  });

  const embeddings = await Promise.all(
    chunks.map(async (chunk, index) => {
      const response = await openai.embeddings.create({
        model: 'text-embedding-3-small', // Cheaper and faster
        input: chunk,
      });

      return {
        chunkIndex: index,
        embedding: response.data[0].embedding,
        text: chunk,
      };
    })
  );

  return embeddings;
}

// Alternative: Use Voyage AI for better accuracy
import { VoyageEmbeddings } from '@langchain/community/embeddings/voyage';

async function generateVoyageEmbeddings(chunks: string[]) {
  const embeddings = new VoyageEmbeddings({
    apiKey: process.env.VOYAGE_API_KEY,
    model: 'voyage-2',
  });

  const vectors = await embeddings.embedDocuments(chunks);
  return vectors;
}

Step 5: Store embeddings in vector database

Use Supabase with pgvector extension for vector storage:

// First, enable pgvector in Supabase:
// Run this SQL in Supabase SQL Editor:
create extension if not exists vector;

create table document_embeddings (
  id bigserial primary key,
  user_id uuid references auth.users not null,
  document_id bigint references documents not null,
  chunk_index int not null,
  content text not null,
  embedding vector(1536), -- 1536 for OpenAI text-embedding-3-small
  created_at timestamptz default now()
);

create index on document_embeddings using ivfflat (embedding vector_cosine_ops);

// Store embeddings:
async function storeEmbeddings(documentId: number, embeddings: any[]) {
  const supabase = createClient();

  const rows = embeddings.map(emb => ({
    user_id: userId,
    document_id: documentId,
    chunk_index: emb.chunkIndex,
    content: emb.text,
    embedding: emb.embedding,
  }));

  const { error } = await supabase
    .from('document_embeddings')
    .insert(rows);

  if (error) throw error;
}

Step 6: Semantic search for relevant chunks

When a user asks a question, find relevant document chunks:

async function searchDocuments(query: string, userId: string, topK: number = 5) {
  // 1. Generate embedding for the query
  const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
  const queryEmbedding = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });

  const embedding = queryEmbedding.data[0].embedding;

  // 2. Search for similar chunks using cosine similarity
  const supabase = createClient();
  const { data, error } = await supabase.rpc('match_documents', {
    query_embedding: embedding,
    match_threshold: 0.78, // Minimum similarity score
    match_count: topK,
    user_id_filter: userId,
  });

  return data; // Returns top K most similar chunks
}

// SQL function for vector similarity search:
create or replace function match_documents (
  query_embedding vector(1536),
  match_threshold float,
  match_count int,
  user_id_filter uuid
)
returns table (
  id bigint,
  document_id bigint,
  content text,
  similarity float
)
language sql stable
as $$
  select
    id,
    document_id,
    content,
    1 - (embedding <=> query_embedding) as similarity
  from document_embeddings
  where user_id = user_id_filter
    and 1 - (embedding <=> query_embedding) > match_threshold
  order by embedding <=> query_embedding
  limit match_count;
$$;

Step 7: Build RAG chat interface in React Native

Create a chat UI that uses retrieved context:

async function askQuestion(question: string) {
  // 1. Search for relevant document chunks
  const relevantChunks = await fetch('https://your-api.com/api/search', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query: question }),
  }).then(res => res.json());

  // 2. Send question + context to AI
  const response = await fetch('https://your-api.com/api/chat-rag', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      question: question,
      context: relevantChunks,
    }),
  });

  // 3. Stream the AI response
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let answer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    answer += decoder.decode(value);
    setMessages(prev => [...prev.slice(0, -1), {
      role: 'assistant',
      content: answer,
      sources: relevantChunks, // Include sources for citations
    }]);
  }
}

Step 8: Backend RAG endpoint with context injection

// /api/chat-rag/route.ts
export async function POST(req: Request) {
  const { question, context } = await req.json();

  // Build prompt with retrieved context
  const contextText = context
    .map((chunk, i) => `[Source ${i+1}] ${chunk.content}`)
    .join('\n\n');

  const prompt = `Answer the following question based ONLY on the provided context. If the answer is not in the context, say "I don't have enough information to answer that."

Context:
${contextText}

Question: ${question}

Answer:`;

  const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
    stream: true,
  });

  // Stream response back to client
  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || '';
        controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(readable);
}

Advanced RAG techniques for mobile

1. Hybrid search (semantic + keyword)

Combine vector similarity with full-text search for better results:

// Add full-text search index to your documents table
create index document_content_fts on document_embeddings using gin(to_tsvector('english', content));

// Hybrid search function
async function hybridSearch(query: string, userId: string) {
  // Vector search
  const semanticResults = await vectorSearch(query, userId, 10);

  // Keyword search
  const { data: keywordResults } = await supabase
    .from('document_embeddings')
    .select('*')
    .textSearch('content', query)
    .eq('user_id', userId)
    .limit(10);

  // Combine and re-rank results
  const combined = mergeAndRank(semanticResults, keywordResults);
  return combined.slice(0, 5);
}

2. Re-ranking with cross-encoders

After retrieval, re-rank results for better relevance:

import { pipeline } from '@xenova/transformers';

async function rerank(query: string, chunks: any[]) {
  const reranker = await pipeline('text-classification', 'cross-encoder/ms-marco-MiniLM-L-6-v2');

  const scores = await Promise.all(
    chunks.map(async chunk => {
      const result = await reranker(query, chunk.content);
      return { ...chunk, score: result[0].score };
    })
  );

  return scores.sort((a, b) => b.score - a.score);
}

3. Multi-query retrieval

Generate multiple variations of the user's question for better coverage:

async function multiQueryRetrieval(userQuery: string) {
  // Use AI to generate query variations
  const variations = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{
      role: 'user',
      content: `Generate 3 different ways to ask this question: "${userQuery}"`,
    }],
  });

  const queries = [userQuery, ...parseVariations(variations)];

  // Search with all queries
  const allResults = await Promise.all(
    queries.map(q => vectorSearch(q, userId, 3))
  );

  // Deduplicate and merge
  return deduplicateChunks(allResults.flat());
}

4. Local embeddings for offline RAG

Run embeddings on-device for privacy and offline support:

import { pipeline } from '@xenova/transformers';

async function generateLocalEmbeddings(text: string) {
  const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
  const embeddings = await embedder(text, { pooling: 'mean', normalize: true });
  return Array.from(embeddings.data);
}

// Store in local SQLite with vector extension
import * as SQLite from 'expo-sqlite';

async function storeLocalEmbedding(text: string, embedding: number[]) {
  const db = await SQLite.openDatabaseAsync('rag.db');
  await db.execAsync(`
    CREATE TABLE IF NOT EXISTS embeddings (
      id INTEGER PRIMARY KEY,
      content TEXT,
      embedding BLOB
    )
  `);

  await db.runAsync(
    'INSERT INTO embeddings (content, embedding) VALUES (?, ?)',
    [text, JSON.stringify(embedding)]
  );
}

RAG performance optimization for mobile

1. Lazy loading embeddings

Don't load all embeddings into memory. Query on-demand from the database.

2. Caching search results

Cache frequent queries to avoid repeated vector searches:

const searchCache = new Map();

async function cachedSearch(query: string) {
  const cacheKey = hashQuery(query);
  if (searchCache.has(cacheKey)) {
    return searchCache.get(cacheKey);
  }

  const results = await vectorSearch(query);
  searchCache.set(cacheKey, results);
  return results;
}

3. Progressive chunk loading

Start with top 3 chunks, load more if AI needs additional context.

4. Batch embedding generation

Generate embeddings in batches to reduce API calls:

async function batchGenerateEmbeddings(chunks: string[], batchSize: number = 100) {
  const batches = [];
  for (let i = 0; i < chunks.length; i += batchSize) {
    batches.push(chunks.slice(i, i + batchSize));
  }

  const allEmbeddings = [];
  for (const batch of batches) {
    const embeddings = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch,
    });
    allEmbeddings.push(...embeddings.data.map(d => d.embedding));
  }

  return allEmbeddings;
}

Common RAG challenges in mobile apps

Challenge #1: Large document processing

Problem: 500-page PDFs take minutes to process on mobile.

Solution: Process documents on the backend. Show progress indicators in the app.

Challenge #2: Context window limits

Problem: Can only fit 3-5 chunks in the AI prompt (with GPT-4o's 128K limit).

Solution: Use hierarchical chunking. First retrieve sections, then drill down to specific paragraphs.

Challenge #3: Inconsistent chunk relevance

Problem: Vector search sometimes returns irrelevant chunks.

Solution: Set minimum similarity thresholds (0.75+). Use hybrid search. Implement re-ranking.

Challenge #4: Multi-document queries

Problem: User asks "Compare document A and B"

Solution: Allow filtering by document ID in vector search. Retrieve from both documents separately, then combine contexts.

How AI Mobile Launcher simplifies RAG development

Building production RAG from scratch takes 6-10 weeks. AI Mobile Launcher's RAG Pack provides everything:

Document Processing Module

  • Support for PDF, DOCX, TXT, images (with OCR)
  • Automatic chunking with configurable strategies
  • Progress indicators during processing
  • Error handling for corrupted files

Embeddings Management

  • Multi-provider support (OpenAI, Voyage, Cohere, local)
  • Batch processing for cost efficiency
  • Automatic retries on failures
  • Cost tracking per document

Vector Search Engine

  • Supabase pgvector integration out-of-box
  • Hybrid search (semantic + keyword)
  • Re-ranking for improved relevance
  • Multi-query retrieval

RAG Chat UI

  • Beautiful mobile-optimized interface
  • Source citations with document links
  • Context highlighting
  • Document filtering and management

Deploy a production RAG app in days, not months. Every module is battle-tested and production-ready.

For Developers: Get AI Mobile Launcher's RAG Pack and skip months of implementation. Document processing, embeddings, vector search, and chat UI—all ready to use.

For Founders: Building a document Q&A or knowledge base app? CasaInnov delivers custom RAG applications using AI Mobile Launcher in 6-8 weeks with fixed-price contracts.