How to Build a Mobile RAG Application in React Native
Complete guide to building Retrieval Augmented Generation (RAG) apps in React Native. Learn document processing, embeddings, vector search, and AI-powered Q&A for mobile devices.
Related reading
How to Integrate AI Into a React Native App (2025 Guide)
Step-by-step guide to integrating AI features into React Native apps. Learn how to add ChatGPT, Claude, and other AI providers with streaming responses, error handling, and production-ready patterns.
Why AI Starter Kits Will Replace Traditional Boilerplates
Traditional mobile boilerplates are becoming obsolete. Discover why AI-powered starter kits with pre-built modules, intelligent features, and plug-and-play architecture are the future of mobile development.
Expo vs React Native CLI — Which One to Use for AI Apps?
Compare Expo and React Native CLI for building AI-powered mobile apps. Learn which framework offers better performance, faster development, and easier AI integration for your next project.
Building a mobile RAG (Retrieval Augmented Generation) application requires four core components: document upload and parsing (PDF, DOCX, TXT), text chunking into searchable segments with embeddings generation, vector storage for semantic search (Supabase pgvector or local storage), and an AI chat interface that retrieves relevant context before generating responses. This architecture enables mobile apps to answer questions from custom documents with accurate, cited information.
What is RAG and why build it on mobile?
RAG (Retrieval Augmented Generation) is an AI technique that combines document retrieval with language models. Instead of relying solely on an AI's training data, RAG apps:
- Let users upload their own documents (contracts, manuals, research papers)
- Convert documents into searchable embeddings
- Find relevant sections when users ask questions
- Feed those sections to the AI as context
- Generate accurate answers with source citations
Why mobile RAG matters
Mobile RAG applications solve real problems:
- Healthcare: Doctors query patient records and medical literature on-the-go
- Legal: Lawyers search case files and legal documents from their phone
- Education: Students chat with their textbooks and lecture notes
- Enterprise: Sales teams access product documentation during client meetings
- Personal: Users build second brains from their personal documents
Traditional mobile apps can't do this. They either don't have AI, or they use generic AI without your specific documents.
RAG architecture for mobile apps
A production-ready mobile RAG system has these components:
1. Document ingestion pipeline
- File upload from device (DocumentPicker, Camera, File System)
- Document parsing (PDF → text, DOCX → text, images → OCR)
- Text preprocessing (remove formatting, handle special characters)
2. Chunking and embedding generation
- Split documents into semantic chunks (usually 500-1500 characters)
- Generate vector embeddings for each chunk (OpenAI, Voyage, Cohere)
- Store embeddings with metadata (source document, page number, timestamp)
3. Vector storage and search
- Database with vector similarity search (Supabase pgvector, Pinecone, local SQLite + vector extension)
- Hybrid search combining semantic similarity + keyword matching
- Ranking algorithm to surface the most relevant chunks
4. AI chat interface with context injection
- User asks a question
- System searches vector database for relevant chunks
- Top chunks are injected into the AI prompt as context
- AI generates an answer based on the retrieved context
- Citations link back to source documents
Step-by-step: Building mobile RAG in React Native
Step 1: Document upload and parsing
Start with file selection using Expo's DocumentPicker:
import * as DocumentPicker from 'expo-document-picker';
async function uploadDocument() {
const result = await DocumentPicker.getDocumentAsync({
type: ['application/pdf', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'text/plain'],
copyToCacheDirectory: true,
});
if (result.type === 'success') {
// Send to backend for processing
await processDocument(result.uri, result.name, result.mimeType);
}
}
async function processDocument(uri: string, name: string, mimeType: string) {
const formData = new FormData();
formData.append('file', {
uri,
name,
type: mimeType,
});
const response = await fetch('https://your-api.com/api/documents/upload', {
method: 'POST',
body: formData,
headers: {
'Authorization': `Bearer ${userToken}`,
},
});
const data = await response.json();
console.log('Document processed:', data.documentId);
}Step 2: Backend document processing
On the backend, extract text from documents:
// /api/documents/upload/route.ts
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { DocxLoader } from '@langchain/community/document_loaders/fs/docx';
export async function POST(req: Request) {
const formData = await req.formData();
const file = formData.get('file') as File;
let text = '';
// Parse based on file type
if (file.type === 'application/pdf') {
const loader = new PDFLoader(file);
const docs = await loader.load();
text = docs.map(doc => doc.pageContent).join('\n');
} else if (file.type === 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') {
const loader = new DocxLoader(file);
const docs = await loader.load();
text = docs[0].pageContent;
} else if (file.type === 'text/plain') {
text = await file.text();
}
// Store raw text and proceed to chunking
const documentId = await saveDocument({
userId: req.user.id,
filename: file.name,
text: text,
});
// Process in background: chunk and embed
await processDocumentEmbeddings(documentId);
return Response.json({ documentId, status: 'processing' });
}Step 3: Text chunking with overlap
Split documents into chunks that preserve context:
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
async function chunkDocument(text: string) {
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // ~1000 characters per chunk
chunkOverlap: 200, // 200 character overlap between chunks
separators: ['\n\n', '\n', '. ', ' ', ''],
});
const chunks = await splitter.splitText(text);
return chunks;
}
// Why overlap matters:
// Without overlap:
// Chunk 1: "...the company was founded in 1998"
// Chunk 2: "The founders were John and Jane..."
// Question: "When was the company founded?" → might miss context
// With overlap:
// Chunk 1: "...the company was founded in 1998. The founders..."
// Chunk 2: "...founded in 1998. The founders were John and Jane..."
// Question: "When was the company founded?" → both chunks have contextStep 4: Generate embeddings
Convert chunks into vector embeddings for semantic search:
import OpenAI from 'openai';
async function generateEmbeddings(chunks: string[]) {
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const embeddings = await Promise.all(
chunks.map(async (chunk, index) => {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small', // Cheaper and faster
input: chunk,
});
return {
chunkIndex: index,
embedding: response.data[0].embedding,
text: chunk,
};
})
);
return embeddings;
}
// Alternative: Use Voyage AI for better accuracy
import { VoyageEmbeddings } from '@langchain/community/embeddings/voyage';
async function generateVoyageEmbeddings(chunks: string[]) {
const embeddings = new VoyageEmbeddings({
apiKey: process.env.VOYAGE_API_KEY,
model: 'voyage-2',
});
const vectors = await embeddings.embedDocuments(chunks);
return vectors;
}Step 5: Store embeddings in vector database
Use Supabase with pgvector extension for vector storage:
// First, enable pgvector in Supabase:
// Run this SQL in Supabase SQL Editor:
create extension if not exists vector;
create table document_embeddings (
id bigserial primary key,
user_id uuid references auth.users not null,
document_id bigint references documents not null,
chunk_index int not null,
content text not null,
embedding vector(1536), -- 1536 for OpenAI text-embedding-3-small
created_at timestamptz default now()
);
create index on document_embeddings using ivfflat (embedding vector_cosine_ops);
// Store embeddings:
async function storeEmbeddings(documentId: number, embeddings: any[]) {
const supabase = createClient();
const rows = embeddings.map(emb => ({
user_id: userId,
document_id: documentId,
chunk_index: emb.chunkIndex,
content: emb.text,
embedding: emb.embedding,
}));
const { error } = await supabase
.from('document_embeddings')
.insert(rows);
if (error) throw error;
}Step 6: Semantic search for relevant chunks
When a user asks a question, find relevant document chunks:
async function searchDocuments(query: string, userId: string, topK: number = 5) {
// 1. Generate embedding for the query
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
});
const embedding = queryEmbedding.data[0].embedding;
// 2. Search for similar chunks using cosine similarity
const supabase = createClient();
const { data, error } = await supabase.rpc('match_documents', {
query_embedding: embedding,
match_threshold: 0.78, // Minimum similarity score
match_count: topK,
user_id_filter: userId,
});
return data; // Returns top K most similar chunks
}
// SQL function for vector similarity search:
create or replace function match_documents (
query_embedding vector(1536),
match_threshold float,
match_count int,
user_id_filter uuid
)
returns table (
id bigint,
document_id bigint,
content text,
similarity float
)
language sql stable
as $$
select
id,
document_id,
content,
1 - (embedding <=> query_embedding) as similarity
from document_embeddings
where user_id = user_id_filter
and 1 - (embedding <=> query_embedding) > match_threshold
order by embedding <=> query_embedding
limit match_count;
$$;Step 7: Build RAG chat interface in React Native
Create a chat UI that uses retrieved context:
async function askQuestion(question: string) {
// 1. Search for relevant document chunks
const relevantChunks = await fetch('https://your-api.com/api/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query: question }),
}).then(res => res.json());
// 2. Send question + context to AI
const response = await fetch('https://your-api.com/api/chat-rag', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
question: question,
context: relevantChunks,
}),
});
// 3. Stream the AI response
const reader = response.body.getReader();
const decoder = new TextDecoder();
let answer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
answer += decoder.decode(value);
setMessages(prev => [...prev.slice(0, -1), {
role: 'assistant',
content: answer,
sources: relevantChunks, // Include sources for citations
}]);
}
}Step 8: Backend RAG endpoint with context injection
// /api/chat-rag/route.ts
export async function POST(req: Request) {
const { question, context } = await req.json();
// Build prompt with retrieved context
const contextText = context
.map((chunk, i) => `[Source ${i+1}] ${chunk.content}`)
.join('\n\n');
const prompt = `Answer the following question based ONLY on the provided context. If the answer is not in the context, say "I don't have enough information to answer that."
Context:
${contextText}
Question: ${question}
Answer:`;
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
stream: true,
});
// Stream response back to client
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content || '';
controller.enqueue(encoder.encode(text));
}
controller.close();
},
});
return new Response(readable);
}Advanced RAG techniques for mobile
1. Hybrid search (semantic + keyword)
Combine vector similarity with full-text search for better results:
// Add full-text search index to your documents table
create index document_content_fts on document_embeddings using gin(to_tsvector('english', content));
// Hybrid search function
async function hybridSearch(query: string, userId: string) {
// Vector search
const semanticResults = await vectorSearch(query, userId, 10);
// Keyword search
const { data: keywordResults } = await supabase
.from('document_embeddings')
.select('*')
.textSearch('content', query)
.eq('user_id', userId)
.limit(10);
// Combine and re-rank results
const combined = mergeAndRank(semanticResults, keywordResults);
return combined.slice(0, 5);
}2. Re-ranking with cross-encoders
After retrieval, re-rank results for better relevance:
import { pipeline } from '@xenova/transformers';
async function rerank(query: string, chunks: any[]) {
const reranker = await pipeline('text-classification', 'cross-encoder/ms-marco-MiniLM-L-6-v2');
const scores = await Promise.all(
chunks.map(async chunk => {
const result = await reranker(query, chunk.content);
return { ...chunk, score: result[0].score };
})
);
return scores.sort((a, b) => b.score - a.score);
}3. Multi-query retrieval
Generate multiple variations of the user's question for better coverage:
async function multiQueryRetrieval(userQuery: string) {
// Use AI to generate query variations
const variations = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{
role: 'user',
content: `Generate 3 different ways to ask this question: "${userQuery}"`,
}],
});
const queries = [userQuery, ...parseVariations(variations)];
// Search with all queries
const allResults = await Promise.all(
queries.map(q => vectorSearch(q, userId, 3))
);
// Deduplicate and merge
return deduplicateChunks(allResults.flat());
}4. Local embeddings for offline RAG
Run embeddings on-device for privacy and offline support:
import { pipeline } from '@xenova/transformers';
async function generateLocalEmbeddings(text: string) {
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const embeddings = await embedder(text, { pooling: 'mean', normalize: true });
return Array.from(embeddings.data);
}
// Store in local SQLite with vector extension
import * as SQLite from 'expo-sqlite';
async function storeLocalEmbedding(text: string, embedding: number[]) {
const db = await SQLite.openDatabaseAsync('rag.db');
await db.execAsync(`
CREATE TABLE IF NOT EXISTS embeddings (
id INTEGER PRIMARY KEY,
content TEXT,
embedding BLOB
)
`);
await db.runAsync(
'INSERT INTO embeddings (content, embedding) VALUES (?, ?)',
[text, JSON.stringify(embedding)]
);
}RAG performance optimization for mobile
1. Lazy loading embeddings
Don't load all embeddings into memory. Query on-demand from the database.
2. Caching search results
Cache frequent queries to avoid repeated vector searches:
const searchCache = new Map();
async function cachedSearch(query: string) {
const cacheKey = hashQuery(query);
if (searchCache.has(cacheKey)) {
return searchCache.get(cacheKey);
}
const results = await vectorSearch(query);
searchCache.set(cacheKey, results);
return results;
}3. Progressive chunk loading
Start with top 3 chunks, load more if AI needs additional context.
4. Batch embedding generation
Generate embeddings in batches to reduce API calls:
async function batchGenerateEmbeddings(chunks: string[], batchSize: number = 100) {
const batches = [];
for (let i = 0; i < chunks.length; i += batchSize) {
batches.push(chunks.slice(i, i + batchSize));
}
const allEmbeddings = [];
for (const batch of batches) {
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch,
});
allEmbeddings.push(...embeddings.data.map(d => d.embedding));
}
return allEmbeddings;
}Common RAG challenges in mobile apps
Challenge #1: Large document processing
Problem: 500-page PDFs take minutes to process on mobile.
Solution: Process documents on the backend. Show progress indicators in the app.
Challenge #2: Context window limits
Problem: Can only fit 3-5 chunks in the AI prompt (with GPT-4o's 128K limit).
Solution: Use hierarchical chunking. First retrieve sections, then drill down to specific paragraphs.
Challenge #3: Inconsistent chunk relevance
Problem: Vector search sometimes returns irrelevant chunks.
Solution: Set minimum similarity thresholds (0.75+). Use hybrid search. Implement re-ranking.
Challenge #4: Multi-document queries
Problem: User asks "Compare document A and B"
Solution: Allow filtering by document ID in vector search. Retrieve from both documents separately, then combine contexts.
How AI Mobile Launcher simplifies RAG development
Building production RAG from scratch takes 6-10 weeks. AI Mobile Launcher's RAG Pack provides everything:
Document Processing Module
- Support for PDF, DOCX, TXT, images (with OCR)
- Automatic chunking with configurable strategies
- Progress indicators during processing
- Error handling for corrupted files
Embeddings Management
- Multi-provider support (OpenAI, Voyage, Cohere, local)
- Batch processing for cost efficiency
- Automatic retries on failures
- Cost tracking per document
Vector Search Engine
- Supabase pgvector integration out-of-box
- Hybrid search (semantic + keyword)
- Re-ranking for improved relevance
- Multi-query retrieval
RAG Chat UI
- Beautiful mobile-optimized interface
- Source citations with document links
- Context highlighting
- Document filtering and management
Deploy a production RAG app in days, not months. Every module is battle-tested and production-ready.
For Developers: Get AI Mobile Launcher's RAG Pack and skip months of implementation. Document processing, embeddings, vector search, and chat UI—all ready to use.
For Founders: Building a document Q&A or knowledge base app? CasaInnov delivers custom RAG applications using AI Mobile Launcher in 6-8 weeks with fixed-price contracts.