By Malik Chohra
How to Build an AI Chat App Using React Native
Step-by-step guide to building a ChatGPT-like AI chat app in React Native. Learn streaming responses, conversation management, and UI best practices.
Building an AI chat app in React Native: the real implementation
Building a production AI chat app in React Native requires getting four things right: streaming responses over SSE, correct message state modeling, a typing indicator that does not lie to the user, and context window management that does not silently break at conversation turn 30. This guide covers all four, including the Hermes engine caveat that will bite you if you skip it.
I have built this three times across different projects. The first time I shipped polling with a 1-second interval and called it done. Users noticed immediately. The second time I used SSE but got the typing indicator state wrong, so the app showed a pulsing indicator even after the first token had already arrived. The third time I got it right, and this post is the distillation of that third attempt.
REST vs. streaming: why polling is the wrong call on mobile
The first architectural decision is how your app receives AI responses. You have two practical options: send a POST request and wait for the full response body, or open a streaming connection and receive tokens as the model generates them.
For a web app with a spinner, polling is tolerable. On mobile, it breaks the illusion of conversation. A typical Claude or GPT-4 response takes 3-8 seconds to complete. On a REST call, the user stares at a loading state for that entire window, then sees text appear all at once. That does not feel like talking to something intelligent. It feels like waiting for a file download.
Streaming changes the perception of latency fundamentally. The user sees the first token in 300-800ms and watches the response build in real time, the same way ChatGPT, Poe, and Claude.ai mobile all work. The actual time to complete the response is the same or slightly longer due to connection overhead, but the experience is categorically different. Users in native apps especially notice this: iOS and Android users are accustomed to interfaces that respond immediately to input. A multi-second blank wait before any feedback reads as broken.
The implementation uses fetch with ReadableStream. You point at a backend endpoint that proxies to your AI provider and returns a streaming response with Content-Type: text/event-stream. On the client, you consume the stream with a reader loop:
// hooks/useStreamingChat.ts
export function useStreamingChat() {
const { messages, addMessage, updateMessage, setLoading, setError } =
useChatStore();
const sendMessage = useCallback(async (content: string) => {
addMessage({ role: 'user', content });
const assistantId = addMessage({ role: 'assistant', content: '' });
setLoading(true);
setError(null);
try {
const response = await fetch('https://your-api.com/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: messages.map(m => ({ role: m.role, content: m.content })),
}),
});
if (!response.ok || !response.body) throw new Error('Stream failed');
const reader = response.body.getReader();
const decoder = new TextDecoder();
let accumulated = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
// Parse SSE lines — each line is "data: <token>
"
for (const line of chunk.split('
')) {
if (line.startsWith('data: ') && line !== 'data: [DONE]') {
accumulated += line.slice(6);
updateMessage(assistantId, accumulated);
}
}
}
} catch (err) {
setError(err instanceof Error ? err.message : 'Unknown error');
} finally {
setLoading(false);
}
}, [messages, addMessage, updateMessage, setLoading, setError]);
return { sendMessage };
}The Hermes engine caveat you need to know
React Native uses the Hermes JavaScript engine by default since Expo SDK 48 and React Native 0.70. Hermes ships with ReadableStream support, but there is a catch that will surface as a silent hang or a cryptic error if you are not expecting it.
Hermes's fetch implementation does not expose response.body as a proper ReadableStream in all React Native versions. Before React Native 0.73, calling response.body.getReader() on a streamed response would either return null or stall indefinitely. The fix is to use the react-native-fetch-api polyfill alongside react-native-url-polyfill. Install both, then import them at the top of your entry file before any other code runs:
// index.js or App.tsx — must be first imports
import 'react-native-url-polyfill/auto';
import { polyfillGlobal } from 'react-native/Libraries/Utilities/PolyfillFunctions';
import { ReadableStream } from 'web-streams-polyfill/ponyfill';
polyfillGlobal('ReadableStream', () => ReadableStream);On React Native 0.73+ and Expo SDK 50+, the polyfill is less critical because the runtime ships a more complete implementation, but it is still worth including for older device compatibility. The cost is negligible: a few kilobytes in your bundle. The risk of omitting it on an older SDK is that your streaming appears to work in the simulator and breaks on real devices running Hermes.
Apps like ChatGPT for iOS and Claude.ai mobile avoid this entirely by maintaining their own native networking layer or using WebSockets rather than SSE. That is a valid architectural choice at scale, but for most apps the polyfill approach is the right tradeoff: it keeps your networking code in TypeScript and avoids writing a native module.
Message state: modeling conversation history correctly
The message type is deceptively simple. You need an id, a role, and a content string. The part that trips most implementations is that an in-progress streaming message needs to exist in state from the moment you open the connection, before any content has arrived. If you add it only after the stream completes, you cannot update it incrementally.
interface Message {
id: string;
role: 'user' | 'assistant' | 'system';
content: string;
timestamp: number;
isStreaming?: boolean; // true while tokens are arriving
}
// In your Zustand store:
addMessage: (msg) => {
const id = crypto.randomUUID();
set((state) => ({
messages: [...state.messages, { ...msg, id, timestamp: Date.now() }],
}));
return id;
},
updateMessage: (id, content, isStreaming) => {
set((state) => ({
messages: state.messages.map((m) =>
m.id === id ? { ...m, content, isStreaming } : m
),
}));
},The isStreaming flag on the message itself matters. It lets your MessageBubble component decide whether to show a blinking cursor at the end of the text, independently of any global loading state. That distinction becomes important once you support multiple concurrent conversations or message regeneration.
Context window management is the other issue most developers encounter around conversation turn 15 to 20. Every major model has a token limit on the messages array you send. Claude 3 Haiku accepts 200k tokens of context, but you still pay per token, and very long conversation histories produce noticeably slower responses. The practical approach is to send only the last N messages, where N is tuned to your model and use case, and store the full history locally in AsyncStorage or SQLite. For most apps, sending the last 20 messages (10 turns) is the right balance. If your use case requires longer memory, implement a summarization step: after every 20 turns, ask the model to summarize the conversation so far, store that summary, and inject it as a system message in the next window.
// Truncate before sending to the API — keep full history in local state
const CONTEXT_WINDOW = 20;
const messagesToSend = [
{ role: 'system', content: systemPrompt },
...messages
.filter((m) => !m.isStreaming)
.slice(-CONTEXT_WINDOW)
.map((m) => ({ role: m.role, content: m.content })),
];The typing indicator problem: two distinct states
Most AI chat tutorials show a single typing indicator that appears when the user hits send and disappears when the response is complete. That works if you are polling, but it is incorrect for streaming, and the difference is visible to users.
With streaming, there are two distinct phases. Phase one: the request is in flight and no tokens have arrived yet. This is the period where you genuinely do not know if the model is processing, if there is a network issue, or if the backend is slow. Phase two: the first token has arrived and the model is actively generating. The UX for these two phases should be different. In phase one, you want a traditional typing indicator, three animated dots, the same pattern ChatGPT uses. In phase two, the dots should disappear and the content should appear in the message bubble, growing token by token.
The implementation adds a second piece of state: a boolean for whether the first token has been received. Toggle it when you write the first non-empty chunk to the message:
// In your streaming loop
let firstTokenReceived = false;
for (const line of chunk.split('
')) {
if (line.startsWith('data: ') && line !== 'data: [DONE]') {
const token = line.slice(6);
accumulated += token;
updateMessage(assistantId, accumulated, true); // isStreaming = true
if (!firstTokenReceived) {
firstTokenReceived = true;
setFirstTokenReceived(true); // a separate state atom in the store
}
}
}
// In ChatScreen.tsx:
const showTypingIndicator = isLoading && !firstTokenReceived;
// <FlatList ListFooterComponent={showTypingIndicator ? <TypingIndicator /> : null} />Poe and Claude.ai mobile both implement this correctly. You can verify it by sending a long query on a slow connection: the dots appear, then transition smoothly to the growing text without a visible flash. If your dots disappear and the bubble appears simultaneously, you got phase two right but not the transition. If the dots never show, you have a race condition where the first chunk arrives before the typing indicator renders.
Tool calls: when you actually need them on mobile
Function calling (also called tool use) lets you give the model a set of typed actions it can invoke, and your app executes them. Most chat tutorials skip this entirely, but for any app where the AI needs to interact with device or backend data, tool calls are the correct architecture. Without them, you end up parsing free-form model output with regex, which breaks on every model update.
In a mobile context, the canonical tool call use cases are: booking and reservation actions (the model identifies a date and service the user wants and calls your booking API), search (retrieving live data the model was not trained on), and calendar access (reading or writing events via the device calendar API). A healthcare app might give the model a lookup_medication tool that queries a drug interaction database. A travel app might give it a search_flights tool.
Here is a minimal example with the Claude API. You pass a tools array in your request. When the model wants to invoke a tool, it returns a response with stop_reason: "tool_use" and a structured tool_use block in the content. You execute the tool, append the result as a user message with role tool_result, and send the conversation back to get the final response:
// Backend: pass tools to Claude
const tools = [
{
name: "search_calendar",
description: "Search the user's calendar for events in a date range",
input_schema: {
type: "object",
properties: {
start_date: { type: "string", description: "ISO 8601 date" },
end_date: { type: "string", description: "ISO 8601 date" },
},
required: ["start_date", "end_date"],
},
},
];
// If stop_reason === 'tool_use', extract and run the tool:
const toolUse = response.content.find((b) => b.type === 'tool_use');
const calendarResult = await searchUserCalendar(toolUse.input);
// Append the result and re-submit to get the final streamed answer
messages.push({
role: 'user',
content: [{
type: 'tool_result',
tool_use_id: toolUse.id,
content: JSON.stringify(calendarResult),
}],
});On the React Native side, this means your sendMessage function needs to handle a possible round-trip: detect a tool call response, run the tool, re-submit, and then stream the final reply. The UI pattern for this is a subtle intermediate state message like "Checking your calendar..." between the user message and the final streamed response. ChatGPT does this with its browsing and code interpreter indicators. Users understand the pattern without needing it explained.
What ChatGPT, Poe, and Claude.ai get right on native
Studying these apps before writing your own saves a lot of iteration. The patterns they share are worth copying directly.
All three use KeyboardAvoidingView with behavior="padding" on iOS and behavior="height" on Android, and all three set a keyboardVerticalOffset that accounts for the navigation header. This is table stakes but easy to get wrong: if the input field slides under the keyboard on Android or overlaps the home indicator on iPhone 15 Pro, the app reads as unfinished regardless of how good the AI integration is.
Poe and Claude.ai both scroll the message list to the bottom incrementally as tokens arrive, not just when the response completes. This is more complex to implement correctly because you need to distinguish between user-initiated scroll (do not auto-scroll) and AI token arrival (do auto-scroll). The standard pattern is to track whether the user has manually scrolled up, and if they have, show a "scroll to bottom" button rather than forcing the scroll position back down.
ChatGPT on mobile handles markdown rendering inside message bubbles during streaming. This is a significant UX lift: it requires a markdown parser that can handle partial content without crashing on incomplete syntax like an unclosed code fence. Libraries like react-native-markdown-display work well for completed messages. For streaming, the pragmatic approach is to render plain text while tokens are arriving and switch to the markdown renderer once the isStreaming flag clears.
How AI Mobile Launcher handles this out of the box
Setting up streaming, the Hermes polyfill, state management, context truncation, and the typing indicator transitions from scratch takes 3-4 days of focused work. That estimate is not padded: the polyfill issue alone takes half a day to diagnose and fix if you have not seen it before, and getting the typing indicator state machine right, including the edge case where a user sends a second message before the first stream completes, adds another day.
AI Mobile Launcher ships with the Claude API integration and streaming already wired. The polyfill setup is in the entry file. The message store uses the two-phase typing indicator pattern described above. Context truncation is configured with a sensible default that you can adjust per conversation. The backend proxy endpoint is included so you do not expose your API key in the mobile bundle.
The practical effect is that your first working streaming chat screen takes an afternoon instead of a week. You start from a codebase where the hard parts are already solved and tested, and you spend your time on the product logic specific to your app: the system prompt, the tool definitions, the conversation persistence strategy, and the UI treatment that makes your chat experience distinct.
If you are building a chat feature into an existing app rather than a standalone chat product, the module structure in the boilerplate makes it straightforward to extract just the chat layer and drop it into your navigation tree. The store is self-contained and does not assume a specific navigation library or routing setup.
Common mistakes and how to avoid them
Forgetting to abort in-flight requests when the component unmounts or when the user navigates away is the most common production bug in this architecture. Use an AbortController and pass its signal to fetch. Cancel the controller in the cleanup function of your effect or in the store teardown. Without this, you will see state updates on unmounted components and warnings that are difficult to trace back to the streaming hook.
Sending the full message history including in-progress streaming messages to the API is another common mistake. Always filter your messages array before constructing the API payload: only include messages where isStreaming is false or undefined. Sending a partial assistant message as context confuses models and produces degraded responses in the same conversation thread.
Not handling the case where the server closes the stream early due to a rate limit or timeout will leave the app in a loading state indefinitely. Always set a maximum timeout on your fetch call, and treat a premature stream close, where the reader resolves with done: true but the accumulated content is still empty, as an error rather than a successful empty response.
The implementation details in this guide reflect what shipping this feature multiple times looks like after finding the failure modes in production. The streaming and Hermes setup described here is correct for Expo SDK 50+ and React Native 0.73+. If you are on an older version, the polyfill section is more critical, not less.