Multimodal AI + RevenueCat + Supabase in a React Native boilerplate
Does AI Mobile Launcher support this combo?
Yes. The AI Pro tier ships a multimodal analyser: audio, image, and video, all powered by Gemini in a single REST endpoint. Two screens ship ready: AiAnalyserInputScreen (camera / mic / video picker) and AiAnalyserResultScreen (formatted analysis output). Prompts live in src/features/ai-analyser/prompts/. Speech-to-text is handled by expo-speech-recognition (native), not by a cloud Whisper call.
The stack
- React Native 0.83.6 + Expo SDK 55.0.17
- Gemini multimodal:
src/features/gen-ui/api/gemini.api.ts(image + audio + video in one call) - Analyser services + prompts:
src/features/ai-analyser/services/,prompts/ - Camera:
expo-camera - Audio:
expo-av+expo-speech-recognition - Video:
expo-video+expo-image-picker - RevenueCat + Supabase wired as default
Setup in five steps
1. Clone AI Pro
git clone <ai-pro-tier-repo>
cd ai-mobile-launcher
pnpm install2. Gemini key
# .env
GOOGLE_GEMINI_API_KEY=AIza...
GEMINI_MODEL=gemini-2.0-flash # multimodal-capable3. Customize the analyser prompts
// src/features/ai-analyser/prompts/image.prompt.ts
// Replace with your domain-specific prompt:
// "Identify the plant species in this image. Return JSON: { species, confidence, careTips[] }."
// Zod schema in src/features/ai-analyser/schemas/ validates the parse.4. Supabase + RevenueCat env
# .env
EXPO_PUBLIC_SUPABASE_URL=https://<project>.supabase.co
EXPO_PUBLIC_SUPABASE_ANON_KEY=<anon-key>
EXPO_PUBLIC_REVENUECAT_IOS_KEY=appl_...
EXPO_PUBLIC_REVENUECAT_ANDROID_KEY=goog_...5. Real-device dev build
eas build --profile development --platform ios
# Camera + microphone require a real device. Simulator has no camera.Why this combo works
Most successful AI mobile apps are camera-based. Calorie trackers, plant ID, dermatology pre-screening, document scan, receipt OCR, fitness form check, study buddy that reads a textbook page. Text-only AI chat is saturated on mobile; the camera is where the next consumer wave is.
Gemini takes an image plus a prompt in one REST call, then returns structured output. With OpenAI you wire Vision separately and pay more per call. With Gemini Flash, a typical "analyse this photo" round-trip costs around $0.0005, which is well below the ad-floor for a free-trial conversion.
Zod schemas guard the parse. The analyser will not crash because the LLM returned malformed JSON. It falls back to a structured error you display, then offers a retry. That is the difference between a demo and a feature you ship.
What it costs at scale
| Line item | 1K MAU | 100K MAU |
|---|---|---|
| Gemini Flash (5 image analyses/user/mo) | ~$2.5 | ~$250 |
| RevenueCat | $0 | ~$200 |
| Supabase Pro (image bytes briefly cached) | $25 | ~$150 |
| Total (excl. store fees) | ~$28 | ~$600 |
Video analysis is the wild card. Per-second pricing applies; a 30-second clip costs ~10x a still photo. Rate-limit aggressively or pay for surprise.
What this combo does NOT cover
- Image generation: analysis only, no text-to-image
- Real-time camera inference: calls round-trip through Gemini, ~1-3s latency
- On-device vision: llama.rn is text-only; vision requires a cloud provider
- Whisper voice processing: uses
expo-speech-recognitionnative instead
Get this combo
Ships in AI Pro tier ($199). The two analyser screens, the prompts directory, and the Zod schemas are all ready to extend.
See AI Pro tier ($199)