Multimodal AI + RevenueCat + Supabase in a React Native boilerplate

Does AI Mobile Launcher support this combo?

Yes. The AI Pro tier ships a multimodal analyser: audio, image, and video, all powered by Gemini in a single REST endpoint. Two screens ship ready: AiAnalyserInputScreen (camera / mic / video picker) and AiAnalyserResultScreen (formatted analysis output). Prompts live in src/features/ai-analyser/prompts/. Speech-to-text is handled by expo-speech-recognition (native), not by a cloud Whisper call.

The stack

React Native 0.83.6 + Expo SDK 55.0.17
Gemini multimodal: src/features/gen-ui/api/gemini.api.ts (image + audio + video in one call)
Analyser services + prompts: src/features/ai-analyser/services/, prompts/
Camera: expo-camera
Audio: expo-av + expo-speech-recognition
Video: expo-video + expo-image-picker
RevenueCat + Supabase wired as default

Setup in five steps

1. Clone AI Pro

git clone <ai-pro-tier-repo>
cd ai-mobile-launcher
pnpm install

2. Gemini key

# .env
GOOGLE_GEMINI_API_KEY=AIza...
GEMINI_MODEL=gemini-2.0-flash  # multimodal-capable

3. Customize the analyser prompts

// src/features/ai-analyser/prompts/image.prompt.ts
// Replace with your domain-specific prompt:
// "Identify the plant species in this image. Return JSON: { species, confidence, careTips[] }."
// Zod schema in src/features/ai-analyser/schemas/ validates the parse.

4. Supabase + RevenueCat env

# .env
EXPO_PUBLIC_SUPABASE_URL=https://<project>.supabase.co
EXPO_PUBLIC_SUPABASE_ANON_KEY=<anon-key>
EXPO_PUBLIC_REVENUECAT_IOS_KEY=appl_...
EXPO_PUBLIC_REVENUECAT_ANDROID_KEY=goog_...

5. Real-device dev build

eas build --profile development --platform ios
# Camera + microphone require a real device. Simulator has no camera.

Why this combo works

Most successful AI mobile apps are camera-based. Calorie trackers, plant ID, dermatology pre-screening, document scan, receipt OCR, fitness form check, study buddy that reads a textbook page. Text-only AI chat is saturated on mobile; the camera is where the next consumer wave is.

Gemini takes an image plus a prompt in one REST call, then returns structured output. With OpenAI you wire Vision separately and pay more per call. With Gemini Flash, a typical "analyse this photo" round-trip costs around $0.0005, which is well below the ad-floor for a free-trial conversion.

Zod schemas guard the parse. The analyser will not crash because the LLM returned malformed JSON. It falls back to a structured error you display, then offers a retry. That is the difference between a demo and a feature you ship.

What it costs at scale

Line item	1K MAU	100K MAU
Gemini Flash (5 image analyses/user/mo)	~$2.5	~$250
RevenueCat	$0	~$200
Supabase Pro (image bytes briefly cached)	$25	~$150
Total (excl. store fees)	~$28	~$600

Video analysis is the wild card. Per-second pricing applies; a 30-second clip costs ~10x a still photo. Rate-limit aggressively or pay for surprise.

What this combo does NOT cover

Image generation: analysis only, no text-to-image
Real-time camera inference: calls round-trip through Gemini, ~1-3s latency
On-device vision: llama.rn is text-only; vision requires a cloud provider
Whisper voice processing: uses expo-speech-recognition native instead

Get this combo

Ships in AI Pro tier ($199). The two analyser screens, the prompts directory, and the Zod schemas are all ready to extend.

See AI Pro tier ($199)

Related combos

Gemini + RevenueCat + Supabase

Same Gemini key, text-first features.

OpenAI + RevenueCat + Supabase

If you need GPT-4o Vision instead of Gemini.