Description
Endemic runs large language models on your iPhone, iPad, and Mac. No API keys. No cloud. The model lives on your device and so do your conversations.
Download a model from the curated catalog, then chat offline from that point on. The catalog filters by your device's RAM so you only see models that actually fit your hardware.
Ships with Qwen 3.5 and Google's Gemma 4 in sizes from 0.8B (any recent iPhone) up to 9B (Mac and iPad Pro with 16GB+). Pick the family you prefer, switch any time.
QWEN 3.5 AND GEMMA 4, ON DEVICE
Qwen 3.5 is the included catalog and a strong all-rounder, from the tiny 0.8B that runs anywhere to the 9B quality pick for 16GB+ devices. Google's Gemma 4 is the newer flagship family, with full chat templating and function calling running locally in Endemic. The 4B Gemma is a solid everyday model on a modern phone; the 9B is the quality pick on iPad Pro and Mac. Both families work with Local Web Search and Local Web Browser, so you can ask the model to look something up and it actually does.
Gemma 4 is part of Endemic Pro, a small monthly subscription that also unlocks conversation folders and helps fund continued development by one person. Qwen 3.5 is included free. You can try Pro, cancel any time, and your downloaded models keep working.
WEB SEARCH AND BROWSING, ON DEVICE
Endemic can search the web and read web pages for you, right from the conversation. Ask it to look something up, check a site, or get current information and the model calls local tools to fetch live results, then summarizes what it finds. The search and page fetch happen on your device through your network connection. No proxy, no Folding Sky server in the middle.
Enable Local Tools in settings and the model gains searchWeb and openWebPageLocally capabilities. Tool calls and results appear inline in the conversation so you can see exactly what the model did.
IMAGES IN, ANSWERS OUT
Vision-capable local models can now read images you attach to a message. Pick a photo, screenshot, or diagram and ask about it. The image stays on your device and the model describes, transcribes, or reasons about what it sees, all offline.
WHAT LOCAL INFERENCE ACTUALLY MEANS
The model runs on your CPU and GPU. No request leaves your device during a conversation. This is not "private cloud" or "encrypted API calls." The computation happens on the hardware in your hand.
The tradeoff is real: a 4B model on a phone produces useful results for many tasks, but it will not match a frontier model on datacenter hardware. If you need the best possible responses, cloud AI is the answer. If you want a model you own on a device you own, this is it.
MODELS AND DEVICES
Endemic detects your hardware and recommends the strongest model that fits. An iPhone with 8GB runs the 4B comfortably. iPads and Macs with 16GB+ handle the 9B. The 0.8B fits anything and responds fast.
Models are open-weight GGUF files downloaded from public hosting. One download, stored locally, excluded from iCloud backup so it does not eat your storage quota.
PRIVACY
There is no Folding Sky backend. No analytics on your conversations. Conversations sync through your personal iCloud if you want, and nowhere else.
Built by one person in Beaverton, Oregon.
Privacy Policy: https://folding-sky.com/privacy
Terms of Service: https://folding-sky.com/terms
What's new (v1.3.0)
Bug fixes and improvements.