Chat with AI models privately and entirely on your device. LlamaChat Local lets you download and run open-source large language models like Llama, Phi, Gemma, and Mistral — all powered by llama.cpp with Metal GPU acceleration for fast, efficient inference.
Your conversations never leave your iPhone. All AI processing happens 100% on-device with no network calls after downloading a model. Conversations are saved locally so you can pick up right where you left off.
Features:
- Browse and download GGUF models directly from HuggingFace
- Run Llama 3.2 1B, Llama 3.2 3B, Phi-3.5 Mini, Gemma 2 2B, and more
- Real-time token streaming with tokens/second display
- Adjustable temperature, context window (512-8K), and system prompt
- Track download progress and manage local model storage
- Dark mode interface with clean, modern design
- No account required — just download a model and start chatting
LlamaChat Local is perfect for anyone who values privacy, wants to experiment with open-source AI models, or needs an offline AI assistant.