Back to deals
NoemaAI

Productivity

Free

NoemaAI

by Alexandru Stamate

4.2 (15) v2.2 106 MB Universal 12+

Description

Noema brings large language model intelligence to all your devices, fully offline. Download lightweight models directly from Hugging Face, connect supported remote endpoints, and pair models with curated textbooks and your own PDFs or EPUBs. The privacy-first design means your data never leaves your device when running locally, whether you are on iPhone, Mac, or visionOS.

- Native macOS app: Run the full Noema experience on your desktop with a rebuilt interface that feels at home on macOS.
- visionOS support: Use Noema in spatial computing environments, with windows you can place around your workspace.
- Noema Relay: Connect your iPhone to your Mac via CloudKit, with no local Wi-Fi required, so one device can host a model while another becomes the client.
- Vision support for models: Attach photographs to your prompts and use multimodal models for on-device image understanding and analysis.
- Open Textbook Library integration: Browse and import entire textbooks through the built-in Explore view; Noema indexes them locally so you can search and retrieve relevant passages on demand.
- Bring your own data: Add personal documents in PDF or EPUB formats, which are embedded and indexed on-device to power retrieval-augmented generation.
- Integrated Hugging Face search: Discover and install quantized models from the Hugging Face Hub with one-tap installation, automatic dependency management, and real-time download progress.
- Remote model support: Connect to supported remote endpoints including OpenRouter and LM Studio, with updated LM Studio REST v1 compatibility and a smoother model download flow through Explore.
- Expanded model runtime support: Run models across GGUF, MLX, ExecuTorch, CoreML, and Apple Foundation Model support, giving you flexible on-device options across Apple hardware.
- RAM check and model size helper: A built-in advisor estimates each model’s memory footprint and shows when it fits your device’s budget; it can also estimate the maximum context length that fits in RAM.
- Advanced settings for power users: Fine-tune context length, quantization, and GPU acceleration; enable tool calling for built-in search and other functions; and customize model parameters for optimal performance.
- Built-in tool calling and Python support: Use integrated tools, including Python, to extend model capabilities for more advanced workflows.
- Built-in search and RAG: Use integrated search tools and retrieval-augmented generation to query your data without hitting context limits.
- Localization upgrades: Experience Noema in 10 languages, so international teams can work in the interface that suits them best.
- Private and offline by default: Local models run entirely on-device, and your conversations and files stay on your device unless you choose to use a connected remote provider.

What's new (v2.2)

• Updated llama.cpp for Gemma 4 support, including fixes for previously known Gemma 4 issues
• Improved GGUF import reliability, with better detection for chat templates, JSON configs, and multimodal projector files
• Added a clearer download experience for CML models, including visible progress during downloads
• Improved smart retrieval so large-context models make better use of available context with PDFs and long documents
• Added a new Prompt Processing card in chat with live progress feedback
• Fixed prompt processing progress getting stuck at 0% and corrected its placement after tool calls
• Fixed scrolling issues in Model Settings caused by repeated memory fit checks
• Updated VRAM estimates and maximum context recommendations to reflect KV cache quantization changes
• Added support for Memory and system prompt customization
• Refreshed curated models with Gemma 4 and Qwen 3 1.7B support