# Local AI Models Now Operate More Swiftly on Ollama on Apple Silicon Macs
If you’re unfamiliar with [Ollama](https://ollama.com/), it is an application available for Mac, Linux, and Windows that enables users to execute AI models locally on their systems. Unlike cloud-based applications like ChatGPT, which do not function locally and necessitate an internet connection, Ollama permits users to load and operate models directly on their devices. These models may be obtained from open-source platforms such as Hugging Face, or even straight from the model provider.
Nonetheless, executing a large language model (LLM) locally can pose significant difficulties, since even smaller, lightweight LLMs typically require considerable amounts of RAM and GPU memory. To mitigate this problem, Ollama has unveiled a preview version (Ollama 0.19) of its application, now constructed upon Apple’s machine learning framework, MLX, leveraging its unified memory architecture to ensure local AI models run at increased speeds on Apple silicon Macs.
This leads to a notable acceleration of Ollama on all Apple Silicon devices. Utilizing Apple’s M5, M5 Pro, and M5 Max chips, Ollama takes advantage of the new GPU Neural Accelerators to improve both time to first token (TTFT) and generation rate (tokens per second). With this upgrade, Ollama asserts it is now swifter to execute personal assistants like OpenClaw, as well as coding agents such as Claude Code, OpenCode, or Codex.
The stipulation is that Ollama advises users to confirm they possess a Mac with over 32GB of unified memory, which may not be the circumstance for many individuals eager to run LLMs locally.
To discover more about Ollama, [click this link](https://ollama.com/). For additional information about Apple’s MLX project, all the details can be found [here](https://opensource.apple.com/projects/mlx/).