Configuring and Operating OpenAI’s ‘gpt-oss-20b’ Open Weight Model Locally on macOS

Configuring and Operating OpenAI's 'gpt-oss-20b' Open Weight Model Locally on macOS

Configuring and Operating OpenAI’s ‘gpt-oss-20b’ Open Weight Model Locally on macOS


# Getting Started with gpt-oss on Macs

This week, OpenAI unveiled its highly anticipated open weight model named gpt-oss. One of the key attractions of gpt-oss is its ability to run locally on your own hardware, including Macs equipped with Apple silicon. Here’s how to dive in and what you should anticipate.

## Models and Macs

gpt-oss is available in two versions: gpt-oss-20b and gpt-oss-120b. The former is characterized as a medium open weight model, while the latter is recognized as a heavy open weight model.

The medium model is what Macs with sufficient resources running on Apple silicon can plan to execute locally. What’s the difference? Anticipate the smaller model to generate more inaccuracies relative to the significantly larger model because of the difference in data set size. That’s the compromise for a faster model that can indeed run on high-end Macs.

Nonetheless, the smaller model is an impressive tool that’s freely accessible if you possess a Mac with adequate resources and a desire to run large language models locally.

You should also be mindful of the distinctions when operating a local model compared to, for instance, ChatGPT. Typically, the open weight local model lacks many of the contemporary chatbot functionalities that enhance ChatGPT’s usability. For instance, the responses do not take into account web results, which can frequently reduce inaccuracies.

OpenAI suggests having at least 16GB of RAM to run gpt-oss-20b, but Macs with greater RAM will indisputably yield better performance. According to initial user insights, 16GB of RAM is essentially the minimum required just to experiment.

## Setup and Use

Getting started is incredibly straightforward.

First, install Ollama on your Mac. This serves as the interface for interacting with gpt-oss-20b. The app can be found at [ollama.com/download](https://ollama.com/download), or you can download the Mac version from [this download link](https://ollama.com/download/Ollama.dmg).

Next, launch Terminal on your Mac and input the following command:

“`
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
“`

This will prompt your Mac to retrieve gpt-oss-20b, which occupies approximately 15GB of disk space.

Lastly, you can start Ollama and choose gpt-oss-20b as your model. You can even toggle Ollama into airplane mode in the app settings to guarantee that everything operates locally. No sign-in is necessary.

To test out gpt-oss-20b, simply type a prompt into the text box and observe the model in action. Again, your hardware resources will determine model performance here. Ollama will utilize all available resources while running the model, meaning your Mac may become noticeably slower during processing.

My best Mac is a 15-inch M4 MacBook Air with 16GB of RAM. While the model works, it’s quite demanding even for experimental use on my device. Responding to ‘hello’ took just over five minutes. Answering ‘who was the 13th president’ took even longer, at roughly 43 minutes. You’ll definitely want more RAM if you intend to experiment for an extended period.

Thinking about removing the local model to free up disk space? Enter this terminal command:

“`
ollama rm gpt-oss:20b
“`

For further details on employing Ollama with gpt-oss-20b on your Mac, refer to this [official resource](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama). Alternatively, you might consider using [LM Studio](https://lmstudio.ai/download?os=win32), another Mac application for working with AI models.