OpenAI Unveils New “Reasoning” AI Models: o1-preview and o1-mini

OpenAI Unveils New "Reasoning" AI Models: o1-preview and o1-mini

OpenAI Unveils New “Reasoning” AI Models: o1-preview and o1-mini


# OpenAI Launches “Strawberry” AI Model: A New Dawn for Reasoning and Problem-Solving?

On Thursday, OpenAI unveiled its eagerly awaited “Strawberry” AI language model, officially designated as **OpenAI o1**. This fresh model family, which features variants such as **o1-preview** and **o1-mini**, promises considerable enhancements in reasoning and problem-solving abilities relative to its forerunners. The models are now accessible to **ChatGPT Plus** subscribers and select API users, heralding a new phase in the development of large language models (LLMs).

## What’s Different in OpenAI o1?

OpenAI asserts that **o1-preview** outshines its predecessor, **GPT-4o**, in various essential domains, including competitive programming, mathematics, and scientific reasoning. According to OpenAI, this model has shown remarkable results on benchmarks, like scoring in the **89th percentile** on competitive programming challenges from [Codeforces](https://codeforces.com/contests/) and achieving a remarkable **83%** on a qualifying exam for the **International Mathematics Olympiad**—a notable improvement from GPT-4o’s 13% score.

Nonetheless, despite these enhancements, initial users have observed that o1-preview does not consistently exceed GPT-4o across all metrics. Some have also pointed out the model’s slower response times, which are linked to its multi-step processing method. This lag stems from o1’s new reasoning abilities, which engage in a more complex, step-by-step problem-solving strategy.

### A Fresh Perspective on Reasoning

The standout innovation in OpenAI o1 is its improved reasoning capabilities. OpenAI credits these enhancements to a groundbreaking **reinforcement learning (RL)** training strategy. This method encourages the model to “analyze” problems more thoroughly prior to providing a response, similar to the **chain-of-thought prompting** technique that has been proven to enhance outputs in other LLMs. The model is engineered to iteratively explore various techniques and acknowledge its own errors, culminating in more precise and reflective answers.

This innovative training method proves particularly effective for tasks that necessitate planning and multi-step reasoning, such as intricate programming challenges or scientific inquiries. However, it’s essential to understand that these advancements do not render o1 a “miracle model” that eclipses earlier versions in every manner, as OpenAI product manager **Joanne Jang** warned in a [tweet](https://x.com/joannejang/status/1834286774140498368).

## Benchmark Performance: Remarkable Yet Not Without Challenges

OpenAI has presented several benchmark results to showcase o1’s strengths. Apart from its robust performance in competitive programming and mathematics, the company claims that o1 operates on par with **PhD students** in specific tasks in physics, chemistry, and biology. However, these assertions are expected to face scrutiny as independent researchers and users undertake their own assessments in due course.

It is pertinent to highlight that AI benchmarks are often criticized for their lack of reliability and susceptibility to manipulation. Earlier this year, **MIT Research** [questioned](https://www.fastcompany.com/91073277/did-openais-gpt-4-really-pass-the-bar-exam) some benchmark claims related to GPT-4, indicating that OpenAI’s performance metrics should be approached with skepticism until further validation.

## o1-Mini: A Budget-Friendly Option

In addition to o1-preview, OpenAI has also rolled out **o1-mini**, a more compact and economically viable version of the model. Priced at **80% less** than o1-preview, o1-mini is specifically optimized for coding tasks, presenting a cost-effective solution for developers and companies in need of AI-driven programming support.

## A Varied Array of Abilities

While OpenAI has shared multiple demonstration videos showcasing o1’s prowess in tackling intricate programming tasks and logic puzzles, one particular demonstration has drawn attention for its simplicity: counting the number of “R”s in the word “strawberry.” This task, which has become somewhat of a meme within the AI community, is notoriously challenging for LLMs due to their tokenization of words into data segments. However, o1 successfully completed the task, illustrating its capability to “consider” its own processes and arrive at the correct answer.

This seemingly minor demonstration brings to light one of o1’s most captivating abilities: its capacity to engage with tasks demanding a deeper comprehension of language structure, an area where earlier models have often faltered.

### Initial Hands-On Feedback

Early users of o1-preview have shared their impressions online, with many expressing a measured optimism regarding the model’s potential. **Wharton Professor Ethan Mollick**, for instance, [commended](https://x.com/emollick/status/1834282947295461662) the model’s success in tackling challenging issues, especially those necessitating planning and multi-step reasoning. However, he