OpenAI’s ‘Strawberry’ Model Excels at Solving Complex Equations

Blog

On September 12, OpenAI announced a preview of its latest model, OpenAI o1, created to tackle complex tasks such as coding, math problem-solving, and in-depth reasoning. This model marks the first release in the highly anticipated next-generation AI series dubbed “Strawberry.”

Currently, users of ChatGPT Plus, Team users, and developers with OpenAI API usage Tier 5 can access the o1-preview model. They also have the option to use o1-mini—a smaller, quicker version of o1 optimized for coding tasks. OpenAI states that o1-mini is “80% cheaper than o1-preview,” making it an economical choice for applications requiring reasoning without extensive world knowledge.

Additionally, OpenAI announced that both models would be accessible to ChatGPT Enterprise and Edu users starting next week. “We are also planning to provide o1-mini access to all ChatGPT Free users,” the company noted in its announcement.

For more information about o1, which is part of a suite of sophisticated and aligned models, visit: link. Although o1 exhibits remarkable capabilities, it still has limitations and may feel less impressive after prolonged use, as indicated by Sam Altman.

Key Features of o1

OpenAI o1 is designed to take more time to analyze and solve challenging problems. Unlike GPT-4, which primarily enhances language functions, o1 and o1-mini concentrate on scientific applications, coding creation, debugging, and mathematical tasks. A demonstration video showcases the model’s ability to build a playable game reminiscent of the classic Snake games from the 1970s.

OpenAI highlights several potential applications for o1, including:

Assisting health care researchers in annotating cell sequencing data.
Aiding physicists in generating complex mathematical formulas necessary for quantum optics.
Supporting developers in various fields to design and execute multi-step workflows.

In competitive programming scenarios, o1 achieved a score in the 89th percentile on the Codeforces test and ranked among the top 500 students in the U.S. for the USA Math Olympiad qualifier. As expected, o1 takes longer to respond compared to ChatGPT or GPT-4, displaying a loading message to indicate that it is “thinking.”

The o1-preview model has a maximum output capacity of 32k tokens, while o1-mini can generate up to 64k tokens. Tokens range from a single character to a full word, depending on the complexity of the text. Both models currently support text input only, excluding audio and image inputs.

To help developers assess whether o1 fits their needs, OpenAI has created a best practices guide. In the model’s system card, outlining security and red-teaming efforts, o1 received a “medium” safety rating in two categories. The independent research group Apollo Research has noted that o1 possesses the fundamental capabilities for simple in-context scheming, indicating some potential for circumventing oversight processes. However, its advanced reasoning abilities also provide a better understanding of safety protocols.