OpenAI o1-preview has introduced a new set of AI capabilities, tailored to tackle the most complex reasoning tasks in science, coding, and mathematics. Released on September 12th, this series brings a major leap in AI performance, pushing beyond the limitations of earlier models.

The new model is available in both ChatGPT and via the API, offering an initial preview phase with regular updates to fine-tune its performance. Future model updates are already underway, with evaluations in progress for continual refinement and advancement.

What makes OpenAI o1-preview different from anything you’ve seen before

OpenAI o1-preview models are part of a new class of AI designed to handle intricate and demanding tasks by allocating more time to thinking through problems. Unlike its predecessors, o1-preview doesn’t rush to produce answers; it mimics human-like problem-solving, where careful reasoning and multiple strategies are applied before reaching a conclusion.

The new “think first” approach improves the model’s capacity to handle difficult questions and deliver accurate results across various domains.

With the initial release, OpenAI plans to regularly update the models, incorporating feedback and performance data to improve capabilities progressively. Evaluations of these future updates are already in development, making sure each iteration builds on the strengths of the previous one while addressing any gaps or limitations.

OpenAI’s o1-preview thinks deeper and solves harder

OpenAI o1-preview’s core strength lies in its improved problem-solving and reasoning abilities. These models are trained to adopt a methodical approach, considering different angles and possibilities before responding. Capacity to “think deeper” makes them far better-suited for solving complex challenges in science, coding, and mathematics.

  • Performance benchmarking: The o1-preview models show results comparable to PhD students when tested on rigorous tasks in physics, chemistry, and biology. For instance, in physics problems that typically require deep understanding and analytical thinking, o1-preview performs at a level similar to advanced graduate students.
  • Dramatic improvements in Mathematics: In a qualifying exam for the International Mathematics Olympiad (IMO), the o1-preview achieved a score of 83%, a dramatic improvement over GPT-4o’s 13%.
  • Coding prowess: In coding competitions, o1-preview models have been evaluated in Codeforces, a platform known for challenging competitive programming. They ranked in the 89th percentile, pointing out strength in generating and debugging complex code efficiently.

What OpenAI o1-preview can’t do yet but is learning fast

Currently, OpenAI’s o1-preview model doesn’t support features such as web browsing, file uploads, or image handling. These functionalities are key for many use cases, but the focus on complex reasoning tasks makes o1-preview a specialized tool for scenarios where critical thinking and deep analysis are required.

Even with these limitations, however, the model surpasses GPT-4o in handling complex reasoning tasks. Absence of certain features doesn’t detract from its core strength: its ability to solve harder problems with greater accuracy.

Innovative ways OpenAI o1-preview handles safety and keeps your data secure

OpenAI has introduced a new safety training approach that uses the model’s advanced reasoning capabilities to stay aligned with safety and ethical guidelines. This method improves the model’s ability to identify and avoid harmful content for safer interactions with users.

  • Resistance to “jailbreaking”: The o1-preview has shown a marked improvement in resisting unauthorized attempts to bypass its safety measures, known as “jailbreaking.” On one of the most challenging jailbreaking tests, the model scored 84 out of 100, compared to GPT-4o’s score of 22—indicating a higher level of resilience against manipulation, and ultimately reducing the risk of misuse.
  • Strict safety protocols: OpenAI has reinforced its internal governance by collaborating closely with federal agencies and improving oversight through frameworks like the Preparedness Framework. Additional safety measures include advanced “red teaming” (a process where the model is tested rigorously to find vulnerabilities) and oversight from the Safety & Security Committee at the board level.

How global partnerships are helping make OpenAI o1-preview safer

To further secure the models, OpenAI has established formal agreements with AI Safety Institutes in the U.S. and U.K. These partnerships aim to operationalize safety protocols by granting these institutes early access to a research version of the model.

This access allows for rigorous evaluation and testing, identifying potential issues before public release.

The goal here is to develop comprehensive testing protocols that can serve as a standard for future model releases, making sure each new iteration improves in capability and aligns closely with strict safety standards.

Who should be excited about OpenAI o1 and how it’ll impact them

OpenAI o1 models are designed to meet the needs of professionals dealing with complex problem-solving across many different fields:

  • Healthcare research: OpenAI o1 can help researchers analyze and annotate massive datasets, such as cell sequencing data, with increased speed and accuracy.
  • Physics and quantum research: Physicists can use o1 to generate and solve complex mathematical formulas required for advanced research in areas like quantum optics.
  • Software development: Developers can build and debug multi-step workflows more effectively, thanks to the model’s advanced coding and reasoning capabilities.

OpenAI o1-mini is a good pick for budget-friendly AI tasks

OpenAI o1-mini offers a more accessible alternative to the o1-preview model, specifically tailored for coding and targeted reasoning tasks.

As a smaller, faster version, o1-mini is 80% cheaper, making it an attractive option for applications where cost efficiency is essential. While it lacks the broader world knowledge of its bigger sibling, it excels in focused, reasoning-driven tasks, particularly in coding.

Who can start using OpenAI o1 now and what to expect

OpenAI is gradually rolling out access to o1 models:

  • Immediate access for Plus and Team users: ChatGPT Plus and Team users can manually select o1-preview or o1-mini models today, with weekly rate limits set at 30 messages for o1-preview and 50 for o1-mini.
  • Upcoming access for enterprise and edu users: Starting next week, ChatGPT Enterprise and Edu users will also have access to both models.
  • API access for developers: Developers at tier 5 can begin using both models now, with a rate limit of 20 Requests Per Minute (RPM). Note that current API features do not include function calling, streaming, or support for system messages.

What’s next for OpenAI o1 as it gets even more powerful and accessible

OpenAI plans to increase message rates and allow ChatGPT to automatically select the best model for a given prompt. Future updates will introduce browsing, file uploading, and image handling capabilities, expanding the practical applications of the o1 models.

Ongoing development aims to keep the OpenAI o1 series and the GPT series at the cutting edge of AI technology. Anticipated updates will bring new features, improved safety protocols, and broader accessibility, making these models even more versatile and powerful tools for users worldwide.

Tim Boesen

September 18, 2024

6 Min