OpenAI releases GPT-4o Mini for strong performance at a lower cost

OpenAI introduces GPT-4o Mini

OpenAI has released GPT-4o Mini, with the primary goal to make advanced intelligence both accessible and affordable, expanding the range of applications that can leverage AI technology.

GPT-4o Mini achieves this with solid performance, scoring 82% on the Massive Multitask Language Understanding (MMLU) benchmark, and outperforming GPT-4 on the LMSYS leaderboard for chat preferences.

In terms of cost, GPT-4o Mini is competitively priced. Input tokens are priced at just 15 cents per million, while output tokens cost 60 cents per million. This is an order of magnitude more affordable than previous frontier models and over 60% cheaper than GPT-3.5 Turbo, making it a cost-effective solution for businesses looking to integrate AI into their operations without incurring prohibitive expenses.

Powering AI applications with GPT-4o Mini

GPT-4o Mini opens up new possibilities for AI-driven tasks due to its low cost and latency—performing well in scenarios that require chaining or parallelizing multiple model calls, making it a good choice for applications needing high throughput and efficiency.

For instance, developers can call multiple APIs simultaneously, greatly improving the performance of complex systems.

Another key capability of GPT-4o Mini is how it handles large volumes of context, including tasks such as processing a full code base or maintaining extensive conversation histories, which are key for in-depth analyses and complex decision-making processes.

Support and specifications

Currently, GPT-4o Mini supports both text and vision inputs in the API, with future updates planned to include support for text, image, video, and audio inputs and outputs—broadening the scope of potential applications, letting businesses integrate multimodal AI functionalities into their workflows.

The model features a context window of 128K tokens, letting it process large data quantities in a single request. Adding to this, GPT-4o Mini also supports up to 16K output tokens per request, allowing detailed and comprehensive responses.

With knowledge up to October 2023, GPT-4o Mini has relatively up-to-date information, making it quite reliable for contemporary applications.

An improved tokenizer improves the cost-effectiveness of handling non-English text, so that GPT-4o Mini can be used for multilingual environments, broadening its global applicability.

GPT-4o Mini’s benchmark performance

Reasoning tasks: GPT-4o Mini showed proficiency with reasoning tasks, scoring 82.0% on the MMLU benchmark—a major improvement over Gemini Flash, which scored 77.9%, and Claude Haiku, which scored 73.8%. GPT-4o Mini’s can handle complex reasoning tasks more effectively, positioning it as a valuable asset for applications requiring high-level cognitive functions.

Math and coding: GPT-4o Mini does well in mathematical reasoning and coding tasks, as shown by its scores on the MGSM and HumanEval benchmarks. It achieved a score of 87.0% on MGSM, passing Gemini Flash’s 75.5% and Claude Haiku’s 71.7%. On the HumanEval benchmark, which measures coding performance, GPT-4o Mini scores 87.2%, outperforming Gemini Flash at 71.5% and Claude Haiku at 75.9%.

Multimodal reasoning: In multimodal reasoning, GPT-4o Mini scored 59.4% on the MMMU benchmark, beating Gemini Flash’s 56.1% and Claude Haiku’s 50.2%. GPT-4o Mini’s can integrate and reason across multiple types of media, making it a versatile tool for a range of different applications.

Real-world practical applications

One powerful use case is extracting structured data from receipt files, an essential task for businesses that handle large volumes of transactions, such as retail chains and financial institutions.

GPT-4o Mini automates the extraction of data from receipts, reducing manual data entry efforts, increasing accuracy, and speeding up processing times. This is particularly beneficial for expense management systems, where quick and precise extraction of purchase details is a priority for accurate tracking and reporting.

Another major practical application of GPT-4o Mini is generating high-quality email responses based on thread history. GPT-4o Mini’s can understand and respond appropriately to complex email threads helps companies maintain high standards of communication while reducing the workload on human agents. It’s particularly useful for customer support teams, sales departments, and any business unit that relies heavily on email communication.

Collaboration with partners

To fine-tune GPT-4o Mini’s capabilities and understand its practical limitations, OpenAI partnered with leading companies such as Ramp and Superhuman for valuable insights into the model’s performance in real-world scenarios.

Ramp, for instance, used GPT-4o Mini to extract structured data from receipt files, finding it superior to previous models in accuracy and speed. Superhuman used GPT-4o Mini to generate high-quality email responses, improving their service’s responsiveness and effectiveness.

Built-in safety and reliability

Safety is a foundational aspect of GPT-4o Mini’s development process. From pre-training to deployment, OpenAI integrates strict safety measures to make sure the model operates reliably and ethically.

Pre-training safety measures: During the pre-training phase, OpenAI implements strict filtering mechanisms to exclude undesirable content, including filtering out hate speech, adult content, spam, and sites that primarily aggregate personal information.

Post-training safety measures: Post-training, the model’s behavior is aligned with OpenAI’s policies through reinforcement learning with human feedback (RLHF). This typically involves human evaluators providing feedback on the model’s responses, which is then used to fine-tune its behavior.

Expert evaluations: To further improve safety, GPT-4o Mini goes through detailed evaluations by over 70 external experts from fields such as social psychology. Experts assess the model for potential risks and biases, providing insights that then inform safety improvements.

New safety techniques: GPT-4o Mini is the first model to apply OpenAI’s new instruction hierarchy method—a technique that strengthens the model’s resistance to common threats such as jailbreaks, prompt injections, and system prompt extractions.

GPT-4o Mini’s availability, access, and pricing

GPT-4o Mini is now available through multiple APIs, including the Assistants API, Chat Completions API, and Batch API. Developers can integrate GPT-4o Mini into a range of applications right away.

Developers will pay 15 cents per 1 million input tokens and 60 cents per 1 million output tokens. To put this into perspective, 1 million tokens are roughly equivalent to 2,500 pages of a standard book, which is valuable for high-volume applications.

Since the introduction of text-davinci-003 in 2022, the cost per token has dropped by 99%.

Starting today, GPT-4o Mini is accessible to Free, Plus, and Team users of ChatGPT, replacing GPT-3.5. Enterprise users will gain access later this week.

Paul

July 23, 2024

5 Min