OpenAI Launches GPT-5.3-Codex: A Self-Generating AI Model

Introduction

On February 6, 2026, shortly after Claude released Opus 4.6, OpenAI unveiled its latest programming model: GPT-5.3-Codex. OpenAI claims this is the world’s most powerful agentic programming model.

Performance

GPT-5.3-Codex achieved state-of-the-art (SOTA) results in SWE-Bench Pro and Terminal-Bench 2.0, showing improvements over GPT-5.2-Codex in agent capabilities and real-world task evaluations. In Terminal-Bench 2.0, GPT-5.3-Codex scored 11.9% higher than Claude Opus 4.6.

However, OpenAI participated in fewer benchmark tests, and there was minimal overlap with Claude Opus 4.6, making these scores only a reference point.

Demonstration

To showcase its programming abilities, OpenAI presented a racing game developed by GPT-5.3-Codex. The game features multiple cars racing on eight maps and includes item usage with the spacebar, although the graphics are somewhat basic. We also tested the game, and it was quite complete.

Experience link:
Play the Racing Game

OpenAI also revealed that GPT-5.3-Codex plays a crucial role in its self-creation process. Early versions of GPT-5.3-Codex were used by the Codex team to debug model training, manage deployments, diagnose test results, and evaluate performance, accelerating model development.

Features

GPT-5.3-Codex combines the programming capabilities of GPT-5.2-Codex with the reasoning skills and knowledge of GPT-5.2, achieving a 25% speed increase.

This means GPT-5.3-Codex can be utilized not only for programming but also for all other tasks in software engineering, including debugging, deployment, monitoring, testing, and metrics analysis. It can also assist in creating PPT, Excel, Word, and other documents, showing promising results in OpenAI’s shared cases.

New Enterprise-Level AI Platform

Alongside GPT-5.3-Codex, OpenAI launched its latest enterprise-level AI platform, Frontier, which supports context sharing, learning from feedback, and continuous improvement, while allowing clear permissions and boundaries.

Currently, GPT-5.3-Codex is available to paid ChatGPT users for use in Codex applications, CLI, IDE plugins, and the web. API access will be updated later. Frontier is currently limited to select customers, with broader availability expected in the coming months.

Community Reception

However, compared to Claude Opus 4.6, GPT-5.3-Codex and Frontier have seen significantly less discussion. The engagement on the release tweets was less than half that of Claude Opus 4.6, and many comments expressed skepticism.

Users interested in programming capabilities believe GPT-5.3-Codex still lags behind Claude Opus 4.6 in practical use and safety, while those using OpenAI models for writing and other scenarios feel neglected. This highlights OpenAI’s ongoing challenge in balancing its consumer and business offerings.

Enhanced Bug Fixing and Reporting

OpenAI claims that with GPT-5.3-Codex, its programming tool Codex will evolve from merely writing and reviewing code to an agent capable of completing nearly all tasks a developer or professional can perform on a computer.

In web development, OpenAI showcased two games created by GPT-5.3-Codex: the previously mentioned racing game and a diving game similar to “Diver Dave”.

GPT-5.3-Codex can autonomously iterate on these games when prompted with general follow-up requests like “fix this bug” or “improve the game”. It demonstrates a clear understanding of web development intentions, generating more complete and reasonable default settings for websites.

For example, when creating a homepage for a service called “Quiet KPI”, GPT-5.3-Codex automatically displayed the annual payment plan as a discounted monthly price and generated a carousel component with three different user reviews instead of just one. This made the overall page more complete and closer to a product ready for launch.

Comprehensive Software Lifecycle Support

The work of programmers, designers, product managers, and data scientists extends beyond just writing code. GPT-5.3-Codex supports tasks throughout the entire software lifecycle, including debugging, deployment, monitoring, writing PRDs, editing documents, user research, testing, metrics analysis, and more.

In evaluations like GDPval, GPT-5.3-Codex reached levels comparable to GPT-5.2, while achieving a score of 64.7% in OSWorld-Verified, where the human average is around 72%. This marks a significant improvement over previous GPT models.

With the release of GPT-5.3-Codex, Codex also introduced a new setting called “work-guided”. When enabled, GPT-5.3-Codex frequently updates key decisions and progress during work, supporting real-time dialogue, questions, and discussions, while continuously explaining its reasoning and providing feedback. This allows human users to manage and supervise multiple agents more efficiently.

OpenAI states that due to improvements in infrastructure and reasoning stacks, Codex users experience a 25% overall speed increase with GPT-5.3-Codex. During its development, OpenAI collaborated with NVIDIA to design, train, and deploy the model based on the GB200 NVL72 system.

Conclusion

In this release, OpenAI focused on enhancing AI productivity and deployment capabilities. Whether through the strengthened programming and software engineering capabilities of GPT-5.3-Codex or the Frontier platform aimed at creating effective agents, the goal is to integrate AI into production environments, making it manageable, trustworthy, and scalable.

The challenge remains for OpenAI to maintain developer and user trust while balancing its long-term business strategies with broad consumer impact.