On Thursday, OpenAI launched GPT-5.4, described as “our most capable and efficient frontier model for professional work.” GPT-5.4 is available in a reasoning model (GPT-5.4 Thinking) and a high-performance version (GPT-5.4 Pro) in addition to the standard version.
The API version supports context windows up to 1 million tokens, the largest from OpenAI so far.
OpenAI highlighted improved token efficiency, indicating GPT-5.4 solves problems with fewer tokens compared to its predecessor.
The model shows improved benchmarks, achieving record scores in OSWorld-Verified and WebArena Verified, and 83% on OpenAI’s GDPval test for knowledge work tasks.
GPT-5.4 excelled in Mercor’s APEX-Agents benchmark, targeting professional skills in law and finance, noted by Mercor CEO Brendan Foody.
“GPT-5.4 excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis,” Foody said, “delivering top performance while running faster and at a lower cost than competitive frontier models.”
OpenAI continues efforts to limit hallucinations and errors, with GPT-5.4 being 33% less prone to errors in claims and 18% less likely to contain errors in responses compared to GPT 5.2.
OpenAI has revamped how the API version of GPT-5.4 manages tool calling, with a new system called Tool Search, replacing the previous method that consumed many tokens. This allows for faster and cheaper requests.
OpenAI included a new safety evaluation to test the model’s chain-of-thought, ensuring it accurately represents its reasoning through tasks. The evaluation indicates less likelihood of deception in the Thinking version of GPT-5.4, showing CoT monitoring as an effective safety measure.
