How do the latest models from AI heavy hitters compare? We examine benchmarks, leaderboards, and overall feature sets.
OpenAI released its latest model, GPT-5.5, on April 23, shortly after Anthropic introduced Claude Opus 4.7. As leaders from top AI labs, it’s interesting to see how these models stack up.
Spoiler alert: Claude Opus 4.7 excels in advanced coding, whereas GPT-5.5 outperforms on most benchmarks.
GPT-5.5 is still being evaluated on all AI leaderboards but is expected to rival Claude Opus 4.7. On verified benchmarks like the Arc Prize, GPT-5.5 surpasses Opus 4.7. On the user-tested Arena leaderboard, Opus 4.7 Thinking holds the top spot, ranking below Opus 4.6 for now. Anthropic’s unreleased Claude Mythos, yet to be ranked, is reportedly superior to Opus 4.7.
In the Epoch Capabilities Index (ECI), GPT-5.4 Pro leads with the highest score, followed by Gemini 3.1 Pro and GPT-5.4.
Considering benchmarks, GPT-5.5 outshines Opus 4.7, relying on self-reported scores from OpenAI and Anthropic:
– SWE-Bench Pro: GPT-5.5 scored 58.6; Opus 4.7 scored 64.3 percent
– Terminal-Bench 2.0: GPT-5.5 scored 82.7 percent; Opus 4.7 scored 69.4 percent
– Humanity’s Last Exam: GPT-5.5 scored 40.6 percent; Opus 4.7 scored 31.2 percent
– Humanity’s Last Exam (with tools): GPT-5.5 scored 52.2 percent; Opus 4.7 scored 54.7 percent
– BrowseComp: GPT-5.5 scored 84.4 percent; Opus 4.7 scored 79.3 percent
– GPQA Diamond: GPT-5.5 scored 93.6 percent; Opus 4.7 scored 94.2 percent
– ARC-AGI-1 (Verified): GPT-5.5 (High) scored 94.5 percent; Claude 4.7 (High) scored 92 percent
– ARC-AGI-2 (Verified): GPT-5.5 (High) scored 83.3 percent; Claude 4.7 (High) scored 68.3 percent
The availability and pricing for GPT-5.5 and Opus 4.7 are subscription-based. GPT-5.5 is accessible to OpenAI Plus, Pro, Business, and Enterprise users. The new model is priced at $5 per 1M input tokens and $30 per 1M output tokens via API. Opus 4.7 is available to Pro and Max customers, costing $5 per million input tokens and $25 per million output tokens through API.
Both AI chatbots offer similar feature sets for research, coding, and creative projects. GPT-5.5 stands out in “agentic coding,” while Opus 4.7 excels in visual intelligence. ChatGPT boasts broader integrations, making GPT-5.5 a preferred choice for everyday professional work.
In conclusion, for advanced coding, Claude Opus 4.7 is leading, but GPT-5.5 offers more comprehensive features for regular professional tasks.
Disclosure: Ziff Davis, Mashable’s parent company, has filed a lawsuit against OpenAI for alleged copyright infringement.
