Google introduced Gemini 2.5 as its most advanced AI model, offering more performance on various benchmarks.

Gemini 2.5 is a “thinking model,” capable of reasoning through thoughts before responding, leading to improved accuracy and decision-making, Koray Kavukcuoglu, CTO of Google DeepMind, said. The model builds on techniques like reinforcement learning and chain-of-thought prompting, enhancing its ability to analyze information, incorporate context, and solve complex problems.
Gemini 2.5 Pro Experimental is the most advanced version, topping the LMArena leaderboard with strong reasoning, coding, and math capabilities. It outperforms other models on benchmarks like GPQA and AIME 2025 and excels in agentic coding tasks, scoring 63.8 percent on SWE-Bench Verified. It can create visually compelling web apps, transform code, and generate executable code from a single-line prompt.
The model maintains native multimodality, supporting text, audio, images, video, and large datasets with a long context window of 1 million tokens (soon expanding to 2 million). It is available in Google AI Studio and the Gemini app for Gemini Advanced users, with Vertex AI integration and pricing plans coming soon.
Google has also released a performance comparison of Gemini 2.5 Pro against other AI models, such as OpenAI GPT-4.5, Claude 3.7 Sonnet, Grok 3 Beta, and DeepSeek R1, across multiple benchmarks. Here’s how Gemini 2.5 Pro stacks up against these models:
Reasoning & Knowledge: Gemini 2.5 Pro leads with 18.8 percent on Humanity’s Last Exam (without tools), surpassing OpenAI GPT-4.5 (6.4 percent) and Claude 3.7 Sonnet (8.9 percent).
Science: It scores 84.0 percent on GPQA diamond, outperforming GPT-4.5 (71.4 percent) but slightly behind Claude 3.7 Sonnet (84.8 percent).
Mathematics: Gemini 2.5 Pro excels with 86.7 percent on AIME 2025 and 92.0 percent on AIME 2024, beating GPT-4.5 (36.7 percent) and Claude 3.7 Sonnet (61.3 percent). However, Grok 3 Beta outperforms in multiple-attempt settings.
Coding & Code Editing: It performs well on LiveCodeBench v5 (70.4 percent) but is slightly behind OpenAI (74.1 percent). On SWE-bench verified, it scores 63.8 percent, exceeding GPT-4.5 (38.0 percent) but trailing Claude 3.7 Sonnet (70.3 percent).
Factuality: Gemini 2.5 Pro has a strong showing (52.9 percent) on SimpleQA, though GPT-4.5 leads with 62.5 percent.
Multimodal Capabilities: It dominates in Visual Reasoning (MMMU) (81.7 percent) and Image Understanding (Vibe-Eval) (69.4 percent), where other models lack multimodal support.
Long Context Understanding: Gemini 2.5 Pro achieves 94.5 percent in MRCR (128k tokens), far surpassing GPT-4.5 (64.0 percent) and OpenAI o3-mini (61.4 percent).
Multilingual Performance: Gemini 2.5 Pro scores 89.8 percent on Global MMLU (Lite), showing strong performance in handling diverse languages.
Gemini 2.5 Pro outperforms in reasoning, math, and long-context understanding while leading in multimodal tasks. It competes closely with top models in science and factuality but trails slightly in coding. These results highlight Gemini 2.5 Pro as one of the most advanced AI models currently available.
Baburajan Kizhakedath