Nvidia has released new performance data indicating that its latest artificial intelligence server can deliver up to a 10-fold improvement when running next-generation AI models, including leading mixture-of-experts models developed in China.

The announcement from Nvidia comes at a time when the AI industry is shifting its attention from training massive models to efficiently deploying them for millions of users, an area where Nvidia faces stronger competition from companies such as AMD and Cerebras, Reuters news report said.
Nvidia’s new data highlights advancements in serving mixture-of-experts (MoE) models, a technique that surged in popularity after China’s DeepSeek introduced a high-performing open source model in early 2025. MoE models split user queries into parts and route them to specialized “experts” within the model, significantly improving efficiency. Since DeepSeek’s breakthrough, the approach has been embraced by OpenAI, Mistral, and China’s Moonshoot AI, which launched a highly ranked open source model in July.
Despite concerns that MoE models require less training on Nvidia’s chips, the company is emphasizing its continued leadership in the deployment phase. According to Nvidia, its newest AI server integrates seventy-two high-end processors in a single system connected through ultra-fast links. This architecture has enabled a 10-fold performance boost for Moonshoot AI’s Kimi K2 Thinking model compared with the previous generation of Nvidia servers. Nvidia reports seeing similar improvements when serving DeepSeek’s models.
The performance leap is attributed to the density of chips within each server and the speed of interconnects linking them, an area where Nvidia maintains a competitive edge. However, the landscape is evolving quickly. AMD is developing a comparable multi-chip AI server, expected to reach the market next year, signaling increased pressure on Nvidia’s dominance in the deployment segment of the AI ecosystem.
Rajani Baburajan

