Microsoft reveals the power of NVIDIA behind OpenAI’s ChatGPT

Microsoft’s blogs have revealed the power of the NVIDIA supercomputing technology behind OpenAI’s ChatGPT.
Nvidia A100The first blog provides new details about Microsoft’s OpenAI supercomputer which used thousands of NVIDIA A100 GPUs and InfiniBand networking to train ChatGPT.

“Co-designing supercomputers with Azure has been crucial for scaling our demanding AI training needs, making our research and alignment work on systems like ChatGPT possible,” Greg Brockman, President and Co-Founder of OpenAI, said.

Microsoft has introduced the ND H100 v5 VM which enables on-demand in sizes ranging from eight to thousands of NVIDIA H100 GPUs interconnected by NVIDIA Quantum-2 InfiniBand networking. Customers will see significantly faster performance for AI models over our last generation ND A100 v4 VMs with innovative technologies like:

8x NVIDIA H100 Tensor Core GPUs interconnected via next gen NVSwitch and NVLink 4.0

400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand per GPU with 3.2Tb/s per VM in a non-blocking fat-tree network

NVSwitch and NVLink 4.0 with 3.6TB/s bisectional bandwidth between 8 local GPUs within each VM

4th Gen Intel Xeon Scalable processors

PCIE Gen5 host to GPU interconnect with 64GB/s bandwidth per GPU

16 Channels of 4800MHz DDR5 DIMMs

Delivering exascale AI supercomputers to the cloud

Generative AI applications are rapidly evolving and adding unique value across nearly every industry. From reinventing search with a new AI-powered Microsoft Bing and Edge to AI-powered assistance in Microsoft Dynamics 365, AI is rapidly becoming a pervasive component of software and how we interact with it, and our AI Infrastructure will be there to pave the way.

“With our experience of delivering multiple-ExaOP supercomputers to Azure customers, customers can trust that they can achieve true supercomputer performance with our infrastructure. For Microsoft and organizations like Inflection, NVIDIA, and OpenAI that have committed to large-scale deployments, this offering will enable a new class of large-scale AI models,” Matt Vegas Principal Product Manager, Azure HPC+AI, said.

“Our focus on conversational AI requires us to develop and train some of the most complex large language models. Azure’s AI infrastructure provides us with the necessary performance to efficiently process these models reliably at a huge scale. We are thrilled about the new VMs on Azure and the increased performance they will bring to our AI development efforts.”—Mustafa Suleyman, CEO, Inflection.

“NVIDIA and Microsoft Azure have collaborated through multiple generations of products to bring leading AI innovations to enterprises around the world. The NDv5 H100 virtual machines will help power a new era of generative AI applications and services,” Ian Buck, Vice President of hyperscale and high-performance computing at NVIDIA, said.

NVIDIA said ND H100 v5 is available for preview and will become a standard offering in the Azure portfolio, allowing anyone to unlock the potential of AI at Scale in the cloud.

The second blog reveals Microsoft’s Azure ND H100 v5 virtual machines featuring NVIDIA’s new H100 GPUs and Quantum-2 InfiniBand networking to accelerate generative AI.
Nidhi Chappell and Phil Waymouth @ Microsoft

Microsoft announced new virtual machines that integrate the latest NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking. Virtual machines are how Microsoft delivers to customers infrastructure that can scale to size for any AI task. Azure’s new ND H100 v5 virtual machine provides AI developers exceptional performance and scaling across thousands of GPUs.

In 2019, Microsoft and OpenAI entered a partnership, which was extended this year, to collaborate on new Azure AI supercomputing technologies that accelerate breakthroughs in AI, deliver on the promise of large language models and help ensure AI’s benefits are shared broadly.

Microsoft and OpenAI began working to build supercomputing resources in Azure that were designed and dedicated to allow OpenAI to train an expanding suite of increasingly powerful AI models.

This infrastructure included thousands of NVIDIA AI-optimized GPUs linked together in a high-throughput, low-latency network based on NVIDIA Quantum InfiniBand communications for high-performance computing.

“There was definitely a strong push to get bigger models trained for a longer period of time, which means not only do you need to have the biggest infrastructure, you have to be able to run it reliably for a long period of time,” Nidhi Chappell, Microsoft head of product for Azure high-performance computing and AI, said.

Related News

Latest News

Latest News