OpenAI to release AI model GPT-4o supporting voice, text and image

ChatGPT maker OpenAI said it would release a new AI model called GPT-4o that will support voice conversation and interaction across text and image.
Performance of GPT-4oGPT-4o accepts any combination of text, audio, and image as input and generates any combination of text, audio, and image outputs. It can respond to audio inputs in 232 milliseconds, with an average of 320 milliseconds.

GPT-4o matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50 percent cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

New audio capabilities enable users to speak to ChatGPT and obtain real-time responses with no delay, as well as interrupt ChatGPT while it is speaking, both hallmarks of realistic conversations that AI voice assistants have found challenging, the OpenAI researchers showed at a livestream event.

Microsoft-backed OpenAI faces growing competition and pressure to expand the user base of ChatGPT, its popular chatbot product that wowed the world with its ability to produce human-like written content and top-notch software code, Reuters news report said.

OpenAI’s chief technology officer, Mira Murati, said at the event that the new model would be offered for free because it is more cost-effective than the company’s previous models. Paid users of GPT-4o will have greater capacity limits than the company’s free users, she said. The GPT-4o model will be available in ChatGPT over the next few weeks.

After launching in late 2022, ChatGPT has emerged the fastest application to reach 100 million monthly active users. However, traffic to ChatGPT’s website has been on a roller-coaster ride in the past year and is only now returning to its May 2023 peak, according to analytics firm Similarweb.

GPT-4o heralds a new era of natural communication, boasting the ability to accept diverse inputs ranging from text to audio and image formats, and deliver outputs in a similar multifaceted manner. With response times as swift as 232 milliseconds for audio inputs, and averaging 320 milliseconds overall, GPT-4o rivals human conversational speed.

This groundbreaking model not only matches the performance of its predecessor, GPT-4 Turbo, in text and code processing but also demonstrates significant enhancements in handling non-English languages. Moreover, it offers superior proficiency in vision and audio comprehension, surpassing existing benchmarks in these domains.

Prior to the advent of GPT-4o, users could engage with ChatGPT through Voice Mode, albeit with notable latency issues. The cumbersome process involved a pipeline of multiple models, resulting in a loss of vital information such as tone, background noise, and emotional nuances. GPT-4o eliminates these inefficiencies by consolidating all processing tasks within a single neural network, thereby maximizing intelligence retention and output accuracy across modalities.

In rigorous evaluations across various metrics, GPT-4o has consistently demonstrated exceptional performance, achieving state-of-the-art results in tasks ranging from general knowledge questioning to speech translation. Notably, it sets a new high watermark in multilingual understanding, speech recognition, and visual perception benchmarks.

Safety remains a paramount concern in AI development, and GPT-4o incorporates robust safeguards against potential risks across its modalities. Through meticulous testing and refinement, OpenAI ensures that the model adheres to strict safety protocols and mitigates any potential hazards, guided by a comprehensive Preparedness Framework and extensive external scrutiny.

As GPT-4o prepares to make its debut, OpenAI plans a phased rollout of its capabilities, beginning with text and image processing functionalities. Initially accessible to users in the free tier and Plus subscription tier, GPT-4o promises enhanced message limits and improved usability within ChatGPT. Developers can also harness its power through the API, enjoying faster processing speeds, lower costs, and expanded rate limits compared to previous iterations.

Related News

Latest News

Latest News