OpenAI Unveils GPT-4o: A New, Free, and Rapid Model With a Voice Assistant So Natural, It May Seem Unbelievable

Forward-looking: OpenAI recently unveiled GPT-4o (GPT-4 Omni or “O” for shorthand). This model matches GPT-4 in intelligence yet distinguishes itself with significant innovations: it manages text, visuals, and audio inputs together, ensures minimal latency in interactions, and boasts an exceptionally human-like voice.

Current chatbots, despite being the most sophisticated to date, all experience notable latency. Response times can vary widely, from a second to several seconds. Companies like Apple are exploring on-device AI processing to tackle this issue. OpenAI, however, has pursued a different avenue with Omni.

Responses from Omni were observed to be prompt in the Monday demonstration, yielding a conversation flow that surpassed typical chatbot experiences. It also showed an adeptness at handling interruptions, pausing mid-response if it was spoken over, in contrast to insistently completing its statement.

OpenAI attributes Omni’s brief response time to its ability to process text, visuals, and audio together. Previously, ChatGPT would handle mixed inputs via a network of separate models, whereas Omni consolidates everything into a seamless response without delay from awaiting outputs from other models. It leverages the GPT-4 “brain” but adds the capability to handle additional forms of input, a feature OpenAI CTO Mira Murati believes should become a standard.

“GPT-4o delivers the intelligence of GPT-4 but at a much quicker pace,” Murati stated. “We see GPT-4o as a paradigm shift towards the future of collaboration, making interactions feel more natural and significantly easier.”

The voice(s) of Omni particularly impressed during the demo. Conversations with the bot featured casual diction mixed with natural pauses and even laughter, imbuing it with a human essence that left some questioning if it was genuine or not.

Both enthusiasts and skeptics will likely examine the demo closely, much like the reaction to Google’s Duplex unveiling. Google’s digital assistant was ultimately confirmed as legitimate, and we can anticipate similar validation for Omni, despite its voice rendering Duplex relatively underwhelming.

Yet, further scrutiny might be unnecessary. A demonstration had GPT-4o conversing with itself on two phones, which, although still human-like in voice, made the interaction appear more synthetic and less natural which is logical when the human element is absent.

Toward the demo’s close, the presenter prompted the bots to sing, leading to an awkward coordination attempt for a duet, further eroding the illusion. Omni’s overly enthusiastic tone could also benefit from some refinement.

Moreover, OpenAI announced the launch of a ChatGPT desktop application for macOS, with a Windows version planned for later in the year. The app is presently available to paid GPT users, with a free version to be released later. The web version of ChatGPT is presently powered by GPT-4o, and the model is set to become accessible to free users, albeit with certain restrictions.

Scroll to Top