OpenAI: An improved GPT-4o, but no search engine

Last Modified: May 13th, 2024 - Category: Artificial Intelligence, Marketing, News, SEO Blog
OpenAI GPT-4o cover image

So, we were fooled by OpenAI, as you’ll see in our last post. Contrary to what well-informed sources said, OpenAI hasn’t launched a search engine version of ChatGPT (but it’s coming, mark my words). Instead, they announced (and launched) Chat-GPT4o, an improved version of their paid flagship ChatGPT-4, available only to paying users.

But not only that, they also added a lot of new features for free users. This is a big deal for everyone and an obvious direct response to Gemini and Perplexity.

OpenAI has unveiled GPT-4o, an advanced version of its AI that revolutionizes human-computer interaction. Contrary to expectations of competing directly with Google, GPT-4o is a multimodal assistant that integrates text, audio, and visuals for a more fluid and natural interaction. Sam Altman, CEO of OpenAI, has presented the innovations and benefits of this new version in a series of posts and a blog entry.

What’s in a name?

GPT stands for “Generative Pre-trained Transformer.” It is a type of artificial intelligence model developed by OpenAI. Here’s a brief breakdown of each term:

  • Generative: The model generates new content, such as text, based on the input it receives.
  • Pre-trained: The model is initially trained on a large dataset to learn language patterns before being fine-tuned for specific tasks.
  • Transformer: This refers to the architecture of the model, which uses attention mechanisms to process input data and generate output efficiently.

Overall, GPT is designed to understand and generate human-like text based on the input it receives.

…and why 4o?

The “o” in GPT-4o stands for “OMNI,” highlighting the model’s versatility and comprehensive capabilities. This term is used to emphasize that GPT-4o can handle a wide range of tasks across different modalities, including text, speech, and images. The OMNI designation signifies the model’s enhanced ability to process and generate content in multiple formats, making it more adaptable and useful for a variety of applications​ (source: OpenAI)

Innovations in GPT-4o

  • Multimodal Interaction: GPT-4o accepts and generates responses in text, audio, and image form, enabling richer and more diverse communication. This allows users to interact with the AI in the most convenient way for them, whether that be through typing, speaking, or showing visual information.
  • Response Speed: It responds to audio input in 232 milliseconds and to text in an average of 320 milliseconds, significantly improving the flow of conversation. This rapid response time ensures that interactions with the AI are seamless and nearly instantaneous, enhancing user engagement and productivity.
  • Economy and Efficiency: the new version is 50% more economical and faster than previous versions while maintaining the quality of the GPT-4 Turbo and improving it in languages other than English. This efficiency makes advanced AI technology more accessible to a broader audience, reducing costs for businesses and developers.
  • Image and Sound Improvement: Significant advances in audio and visual comprehension enable more accurate and natural interactions. The AI can now better understand and interpret complex visual and auditory inputs, making it more effective in a wide range of applications.
  • Enhanced Visual Understanding: the model excels at understanding and discussing images shared by users. For example, it can translate a menu in a different language, explain the history and significance of the food, and provide recommendations. Future updates will enable more natural, real-time voice conversations and video interactions, such as explaining live sports games. A new Voice Mode with these capabilities is set to launch in an alpha version soon, with early access for Plus users.

Impact on the User Experience (UX)

The features of this new ChatGPT model significantly improve the user experience. The multimodal interaction allows users to communicate more intuitively, reducing friction when interacting with the AI and increasing satisfaction. The speed of response leads to a smoother experience, in line with Fitts’ Law, which states that fast response times increase user satisfaction. The improvement in vision and hearing also leads to better accessibility, allowing users with different abilities to interact effectively with the technology.

The enhanced multimodal capabilities mean that users can switch seamlessly between different modes of interaction, depending on their needs and preferences. This reduces cognitive load and makes the interaction with AI more natural and less stressful. Furthermore, the improved speed and accuracy contribute to a more responsive and reliable user experience, which is crucial in maintaining user trust and engagement.

Advances in the User Interface (UI)

OpenAI is introducing a new look and feel for ChatGPT designed to be friendlier and more conversational. Users will notice a new home screen, message layout, and more, aimed at making the interaction with ChatGPT more engaging and intuitive.

From a UI perspective, GPT-4o facilitates the development of more dynamic and responsive interfaces. The ability to process multimodal input enables the development of applications that respond to text, voice, and image commands, allowing for richer and more versatile interaction. This flexibility improves accessibility and usability, making applications more inclusive and universal. The generation of precise contextual responses in multiple languages also extends applicability around the world.

Developers can now design interfaces that are more engaging and interactive, taking full advantage of GPT-4o’s capabilities. For instance, voice-activated commands can enhance user interaction in scenarios where typing is inconvenient or impossible. Visual input processing can improve applications in fields like healthcare, education, and customer service by providing more intuitive ways to interact with AI.

Screen capture of ChatGPT-4o new user interface in dark mode
ChatGPT-4o new UI, dark mode

There are also some visual changes in GPT-4o user interface (UI), with some small rounded corners boxes featuring colorful icons instead of the classic text-only boxes. Personally, I think they add nothing to the user experience and they look like a “hey, we wanna jump the Gemini and Perplexity bandwagon!”. Which, if you think about it, it’s rather strange considering ChatGPT is teh absolut leader in teh market. BUt well, I assume they have had their user research in line, so I can’t say nothing but my subjective opinion.

Applications and Demonstrations

OpenAI has shown several demonstrations of GPT-4o, including interactions between two instances of ChatGPT and support for tasks such as content creation and real-time translation. One notable demonstration shows two ChatGPTs in dialogue with each other, reminiscent of the 2018 Google demonstration. These applications highlight GPT-4o’s ability to perform complex tasks with high accuracy.

Example of text to font conversion in chatGPT-4o
An example of what you can do. I couldn’t achieve this.

Other demonstrations have shown how GPT-4o can be used in customer support, providing instant and accurate responses to customer inquiries. In the field of education, GPT-4o has been used to create interactive learning experiences, where students can ask questions and receive detailed explanations in real-time. These applications illustrate the potential of GPT-4o to transform various industries by providing smarter, more efficient solutions.

However, I tried the exact same prompt examples and couldn’t achieve similar results. In fact, my results were quite poor. Nonetheless, we’ll be doing a comprehensive test of the new model in the near future, so stay tuned!

Making ChatGPT accessible

In his blog, Altman emphasized two important points:

  • Accessibility of AI Tools: OpenAI is committed to making advanced AI tools available for free or at low cost. This commitment ensures that more people, regardless of their financial resources, can benefit from cutting-edge AI technology.
  • Innovation in User Interfaces: The GPT-4o’s new voice and video modality is a significant advancement that makes interacting with computers more natural and intuitive. These innovations are designed to break down barriers and make AI more accessible to everyone, including those with disabilities.

Altman emphasized that GPT-4o is available to all ChatGPT users, even in the free version. In addition, the GPT-4o API is more affordable so that it can be used by developers and businesses. This move is part of OpenAI’s commitment to democratizing access to advanced AI tools.

In my opinion, OpenAI is making a clever move against Gemini here: the “good GPT” was chargeable, while model 3 is free but much less effective. Therefore, heavy users or those looking for better results had to pay or turn to competitors like Gemini or Perplexity. With an improved free version, the tolerance threshold for users looking for free solutions will be much higher, so to speak.

Enhanced Language Capabilities and Accessibility

To make advanced AI more accessible and useful worldwide, GPT-4o’s language capabilities are improved across quality and speed. ChatGPT now supports more than 100 languages across sign-up and login, user settings, and more. This ensures that non-English speaking users can fully benefit from the advanced capabilities of GPT-4o.

This is no surprise for major languages. I tested it in several languages, and the results were excellent. As someone who speaks five languages with varying degrees of fluency, I can confirm that the previous version excelled in internationalization or Universal UX. However, 100 languages is a significant feat, and I certainly don’t speak all 100. This means there’s bound to be room for improvement in less common languages. Just look at the table below – it includes a whopping 107 languages!

AfrikaansAlbanian (new)Amharic (new)Arabic
ArmenianAzerbaijani (new)Basque (new)Belarusian (new)
BengaliBosnian (new)BulgarianCatalan
Cebuano (new)Chichewa (new)Chinese (Simplified)Chinese (Traditional)
Corsican (new)CroatianCzechDanish
DutchEnglishEsperanto (new)Estonian
FilipinoFinnishFrenchGalician (new)
Georgian (new)GermanGreekGujarati
Haitian Creole (new)Hausa (new)Hawaiian (new)Hebrew
HindiHmong (new)HungarianIcelandic
Igbo (new)IndonesianIrishItalian
JapaneseJavanese (new)KannadaKazakh (new)
Khmer (new)Kinyarwanda (new)KoreanKurdish (new)
Kyrgyz (new)Lao (new)Latin (new)Latvian
LithuanianLuxembourgish (new)Macedonian (new)Malagasy (new)
MalayMalayalamMaltese (new)Maori (new)
MarathiMongolian (new)Myanmar (Burmese) (new)Nepali
NorwegianPashto (new)PersianPolish
Samoan (new)Scots Gaelic (new)SerbianSesotho (new)
Shona (new)Sindh (new)SinhalaSlovak
SlovenianSomali (new)SpanishSundanese (new)
SwahiliSwedishTajik (new)Tamil
Tatar (new)TeluguThaiTongan (new)
TurkishTurkmen (new)UkrainianUrdu
UzbekVietnameseWelshXhosa (new)
Yiddish (new)YorubaZulu
List of languages supported by GPT-4o

More Advanced Tools for Free Users

As I said above, the announcement wasn’t only about the new and improved 4o model, but also about the new features available for free users (e.g. GPT-3 users)

When using GPT-4o, ChatGPT Free users will now have access to features such as:

  • Experience GPT-4 level intelligence
  • Get responses from both the model and the web
  • Analyze data and create charts
  • Chat about photos they take
  • Upload files for assistance in summarizing, writing, or analyzing
  • Discover and use GPTs and the GPT Store
  • Build a more helpful experience with Memory

There will be a limit on the number of messages that free users can send with GPT-4o depending on usage and demand. When the limit is reached, ChatGPT will automatically switch to GPT-3.5 so users can continue their conversations.

Streamlining Workflow in the New Desktop App

For both free and paid users, OpenAI is launching a new ChatGPT desktop app for macOS designed to integrate seamlessly into any activity on the computer. With a simple keyboard shortcut (Option + Space), users can instantly ask ChatGPT a question. They can also take and discuss screenshots directly in the app.

A screen capture of ChatGPT-4o MacOS app
Capture of ChatGPT-4o MacOS app

Users can now have voice conversations with ChatGPT directly from their computer, starting with Voice Mode that has been available in ChatGPT at launch, with GPT-4o’s new audio and video capabilities coming in the future. Whether users want to brainstorm new ideas, prepare for an interview, or discuss a topic, they can tap the headphone icon in the bottom right corner of the desktop app to start a voice conversation.

The macOS app is rolling out to Plus users starting today, with broader availability planned in the coming weeks. A Windows version is also in the works, set to launch later this year.

Availability and Future

In addition to the web version, OpenAI has launched a new desktop application for macOS that integrates GPT-4o and allows users to experience this advanced technology in an even more direct and accessible way. This application underscores OpenAI’s commitment to making artificial intelligence accessible to a wider audience.

The desktop application is designed to provide a seamless user experience, with an intuitive interface that makes it easy to access and use the advanced features of GPT-4o. This development is expected to drive greater adoption of AI technology across various sectors, from education and healthcare to entertainment and business.

For more information and access to the demonstrations, visit OpenAI’s official blog.

My take on the new GPT-4o

GPT-4o is a milestone in the development of artificial intelligence, combining speed, economy, and multimodal capabilities into a great user experience. That’s for sure. While I wasn’t able to achieve the same results that OpenAI publishes (and honestly, not even close), what I was able to test in the first couple hours of this newborn is simply amazing. Yes, a lot of hype, but come on…

From a more or less objective point of view, OpenAI’s release of GPT-4o represents a significant development in the field of large language models, and this is undeniable. The model’s expanded capabilities hold promise for advancements in practical AI applications. Potential areas of impact include improved personal assistants with more natural and effective user interaction, as well as the development of educational tools that can adapt to individual learning styles.

OpenAI’s decision to make the model widely accessible opens doors for broader participation in AI research and development. While the long-term impact on innovation and global quality of life remains to be seen, increased accessibility has the potential to accelerate progress in these areas.

We can improve your business!

Let us help you with the best solutions for your business.

It only takes one step, you're one click away from getting guaranteed results!

I want to improve my business NOW!