What changed with GPT-4o?

"I will no longer talk about news," I had promised myself in the past months, there are too many, on topics not so important, and writing would only increase the confusion. Furthermore, I am writing the next book: “Hire an AI in your Organization” and I really couldn't find the space to write something new.

The ongoing competition

In recent weeks, on the rankings of Chatbot Arena, a site where it is possible to compare different AI models, a model named 'im-also-a-good-gpt2-chatbot' and it was so noisy because first in speed and reliability. As revealed by William Fedus of OpenAI, was actually GPT-4o.

But something has changed again with GPT-4o, the new LLM Model released by Open AI; first of all, equipped with a name as always horrible (the ‘o’ stands for ‘omni,’ which perhaps was better-left whole. So let me call it Omni).

The results, although not of orders of magnitude different, showed how a new model had appeared that clearly detached GPT-4-Turbo, previously the best of all.

So, since March 2023, when GPT-4 was released, the 'other big ones' have spent a few billion to try to catch up to a model more than a year old, without succeeding. With this release, Open AI has once again detached everyone and defined new paradigms that we will see shortly.

The release of GPT-4o / Omni has demonstrated to everyone the current superiority of OpenAI from this point of view. Also, in this case, we are in the presence of a model with a rather ‘old’ cut-off date: October 2023. (The cutoff date is the date on which the training of the model ended). To signify that probably the model was in the drawer for a while, just like Sora.

The best interface is… an AI interface

I have always been interested in creating interfaces between men and computers. I started to see the first green phosphor computers at the beginning of my career when punch cards were being forgotten. And right away, it was the 90s, I was a pioneer of the new revolutionary graphic interfaces brought by Mac and Windows (3.0). From there, for 30 years, there has been all the evolution (and sometimes involution) that you may have partly experienced as well: extremely complex software to use, functions, hidden buttons, cheat codes even on web pages (The latest? Yesterday to activate Chat GPT-4o a few minutes earlier https://chatgpt.com/?oai-dm=1). And constant efforts to simplify the interaction and interfaces to make them accessible, UX and UI Designers have racked their brains to make the graphical interfaces of Software 1.0 easy to use.

Although yesterday, May 13th, 2024, was not a final step, it is a taste of the direction Golden Krishna already highlighted in 2015 with his book “The Best Interface is No Interface”: direct interaction between man and machine without interfaces, at least graphical.

Omni is a multimodal model capable of reading text, listening to audio, seeing images, and responding accordingly almost in real time. (The previous model translated audio into text, processed a text-to-text response, and translated the text into speech again) and was slow

Now, it is possible to speak with the model with almost zero latency. You can interrupt it as you would (rudely) do with any human, and you can talk about what the model is seeing (through the cameras of our devices).

It allows us to get very close to the Operating System based on LLM theorized by Andrej Karpathy, as seen here.

To understand what we are talking about, once you've read this post, take half an hour and watch all the videos you find here, or here if you prefer X.

These features have not yet been released to everyone, but the videos should not have been falsified. (Conditional required)

New use cases for AI

For those who trust on word, from today, it is possible:

to instantly translate one language into another in a conversation, just as an interpreter would do.
Let the AI see a meeting and ask it to distinguish between the people it hears talking (But without it participating as an agent, simply showing it video and audio on the screen).
Let the AI sing (yes, you had tried that with Alexa too, but here we are on another level).
Let two AIs interact with each other, making them dialogue in human language. (And asking them to help each other discover the world).
Get advice on your appearance and voice tone (How do I look? We all asked ourselves this question but were afraid to ask fearing a negative human judgment).
You can get help with tasks while doing them on a tablet or in a notebook (without writing text or complex prompts, as you would with a real tutor).
Further evolve the concepts of customer service with very fast direct interactions between often annoyed users and often too stressed operators.
On the other end, it have a humanly emotional voice (always with distinctions related to the concept of human and synthetic emotion).

On this last point, I will stop for a while.

Vocal AIs, although they have made giant steps, have always been 'toneless' and a bit boring. Now you can joke with GPT-4o / Omni and ask her to be serious, radiant, and positive. But her itself, depending on what you tell her, will be amazed, worried, or laugh at what you have told it. (She as an AI)

On the concept of “she,” Sam Altman simply tweeted “her” yesterday, referring to the famous movie in which the protagonist falls in love with the voice of an AI.

Impacts on education

The tutoring part is quite interesting. Presented no less by Sal Khan, the founder of Khan Academy, whom I have already talked about in the past, it shows how his son manages to do math homework assisted by a tutor who observes in real time what he is doing, asks him questions, and offers suggestions. I will have to revise my school workshops now…

Impacts on business

At my last workshop, on Friday, it was evident that many people in the company were using free AIs like GPT-3.5. Maybe 5% of them are using 4.0 class solutions. But now that GPT-4o is available for free to everyone, what will happen in companies?

It is almost like having given everyone Chat GPT4, an operation that previously cost between 20 and 30 USD per user per month. Some parts of my book need to be edited again.

A detail about privacy

For those who are already complaining about the data's destination, OpenAI has also renewed Chat GPT's interface. Now, even free users can decide NOT TO DELIVER THEIR CONVERSATION DATA TO OPEN-AI with a simple click.

Does this mean that the data is safe, in cloud environments with very high security standards (Iso 27000, HIPAA, SOC 2 TYPE II, and a lot of other very important standards)?

It needs to be explored further, but at least an answer has been given to one of the reasons that most hindered the free use of AI.

Who knows how much it costs

The interesting thing is that GPT-4o has already been released to all OpenAI users, including those for free.

OPENAI has therefore decided to carry forward its 'creed' of 'accessible AI' by offering everyone the most powerful AI model ever albeit with time and functionality limitations (including, for example, the voice assistant)

And this is where all the considerations about business models, who the product is, etc., can run wild. As long as we admit that we are not looking at a traditional business and therefore the future is yet to be written.

So…

We have a new model, multimodal for real, fast (I'm a little sorry for Groq, but NVidia's new GPUs come into their own on GPT-4o), which is still far from the concept of General Artificial Intelligence, but still has a lot of itself to discover.

It's especially a lot to ask of competitors, who will now have to figure out when they can pull their state-of-the-art out of the drawer and maybe finally bring in something new.