Technology

Open AI: "HER" Moment

Mrigesh

May 20, 2024

OpenAI released it’s first multi-modal model, GPT-4o, where “O” stands for Omni. It is natively multimodal, which means it can “see” and “hear” and “speak” in an integrated way with almost no delays.

It can see what you are doing, react to it, respond to interruptions, use realistic voice tones, create images with precise control, and more - all seamlessly. Basically, GPT-4o is a chatbot that can interact naturally with the world around it.

Simply reading about it does not fully convey its impact, observing it firsthand is more. See these demo videos to gain a clearer understanding

  • Two GPT-4os interact and describe a scene - Youtube Link

  • AI assistant in sarcastic mode. Remember TARS from interstellar - Youtube Link

  • GPT-4 is a powerful tutor and teaching tool. Interactive AI tutor - Youtube Link


Watching the demo videos will provide insight into the significant upcoming changes. Those familiar with the movie "HER" will recognize the proximity of such a reality, where forming a deep connection with AI appears to be an inevitable progression.

Tl;DR: Event summary and implications

  1. Multi-Modality - GPT-4o seamlessly integrates all inputs and outputs into one unified neural network, in contrast to earlier models that required distinct pipelines for audio and text processing. This holistic approach improves the model's comprehension of context, speech subtleties, and interactions involving multiple speakers. It can even produce nuanced responses such as laughter, singing, and emotional expressions.

  2. Speed makes it more natural -   One of the most impressive features is its speed; GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average response time of 320 milliseconds. This speed makes interactions feel almost as natural as conversing with another person. Offering versatility, enjoyment, and paving the way for more natural human-computer interaction.

  3. Making GPT-4 accessible to everyone - GPT-4 vastly outperforms the free ChatGPT-3.5, comparable to working with a seasoned professional rather than an intern. Previously, the $20 monthly fee restricted access to its benefits, but this barrier has now been removed.  General people will start realising how powerful AI is.

  4. Lower token prices, Higher Inference speed, large context windows - GPT-4o model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. Hints towards more agentic behaviour and natural integration of AI with your workflow.

  5. Huge impact on Education : GPT-4 is revolutionary educational aid. Unrestricted access to GPT-4o will significantly amplify AI's educational impact.

  6. Less tokens to be used for non-English languages - which increases quality and reduces prices when using the API. enabling more native language application

  7. Vastly improved image generation - Being multi-modal significantly enhances image generation models. They can transform broken text into stylistically consistent writing with correct spelling in a single iteration. The result isn't just "high quality," but genuine quality—reflecting what you'd see in the videos you watch or in everyday life. It's about achieving actual realism, not just a semblance of it.


    The image below is not real; it was generated using GPT-4o

    GNptva7W4AAXFjM.jpeg


  8. Practical Applications - GPT-4o has diverse and impactful applications. It can be utilized for real-time translation, customer service, creative tasks like poster creation, and summarizing meetings with multiple speakers. Its versatility makes it an invaluable tool across various industries.


GPT-4o is an important shift in human computer interaction, embedding AI into our daily lives.  AI is no longer just a tool; it's our coworker, our companion, a constant presence. Bringing closer to the world of “Her”.


Sign up for the waitlist now!

We're currently developing Huku and are excited to share it with you soon.


Join the waitlist to get early access

More Articles

Sign up for the waitlist now!

We're currently developing Huku and are excited to share it with you soon.


Join the waitlist to get early access