o1 - Dawn of AGI

OpenAI's o1 models tackle complex problems, Mistral's Pixtral 12B processes images and text, Adobe's Firefly revolutionizes video editing, and Google's NotebookLM turns notes into podcasts.

September 13, 2024

Just a few months ago, the AI world was abuzz with OpenAI's release of GPT-4O, a model that stunned us with its ability to pass bar exams and excel at complex reasoning tasks. We marveled at its multimodal capabilities, processing both text and images with unprecedented accuracy. It seemed like we'd reached the pinnacle of AI achievement.

OpenAI's latest offering, the o1 series of reasoning models, is a testament to how far we've come. Launched just this week, these models are pushing the boundaries of what AI can do, tackling complex problems with a level of sophistication that rivals human experts. Imagine an AI that can go toe-to-toe with PhD students in physics and chemistry, or one that can crack math problems that would make most of us break out in a cold sweat. That's the o1 series for you.

But OpenAI isn't the only player in this high-stakes game. Mistral, a French startup, is turning heads with Pixtral 12B, their multimodal AI that processes images and text with ease. Adobe's upcoming Firefly Video Model promises to revolutionize video editing, making content creation as simple as typing a prompt. Meanwhile, Google's latest innovation in NotebookLM transforms mundane notes into engaging AI-hosted discussions. These developments are pushing the boundaries of AI applications, from visual processing to content creation and information synthesis.

As we stand on the precipice of this AI revolution, one thing is clear: the future is here, and it's more exciting than we ever imagined. In this newsletter, we'll dive deep into these groundbreaking developments and explore what they mean for our world.

OpenAI Launches o1 Preview Advanced Reasoning Models

OpenAI has introduced the o1 series of reasoning models, designed to tackle complex problems more effectively than previous iterations. Launched on September 12, 2023, the o1-preview model emphasizes a thoughtful approach to problem-solving, allowing it to excel in areas such as science, coding, and mathematics. This model has shown impressive performance in benchmark tasks, achieving results comparable to PhD students in fields like physics and chemistry, and scoring significantly higher in math problems than its predecessor, GPT-4. The o1 series also incorporates enhanced safety measures, allowing the model to adhere to safety guidelines more effectively. It is particularly beneficial for users in research and development, providing tools for tasks like annotating scientific data and generating complex mathematical formulas. Alongside the o1-preview, OpenAI has released o1-mini, a more cost-effective model aimed at coding tasks. Access to these models is available to ChatGPT Plus, Team, and Enterprise users, with plans for broader availability in the future.

Mistral Launches Pixtral 12B, Its First Multimodal AI Model

Mistral, a French AI startup, has launched Pixtral 12B, its first multimodal model capable of processing both images and text. This model, which features 12 billion parameters and is approximately 24GB in size, builds on Mistral's previous text model, Nemo 12B. Pixtral 12B can answer questions about a variety of images, whether provided as URLs or encoded in base64 format, and is designed to perform tasks such as image captioning and object counting. The model is available for download on GitHub and Hugging Face under an Apache 2.0 license, allowing users to fine-tune and utilize it without restrictions. Mistral's release follows a significant funding round that valued the company at $6 billion, positioning it as a key player in the AI landscape, particularly in Europe. The company aims to provide open models while also offering managed versions and consulting services to corporate clients.

Adobe Unveils Firefly Video Model

Adobe has announced the upcoming Firefly Video Model, which will introduce text-to-video and image-to-video capabilities into its Creative Cloud applications, including Adobe Premiere Pro, Adobe Express, and Adobe Digital Marketing workflows. This innovative model is designed to assist video editors by generating dynamic video content that can help fill gaps in timelines, ideate creative concepts, and integrate new elements into existing footage, all while ensuring commercial safety. Key features include the ability to create video segments from text prompts, transforming still images into live-action clips, and a generative extend feature that allows for smoother transitions and precise edits. The Firefly Video Model is set to enter beta testing later this year, with interested users encouraged to sign up for a waitlist on Adobe's official website. This initiative is part of Adobe's broader strategy to integrate generative AI into its suite of creative tools, enhancing the capabilities available to video editors and content creators.

Google's NotebookLM Launches AI Audio Overviews for Notes

Google has introduced a new feature called "Audio Overview" for its AI-powered note-taking tool, NotebookLM. This innovative functionality allows users to transform their uploaded documents into engaging, podcast-style audio discussions featuring two AI hosts. The AI hosts summarize the content, make connections between topics, and even engage in light banter, creating a lively dialogue that enhances the learning experience. While the feature aims to provide a more interactive way to digest information, it is still in the experimental phase, with limitations such as only supporting English and potential inaccuracies in the generated content. Users can access this feature by navigating to the Notebook guide within NotebookLM, and the audio discussions can be downloaded for on-the-go listening. The introduction of Audio Overview is part of Google's efforts to leverage AI to improve information retention and engagement for users.

Hand Picked Video

In this video, we explore OpenAI's o1-preview update, showcasing its ability to create a Flappy Bird game with one prompt, despite struggling with a simple reasoning question. Full video out today!

Top AI Products from this week

Olly Social - Running 30% OFF, claim on their website.
Serra (YC S23) - Serra is a GPT-powered search engine for recruiters. Instead of manually selecting keywords and reviewing candidates one by one, recruiters can search in plain English and instantly see the best matches, with all their research done for them.
Replit Agent - The Replit Agent is an AI-powered tool designed to assist users in building software projects. It can understand natural language prompts and help create applications from scratch, making software development more accessible to users of all skill levels.
Patched (YC S24) - With Patched, development teams can create custom AI workflows to automate code reviews, documentation, and patches. The workflows can be self-hosted through our open-source framework and integrated with your preferred LLM, ensuring full privacy and control.
Thunderbit - Thunderbit is a Chrome Extension that automates your web tasks. Create personalized web AI copilot in 1-click using AI. Build AI app and automation just by filling out an easy form. We're challenging the No-Code status quo with AI.
Zight AI - Zight AI enhances communication with AI-driven video editing, screen recording, and GIF creation. Perfect for marketers, educators, and remote workers, it automates tasks so you can focus on your message. Just press record and let Zight AI boost productivity!
AIPhone.AI - AIPhone.AI translates phone calls in real-time, backed by a multilingual translation model, offering super accurate translations. Communicate confidently in your native language during calls, no matter what language the other person speaks!

This week in AI

Apple Unveils New Intelligence Features - Apple is set to introduce advanced intelligence features for iPhone, iPad, and Mac starting next month, enhancing user experience with smarter capabilities across devices.
Freepik Launches Retouch Tool - Freepik's new Retouch tool allows users to edit images effortlessly using AI. After signing up for credits, upload an image, select areas to modify, and provide prompts for changes like hair or eye color.
Hume AI Launches Voice Mode Substitute - Hume AI has introduced a new tool to replace voice mode, allowing users to easily generate audio from text. After signing up, simply enter your text prompt and the AI will create a high-quality audio recording in seconds, without the need for a microphone.
Mapify: AI-Powered Mind Mapping Tool - Mapify (formerly Chatmind) enables users to create mind maps from various sources, like documents and videos, in seconds. With AI assistance, it simplifies complex information, enhances learning, and offers templates for quick mapping.