Ilya, I/O & AGI

Full summary of Google I/O Event, notably, 13 updates about pretty much everything that went down on the event. Ilya leaving OpenAI. Top AI Tools & more.

May 16, 2024

This newsletter is mostly dedicated to all Google I/O Events. Before we go in there though, Ilya co-founder of OpenAI just quit the company after a decade. He was leading the superalignment team, the AGI Project, and the future seems a little hazy now. Sigh. On the other side of the story, Google made a bunch of announcements in their I/O event. Let’s take a look.

Google announced at least 13 new things in the recent I/O event.

It was insane.

Two models of Google Gemini 1.5 (Gemini 1.5 Pro and Flash)
Innovative text-based image and video models (Imagen-3 and VEO)
A Music AI Sandbox for music producers
Enhanced Google Gemini with direct access to Google Search and Mail
A virtual assistant named Google Astra that engages in lifelike conversations
Google's LearnLM, a powerful tool designed to revolutionize education and studying.

And a lot more. Let’s jump right in.

Gemini 1.5 Pro, Generally available

Google’s Gemini 1.5 Pro, which has a 1 million token context window - the longest of any widely available consumer chatbot. This allows Gemini 1.5 Pro to analyze and summarize multiple large documents up to 1,500 pages, 100 emails, or 30,000 lines of code. Gemini 1.5 Pro brings improvements to image understanding, like being able to provide recipes for photos of dishes or step-by-step solutions for photos of math problems

Gemini 1.5 Flash

Gemini 1.5 Flash is a new addition to Google's Gemini family of AI models, designed for speed and efficiency in handling high-volume, high-frequency tasks at scale. Gemini 1.5 Flash is available for developers to try in Google AI Studio and Vertex AI, with a 1 million token context window initially and 2 million tokens available upon request.

It excels in multimodal reasoning, allowing it to process various types of information like text, images, and videos to generate intelligent results without switching to a different model. Gemini 1.5 Flash features a breakthrough long context window, enabling it to analyze and summarize vast amounts of information efficiently.

The pricing for Gemini 1.5 Flash starts at $0.35 per million tokens, making it a cost-effective option for tasks that require quick turnaround times

Imagen 3

Imagen 3 is the next generation of Google's text-to-image AI model, following the success of its predecessors. This new AI model was introduced shortly after the previous version, Imagen, showcasing Google's commitment to advancing AI technologies. This model understands prompts written in natural, everyday language, making it easier for users to obtain the desired output without the need for complex prompt engineering.

Imagen 3 enhances text-to-image generation with improved accuracy, natural language understanding, and built-in safety precautions for responsible AI.

Imagen 3 is part of the Imagen model family, offering photorealism and deep language understanding for various applications requiring high-quality image generation from text descriptions.

Music AI Sandbox

Google's Music AI Sandbox is a groundbreaking suite of AI tools that revolutionizes music creation. It enables users to generate loops and instrumental sections from text prompts, transforming sounds in innovative ways. The platform, a collaboration between Google's DeepMind and YouTube, involves musicians like Wyclef Jean, Justin Tranter, and Marc Rebillet. Google ensures responsible AI advancement, incorporating watermarks to identify AI-generated content and addressing challenges posed by generative technologies

Google is mindful about advancing AI responsibly and enabling people to work with AI-generated content. The generated content includes watermarks to identify it as AI-made. Google is taking measures to address challenges raised by generative technologies

Veo - SORA’s nightmare?

Google unveiled Veo, its new AI video generation model, at the Google I/O 2024 event. Veo is designed to compete with OpenAI's Sora video generation model. Veo is part of Google's foundation models that include text-to-image, text-to-code, and speech-to-text capabilities. It is designed to unlock visual creativity by generating high-quality videos from text prompts, with a focus on photorealism and deep language representations.

The Veo model is being incorporated into multiple Google products, including video generation in Google Slides, Cloud Vertex AI, and Android's Generative AI features. This will allow users to create videos more easily and efficiently using natural language prompts. Google is taking a responsible approach to the development of Veo, ensuring that the generated content includes watermarks to identify it as AI-made. The company is also working to address challenges raised by generative technologies like Veo

Their most powerful TPU so far!

Google's Trillium is the sixth generation of its Tensor Processing Unit (TPU), a custom-built chip designed to accelerate neural network computations. Google's Trillium TPU represents a significant advancement in AI hardware, offering improved compute performance for accelerating AI workloads. Built on Google's custom Tensor Processing Unit ASICs, the Trillium TPU features a neural-network-specific architecture optimized for both training and inference tasks, with a focus on efficient matrix multiplication.

This powerful TPU is available to Google Cloud customers for training and inference workloads, playing a crucial role in Google's AI infrastructure. Google supports the Trillium TPU with a rich software ecosystem, including TensorFlow and model implementation examples for tasks like image classification and machine translation, enhancing its usability and effectiveness in various AI applications.

Project Astra

Project Astra, unveiled at Google I/O 2024, is a revolutionary AI assistant that can see, identify objects, and assist in various tasks. P

owered by an upgraded Gemini Ultra model, it processes audio, images, video, and text seamlessly, offering a conversational and intuitive experience. Astra showcased its abilities in modes like Storyteller and Pictionary, recognizing objects, people, moods, and textiles, and remembering scenarios.

Its potential applications include wearable technology, like future Google Glass models, promising more immersive user experiences

Agent Chip

Google's AI Teammate, "Chip," unveiled at Google I/O 2024, redefines the AI chatbot experience by acting as a virtual coworker in group chats, emails, and documents. Chip has a distinct identity, workspace account, role, and objectives within the company. It learns from team input, shares collective knowledge, and interacts with all group chat members, enhancing productivity and communication within Google's Gemini for Workspace platform. Additionally, Gemini Live, part of "Project Astra," introduces a multi-modal AI model with advanced voice and video capabilities, akin to OpenAI's GPT-4o, enhancing smartphone AI experiences.

Gemini now in Search

Google's "Gemini" initiative aims to revolutionize search with AI-powered features like "AI Overviews" for concise answers, video queries, voice commands, and personalized suggestions for activities.

The gradual rollout, starting in the US and expanding globally, will transform Google Search into an intelligent assistant.

Users can opt into search lab experiments to access these features sooner. The company's commitment to enhancing search capabilities through advanced AI showcases its vision for the future of search.

Gemini Nano in Android & Chrome

Google is introducing full multimodal capabilities to Gemini Nano, enhancing user experiences on Android devices. With Gemini Nano, users can expect their phones to process not only text but also understand context from various inputs like images, sounds, and spoken language.

This advancement will bring clearer descriptions to TalkBack, aiding individuals with blindness or low vision by providing detailed image descriptions. Additionally, Gemini Nano will offer real-time alerts during phone calls to help users identify potential scams, enhancing privacy and security by detecting suspicious conversation patterns.

These features will be available on Pixel devices later this year, leveraging the power of on-device AI for quick responses and privacy protection.

Vision Model - PaliGemma

PaliGemma is an open vision-language model introduced by Google, inspired by PaLI-3 and built on open components like the SigLIP vision model and the Gemma language model. This model is designed to excel in fine-tuning performance across various vision-language tasks, including image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation. Google offers both pretrained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned for a mix of tasks, enabling immediate exploration and utilization of PaliGemma for diverse vision-language applications

Gemma finetuned for 15 Indic languages!

Project Navarasa, an initiative by Google to process content in 15 Indian languages, took center stage at Google I/O 2024. The project aims to make Google's products and services more accessible and useful for Indian users by supporting a wide range of local languages.

Navarasa 2.0, a key component of the project, was highlighted during the event. It focuses on enhancing Google's ability to process and generate content in 15 major Indian languages, including Hindi, Bengali, Telugu, Marathi, Tamil, and Urdu. Project Navarasa is part of Google's broader efforts to make AI helpful for everyone, as mentioned by Sundar Pichai during the Google I/O 2024 keynote.

LearnLM

Google's new family of models called LearnLM, which are fine-tuned for learning and grounded in educational research to enhance teaching and learning experiences. LearnLM aims to make learning more active, personal, and engaging by infusing principles of learning science into its models and products.

LearnLM is being integrated into various Google products like Search, YouTube, and Gemini to enhance learning experiences. For example, in Google Search, users will soon be able to simplify complex topics, while on Android, Circle to Search will assist with math and physics problems.

Moreover, LearnLM will be applied in schools through a pilot program in Google Classroom to simplify lesson planning for educators. The goal is to help teachers discover new ideas, find engaging materials, and differentiate lessons to meet students' needs.

There’s more. But these are the ones you need to know for now.

Let’s also take a look at the other updates from this week.

This week in AI

Elon Musk's xAI Deal - Elon Musk's xAI startup nears $10B deal with Oracle to rent AI servers, aiming to rival OpenAI and Google in the AI race.

Transfer Customer Experience – Verizon introduces AI tools like GenAI to enhance customer service, offering personalized solutions and improving interactions effectively.

AI Video Trasnlation : Notta Showcase empowers users to effortlessly translate videos into 15+ languages while retaining original voice style — making it easier than ever to expand content to a global audience.

Ooto AI : OctoStack is a turnkey GenAI serving stack to run your optimized models in your environment on your GPUs.

Ilya Sutskever saying Goodbye to OpenAI : OpenAI’s cofounder Ilya Sutskever announced he was leaving the company. He plans to focus on a “project that is very personally meaningful” to him.