- AI Report by Explainx
- Posts
- Google AI Mode Adds Image Search
Google AI Mode Adds Image Search
Google debuts multimodal AI search, OpenAI revives o3 and teases GPT-5, while Meta launches Llama 4 models rivaling top AI benchmarks. The AI race just got hotter.
The AI race just leveled up.Google, OpenAI, and Meta are all making bold moves—each redefining how we search, reason, and interact with AI.
Google is rolling out multimodal search in AI Mode, blending Lens with Gemini to let users snap a photo and ask questions about it, getting rich, scene-aware answers in return.
OpenAI has updated its GPT-5 roadmap, reviving its o3 model and teasing the powerful o4-mini—while prepping the long-awaited GPT-5 with unified reasoning, voice, and research tools.
And Meta just dropped the Llama 4 family, a new generation of open-weight, multimodal models that not only match but in some cases outperform GPT-4.5 and Claude 3 on key benchmarks.
Let’s break down what these moves mean—and where the AI landscape is heading next.
Google Introduces Multimodal Search in AI Mode
Google's AI Mode, now available to more Labs users in the U.S., integrates multimodal search capabilities powered by Lens and Gemini. This feature allows users to snap a photo or upload an image, ask questions about it, and receive detailed, contextually relevant responses. By combining advanced visual search expertise with Gemini's ability to understand entire scenes, AI Mode identifies objects, their relationships, and unique attributes like materials and colors. Using a query fan-out technique, it provides nuanced answers and recommendations, such as identifying books on a shelf and suggesting similar titles. Users can test this feature in the Google app via Labs.
OpenAI Updates GPT-5 Roadmap

OpenAI has announced plans to release its o3 reasoning model and the successor o4-mini within weeks, reversing its earlier decision to cancel o3's consumer launch. The delay in GPT-5's release, now expected in a few months, is attributed to challenges in integrating features and ensuring sufficient capacity for anticipated high demand. GPT-5 is set to unify models with enhanced reasoning capabilities, voice integration, Canvas, search, and deep research tools. OpenAI also plans to debut its first open language model since GPT-2, featuring reasoning capabilities and additional safety evaluations, while facing competition from rivals adopting open approaches to AI development.
Meta Unveils Llama 4 Family
Meta has introduced the Llama 4 family of open-weight, natively multimodal AI models, including Llama 4 Scout and Llama 4 Maverick, designed to enable more personalized multimodal experiences. Llama 4 Scout, with 17 billion active parameters and a 10M context window, is a high-performing model in its class, while Llama 4 Maverick, also with 17 billion active parameters, rivals GPT-4o and Gemini 2.0 Flash in benchmarks. These models are distilled from Llama 4 Behemoth, a 288 billion active parameter model that outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks. Llama 4 Scout and Llama 4 Maverick are available for download on llama.com and Hugging Face, and Meta AI with Llama 4 can be trialed in WhatsApp, Messenger, Instagram Direct, and on the web.
Hand Picked Video
In this video, we'll look at the exciting new suite of AI development tools from OpenAI, including their Agent SDK framework for building autonomous agents, enhanced Web Search capabilities, and the powerful Computer Use Tool that's benchmarking impressively on OS World, Web Arena, and Web Voyager.
Top AI Products from this week
Cua - Create and run local macOS and Linux sandbox with near-native performance on Apple Silicon with Lume. Features a computer-use interface (CUI) and a computer-use AI agent (CUA) for multi-app agentic workflow automation.
EverTutor Live - EverTutor is your AI-powered voice tutor, personalized to your learning style. Engage with it like an interactive Zoom call to master complex topics like GRE prep. Receive instant feedback as your AI tutor tailors lessons to your pace and needs.
Amurex - Amurex is an open-source AI that integrates in your workflow, connecting and organizing knowledge across your existing tooling. Search, organise, and retrieve insights effortlessly, without switching contexts. Fully self-hostable for complete data control.
Refgrow - Turn your users into affiliates — right inside your SaaS. Refgrow lets you embed a native affiliate dashboard into your product in minutes. No external tools. No earnings limits. From just $9/mo.
Sweep AI - Sweep is an AI coding assistant purpose-built for JetBrains IDEs. We bring features like multi-file code "apply" and automatic context awareness into a secure, on-prem ready AI plugin for JetBrains.
Flexprice - Flexprice is a billing platform that helps launch & iterate pricing without dev bottlenecks. Cloud or self-hosted No-code UI Usage-based pricing & metering Credits & top-ups Control feature access Works with Stripe, Chargebee, & more.
This week in AI
Midjourney V7 Update - Improved text & image quality, personalization, Draft Mode for fast rendering, Turbo & Relax modes available.
Microsoft's AI Quake II Demo - Microsoft's Muse AI generates a playable Quake II browser demo, showcasing real-time AI gameplay. It highlights game prototyping and preservation but has visual and interaction limits.
DeepSeek's AI Reasoning Breakthrough - DeepSeek, with Tsinghua University, unveiled GRM and SPCT techniques to enhance AI reasoning. Models align better with human preferences, outperform rivals, and may go open-source soon.
Your AI Companion: Microsoft's Copilot - Microsoft introduces Copilot, a personalized AI companion that learns your preferences, helps with tasks, and offers real-time assistance via memory, actions, vision, and more.
Amazon's 'Buy for Me' - Amazon tests 'Buy for Me,' letting users purchase from other brand sites within the Amazon Shopping app if Amazon doesn't sell the item directly. It uses AI for seamless purchases.