Google's Gemini Got Deep Research🔎📝

Google's Gemini 2.5 Pro enhances research, new Nvidia's update TTT layers enable minute-long story videos, and Microsoft's Copilot Vision brings visual search to mobile devices.

In today's rapidly evolving AI landscape, three major players have announced significant enhancements to their platforms, pushing the boundaries of what's possible with artificial intelligence.

Google has upgraded its Gemini Advanced service with Deep Research capabilities powered by the new Gemini 2.5 Pro Experimental model. This enhancement delivers superior analytical reasoning and information synthesis, with reports preferred by users over competing services by a 2-to-1 margin. The feature includes Audio Overviews for on-the-go consumption across web and mobile platforms.

Meanwhile, researchers have made a breakthrough in AI video generation, implementing Test-Time Training layers into pre-trained Diffusion Transformers. This innovation enables the creation of one-minute cartoon videos from text storyboards with improved coherence and storytelling, outperforming previous methods by 34 Elo points in human evaluations.

Finally, Microsoft has expanded its Copilot Vision capability to mobile devices, transforming smartphones into interactive visual search tools. This feature allows users to analyze real-time video and stored photos, providing contextual information for tasks ranging from shopping to plant care, with further Windows integration currently in testing.

Embrace the future—AI tools are revolutionizing how we research, create, and interact with the world around us.

Gemini Advanced Unveils Deep Research on 2.5 Pro

Gemini Advanced subscribers can now access Deep Research powered by the Gemini 2.5 Pro Experimental model, which Google claims is the most capable AI model according to industry benchmarks. In testing, this new model generated reports preferred over other leading deep research providers by a margin of more than 2 to 1. Users are reporting improvements in analytical reasoning and information synthesis, resulting in more insightful research reports. Gemini Deep Research is available on the web, Android, and iOS, offering detailed reports on various topics to save users time. The service also features Audio Overviews, which turn reports into podcast-style conversations for on-the-go listening.

Generate One-Minute Cartoon Video

The update introduces Test-Time Training (TTT) layers into pre-trained Diffusion Transformers, enabling the generation of one-minute videos from text storyboards, a significant improvement over existing methods that struggle with long contexts and complex multi-scene stories. TTT layers use neural networks as hidden states, making them more expressive than traditional RNN layers like Mamba or DeltaNet. This approach enhances coherence and storytelling in video generation, outperforming baselines by 34 Elo points in human evaluations. While the implementation currently faces challenges such as artifacts and resource constraints, it holds promise for longer videos and broader applications. Sample videos, code, and annotations are available online for exploration.

Microsoft Copilot Vision: AI-Powered Camera Assistant Now on Mobile

Microsoft has expanded its Copilot Vision feature to mobile devices, allowing users to turn their phone's camera into an interactive visual search tool. Initially introduced for the web, Copilot Vision now enables users to analyze real-time video and photos stored on their device, providing additional information and assistance in tasks like shopping or research. This feature is available within the Copilot app for iOS and Android, accessible to Copilot Pro subscribers in the U.S. via the app's voice mode. Users can point their camera at objects and ask questions, such as assessing plant health or getting decorating tips. Microsoft plans to further integrate this feature into Windows, with testing currently underway for Windows Insiders. This enhancement is part of Microsoft's broader strategy to create a more integrated AI assistant experience across devices, making Copilot more personalized and intuitive in daily life.

Hand Picked Video

In this video, we'll look at building a thinking social media agent powered by Claude 3.7 Sonnet that can understand your voice, craft thoughtful posts, and engage authentically with your audience—all while saving you hours of content creation time.

Top AI Products from this week

  • Helix - AI Coding Agent which helps you write production quality code. It generates code, runs commands, and debugs existing code all on its own. With built-in automation and intuitive UI, Helix helps you build enterprise grade softwares.

  • AI Command Bar - Imagine Spotlight with superpowers... That is what Stepsailor's AI Command Bar is aiming for. Let your user describe what they want to do, and the command bar will automatically run the required actions within your product. No need for prompt engineering.

  • GitHub MCP Server - GitHub's official MCP Server. Lets AI agents securely call GitHub APIs locally. Integrates with VS Code.

  • Okareo - Real-time LLM behavioral alerts and structured debugging for agents and RAGs

  • IBM z17 - IBM z17, the new mainframe built for AI. Features Telum II processor & upcoming Spyre accelerator for on-prem inference, GenAI, and agentic AI with high security & availability.

  • Relationchips - Connect your SaaS tools and database in minutes, explore data in natural language, build always-up-to-date dashboards, and trigger alerts or actions based on your business logic — all without writing a single line of SQL.

This week in AI

  • Mira Murati's AI Startup - Mira Murati's Thinking Machines Lab gains advisers Bob McGrew and Alec Radford, both ex-OpenAI. The startup aims to create customizable AI systems.

  • WordPress AI Builder - WordPress.com launched an AI website builder. It creates complete websites with text, layouts, and images from a single prompt. Ideal for quick, easy site creation.

  • Anthropic's Max Plan - Anthropic introduced the Max plan offering increased Claude AI usage (5x or 20x Pro) for $100-$200/month. It includes priority access to new features and models.

  • Google Geospatial AI - Google Research launched Geospatial Reasoning combining generative AI and foundation models to solve geospatial problems in crisis response, public health, and more.

  • Google & Ai2 Partner - Google Cloud partners with Ai2 to offer open-source AI models (OLMo, Molmo) on its platform. This aims to attract public sector and regulated industries with greater data control and customization.