AI Report by Explainx
Posts
Claude Sonnet 4 Can Now Process Your Entire Project in One Go

Claude Sonnet 4 Can Now Process Your Entire Project in One Go

Claude Sonnet 4 gains 1M-token context on Anthropic API, Nvidia unveils Cosmos Reason vision-language model for robotics, Google’s Gemini Live integrates with Keep, Tasks, Calendar, and Maps.

Yash Thakker
August 13, 2025

The AI world is moving fast here are the big updates you shouldn’t miss this week

🤖 Claude Sonnet 4
Now with 1M‑token context on Anthropic API, handling entire codebases, large document sets, or long workflows in one go. Priced from $6/MTok input, $22.50/MTok output; also on Amazon Bedrock, with Google Vertex AI coming soon.

🧠Nvidia Cosmos
Nvidia debuts Cosmos Reason, a 7B open vision‑language model for robotics that fuses perception, physics, and common‑sense reasoning. Also launches Cosmos Transfer‑2 for rapid synthetic data, 3D reconstruction tools, Omniverse SDK updates, and new RTX Pro Blackwell & DGX Cloud hardware for next‑gen automation.

🎥 Gemini Live
Now works with Google Keep, Tasks, Calendar, and Maps, letting Android and iOS users manage notes, to‑dos, events, and navigation via real‑time voice or camera interactions. App “chips” show active integrations, turning Gemini into a central hub for Google‑powered assistance.

Let’s dive into the innovations shaping the future. 👇️

Claude Sonnet 4: Expanded 1M Token Context on Anthropic API

Claude Sonnet 4 now supports up to 1 million tokens of context, increasing its capacity fivefold and enabling the processing of entire codebases with more than 75,000 lines of code or sets of dozens of documents in a single API request. This long-context capability, available in public beta on the Anthropic API for Tier 4 and custom-rate customers, unlocks advanced use cases like large-scale code analysis (including architecture understanding and cross-file dependencies), document synthesis across hundreds of contracts or research papers, and building agents that maintain context through extensive multi-step workflows and API interaction histories. Pricing adjusts for longer prompts: requests over 200K tokens are billed at $6/MTok for input and $22.50/MTok for output, with prompt caching and batch processing offering further cost efficiencies. Sonnet 4’s enhanced context is also available in Amazon Bedrock, with Google Cloud’s Vertex AI support coming soon. Early adopters like Bolt.new and iGent AI highlight significant improvements in code generation reliability and agentic engineering for real-world, production-scale projects.

Nvidia unveils new Cosmos world models, infra for robotics and physical uses

Nvidia Cosmos Reason is a 7-billion-parameter open, customizable vision language model engineered for physical AI and robotics. Announced at SIGGRAPH 2025, Cosmos Reason bridges perception and logical reasoning for robots and embodied AI agents, using memory, physics understanding, and common sense to plan and act in the real world. The model excels at breaking down complex commands, adapting to novel or ambiguous situations, and provides tools for video analytics, data curation/annotation, and deliberate robot planning. Joining Cosmos Reason are Cosmos Transfer-2, which accelerates synthetic data generation from 3D simulation scenes or spatial control data, and a distilled Transfer model tailored for ultra-fast training. Nvidia’s rollout also includes neural reconstruction libraries for 3D world simulation from sensor data and integration with platforms like CARLA, plus updates to Omniverse SDK and server hardware such as the RTX Pro Blackwell Server and DGX Cloud. These advances position Nvidia at the forefront of robotics infrastructure, leveraging its AI GPUs for the next wave of automation and physical intelligence, with adoption across autonomous vehicles, industrial inspection, and city-scale video analytics workflows

Gemini Live: Real-Time AI Conversations with Google Apps

Gemini Live now integrates with Google’s essential productivity apps—Keep, Tasks, Calendar, and Maps—bringing real-time, conversational access to your personal information. Available to both free and paid Gemini users on Android and iOS, the new feature shows connected app "chips" above Live controls as you chat, indicating active integrations. Users can voice commands to check daily events, add calendar appointments by talking or pointing their phone camera, view and manage Tasks lists, create or read notes in Keep, and even receive directions in Maps with tap-to-start navigation links. The addition streamlines daily planning, list-making, and navigation, making the Gemini app a true central hub for Google-powered assistance. To check if you have this update, open Gemini, tap the Live button, and look for the app chips when interacting with Gemini Live.

Hand Picked Video

bgremover is a AI-powered online tool that instantly removes backgrounds from videos without green screens. Supports popular formats, offers background replacement, real-time preview, batch processing, and exports high-quality videos, making video background removal fast and easy for creators and professionals.

Top AI Products from this week

mcp-use – Open-source SDK and cloud platform for quickly building and deploying MCP-powered AI agents. Trusted by NASA, NVIDIA, and SAP, it unifies server management, security, and hosting to make MCP agent development fast, secure, and production-ready.
Bio Calls by Cross Paths – A next‑gen link‑in‑bio tool that lets you monetize your social media in 60 seconds by offering paid or free 1:1 calls with followers. Built for creators, experts, and problem‑solvers, it turns your time and skills into an instant income stream—no subscription required.
Inworld Runtime – AI-native backend that scales consumer apps from prototype to millions, with Adaptive Graphs for speed, Automated MLOps for maintenance-free ops, and Live Experiments for instant testing. Works with any ML stack and major model providers, used by NVIDIA, Google, and Xbox—free to use through August.
Fellow AI Meeting Assistant – Privacy‑first AI tool that records, transcribes, and summarizes meetings, integrates with major conferencing and productivity apps, and offers an API for secure, custom workflows with encryption, audit logs, and admin controls.
Compozy – Open‑source platform for orchestrating multi‑agent AI systems using simple YAML workflows. Built on Go and Temporal, it offers scalable, reliable automation with stateful workflows, parallel tasks, MCP support, scheduling, and full self‑hosting without vendor lock‑in.
VibeKit – Open‑source safety layer for AI coding agents that runs them in local Docker sandboxes, redacts sensitive data, and gives full observability into actions for secure, private, and controlled workflows.

This week in AI

Nvidia and AMD- will pay the US 15% of revenue from high‑end AI chip sales to China in exchange for export licenses, allowing sales of Nvidia’s H20 and AMD’s MI308 despite earlier restrictions—shifting the chip race focus from security to trade.
OpenArt – AI startup by ex-Googlers with 6M MAUs lets users turn text or songs into one‑minute “brain rot” videos via one‑click templates. Runs on a credit subscription model, projects $20M+ revenue, and is expanding features while managing IP risks.
NASA and Google AI Medical Assistant – NASA and Google are developing CMO‑DA, an AI tool to help astronauts handle medical issues on deep‑space missions without Earth contact, with potential uses for remote healthcare on Earth.
Seoul-based Datumo raises $15.5M to expand LLM evaluation – Datumo, once a data-labeling startup, now offers no‑code AI evaluation tools for safety and bias testing. Backed by Salesforce Ventures, the $15.5M round will fund R&D and expansion to Japan and the U.S., challenging Scale AI.
AI Companion Apps Market Growth – AI companion apps like Replika and Character.AI are surging, with 337 active apps generating $82M in H1 2025 and projected to top $120M by year-end. Downloads hit 220M, up 88% YoY, with most revenue driven by romantic-themed companions.

Paper of The Day

BrowseMaster is an open‑source, scalable web‑browsing framework that pairs a planner for long‑horizon reasoning with an executor for fast, programmatic search. The planner devises adaptive strategies and delegates targeted subtasks, while the executor uses code‑driven tool calls, standardized search primitives, and a stateful sandbox to retrieve and filter relevant web content efficiently. This separation boosts both search breadth and reasoning depth, enabling BrowseMaster to outperform leading proprietary and open‑source agents on complex English and Chinese benchmarks like BrowseComp and WebWalkerQA.

To read the whole paper, go to here.