AI Report by Explainx
Posts
ElevanLabs Now Taking Action Beyond Conversation

ElevanLabs Now Taking Action Beyond Conversation

ElevanLabs 11ai handles real tasks via voice, Mu brings fast on-device AI to Windows, and Magenta RT lets you generate music live.

Yash Thakker
June 25, 2025

June is wrapping up with a wave of next-gen AI breakthroughs that are redefining how we talk, work, and even create music. Whether you're deep into productivity tools, Windows systems, or creative expression, there's something exciting for you.

🎙️ ElevenLabs Launches 11ai – The Voice Assistant That Actually Gets Things Done
Forget just chit-chat—11ai is a powerful voice-first assistant built to handle real tasks. From sending emails and updating tickets to managing your calendar, it connects with tools like Gmail, Slack, Notion, and Salesforce using secure APIs. You can choose from 5,000+ voices—or even clone your own—and interact naturally through real-time voice or text. Currently in free alpha, it's a peek into the future of voice-driven productivity.

💻 Microsoft’s New On-Device AI, ‘Mu’, Makes Windows Settings Smarter & Faster
Meet Mu, Microsoft’s compact but powerful AI model running locally on Copilot+ PCs. It's lightning-fast (over 100 tokens/sec!) and helps you control Windows Settings just by asking. Think of it as your privacy-friendly, always-available settings assistant, optimized to run on-device with no lag and no internet dependency.

🎵 Magenta RT – Open-Source AI That Makes Music With You, Live
For creatives, Magenta RealTime (RT) is a game-changer. This 800M-parameter model generates studio-quality music on the fly, blending styles and adapting in real time. Trained on thousands of hours of instrumental music, it’s perfect for live performances, jamming, or experimenting with sound. It’s open source, runs on Colab TPUs for now, and offers full control and customization.

From hands-free workflow automation to real-time music generation and AI that lives right on your device, the future of intelligent tools is getting smarter, faster, and more creative.

ElevanLab’s Next-Gen Voice Assistant That Gets Things Done

11ai, launched by ElevenLabs on June 23, 2025, is a voice-first AI assistant designed to move beyond simple conversation and actively participate in your digital workflow. Built on ElevenLabs’ low-latency Conversational AI platform, 11ai integrates the Model Context Protocol (MCP), enabling secure, standardized connections to a wide range of tools and APIs—including Salesforce, HubSpot, Gmail, Zapier, Linear, Slack, Perplexity, Notion, Google Calendar, and custom internal systems—so it can not only answer questions but also take meaningful actions like planning your day, researching prospects, updating tickets, and sending messages through natural voice commands. Users can personalize their assistant by choosing from over 5,000 voices or creating a custom voice clone. The platform supports real-time, multimodal (voice and text) interaction, integrated Retrieval-Augmented Generation (RAG) for context-aware responses, automatic language detection, and enterprise-grade security. Currently available as a free alpha, 11ai showcases the future of voice-first productivity: natural conversation that results in real action, with ongoing feedback helping to expand integrations and capabilities for seamless, personalized workflow automation.

Microsoft’s Fast, On-Device AI for Windows Settings

Mu is Microsoft’s new on-device small language model designed for efficient, high-performance operation on Neural Processing Units (NPUs), powering the agent in Windows Settings for Copilot+ PCs. Built with a transformer encoder–decoder architecture, Mu excels at mapping natural language queries to system actions quickly and accurately, delivering responses at over 100 tokens per second while running entirely on local hardware. Its design leverages advanced techniques like dual LayerNorm, rotary positional embeddings, and grouped-query attention, along with hardware-aware optimizations and quantization, to ensure low latency and minimal memory usage. Trained on high-quality educational data and fine-tuned with task-specific samples, Mu achieves impressive accuracy despite its compact size, enabling real-time, context-aware assistance for managing hundreds of Windows settings. This innovation allows users to interact naturally with Windows Settings, making system adjustments through simple language commands, all while maintaining privacy and responsiveness by processing everything locally on the device.

Magenta an Open-Source Live Generative Music Model

Magenta RealTime (Magenta RT) is an open-weights, live generative music model developed by the Magenta Project, designed for interactive music creation, control, and performance in real time. Built as an 800-million-parameter autoregressive transformer, it generates high-fidelity (48kHz stereo) audio by producing music in sequential chunks, each influenced by previous outputs and customizable style embeddings, allowing users to blend and morph musical styles on the fly. Trained on around 190,000 hours of mostly instrumental stock music, Magenta RT enables real-time exploration of musical ideas and textures, supporting creative workflows from live performance to interactive soundscapes. While it currently runs on Colab TPUs and is aimed at eventual local deployment, the model and code are openly available for experimentation, with future plans for personal fine-tuning and on-device inference.

Hand Picked Video

In this video, we'll look at Elevenlabs Conversational AI Agents.

Top AI Products from this week

NativeMind - NativeMind brings the latest AI models to your browser—powered by Ollama and fully local. It gives you fast, private access to models like Deepseek, Qwen, and LLaMA—all running on your device.
Middleware - Middleware is a YC backend full-stack AI observability platform that helps developers detect, diagnose, and resolve issues in real time—automatically. By providing deep insights across your entire stack, you ensure optimal performance and reliability for AI-powered applications.
FlashDocs API - FlashDocs lets you generate Google Slides and PowerPoint decks from your product — using just one API call. Markdown, charts, images, tables, merge tags, brand themes — all handled. Build it once, export stunning decks everywhere.
Hope AI- Build maintainable, production-grade applications. Control generation at component-level with prompts and design sketches. Compose with design system and reusable components. Deploy instantly. Generate code that developers love! By Bit Cloud.
Automaticall - I was sick of scam calls, blocking them wasn't enough. So we built an AI to mess with them instead. Try for free today!
Mighty - Mighty brings enterprise-grade security to AI agents. Drop in our client-side Python SDK to spin up a data vault, secure key exchange and an OAuth-ready policy engine in minutes. Agents can now access private data with full auth, audit and compliance.

This week in AI

ElevenLabs Mobile App - The ElevenLabs app brings ultra-realistic AI voiceovers to iOS and Android, letting you create, export, and customize lifelike speech on the go with the expressive v3 model.
MIT SEAL Self-Learning AI Framework - MIT’s SEAL lets AI models teach themselves by generating their own training data and updating their knowledge, enabling continuous, autonomous adaptation.
H2L Capsule Interface - Japan’s H2L Capsule Interface uses muscle sensors to let users control humanoid robots remotely, capturing both movement and force for realistic, immersive teleoperation.
Grok Spreadsheet Editor Leak - Leaked code reveals xAI is developing a Grok file editor with spreadsheet support, letting users talk to Grok for real-time help while editing files—challenging Google and Microsoft.
Google Commerce Media Beta - Google’s Commerce Media suite, now in beta, uses AI to connect brands and retailers, expand reach, enable YouTube campaigns, and offer product-level sales measurement.