The Next-Gen Sora Rival⚔️

AI is transforming video and accessibility! StepVideo-T2V generates stunning videos, Project Starlight upscales to 4K, and NVIDIA’s Signs enhances ASL learning. Let’s dive in!

Imagine typing a simple text prompt and watching it turn into a high-quality video. That’s what StepVideo-T2V delivers with its 30-billion parameter model, generating 204-frame videos with stunning realism. Using Video-VAE compression, it keeps videos efficient without losing quality. With support for English and Chinese and an MIT license for commercial use, this model is unlocking new possibilities for creators and developers.

Now, picture restoring an old, blurry video into crisp 4K resolution with no manual effort. Project Starlight by Topaz Labs makes this a reality using diffusion AI, ensuring smoother motion and better detail than traditional upscaling. Available as a cloud-based research preview, it lets users enhance three 10-second clips per week for free. With automatic denoising and sharpening, Topaz Labs is pushing the boundaries of effortless video restoration.

AI is also transforming accessibility with NVIDIA’s Signs, a platform helping families learn American Sign Language (ASL). Developed with the American Society for Deaf Children, it features 3D avatars and real-time webcam feedback to teach and correct ASL signs. The project aims to build a dataset of 400,000 video clips covering 1,000 signed words, soon to be released publicly, paving the way for AI-powered accessibility tools in education and communication.

Let’s dive in and explore how these breakthroughs are reshaping the world! 🚀

StepVideo-T2V: 30B Parameter Video Model

StepVideo-T2V is a powerful text-to-video model with 30 billion parameters that can generate videos up to 204 frames long, using a special compression technique called Video-VAE to maintain high video quality while being efficient. It supports both English and Chinese inputs and uses advanced methods like DiT with 3D full attention and Video-DPO to produce visually appealing videos. The model has been tested and shown to perform well compared to other systems, and it is available under the MIT license, allowing for commercial use and modification. To use it, you'll need Python, PyTorch, and CUDA, and you can get started by downloading the code from GitHub and the model from Huggingface or Modelscope.

Transform Any Video into Stunning HD with Project Starlight!

Project Starlight is a groundbreaking AI model by Topaz Labs that transforms low-resolution and degraded videos into high-definition quality, including up to 4K resolution. It uses diffusion AI technology, offering better temporal consistency and natural motion compared to traditional GAN models. Currently available as a research preview, users can access it online or through the Topaz Video AI platform, with cloud-based rendering due to its computational demands. The model enhances videos by upscaling, denoising, de-aliasing, and sharpening without manual adjustments. While it's free to process three 10-second clips per week, additional processing requires cloud credits, with promotional offers available. Topaz Labs is working on optimizing the model for speed and size to enable local processing in the future.

NVIDIA's Signs: Revolutionizing ASL Education

NVIDIA has launched an innovative AI platform called Signs, designed to enhance American Sign Language (ASL) learning and accessibility. Developed in collaboration with the American Society for Deaf Children and creative agency Hello Monday, Signs uses a 3D avatar to demonstrate signs and provides real-time feedback via webcam analysis. The platform aims to create a comprehensive dataset of 400,000 video clips representing 1,000 signed words, validated by fluent ASL users and interpreters. This initiative helps bridge communication gaps, especially for families with deaf children, allowing them to start learning ASL early. NVIDIA plans to make the dataset publicly available later this year to foster the development of accessible technologies like AI agents and video conferencing tools.

Hand Picked Video

In this video, we'll look at Elevenlabs Conversational AI Agents.

Top AI Products from this week

  • NEO Gamma - NEO Gamma is the next generation of home humanoids designed and engineered by 1X Technologies. The Gamma series includes improvements across NEO’s hardware and AI, featuring a new design that is deeply considerate of life at home.

  • Rabbit Android Agent - A research preview from Rabbit: an AI agent that can control Android apps via natural language, demonstrating the potential of AI-driven app automation.

  • NYX - NYX’s AI Co-pilot simplifies end-to-end campaign management helping you create high-converting ads, launch multi-channel campaigns & optimize performance with real-time analytics. Integrate major ad platforms for AI-driven insights to maximize impact.

  • Trupeer Faces - For the first-time ever, create studio-quality screen recordings without facing the camera or putting your voice. Trupeer's integration with Heygen enables everyone to create perfect walkthroughs with avatars, AI voiceover, branding and more in 30+ languages.

  • Helix - Helix, from Figure AI, is a Vision-Language-Action model for full upper-body humanoid control. Zero-shot generalization to new objects & tasks. Runs on embedded GPUs.

  • Chance AI for iOS - Chance AI is the world's most advanced visual search engine for curious minds. Snap a photo of anything—art, architecture, nature—and instantly uncover its history, meaning, and hidden connections. Perfect for creatives, designers, and lifelong learners.

This week in AI

  • Spotify & ElevenLabs - Spotify now accepts audiobooks from ElevenLabs, an AI voice narration platform. Authors can use ElevenLabs to narrate books in 29 languages and distribute to Spotify via Findaway Voices.

  • OpenAI Shifts Compute - OpenAI to shift computing from Microsoft to SoftBank by 2030, using Stargate for 75% of its needs. Costs to rise significantly.

  • Meta's AI Robots - Meta is investing heavily in AI-powered humanoid robots, focusing on household chores. The project is part of Reality Labs, with plans to collaborate with other companies on hardware while Meta handles AI and software

  • Microsoft's Muse AI - Microsoft unveils Muse, a gaming-focused AI model that understands 3D game worlds. Trained on extensive gameplay data, Muse generates virtual gameplay and could revive classic games by optimizing them for modern devices. Weights are open source.