AI Report by Explainx
Posts
NVIDIA Canary Qwen-2.5B Elevates ASR

NVIDIA Canary Qwen-2.5B Elevates ASR

New research challenges AI reasoning vs memorization • UK launches 21-exaflop Isambard-AI supercomputer • Adobe Firefly adds video/audio tools • NVIDIA's 2.5B ASR model.

Yash Thakker
July 20, 2025

This Week's Highlights

Welcome to another edition of our AI Weekly Digest! This week brings exciting developments across the AI landscape, from groundbreaking research that challenges how we evaluate machine learning models to powerful new tools transforming creative workflows.

🔬 Research Spotlight: New evaluation methods reveal that even state-of-the-art AI models may be relying more on memorization than true reasoning—a finding that could reshape how we train and test AI systems.

💻 Infrastructure & Compute: The UK unveils its most powerful AI supercomputer, setting new standards for energy-efficient high-performance computing while advancing national AI capabilities.

🎨 Creative AI Tools: Adobe enhances its Firefly platform with advanced video generation and audio creation features, offering creators unprecedented flexibility and commercial-grade reliability.

Let's dive into the details of these game-changing developments...

NVIDIA Canary-Qwen-2.5B: High-Accuracy English Speech Recognition Model

NVIDIA NeMo Canary-Qwen-2.5B is a state-of-the-art English automatic speech recognition (ASR) model with 2.5 billion parameters, designed for high-accuracy transcription with punctuation and capitalization. Running at 418 RTFx, it operates in two distinct modes: ASR mode for pure speech-to-text conversion and LLM mode for leveraging large language model capabilities like summarization or question answering, using the transcript as input. The architecture utilizes a FastConformer encoder and Transformer decoder, with LoRA applied to the LLM, and supports .wav or .flac audio input at 16 kHz. Trained on 234,500 hours of English speech from diverse public datasets—including YouTube-Commons, YODAS2, and LibriLight—the model achieves leading WER scores across benchmarks: 1.6% (LibriSpeech Clean), 3.1% (LibriSpeech Other), and 5.6% (VoxPopuli), and robust noise tolerance (2.41% WER at SNR 10). Fairness tests show WER varies across gender (13.85% female, 16.71% male, 29.46% other) and age groups, with minor differences. Deployment is optimized for NVIDIA GPUs (Ampere, Blackwell, Hopper, etc.) and supported on Linux and Windows. Released under the CC-BY-4.0 license, the model is ready for commercial use via NVIDIA’s NeMo toolkit, and is ideal for transcription, summarization, and transcript-based analytics tasks, although it is only reliably accurate for English-language input.

Isambard-AI Powers UK’s AI Future

Isambard-AI, based at the University of Bristol, is the UK’s most powerful AI supercomputer, built in under two years with a £225 million investment. It uses 5,448 NVIDIA GH200 Grace Hopper Superchips in an HPE Cray EX4000 system to deliver over 21 exaflops of AI performance and more than 250 petaflops of traditional scientific compute, all while consuming under 5 megawatts of power. The system features advanced direct liquid cooling, a power usage effectiveness (PUE) below 1.1, zero-carbon electricity, and 27 petabytes of all-flash storage divided between high-speed Cray ClusterStor and VAST Data systems. Isambard-AI supports projects ranging from AI-driven drug discovery and climate modeling to the development of UK-tuned large language models, with access centrally managed to foster innovation across research, government, and industry. Recognized for its speed and exceptional energy efficiency—ranking fourth in the world on the Green500—Isambard-AI is designed as a strategic national asset to accelerate scientific, healthcare, and technological breakthroughs while minimizing environmental impact.

Adobe Firefly Adds Advanced Video and Audio AI Tools

Adobe has announced major upgrades to its Firefly generative AI platform, boosting its video creation capabilities with improved motion fidelity and advanced controls for smoother, more lifelike animations. Users can select from several industry-leading AI models—including Firefly, Runway’s Gen-4, and Google Veo3—inside the Firefly web and mobile apps, enabling more flexibility and creative control. New tools include Composition Reference (mirroring the structure of a reference video), Style Presets for instant visual effects, and Keyframe Cropping to easily fit content to any aspect ratio. Firefly also introduces a Generate Sound Effects feature, letting users create custom AI-generated audio just by typing a prompt or using their voice, as well as the new Text to Avatar tool for quickly turning scripts into avatar-led videos. All features are commercially safe, as models are only trained on licensed content—giving professionals confidence for all their creative projects.

Hand Picked Video

In this video, we’ll look at OpenManus, the ultimate open-source AI agent that lets you build and automate without restrictions, powered by GPT-4o and designed for seamless AI-driven workflows, from website creation to stock analysis, all completely free and accessible to everyone.

Top AI Products from this week

Lexi AI - Create, launch, and optimize high-performing Meta (Facebook & Instagram) ads with Lexi AI. No experience needed—just results. Scale your business with smarter, automated ad management.
Aioly app - Aioly is AI-powered short video app but just for food. Swipe through the latest food trends, ask AI what and where to eat, or connect with people over a meal. Don’t eat alone. create or join invites to eat together. Discover food and connect over food.
Scrapeless - Scrapeless powers AI agents, automation workflows, and data extraction with browser-tool infrastructure.
Botdial - Botdial is a voice-first AI support assistant that handles calls & WhatsApp messages, answers customer questions, books appointments, and delivers predictive analytics helping you grow revenue and improving customer experience.
ClearTerms - ClearTerms is a Chrome extension that reads and summarizes Terms & Conditions in plain English. It highlights privacy risks, flags shady clauses, auto-detects T&C pages, and saves past analyses — so you know what you’re agreeing to, instantly.
LingoClub- The first AI-native language platform built around real conversation. No flashcards, no memorization - just natural speaking practice with instant feedback from your AI tutor. It adapts to how you learn, so you can start talking from day one.

This week in AI

AI Echocardiogram Breakthrough - New AI model detects cardiac amyloidosis from a single echocardiogram clip with 85% sensitivity and 93% specificity, outperforming standard risk scores across diverse populations.
Udio Styles Upgraded - Udio now lets all users blend two style references, access a curated Styles Library, and try premium Artist Styles—free for one week. Manage, save, and mix styles easily.
Unilever opens AI design studios - Unilever’s new Sketch Pro studios use AI to create branded content 3x faster, but designers guide the creative process; 21 global studios planned by 2026.
Copilot Vision Desktop Share Launches - Windows Insiders can now share their desktop with Copilot, letting the AI see and assist in real time with projects, content, or questions—rolling out in the latest app update.
AI vs. Humans at AtCoder - AI took an early lead routing robots in AtCoder’s wall-planning contest, but human creativity with wall placement and grouping quickly narrowed the gap in this real-time challenge.

Paper of The Day

VAR-MATH introduces symbolic evaluation for mathematical reasoning in LLMs by converting fixed problems into parameterized templates requiring consistent solutions across multiple variants. Tests reveal RL-trained models show 40-60% performance drops on variabilized benchmarks, suggesting reliance on memorization over true reasoning ability.RetryClaude can make mistakes. Please double-check responses.

To read the whole paper, go to here.