AI Report by Explainx
Posts
OpenAI's CHEAPEST Competitor

OpenAI's CHEAPEST Competitor

NovaSky releases affordable reasoning model Sky-T1, OpenAI adds ChatGPT personality customization, and TransPixar enables RGBA video generation with alpha channels.

January 13, 2025

Imagine a world where creating a sophisticated AI model costs less than your monthly grocery bill. That's exactly what happened in a quiet lab at UC Berkeley, where a team of researchers achieved what many thought impossible. With just $450 and a vision for democratizing AI, they brought to life Sky-T1-32B-Preview, an AI model that can reason, fact-check itself, and tackle complex mathematical problems. In just 19 hours, using eight Nvidia H100 GPUs, they trained a 32-billion-parameter model that's challenging the status quo of what's possible in artificial intelligence.

But how did they do it? The NovaSky team, as they're known, took an innovative approach. They began with Alibaba's QwQ-32B-Preview to generate initial data, then refined it using OpenAI's GPT-4o-mini. The result? A model that's not just affordable but also remarkably capable. While it outshines earlier versions of OpenAI's o1 in certain math and coding challenges, it's still learning when it comes to complex scientific questions. But that's just the beginning of their journey—NovaSky is already planning future iterations with even more advanced reasoning capabilities.

Meanwhile, in another corner of the AI world, a different kind of revolution is taking place. OpenAI has reimagined how we interact with AI assistants. Their latest ChatGPT update feels like giving your AI companion a personality makeover. Want a chatbot that speaks your language, whether that's professional jargon or Gen Z slang? Done. Share your name, profession, and preferences, and suddenly you're not just talking to an AI—you're conversing with a digital companion that gets you. It's like having a friend who's always ready to chat, but one who remembers exactly how you like your conversations served.

But the story doesn't end there. Enter TransPixar, a breakthrough that's turning the world of video creation on its head. Think of it as a master artist who can paint not just with colors, but with transparency itself. Traditional video models struggled with alpha channels—those invisible layers that make video effects possible. TransPixar changes all that. Using advanced diffusion transformer architecture and clever fine-tuning, it brings together RGB and alpha channels in perfect harmony. From a simple text description, it can create videos that aren't just consistent—they're transformative.

Ready to explore how these cutting-edge innovations are shaping our digital experiences? Let’s dive into this.

NovaSky Unveils Affordable Open-Source Reasoning AI Model

Researchers from UC Berkeley's Sky Computing Lab, known as NovaSky, unveiled Sky-T1-32B-Preview, an open-source reasoning AI model. This model is notable for its affordability, having been trained for less than $450, which showcases the potential for developing high-level reasoning capabilities efficiently. Sky-T1 is designed to fact-check its own outputs, making it more reliable in fields like physics and mathematics compared to traditional AI models.The training process involved using Alibaba's QwQ-32B-Preview to generate initial data, which was then refined with OpenAI’s GPT-4o-mini. The model, comprising 32 billion parameters, was trained over approximately 19 hours on a setup of eight Nvidia H100 GPUs. While Sky-T1 outperformed an earlier version of OpenAI's o1 on certain math and coding challenges, it did not match its performance on more complex scientific questions. NovaSky plans to continue improving open-source models with advanced reasoning capabilities and aims to enhance efficiency and accuracy in future iterations.

OpenAI Enhances ChatGPT with Custom Traits Feature

OpenAI has introduced an enhanced customization feature for ChatGPT, allowing users to personalize their interactions with the AI chatbot. The revamped custom instructions menu now includes fields where users can specify a preferred name or nickname, their profession, and additional details they want ChatGPT to know. A notable addition is the ability to assign personality traits to the chatbot, such as "Chatty," "Encouraging," or "Gen Z." This feature aims to provide more tailored and engaging responses by encouraging users to introduce themselves for better interaction. Importantly, this new customization option is distinct from ChatGPT's memory feature, which allows the AI to remember or forget specific user information. The updates are seen more as an aesthetic improvement rather than a significant technical overhaul, as they still rely on prompt engineering principles used in the previous version. Users have reported mixed experiences with the rollout, with some seeing the new options while others have yet to access them. Overall, this enhancement is designed to make ChatGPT more responsive to individual user preferences and needs.

TransPixar: Innovative Method for Generating RGBA Videos

TransPixar is a novel method developed to enhance pretrained video models for generating RGBA videos, which include alpha channels crucial for visual effects (VFX). Traditional text-to-video generative models have struggled with RGBA generation due to limited datasets and the complexities involved in adapting existing models. TransPixar addresses these challenges by utilizing a diffusion transformer (DiT) architecture that incorporates alpha-specific tokens and employs LoRA-based fine-tuning. This allows for the simultaneous generation of RGB and alpha channels while maintaining high consistency between them. The method optimizes attention mechanisms to preserve the original strengths of RGB models, achieving a strong alignment between RGB and alpha outputs even with limited training data. By introducing new tokens for alpha channel generation, reinitializing positional embeddings, and using a zero-initialized domain embedding, TransPixar effectively integrates text, RGB, and alpha tokens in a unified sequence. This advancement opens up new possibilities for creating diverse and consistent RGBA videos, significantly benefiting VFX and interactive content creation.

Hand Picked Video

In this video, we'll look at Elevenlabs Conversational AI Agents.

Top AI Products from this week

Minduck Discovery - Minduck Discovery is an AI-Search platform that fuels your curiosity with interactive Mind Maps, curating essential information and structuring it for easy understanding. For deeper insights, the Discovery Book offers traceable, in-depth content with insight.
Kolors - Kolors is a cutting-edge text-to-image model powered by latent diffusion. Trained on billions of pairs, it excels in visual quality, complex semantics, and text rendering, outperforming both open and closed-source models.
SmolAgents - Smolagents's Guides and News - HuggingFace's NEW Agent Framework ，Create Powerful AI Agents with Minimal Effort
TestSprite 1.0 - TestSprite is the first AI end-to-end testing agent for small and growing developer teams.
TIXAE Agents - TIXAE Agents is a single place for building multi-channel AI agents that work on Voice + Text channels like Web, Whatsapp, Instagram, FB Messenger, twilio and much more!
Topview 2.0 Product Avatar - Showcase your products with AI avatars. Upload product image, and let digital avatars hold and present it perfectly—ideal for eCommerce and marketing!
Humiris - Mixture of AI - Humiris automatically optimizes the accuracy & cost of your GenAI models. Up to 80% cost saving compared to o1 for equivalent quality. Seamlessly integrate GPT4o, Sonnet 3.5...with your custom models to achieve +70% accuracy than general Reasoning models.

This week in AI

AI Unveils Cell Secrets - A new AI developed by Columbia University predicts cellular functions, enhancing our understanding of cell behavior and potentially revolutionizing biological research.
Google Launches Daily Listen Feature - Google's Daily Listen feature generates personalized five-minute audio summaries of news stories based on users' interests from their Discover feed, available on Android and iOS.
Timekettle Launches Babel OS - Timekettle's new Babel OS powers W4 Pro Earbuds, enabling seamless real-time translation in over 40 languages during calls, enhancing global communication effortlessly.
VLC Introduces AI Subtitling - At CES 2025, VLC showcased an offline AI feature for real-time subtitling and translation in over 100 languages, enhancing video accessibility while ensuring user privacy.