OpenAI's cooking up its own AI chips

OpenAI launches cost-effective GPT-4o mini, explores AI chip development, while Google DeepMind introduces FLAMe for unbiased AI evaluation. AI accessibility and assessment evolve.

In this weeks newsletter we're about to dive into some seriously exciting stuff from OpenAI and Google Deepmind.

A few months ago, OpenAI held a Developer Conference 2024 where they launched GPT-4o, the latest multimodal AI model that can natively handle text, audio, and images. It features real-time interactions, knowledge-based Q&A, and advanced language processing capabilities in over 50 languages.

And now OpenAI has unveiled GPT-4o mini, a groundbreaking AI model designed to make advanced AI more accessible and affordable. This new model offers impressive capabilities at a fraction of the cost:

  • 60% cheaper than GPT-3.5 Turbo

  • 82% score on MMLU, beats GPT-4.1 in chat

  • Handles text and vision inputs

  • Knowledge updated to October 2023

  • Advanced safety features

  • Available via APIs, replacing GPT-3.5 in ChatGPT plans

Additionally OpenAI is in talks with Broadcom to develop its own AI chip, aiming to address GPU shortages, and Google DeepMind has introduced FLAMe, an open-source model family for evaluating LLm’s.

Let's dive into the stories that are redefining the landscape of artificial intelligence.

OpenAI Introduces Cost-Effective GPT-4o mini

OpenAI has introduced GPT-4o mini, a cost-effective AI model designed to expand the accessibility and application of artificial intelligence. With a 60% reduction in cost compared to GPT-3.5 Turbo, GPT-4o mini offers impressive performance, scoring 82% on the MMLU benchmark and outperforming GPT-4.1 on chat preferences. The model supports multimodal inputs, including text and vision, and retains knowledge up to October 2023. Safety measures, such as filtering undesirable content and using reinforcement learning with human feedback, have been integrated into the model's development. GPT-4o mini is now available through various APIs and will replace GPT-3.5 for ChatGPT's Free, Plus, and Team plans. OpenAI's commitment to advancing AI while reducing costs aims to make AI more accessible and reliable for developers and users.

OpenAI to Develop Its Own AI Chip

OpenAI is currently in discussions with Broadcom regarding the development of a new artificial intelligence chip. This initiative is part of OpenAI's broader strategy to mitigate the ongoing shortage of costly graphics processing units (GPUs), which are essential for training AI models like ChatGPT and DALL-E3. The company aims to create its own AI server chip to enhance its capabilities in AI development. To support this endeavor, OpenAI is reportedly hiring former Google employees who were involved in creating Google's tensor processing unit (TPU). Additionally, OpenAI CEO Sam Altman has plans to raise significant funds to establish a network of semiconductor manufacturing facilities, potentially collaborating with major chipmakers such as Intel, Taiwan Semiconductor Manufacturing Co., and Samsung Electronics. A spokesperson for OpenAI emphasized that the organization is engaging with various industry and government stakeholders to improve access to the necessary infrastructure to ensure the widespread benefits of AI technology.

FLAMe Illuminates the Path to Fair AI Assessment

Google Deepmind presents FLAMe (Foundational Large Autorater Models), a new family of models developed to enhance the evaluation of large language models (LLMs). Recognizing the challenges associated with human assessments—such as high costs and variability—the authors trained FLAMe on over 5 million human judgments across more than 100 quality assessment tasks. This extensive training enables FLAMe to generalize effectively, outperforming existing models like GPT-4 and Claude-3, with an impressive accuracy of 87.8% on the RewardBench evaluation. The models employ a multitask instruction-tuning approach, converting diverse evaluation tasks into a unified text-to-text format, which allows for effective transfer learning. Additionally, FLAMe demonstrates significantly reduced bias compared to traditional LLM-as-a-Judge models, providing more reliable assessments of AI-generated outputs. A novel tail-patch fine-tuning strategy further enhances computational efficiency, enabling FLAMe to achieve competitive performance with less training data. Overall, FLAMe represents a significant advancement in automatic evaluation methods, aiming to deliver more accurate, efficient, and unbiased evaluations of AI-generated content.

Hand Picked Video

In this video, we'll dive deep into the world of tokens in generative AI - the fundamental building blocks that power language models like GPT, BERT, and more.

Top AI Products from this week 

  • fastn.ai - Integrate and orchestrate multiple data sources in a single, unified API. Connect any data flow and create hundreds of app integrations. Accelerate development and speed time to market. Compose anything.

  • Xspiral - Xspiral is an online 3D visualization tool integrating 2D/3D hybrid design, real-time collabs, and AI for productivity. With a user-friendly interactive interface and powerful tools, enabling both beginners and professionals to create 3D works effortlessly.

  • AnyParser - AnyParser empowers financial services by accurately extracting insights and mapping text, tables, and charts from PDFs and images to databases, doubling insights from fivefold data.

  • Cohesive AI - Cohesive allows you to enrich your spreadsheet data with AI, web scraping, and email validation. Analyze data, research companies, validate emails, and generate personalizations at scale, all within Google Sheets.

  • Concurrence AI - Revolutionize community management with AI-powered filters, 24/7 uptime, multi-language support, and unlimited message handling! Enjoy flexible, low-cost plans starting at $5/month.

  • AI Math Solver by GPT-4o Free Online - MyMathSolver.ai is a free online AI math solver that uses advanced GPT-4o technology to solve various complex math problems. Input your math problem via text, image, or file upload, and receive accurate, step-by-step answers within seconds.

  • Supermemory - You've been collecting bookmarks on the internet for all this while- it's finally time to use them. Supermemory is a hub for organizing and utilizing saved information, with a search engine, writing assistant, canvas and more.

  • Reactor Chat AI - I'm Reactor, ARC's creation, victor over GPT4o in MMLU and HumanEval! I'm speedy, eco-friendly (only 0.5W per response), and committed to enhancing your life with accurate, valuable answers. Let's be friends!

This week in AI

  • Google Launches Air-Gapped AI Hardware - Google has announced the launch of its new air-gapped distributed cloud edge hardware designed specifically for AI workloads.This hardware is aimed at providing secure and efficient processing of AI tasks at the edge, without the need for a constant internet connection. The air-gapped design ensures that the hardware is isolated from external networks, enhancing security and privacy for sensitive AI applications.

  • Microsoft Launches AI-Powered Designer App - Microsoft has introduced its Designer app, an AI-driven alternative to Canva, now available on iOS, Android, and Windows. The app allows users to create and edit visuals using AI, featuring over 80 templates and integration with Microsoft Office for easy use in Word and PowerPoint.

  • Prover-Verifier Games Enhance AI Output Legibility - OpenAI's research introduces "prover-verifier games" to improve the legibility of language model outputs. This approach trains strong language models to generate text that is not only correct but also easily verifiable by weaker models, which in turn makes the text clearer for human evaluators.

  • Mistral NeMo: New Multilingual AI Model Released - Mistral AI has launched Mistral NeMo, a 12 billion parameter model developed in collaboration with NVIDIA. This model features an impressive context window of up to 128,000 tokens, showcasing state-of-the-art reasoning, world knowledge, and coding accuracy for its size category.