ChatGPT's Fairness Test

Nvidia's Llama-3.1 challenges GPT-4, Mistral's Ministrals go mobile, OpenAI pursues fairness, and Archetype's Newton learns physics solo. The AI race intensifies!

In the ever-evolving realm of artificial intelligence, four tech giants stood at the forefront of innovation, each pushing the boundaries of what machines could achieve. As the sun rose on a crisp autumn morning, the world of AI was about to witness a day like no other.

Nvidia, the graphics powerhouse turned AI titan, unveiled its latest creation with a flourish. The Llama-3.1-Nemotron emerged from its digital forge, a 70-billion-parameter colossus ready to challenge the reigning champions of language models. As it flexed its neural networks, even the mighty GPT-4 and Claude 3.5 Sonnet felt the tremors of competition.

Meanwhile, in a quaint Parisian office, the upstart Mistral AI celebrated its first anniversary with not one, but two new arrivals. The Ministral twins, 3B and 8B, were small but mighty, designed to bring AI's power to the palm of your hand. They whispered promises of a future where every device could think, reason, and create.

Across the Atlantic, OpenAI's researchers huddled around screens filled with data, their faces illuminated by the glow of countless evaluations. They were on a quest for fairness, determined to teach their AI child, ChatGPT, the delicate art of impartiality. Each line of code was a step towards a more equitable digital future.

And in a lab that seemed to defy the laws of nature itself, Archetype AI had given birth to Newton—not the man, but a machine with an insatiable appetite for physics. This digital prodigy consumed raw data like a black hole, spitting out equations and theories that would make even Einstein scratch his head in wonder.

As these four stories unfolded, the world held its breath, knowing that the future of AI—and perhaps humanity itself—hung in the balance of these silicon dreams.

Nvidia Launches Llama-3.1-Nemotron

Nvidia has introduced the Llama-3.1-Nemotron-70B-Instruct, a large language model (LLM) designed to outperform leading models like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet. With 70 billion parameters, this model is noted for its efficiency and effectiveness in generating coherent responses for various tasks, including coding inquiries. It achieved high scores on key benchmarks: Arena Hard (85.0), AlpacaEval 2 LC (57.6), and GPT-4-Turbo MT-Bench (8.98), showcasing its competitive edge despite being smaller than its counterparts. Nvidia has made the model open-source, allowing public testing on Hugging Face, further emphasizing its commitment to advancing AI technology while demonstrating that smaller models can effectively challenge industry leaders.

Mistral AI Launches Ministral 3B and 8B Models for On-Device Computing

Mistral AI has announced the launch of two new models, Ministral 3B and Ministral 8B, designed to enhance on-device computing and edge applications. These models, introduced on the first anniversary of the Mistral 7B, focus on improving knowledge, reasoning, and efficiency within the sub-10 billion parameter range. They support a context length of up to 128k, with the Ministral 8B incorporating a sliding-window attention mechanism for better inference speed and memory efficiency. Targeted at local, privacy-first applications such as on-device translation, smart assistants, local analytics, and autonomous robotics, both models have shown superior performance in benchmarks against competitors like Gemma 2 and Llama 3.1. The Ministral 8B is priced at $0.1 per million tokens, while the Ministral 3B costs $0.04 per million tokens. Mistral AI emphasizes that even their smallest model outperforms their previous flagship model in most benchmarks, showcasing their commitment to innovation in AI technology.

OpenAI Evaluates Fairness in ChatGPT to Mitigate Biases

OpenAI has published a detailed evaluation of fairness in ChatGPT, focusing on how the model handles various sensitive topics and demographic groups. The assessment aims to identify and mitigate biases in the model's responses, ensuring that it treats all users equitably. OpenAI has implemented a range of methodologies, including user studies and external audits, to gather data on potential biases related to race, gender, and other characteristics. The findings highlight areas where the model performs well and others that require improvement. By actively addressing these issues, OpenAI seeks to enhance the overall fairness and reliability of ChatGPT, reinforcing its commitment to ethical AI development.

Archetype AI's Newton Learns Physics from Raw Data Without Human Help

Archetype AI has introduced a groundbreaking model named Newton, which autonomously learns the principles of physics from raw data without any human intervention. This innovative AI system processes vast amounts of unstructured data to identify and understand physical laws, demonstrating an ability to derive equations and concepts that govern various phenomena. Newton's approach allows it to uncover insights that may not be immediately apparent, showcasing the potential for AI to revolutionize scientific research. By leveraging this technology, Archetype AI aims to enhance the efficiency of data analysis in physics, paving the way for new discoveries and applications in the field.

Hand Picked Video

In this video, we’ll explore Olly's Custom Actions feature, showcasing how to generate engaging human-like content.

Top AI Products from this week 

  • Code2.AI - Ever wanted to build a product but can't code? AI can now code anything you can explain, but it struggles to stay updated on your project.

  • Strella - Strella is an end-to-end customer research platform that uses AI-moderated interviews and real-time synthesis to deliver human insights in hours, not weeks.

  • Gradio 5.0 - An open-source library for building and sharing web-based AI apps with ease. Deploy, customize, and share machine learning model demos in just a few lines of Python.

  • FocusBuddy (YC S24) - Your AI productivity co-pilot that stays on calls with you while working to help you get more done! Focus Buddy includes: • Seamless Voice First Todo List • 24/7 Focus Coach for accountability • Personalised weekly insights into your productivity.

  • Tattoon App - Tattoon leverages cutting-edge AI tech to uniquely apply tattoos on individuals, considering factors such as body angle, volume, shape, and glare for a truly immersive experience.

  • Feta - ​​Feta is the only video-calling tool built for product and engineering teams. Run seamless standups, retros, sprint sessions, or syncs with AI-powered documentation and automated workflows.

  • Kick - Backed by OpenAI, Kick is a personal bookkeeper made for the modern Entrepreneur or Accountant who wants to automate their business life. Kick is free to use–or pays for itself.

  • Hey Papaya - We are an AI-powered platform for businesses and artists in the music industry, consolidating and organizing data from various sources into a single interface.

This week in AI

  • Meta's AI Revolutionizes Movie Production - Meta's new AI tool generates video and sound for movies, collaborating with Blumhouse to enhance storytelling and streamline production processes in the film industry.

  • ChatGPT Windows App Now Available for Paid Users - ChatGPT Plus, Enterprise, Team, and Edu users can now test an early version of the Windows desktop app. Access ChatGPT quickly with the Alt + Space shortcut for enhanced productivity.

  • Pyramid Flow Launches - Pyramid Flow, an open-source AI video generator, creates high-quality 5- to 10-second videos using a unique technique, offering a robust alternative for developers and creators.

  • Launch of INTELLECT-1 - Prime Intellect introduces INTELLECT-1, a decentralized 10-billion-parameter AI model, inviting contributions to enhance open-source AGI development and optimize training globally.

  • Microsoft AI VP Bubeck Joins OpenAI - Sebastien Bubeck, Microsoft’s AI VP, is leaving to join OpenAI, focusing on advancing artificial general intelligence (AGI) while continuing collaboration with Microsoft.