Perplexity Goes Mobile for Android Users

Perplexity AI launches a smart Android assistant, Anthropic’s Claude gets reliable with Citations, and Humanity’s Last Exam pushes AI limits with tough benchmarks. Exciting strides in AI innovation!

This week, the world of AI brims with exciting breakthroughs, each transforming how we interact with technology.

First, Perplexity AI has launched its Android-exclusive Perplexity Assistant, revolutionizing smartphones with features like task automation, multimodal interactions, and seamless web integration, aiming to rival established digital assistants.

Next, Anthropic takes a bold step toward trustworthiness with Citations, a feature for its Claude models that grounds AI responses in source materials, making them more reliable and reducing the risk of misinformation.

Finally, the academic AI landscape gets a shake-up with the debut of Humanity’s Last Exam—a challenging benchmark designed to rigorously test the limits of today’s language models, signaling the need for continued innovation.

Each update tells a unique story, but together, they redefine what’s possible in AI.

Perplexity AI Launches Assistant for Android

Perplexity AI has introduced the **Perplexity Assistant**, a new feature exclusively for Android users, transforming its platform from an AI-powered answer engine to a comprehensive digital assistant. Available on the Google Play Store, this assistant supports 15 languages and offers capabilities such as **task automation** (like booking reservations and drafting emails), **context retention** for seamless interactions, **multimodal interaction** using the phone's camera, and **web integration** for real-time information retrieval. Users can access the assistant by downloading or updating the Perplexity app and setting it as their default assistant. This launch positions Perplexity to compete with established AI assistants like Siri and Google Assistant, with plans for ongoing improvements in response to user feedback.

Anthropic Unveils Citations for Trustworthy AI

Anthropic has launched a new feature called **Citations** for its Claude AI models, aimed at enhancing the trustworthiness of AI-generated responses by grounding them in source documents. This feature allows developers to provide source materials, such as PDFs or plain text files, which Claude then uses to automatically cite specific sentences and passages in its responses. This innovation addresses the challenge of verifying the accuracy of AI outputs, significantly reducing the risk of misinformation or "hallucinations." Citations is available on both Anthropic's API and Google Cloud's Vertex AI, and it is designed to streamline processes in applications like document summarization and customer support. The feature is currently accessible for Claude 3.5 Sonnet and Haiku models, with a pricing structure based on the length of the source documents used.

Humanity's Last Exam: A New Benchmark for AI

Humanity's Last Exam (HLE) is a newly introduced multi-modal benchmark aimed at evaluating the capabilities of large language models (LLMs) in a rigorous academic context. This initiative arises from the observation that existing benchmarks have become too easy for advanced models, with many achieving over 90% accuracy on popular tests. HLE presents a set of 3,000 challenging questions across various subjects, contributed by nearly 1,000 experts from over 500 institutions worldwide. The benchmark aims to provide a more accurate measure of LLM performance, showing that current models struggle significantly, with low accuracy rates indicating room for improvement. The dataset is intended to serve as a reference point for assessing AI advancements and fostering informed discussions about AI development and governance.

Hand Picked Video

ans do - moving cursors, clicking buttons, and navigating interfaces, though it's still in beta testing.

Top AI Products from this week

  • Soul Tarot - Soul Tarot combines AI with Tarot to help you make decisions and gain insights into your future. Ask a question, draw cards, and get personalized readings. If needed, connect with our AI Tarot guide via voice call for further explanations.

  • Gemini 2.0 Flash Thinking - Gemini 2.0 Flash Thinking Experimental is Google's enhanced reasoning model, capable of showing its thoughts to improve performance and explainability.

  • ARTLAS - ARTLAS brings art to life with an AI companion that deciphers masterpieces, crafts personalized museum tours, answers your art curiosities, and suggests must-see exhibitions—transforming how you experience and connect with art.

  • Wepost - Simplify your social media workflow. Wepost automates content creation, publishing, and analytics, so you can focus on building your brand.

  • InboxPilot - InboxPilot is a Chatbot for Emails that uses your company’s data to instantly draft or send replies. Perfect for automating responses to support@, info@, or high-volume accounts—saving you time and keeping your inbox under control.

  • HyperUGC - Replace expensive creators with AI avatars to generate authentic UGC videos in minutes. Create content for TikTok, Instagram & YouTube at a fraction of the cost.

This week in AI

  • SmolVLM Models Released - Hugging Face introduces SmolVLM-256M and SmolVLM-500M, the smallest Vision Language Models, enhancing performance with fewer parameters for efficient multimodal tasks1.

  • Ray 2 Tool Launch - Luma Labs has introduced Ray 2, a powerful tool for creating high-quality 3D content from images. It enhances user experience with improved AI capabilities, streamlining the content generation process.

  • LOKI Benchmark Overview - LOKI is a multimodal benchmark for evaluating large multimodal models (LMMs) in detecting synthetic data across 26 categories, including video, image, and audio. It features 13K questions to assess model capabilities in explainability and anomaly detection, revealing strengths and weaknesses in various modalities.

  • Virtuoso-Small Model Launch - Arcee.ai has released Virtuoso-Small, a generative AI model with 14 billion parameters, designed for efficient instruction-following and reasoning tasks. It supports business applications and is available via API.