- AI Report by Explainx
- Posts
- Claude AI Gets Eyes👁️📰
Claude AI Gets Eyes👁️📰
AMD just shared their AI secrets, Claude learned to read PDFs, and Runway ML turned us all into directors. The AI world just leveled up, making tech magic more accessible than ever.
Hey, you won't believe what just dropped in the AI world! So three massive things just happened that are actually pretty mind-blowing. First up, AMD – you know, the chip company? They just did something really cool. They basically said "Here, take our AI models, they're yours!" It's like they're handing out the keys to their Ferrari. Anyone can now grab these powerful AI models and tinker with them however they want. Pretty sweet, right?
But wait, it gets better. Remember Claude, that AI assistant that's been making waves? Well, it just got a serious upgrade. You know how annoying it is to copy-paste stuff from PDFs all the time? Claude just said "nah, I got this" and can now read PDFs directly. It's like giving your smart friend a document and they just... get it. No more hassle, just upload and go.
And here's the cherry on top – Runway ML just dropped this insane camera control feature for their video AI. Imagine being able to direct AI-made videos like you're Spielberg. Want to zoom in dramatically? Pan across the scene? Now you can do it with a few clicks. It's like having a Hollywood studio in your laptop.
The cool thing is, all this stuff is happening right now, making AI way more useful for regular people like us. Pretty wild how fast things are moving, huh?
AMD Unveils Open-Source 1 Billion Parameter Language Models

AMD has recently introduced its first series of 1 billion parameter language models, known as AMD OLMo, which are fully open-sourced. These models were trained from scratch using 1.3 trillion tokens on a cluster of AMD Instinct™ MI250 GPUs. The initiative aims to provide organizations with the ability to pre-train and fine-tune language models tailored to specific needs, enhancing scalability and specialization beyond off-the-shelf solutions. The AMD OLMo models are designed to improve performance in natural language processing tasks, including reasoning and instruction-following capabilities. They underwent a three-stage training process: initial pre-training on a large corpus of text, followed by supervised fine-tuning on instructional datasets, and finally alignment using Direct Preference Optimization (DPO) to better align outputs with human preferences. The release includes three checkpoints for the models, showcasing their capabilities against other similar-sized open-source models. AMD emphasizes that these models can be run efficiently on AMD Ryzen™ AI PCs equipped with Neural Processing Units (NPUs), promoting local deployment for enhanced data privacy and energy efficiency. The complete training details, model weights, and code have been made publicly available to encourage further innovation within the AI community.
Anthropic Introduces PDF Support for Claude AI Model

The article from Anthropic announces the introduction of PDF support for their AI model, Claude. This update enables users to upload PDF documents directly, allowing Claude to read and analyze the content within these files. The feature enhances user interaction by enabling Claude to extract information, summarize documents, and answer questions based on the text in the PDFs. This capability aims to improve the model's utility across various applications, making it easier for users to access and utilize information contained in PDF format. Additionally, the guide provides instructions on how to effectively use this feature, highlighting its potential to enhance productivity and streamline information retrieval processes.
Runway ML Introduces Advanced Camera Control for Gen-3 Alpha Turbo

Recently, Runway ML announced the introduction of Advanced Camera Control for its Gen-3 Alpha Turbo model. This feature enables users to manipulate both the direction and intensity of camera movements in AI-generated videos, enhancing the creative possibilities for scene creation. This update allows for more precise control over how scenes are presented, making it easier to achieve desired visual effects.The feature was highlighted in a brief video showcasing its capabilities, which was released just days ago. Users can now experiment with different camera angles and movements to create more dynamic and engaging content.
Hand Picked Video
In this video, we'll look at Prompt Caching vs RAG.
Top AI Products from this week
Truva AI - Truva handles the busywork intelligently, saving your sales team hours each week so they can focus on what matters most. Truva's AI agents automate tasks like email follow-up, CRM data entry, and process optimization—boosting sales by up to 25% for teams.
ChatGPT search - ChatGPT can now search the web in a much better way than before. You can get fast, timely answers with links to relevant web sources, which you would have previously needed to go to a search engine for.
Claude for Desktop - Download Claude for your desktop or mobile device.
Monica Code - Monica Code brings Claude 3.5 and GPT-4o directly into VSCode, providing in-depth project understanding without disrupting your workflow.
Pixyer.AI - Background removal, photo enhancement, and background generation—all in one place. Pixyer analyzes your product to generate a perfect-fit background, with automated lighting and tone adjustments — just like a professional photographer.
X to Voice - What does your X profile sound like? X to Voice analyzes your X profile to come up with a unique voice and avatar using ElevenLabs Voice Design API and Hedra Character 2
KLING AI - KLING AI, a cutting-edge creative studio by Kuaishou Tech, excels in image and video generation. It ignites creativity through prompts and images, producing realistic visuals with advanced text comprehension, intricate details, and diverse styles.
Slide Dish - Reach into your fridge and create custom recipes from your favorite ingredients. Adjust recipes for world cuisines or diets, scale servings, and get step-by-step instructions. Save your favorites, explore new flavors, and become a better home chef with ease.
This week in AI
Copilot Vision Enhances Visual Content Analysis - Copilot Vision will soon enable browsers to analyze and interpret visual content, enhancing user experience by providing insights based on what is seen in real-time.
Google’s AI Update - Google's Gemini API now features Grounding with Google Search, allowing real-time data integration into AI apps for improved accuracy and user engagement.
Oracle's AI EHR Launch - Oracle has introduced an AI-powered electronic health record system to improve patient care and streamline healthcare operations through automation.
MobileLLM Overview - MobileLLM optimizes sub-billion parameter language models for on-device use, achieving 2.7%/4.3% accuracy boosts on zero-shot tasks, with scalable designs for larger models.