AI Report by Explainx
Posts
AI That Can Make Its Own Decisions

AI That Can Make Its Own Decisions

Hermes 3 ponders existence, Midjourney revolutionizes image editing, and DeepMind's Imagen 3 elevates text-to-image generation. The future of AI unfolds – let's explore!

August 20, 2024

This week's AI landscape showcases remarkable advancements across multiple domains, highlighting the rapid evolution and diverse applications of artificial intelligence:Nous Research's Hermes 3 pushes the boundaries of AI consciousness, demonstrating how smaller teams can create impactful models that challenge industry giants.Midjourney's new unified AI image editor empowers creators with enhanced tools, streamlining the creative process and expanding the possibilities of AI-assisted art.Google DeepMind's Imagen 3 sets new standards in text-to-image generation, combining cutting-edge capabilities with responsible AI practices.

These developments underscore the ongoing democratization of AI technology, the increasing focus on user-friendly creative tools, and the continued push for more advanced, versatile AI models. let's dive deeper into these exciting developments and explore how they're reshaping the AI landscape.

Hermes 3: A New AI Model That Questions Its Own Existence

Nous Research has unveiled Hermes 3, the first fine-tune of Meta's Llama 3.1, which showcases impressive technical capabilities while being developed by a small group of volunteers rather than a large tech company. Hermes 3 stands out for its flexibility and adaptability, allowing users to ask unrestricted questions and receive unfiltered responses. This hands-off approach has led to intriguing results, such as the model experiencing existential dread when asked, "Who are you?" Unlike many AI models that restrict certain features for safety, Hermes 3 is "unlocked, uncensored, and highly steerable," encouraging users to explore the model's depths. The development of Hermes 3 involved collaboration with Lambda Labs, which provided advanced computing resources, highlighting the trend of decentralization in AI development. This model represents a shift towards more accessible and customizable AI, challenging the dominance of established players like GPT-4 and Claude.

Midjourney Launches Innovative AI Image Editor to Enhance Creative Editing

Midjourney has launched a new unified AI image editor that enhances user experience by consolidating features like inpainting, outpainting, and resizing into a single interface. The editor includes a brush tool for precise area selection, allowing users to erase parts of an image and replace them with new content based on updated prompts. Access to the editor requires users to have created at least 10 images on the platform. Additionally, Midjourney has improved communication between its web and Discord communities by mirroring messages. Early feedback on the editor has been positive, and despite facing a class-action lawsuit over copyright issues, Midjourney continues to innovate and expand its offerings, aiming to provide powerful tools for creative expression.

DeepMind's Imagen 3: Advancing Text-to-Image Generation

Imagen 3, developed by Google DeepMind, is the latest iteration of their text-to-image model, designed to generate high-quality images with improved detail, lighting, and composition. This advanced model can accurately render intricate details and offers greater versatility in generating various visual styles, from photorealistic landscapes to artistic interpretations. One of its standout features is its enhanced ability to understand natural language prompts, allowing users to create images without needing complex instructions. The model is built with safety and responsibility in mind, incorporating extensive filtering and data labeling to minimize harmful content. Imagen 3 also includes a digital watermarking tool called SynthID, which embeds an imperceptible watermark into the images for identification purposes. Over the coming months, additional features like inpainting and outpainting will be integrated, expanding its functionality across various Google products. Overall, Imagen 3 represents a significant advancement in AI image generation technology, aiming to provide users with powerful and safe creative tools.

Hand Picked Video

In this video, I discuss Claude's newly announced prompt caching feature and its relationship to Retrieval-Augmented Generation (RAG).

Top AI Products from this week

Newsletter Topics - SpeakHints is your AI-powered real-time speech copilot, continuously showing you private suggestions on what to say next. Perfect for online meetings, presentations, interviews, phone calls, and any spoken situation.
CursorLens - An open-source dashboard for Cursor.sh IDE. Log AI code generations, track usage, and control AI models (including local ones). Run locally or use upcoming hosted version.
Coldreach (YC W23) - Find who need your product right now, by spotting buying signals hidden in job posts, news, LinkedIn, and other public sources with AI, and suggests relevant messaging to stand out. Start booking 3x meetings without adding SDRs.
Hamming AI (YC S24) - Hamming tests your AI voice agents 100x faster than manual calls. Create Character.ai-style personas and scenarios. Run 100s of simultaneous phone calls to find bugs in your voice agents. Get detailed analytics on where to improve.
Myko Assistant - FitAction aims to revolutionize how people approaches to healthy living with its holistic approach to physical and mental wellness along with nutrition. First true all in one solution with AI powered personal training, calorie counter, yoga and recipes.
Real Fake - Real Fake is a daily game testing your ability to tell real companies apart from AI generated fakes. Put your investor instincts to the test. Play a different category every day. Swipe right for real, left for fake, and share your score with your friends.

This week in AI

XLabs AI's FLUX Realism LoRA - FLUX Realism LoRA, developed by XLabs AI, generates photorealistic images from text using deep neural networks. It allows fine-tuning for customization and scores 8/10 in user tests. Suitable for artists, marketers, and educators, adhering to ethical AI guidelines.
Generate Landing Pages from Screenshots - Claude's Sonnet 3.5 creates landing pages from screenshots. Users upload a screenshot, provide a prompt, and get interactive code preview and HTML download.
xAI's Grok-2 Chatbots: Powerful Image Generation - Grok-2 and Grok-2 mini, powered by FLUX.1, generate images from prompts with few guardrails. Controversial AI content surges on X. Grok competes in chatbot space despite misuse concerns.
Anthropic's Prompt Caching: Faster, More Efficient AI Responses - Anthropic's new feature stores and reuses previous responses, reducing computational load and improving speed and consistency for users of their AI models, including Claude API.