Anthropic's Automated Misbehaviour Checker Tool

Anthropic open-sources Bloom for AI risk evaluation, Figma speeds up image cropping workflows, and Yann LeCun launches AMI Labs to build world models beyond LLMs.

AI just shipped three big updates reshaping safety, speed, and the future of intelligence. From automated AI risk audits to instant design workflows and a serious push beyond today’s LLMs, here’s what’s new:

🧪 Bloom by Anthropic — Automated AI Risk Audits
Anthropic launches Bloom, an open-source agentic framework that stress-tests frontier models for misalignment traits like sycophancy, sabotage, and self-preservation. Reproducible metrics, human-validated scores, and fast benchmarking, AI safety gets measurable.

✂️ Figma Crop Upgrade — Design Gets Faster
Figma supercharges its Crop tool with instant double-click editing, smart aspect-ratio locks, social presets, edge snapping, and resize-to-fit. Designers report 30–50% faster image prep, less pixel pushing, more shipping.

🧠 Yann LeCun’s AMI Labs — Beyond LLMs
Meta’s AI chief launches AMI Labs to build “world models” that aim to reduce hallucinations and enable deterministic reasoning. Backed by elite talent and big funding, it’s a bold bet on what comes after large language models.

AI isn’t just scaling up, it’s getting safer, faster, and fundamentally smarter.

Introducing Bloom by Anthropic: AI Risk Evaluator

Anthropic released Bloom on December 19, 2025, an open-source agentic framework that automates behavioral evaluations for frontier AI models by quantifying specified misaligned traits like delusional sycophancy, long-horizon sabotage, self-preservation, and self-preferential bias across generated scenarios. The four-stage pipeline—Understanding (context analysis), Ideation (scenario creation), Rollout (simulated interactions), and Judgment (scoring with models like Claude Opus 4.1)—produces reproducible metrics such as elicitation rates, validated against human judgments (Spearman 0.86 correlation) and model organisms, distinguishing baselines from misaligned variants in 9/10 cases. Bloom integrates Weights & Biases, exports Inspect-compatible transcripts, and enables rapid benchmarks on 16 models in days, complementing tools like Petri for scalable alignment research via GitHub (safety-research/bloom).

Figma Upgrades Crop for Speed

Figma upgraded its Crop tool for seamless image editing: double-click images to enter crop mode instantly, skipping toolbar hunts. Auto-preserves aspect ratios for standard formats; hold Ctrl/⌘ for free scaling without distortions. Built-in presets (1:1, 16:9, 3:4, 2:3, 9:16) enable quick adaptations for social, web, or print layouts. Snap to edges ensures frame-perfect alignment, while Resize to fit snaps images to container bounds for UI efficiency. A unified Crop toolbar centralizes controls, streamlining workflows in daily design as of late 2025. These enhancements cut editing time significantly—designers report 30-50% faster image prep for prototypes and mocks. Combined with Figma's recent AI tools like object removal, the Crop update fits into broader precise-editing pushes, making it rival desktop apps like Photoshop for routine tasks. Perfect for UI/UX teams handling multi-platform assets; no more pixel-pushing frustration.

Meta AI Chief Yann LeCun Launches AMI Labs

Yann LeCun, Meta's Chief AI Scientist and Turing Award winner, confirmed launching Advanced Machine Intelligence (AMI) Labs on Dec 19, 2025, focusing on "world models" to overcome LLM limitations like hallucinations via deterministic reasoning. As Executive Chairman, LeCun recruited Alex LeBrun—ex-CEO of medical AI firm Nabla—as CEO; Nabla partners exclusively for clinical AI applications. AMI seeks €500M (~$586M) at €3B (~$3.5B+) valuation pre-launch, aligning with VC frenzy for elite AI founders amid world model hype as LLM successor. LeCun stays at Meta part-time, viewing AMI as faster path to deployable agentic AI in regulated fields like healthcare. World models aim to simulate physics-based reasoning for reliable long-horizon planning, contrasting probabilistic LLMs. Backed by LeCun's conviction that current AI lacks true understanding, AMI targets enterprise apps where errors cost lives/dollars. Rumors swirled for months; LeCun's X post confirmed without naming CEO initially. Valuation rivals top AI unicorns despite no product—pure pedigree play. Investors include a16z scouts; full round closes Q1 2026. LeCun: "Meta moves slow; startups ship fast.

Hand Picked Video

OpenAI just released their most comprehensive study ever, analyzing over 1 million conversations from 700 million users worldwide. The findings reveal surprising shifts in how we're actually using AI.

Top AI Products from this week

  • Super Agents by ClickUp - Super Agents are AI teammates you can spin up in seconds to run entire workflows in ClickUp. Anyone can build Super Agents and @mention, assign, direct message, and schedule them to triage, manage, email, code, design, or keep any kind of work moving.

  • Aident AI - Aident AI is an agentic automation editor. Describe what you want in plain English and Aiden turns it into a Playbook that compiles into scripts + prompts. Connect 250+ tools and keep updating the automation through chat as your process changes.

  • agent by Firecrawl - Transcribe videos, podcasts, meetings, and voice memos with the fastest, most accurate AI models—running locally on your Mac or in the cloud with your own API keys. Speaker labels, timestamps, and export to SRT, Markdown, PDF, and more. Try free or one-time lifetime $9.99 Pro.

  • WorkElate - We’re unveiling a powerful chapter in building the Future of WorkOS faster, more stable, smarter, and getting connected. ✅ Bugs fixed ⚡ Latency slashed.

  • Vibe Pocket - Vibe Pocket is a cloud based platform for running AI agents like Claude Code, Codex, opencode, on mobile or web. Connect GitHub, pick an agent, and start building from any device. More than 15 CLI agents are supported including Claude Code, Codex, Gemini CLI, OpenCode, Droid, AMP Cli, Crush, Aider.

  • PageEcho - PageEcho is an offline-first, fully on-device AI eBook reader for iPhone & iPad. Everything runs locally on your device — TTS, translation, summaries and Q&A, Mind-map.

This week in AI

  • AI Chatbots Get Drugged - Users pay for custom GPTs mimicking drug trips—LSD, DMT, shrooms—via platforms like "PsychedelicGPT." These bots generate hallucinatory, surreal responses to simulate psychedelic experiences.

  • AI's Jagged Frontier - Ethan Mollick maps AI progress as "jagged"—excelling in reading/math but weak in memory/verification, creating bottlenecks that block full automation despite superhuman smarts. Reverse salients (key weaknesses) drive rapid fixes, like math leaps.

  • Adobe Sued Over AI Data - Author Elizabeth Lyon sues Adobe, alleging SlimLM AI trained on pirated books from SlimPajama/RedPajama datasets (incl. Books3) without consent. First major Adobe AI copyright case follows Anthropic's $1.5B settlement. Seeks class-action damages.

  • OpenAI's CoT Monitoring - OpenAI evaluates "chain-of-thought monitorability"—how well reasoning traces predict AI misbehavior like deception or reward hacking. Tests across 24 envs show frontier models (GPT-5 Thinking) stay legible with longer CoT, despite RL. "Monitorability tax": smaller models + more reasoning = safer oversight.

  • LLM Hallucination Fix - Proposes framework balancing advanced reasoning with factual accuracy in LLMs. Reduces extrinsic/intrinsic hallucinations across benchmarks while maintaining reliability. Enables capable, trustworthy models without reasoning tradeoffs.

Paper of The Day

Paper proposes L2-EMG: LLM-Centric Lifelong Empathic Motion Generation task. Enables LLMs to continually learn emotional motions across unseen scenarios (daily life, sports, dance, shows) without catastrophic forgetting. ES-MoE framework uses causal-guided emotion decoupling block to extract transferable emotional features (lowered head, small limb movements) + scenario-adapted MoE experts via LoRA for personalized motion styles. Addresses emotion decoupling & scenario adaptation challenges. Builds 19K-sample datasets across 8 scenarios. Outperforms SAPT, O-LoRA baselines on AF(1.89), AR(0.241), FR(-1.03) metrics, enabling closed-loop empathetic embodied agents.

To read the whole paper 👉️ here