OpenAI makes an Insane Dev day announcement!

OpenAI’s AgentKit streamlines AI agent creation, Google DeepMind’s CodeMender automates code fixes to boost software security, ElevenLabs’ Agent Workflows enables multi‐LLM conversational design

This week in AI, multi-agent orchestration, code security, and conversational intelligence are advancing at remarkable speed. From visual agent toolkits to autonomous code repair and modular conversation flows, here’s what’s making headlines:

🛠 AgentBuilder by OpenAI
A visual toolkit for building and evaluating AI agents, featuring multi-agent workflow design, unified data connectors, and customizable chat UIs, with automated prompt optimization and reinforcement fine-tuning.

🔒 CodeMender
Google DeepMind unveils CodeMender, an AI agent that automatically detects, fixes, and prevents code vulnerabilities. With validated patches and proactive rewriting, it’s already improved dozens of open‑source projects, showing how AI can make software safer and more reliable.

💬 Agent Workflows
ElevenLabs introduces Agent Workflows, a visual editor for designing branching, modular conversations powered by multiple LLMs. It enables adaptive Subagents, smart routing, and structured testing, bringing precision and scale to enterprise‑grade conversational AI.

Additionally, at OpenAI DevDay 2025, the company unveiled GPT‑5 Pro for advanced reasoning and creativity, Sora 2 for video generation, and major Codex updates with a new SDK and Slack integration. It also introduced cost‑efficient voice and image models, GPT Real‑Time Mini and GPT Image One Mini, and enhanced enterprise security across ChatGPT and AgentKit for scalable deployments.

From seamless agent orchestration to autonomous debugging and modular AI dialogue design, these innovations signal a leap forward in intelligent workflow automation.

OpenAI Launches AgentKit, a Visual Toolkit to Simplify AI Agent Workflows

OpenAI has introduced AgentKit, a comprehensive toolkit aimed at simplifying the creation, deployment, and optimization of AI agents for developers and enterprises. Featuring a visual Agent Builder for designing multi-agent workflows, a Connector Registry for unified data integration, and ChatKit for embedding customizable chat UIs, AgentKit eliminates the complexity of managing fragmented tools and manual processes. Enhanced evaluation features now include automated prompt optimization, trace grading, and support for third-party models, making it easier to rigorously measure agent performance. With capabilities like reinforcement fine-tuning and custom tool calls, AgentKit is positioned to accelerate the rollout of sophisticated agentic solutions, boosting efficiency and allowing faster, safer deployment across a variety of real-world applications.

Google DeepMind Unveils CodeMender

Google has introduced CodeMender, an AI agent by Google DeepMind designed to automatically find and fix software vulnerabilities. Utilizing advanced program analysis and large language models, CodeMender identifies root causes of bugs, generates high-quality security patches, and proactively rewrites code to prevent future vulnerabilities. It incorporates rigorous validation steps to ensure functional correctness and avoid regressions, escalating only verified patches for human review. In six months, CodeMender contributed 72 fixes to major open-source projects, demonstrating how AI can significantly enhance software security and reduce developer burden.

ElevenLabs Launches Agent Workflows

ElevenLabs unveiled Agent Workflows, a powerful visual editor enabling developers to design sophisticated, branching conversation flows in its Agents Platform. This new tool allows the creation of adaptive AI agents that break complex interactions into manageable, modular Subagents, each with customized prompts, tools, and knowledge. By enabling smart routing, human handoffs, and the use of multiple LLMs per flow, Agent Workflows significantly enhance control, precision, and scalability for conversational AI, empowering businesses to deliver nuanced, enterprise-grade customer experiences. The feature fosters reliability with structured testing and seamless integration into existing systems, marking a major step forward in conversational AI orchestration.

Hand Picked Video

In this video, we'll look at what agentic AI actually is and why it's completely different from everything you've seen before. We'll explore AI agents like Comet Browser, ChatGPT Agent Mode, Browser Use, and Olly Socials to understand how autonomous AI is transforming web automation and productivity.

Top AI Products from this week

  • PromptCompose - PromptCompose is a visual prompt engineering tool that helps users create, test, and iterate AI prompts efficiently in one unified interface. It offers instant validation, schema suggestions, mustache syntax highlighting, and support for multiple LLM vendors, streamlining the development of AI-powered apps, assistants, and agents.

  • Skyllbox - Skyllbox is a secure, local AI platform that enables users to create, organize, and automate custom prompts and repetitive tasks while keeping data encrypted and stored only on the user’s machine.

  • OpenAI - OpenAI leads in advanced AI models like GPT-5 and o3, offering cutting-edge capabilities in natural language understanding, multimodal input (text, images, audio, video), real-time collaboration, personalized AI assistance, and AI-powered code generation.

  • NovaKitz - NovaKitz is an all-in-one AI toolkit designed to boost productivity across writing, studying, emailing, diagramming, and PDF processing. It offers AI-powered tools like paragraph rewriting, letter and content generation, study guides, flashcard creation, email writing and templates, mind map and flowchart generation, as well as document summarization and Q&A.

  • Instict 2 - Instruct 2 enables users to build, edit, and run powerful AI agents using natural language prompts without coding complexity. It simplifies automations by letting users create robust workflows with conversational commands, making AI-driven task execution accessible to everyone.

  • BuyScout - BuyScout acts as a personal shopping assistant by providing real-time product insights, price tracking, and chat support directly on shopping pages, helping shoppers make smarter purchase decisions, enhancing the overall online shopping experience.

This week in AI

  • Anthropic Advances AI for Cyber Defenders - Anthropic enhanced Claude’s ability to help defenders detect, analyze, and fix code and system vulnerabilities. Claude Sonnet 4.5 matches or outperforms the recently released Opus 4.1 in finding vulnerabilities and other cyber skills.

  • Apps Inside ChatGPT - OpenAI introduces a new generation of interactive apps integrated directly into ChatGPT, allowing users to seamlessly access services like Spotify, Zillow, and Canva within conversations, while developers can build and deploy apps using the new Apps SDK preview.

  • Runway Teases Custom Workflow Platform - Runway is preparing to launch a next-gen tool that lets users build fully customizable workflows, offering creators and developers greater flexibility, automation, and creative freedom within its AI-powered suite.

  • Higgsfield Launches UNLIMITED Sora 2 & Sora 2 PRO - Higgsfield has made the world’s most wanted AI video model, Sora 2, available worldwide with unlimited, unrestricted access, featuring audio synchronization, 1080p quality, and multi-scene reasoning. Users can earn credits by engaging with their social campaigns for enhanced creativity and content production.

  • Dreamer 4 AI Agent - Dreamer 4 is a scalable AI agent that trains inside a fast, accurate world model, mastering complex tasks like mining diamonds in Minecraft purely from offline data, advancing robotics potential.

Paper of The Day

The paper introduces LMM-Incentive, a new incentive model using Large Multimodal Models (LMMs) to motivate users to create high-quality user-generated content (UGC) in Web 3.0. It addresses issues of information asymmetry, where users may submit low-effort content to exploit rewards. The framework designs contracts that users select based on reputation and effort levels. LMM agents evaluate content quality directly using advanced prompt techniques. An improved Mixture of Experts (MoE)-based Proximal Policy Optimization (PPO) algorithm optimizes contract design dynamically. Empirical results demonstrate that this method outperforms existing benchmarks. The model is deployed on Ethereum to support sustainable, high-quality content creation in decentralized environments.

To read the whole paper 👉️ here.