<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Wed, 15 Apr 2026 14:00:15 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[Adobe’s new Firefly AI Assistant wants to run Photoshop, Premiere, Illustrator and more from one prompt]]></title>
            <link>https://venturebeat.com/technology/adobes-new-firefly-ai-assistant-wants-to-run-photoshop-premiere-illustrator-and-more-from-one-prompt</link>
            <guid isPermaLink="false">7efPNgDvD9GLodSCcgVgy2</guid>
            <pubDate>Wed, 15 Apr 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[<p><a href="https://www.adobe.com/">Adobe</a> today launched its most ambitious AI offensive to date, unveiling the <a href="https://news.adobe.com/">Firefly AI Assistant</a> — a new agentic creative tool that can orchestrate complex, multi-step workflows across the company&#x27;s entire <a href="https://www.adobe.com/creativecloud.html">Creative Cloud suite</a> from a single conversational interface — alongside a raft of new video, image, and collaboration features designed to position the company at the center of the rapidly evolving AI-powered content creation landscape.</p><p>The announcements, which also include a new Color Mode for <a href="https://www.adobe.com/products/premiere.html">Premiere Pro</a>, the addition of <a href="https://higgsfield.ai/kling-3.0">Kling 3.0 video models</a> to Firefly&#x27;s growing roster of third-party AI engines, and <a href="http://frame.io">Frame.io Drive </a>— a virtual filesystem that lets distributed teams work with cloud-stored media as though it lived on their local machines — represent Adobe&#x27;s clearest signal yet that it views agentic AI not as a feature upgrade but as a fundamental reshaping of how creative work gets done.</p><p>&quot;We want creators to tell us the destination and let the Firefly assistant — with its deep understanding of all the Adobe professional tools and generative tools — bring the tools to you right in the conversation,&quot; Alexandru Costin, Vice President of AI &amp; Innovation at Adobe, told VentureBeat in an exclusive interview ahead of the launch.</p><p>The stakes could hardly be higher. Adobe is fighting to convince Wall Street, creative professionals, and a wave of well-funded AI-native competitors that its decades-old software empire can not only survive the generative AI revolution but lead it.</p><h2><b>How Adobe turned a research prototype into a 100-tool creative agent</b></h2><p>The centerpiece of today&#x27;s announcement is the <a href="https://www.adobe.com/products/firefly.html">Firefly AI Assistant</a>, which Adobe describes as a fundamentally new way to interact with its creative tools. Rather than requiring users to manually navigate between <a href="https://www.adobe.com/products/photoshop.html">Photoshop</a>, <a href="https://www.adobe.com/products/premiere.html">Premiere</a>, <a href="https://www.adobe.com/products/illustrator.html">Illustrator</a>, <a href="https://lightroom.adobe.com/">Lightroom</a>, <a href="https://www.adobe.com/express/">Express</a>, and other apps — selecting the right tool for each step of a complex project — the assistant lets creators describe an outcome in natural language. The agent then figures out which tools to invoke, in what order, and executes the workflow.</p><p>The assistant is the productized version of <a href="https://blog.adobe.com/en/publish/2025/10/28/our-view-agentic-ai-assistants-that-work-you-in-your-favorite-apps">Project Moonlight</a>, a research prototype Adobe first previewed at its annual MAX conference in the fall of 2025 and subsequently refined through a private beta. &quot;This is basically [Project] Moonlight,&quot; Costin confirmed to VentureBeat. &quot;We started with all the learnings from Moonlight, and we engaged with customers. We looked internally. We evolved that architecture to make it more ambitious.&quot;</p><p>Under the hood, Adobe says it has assembled roughly 100 tools and skills that the assistant can call upon, spanning generative image and video creation, precision photo editing, layout adaptation, and even stakeholder review through <a href="http://frame.io">Frame.io</a>. The system is built around a single conversational interface inside the Firefly web app where users describe what they want and the assistant maintains context across sessions. Pre-built Creative Skills — purpose-built, multi-step workflow templates such as portrait retouching or social media asset generation — can be run from a single prompt and customized to match a creator&#x27;s own style. The assistant also learns a creator&#x27;s preferred tools, workflows, and aesthetic choices over time, and understands the content type being worked on — image, video, vector, brand assets — to make context-aware decisions.</p><p>Crucially, outputs use native Adobe file formats — PSD, AI, PRPROJ — meaning users can take any result into the corresponding flagship app for manual, pixel-level refinement at any point. &quot;We always imagine this continuum where you can have complete conversational edits and pixel-perfect edits, and you can decide, as a creative, where you want to land,&quot; Costin said. The <a href="https://www.adobe.com/products/firefly.html">Firefly AI Assistant</a> will enter public beta in the coming weeks, though Adobe did not specify an exact date.</p><h2><b>Why Wall Street is watching Adobe&#x27;s AI pricing model so closely</b></h2><p>For a company whose AI monetization story has faced persistent skepticism from investors, the pricing structure of the Firefly AI Assistant will be closely watched. Costin told VentureBeat that, at launch, using the assistant will require an active Adobe subscription that includes the relevant apps — meaning users who want the agent to invoke Photoshop cloud capabilities, for instance, will need an entitlement that includes the Photoshop SKU. Generative actions will consume the user&#x27;s existing pool of generative credits, consistent with how Firefly credits work across the rest of Adobe&#x27;s platform.</p><p>&quot;To use some of these cloud capabilities from Photoshop and other apps, you need to have a subscription that includes access to the Photoshop SKU,&quot; Costin explained. &quot;You&#x27;ll be consuming your credits when you use generative features.&quot; He acknowledged, however, that the model could evolve: &quot;As we better understand the value of this — and the costs of operating the brain, the conversation engine — things might change.&quot;</p><p>The question of whether Adobe can convert AI enthusiasm into meaningful revenue growth is anything but theoretical. When Adobe reported its <a href="https://www.adobe.com/cc-shared/assets/investor-relations/pdfs/21306202/ay45th643t5y46.pdf">most recent quarterly results</a> in March, it touted 10% year-over-year revenue growth to $6.4 billion and disclosed that annual recurring revenue from AI standalone and add-on products had reached $125 million — a figure CEO Shantanu Narayen projected would double within nine months.</p><h2><b>Adobe adds Chinese AI video models to Firefly, raising commercial safety questions</b></h2><p>Alongside the assistant, Adobe is expanding Firefly&#x27;s roster of third-party AI models to include <a href="https://kling.ai/">Kling 3.0</a> and <a href="https://kling.ai/app/omni/new">Kling 3.0 Omni</a>, two video generation models developed by Kuaishou, the Chinese technology company. Kling 3.0 focuses on fast, high-quality production with smart storyboarding and audio-visual sync, while the Omni variant adds professional controls for shot duration, camera angle, and character movement across multi-shot sequences. The additions bring Firefly&#x27;s model count to more than 30, joining Google&#x27;s <a href="https://blog.google/innovation-and-ai/technology/ai/nano-banana-2/">Nano Banana 2</a> and <a href="https://blog.google/innovation-and-ai/technology/ai/veo-3-1-lite/">Veo 3.1</a>, Runway&#x27;s <a href="https://firefly.adobe.com/generate/video">Gen-4.5</a>, Luma AI&#x27;s <a href="https://firefly.adobe.com/generate/video?sdid=M7K4SCXX&amp;mv=search&amp;mv2=paidsearch&amp;ef_id=Cj0KCQjwy_fOBhC6ARIsAHKFB7-d4stemHQOm5wih7cWhSphxr0nbLAv4Lnfnn_612KrLlmQf5SwMm4aAsAPEALw_wcB%3AG%3As&amp;s_kwcid=AL%213085%213%21799093785594%21e%21%21g%21%21ray3.14%2123540033269%21193419771512&amp;gad_source=1&amp;gad_campaignid=23540033269&amp;gbraid=0AAAABBMYekTO4u4pE6vx1sRzw64rrdXnK&amp;gclid=Cj0KCQjwy_fOBhC6ARIsAHKFB7-d4stemHQOm5wih7cWhSphxr0nbLAv4Lnfnn_612KrLlmQf5SwMm4aAsAPEALw_wcB">Ray3.14</a>, Black Forest Labs&#x27; <a href="https://bfl.ai/models/flux-2">FLUX.2[pro]</a>, ElevenLabs&#x27; <a href="https://elevenlabs.io/blog/eleven-multilingual-v2">Multilingual v2</a>, and others.</p><p>When asked whether Adobe had concerns about integrating a model from a Chinese tech company given the current geopolitical climate, Costin was direct: &quot;We think choice is what we want to offer our customers.&quot; He explained that Adobe&#x27;s strategy distinguishes between its own commercially safe, first-party Firefly models — trained on licensed Adobe Stock imagery and public domain content — and third-party partner models, which carry different commercial safety profiles. &quot;For some use cases, like ideation, non-production use cases, we got requests from customers to support some external models,&quot; Costin said. &quot;If I&#x27;m in ideation, I might be more flexible with commercial safety. When I go into production, I’d want to have a model that gives you more confidence.&quot;</p><p>This raises an important nuance for the agentic era. When the <a href="https://www.adobe.com/products/firefly.html">Firefly AI Assistant</a> autonomously selects which model to use for a given task, the commercial safety guarantees may vary depending on which engine it invokes. Costin pointed to Adobe&#x27;s <a href="https://helpx.adobe.com/creative-cloud/apps/adobe-content-authenticity/content-credentials/overview.html">Content Credentials</a> system — the metadata-and-fingerprinting framework developed through the Content Authenticity Initiative — as the mechanism for maintaining transparency. &quot;The agentic power — and the fact that the assistant has access to all of those models — means it could decide to use a model that carries different content credentials,&quot; he acknowledged. &quot;But with the transparency of content credentials, the user will know how a particular piece of content was created and can decide whether that&#x27;s commercially safe or not.&quot; Adobe offers commercial indemnity for its first-party Firefly models but applies different indemnity levels for third-party models — a distinction that enterprise buyers, in particular, will need to carefully evaluate.</p><h2><b>Inside Adobe&#x27;s active collaboration with Nvidia on long-running AI agent infrastructure</b></h2><p>Adobe&#x27;s agentic ambitions also intersect with its strategic partnership with Nvidia, announced earlier this year at <a href="https://venturebeat.com/technology/nvidia-launches-enterprise-ai-agent-platform-with-adobe-salesforce-sap-among">Nvidia’s GTC conference</a>. When asked whether the Firefly AI Assistant&#x27;s agentic capabilities are built on NVIDIA&#x27;s agent toolkit and NeMo infrastructure, Costin revealed that the collaboration is active but has not yet made it into a shipping product.</p><p>&quot;We&#x27;re in active discussions — investigating not only <a href="https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/">Nemotron</a>,&quot; Costin said. &quot;They have this technology called <a href="https://build.nvidia.com/openshell">Open Shell</a> and <a href="https://www.nvidia.com/en-us/ai/nemoclaw/">Nemo Claw</a>, which give us the ability to efficiently run long-running agentic workflows in a sandboxed environment.&quot; He said the technology would become increasingly important as Adobe pushes the assistant to handle longer, more autonomous creative tasks — but cautioned that &quot;it&#x27;s not shipping yet. It&#x27;s being actively explored.&quot;</p><p>For Nvidia, which is <a href="https://venturebeat.com/technology/nvidia-launches-enterprise-ai-agent-platform-with-adobe-salesforce-sap-among">building an ecosystem of enterprise AI agent platforms</a> with partners like Adobe, Salesforce, and SAP, the partnership could eventually serve as a high-profile proof point for its agent infrastructure stack in the creative vertical. For Adobe, the ability to run complex, long-duration agentic workflows efficiently and securely in sandboxed environments could be the technical foundation that separates the <a href="https://www.adobe.com/products/firefly.html">Firefly AI Assistant</a> from lighter-weight chatbot integrations offered by competitors. The partnership also signals Adobe&#x27;s recognition that the computational demands of agentic AI — where a single user request may trigger dozens of model calls and tool invocations — require infrastructure partnerships that go well beyond what a software company can build alone.</p><h2><b>Premiere Pro&#x27;s new color grading mode and the tools Adobe is shipping today</b></h2><p>Beyond the headline AI assistant announcement, Adobe&#x27;s broader set of updates reflects a company trying to strengthen its position across every phase of the content creation pipeline. Color Mode in <a href="https://www.adobe.com/products/premiere.html">Premiere Pro</a> may be the most significant near-term upgrade for working editors. Entering public beta today, Color Mode is described as a first-of-its-kind color grading experience built specifically for the way editors — rather than dedicated colorists — think and work. Adobe notes that it was developed through an extensive private beta with hundreds of working editors, and that participants reported they &quot;actually enjoy color grading&quot; — a sentiment suggesting Adobe may have found a way to democratize one of post-production&#x27;s most intimidating disciplines. General availability is expected later in 2026.</p><p>The <a href="https://www.adobe.com/products/firefly/features/ai-video-editor.html">Firefly Video Editor</a> gains audio upgrades including the Enhance Speech feature migrated from Premiere and Adobe Podcast, direct Adobe Stock integration with access to more than 800 million licensed assets, and simple color adjustment controls with intuitive sliders and one-click looks. On the image editing front, Adobe introduced Precision Flow, which generates a range of semantic variations from a single prompt and lets users browse them via an interactive slider — a novel approach that Costin described as &quot;the best slider-based control mixed with the best semantic understanding of not only the existing scene, but what the scene could be.&quot; AI Markup complements this by letting users draw directly on images to specify where and how edits should be applied. After Effects 26.2 adds an AI-powered Object Matte tool that dramatically accelerates rotoscoping and masking — create accurate mattes of moving subjects with a hover and click, refine with a Quick Selection brush, and perfect edges with a Refine Edge tool.</p><h2><b>Frame.io Drive wants to kill the shipped hard drive and make cloud media feel local</b></h2><p>Rounding out the announcements, <a href="http://frame.io">Frame.io Drive</a> addresses one of the most persistent pain points in distributed video production: getting media from point A to point B without losing hours — or days — to downloads, syncing, and shipped hard drives. Frame.io Drive is a desktop application that mounts Frame.io projects to a user&#x27;s computer so media appears in Finder or Explorer and behaves like local files. The underlying technology, called Frame.io Mounted Storage, streams media on demand as applications request it, while local caching ensures smooth playback. The product builds on streaming technology provided by Suite Studios, and the real-time file access capability is included with every Frame.io account. Adobe emphasized that all content lives solely within Frame.io and is never shared with third parties.</p><p>The move positions <a href="http://frame.io">Frame.io</a> not just as a review-and-approval tool at the end of the production pipeline but as the central media layer from the very beginning of a project — from first capture through final delivery. If successful, the strategy could significantly deepen Adobe&#x27;s lock-in with professional video teams by making Frame.io the single source of truth for distributed productions. Frame.io Drive and Mounted Storage will roll out in phases, with Enterprise customers gaining access starting today and accounts on other plans following shortly. Others can join a waitlist.</p><h2><b>Adobe&#x27;s biggest challenge isn&#x27;t building the AI — it&#x27;s convincing creators to trust it</b></h2><p>Taken together, today&#x27;s announcements paint a picture of a company executing aggressively across multiple fronts — but also one that is navigating a complex moment. Adobe first <a href="https://news.adobe.com/news/news-details/2023/adobe-unveils-firefly-a-family-of-new-creative-generative-ai">introduced Firefly in March 2023</a> as a family of generative AI models focused on image and text effects, with a strong emphasis on commercial safety through training on licensed Adobe Stock content. In the two years since, the company has rapidly expanded into video generation, multi-model access, and now agentic workflows — a trajectory that mirrors the broader industry&#x27;s shift from standalone AI features to AI-native systems.</p><p>But the competitive field has grown dramatically. <a href="https://runwayml.com/">Runway</a>, <a href="https://pika.art/login">Pika</a>, and a host of AI-native video generation startups have captured mindshare among creators. <a href="https://www.canva.com/">Canva</a> has aggressively integrated AI into its design platform. And the emergence of powerful foundation models from <a href="https://openai.com/">OpenAI</a>, <a href="https://www.google.com/">Google</a>, and <a href="https://www.anthropic.com/">Anthropic</a> — the latter of which Adobe says it will integrate with Firefly AI Assistant capabilities — means the barrier to building creative AI tools has never been lower. Adobe is also navigating these product ambitions against a complex corporate backdrop: the <a href="https://news.adobe.com/news/2026/03/leadership-update">impending departure of CEO Shantanu Narayen</a>, an actively exploited zero-day vulnerability in <a href="https://thehackernews.com/2026/04/adobe-patches-actively-exploited.html">Acrobat Reader (CVE-2026-34621)</a> that had been used by hackers for months before being patched this week, a U.K. antitrust investigation over cancellation fees, and a recent <a href="https://news.adobe.com/news/2026/03/adobe-statement">$75 million lawsuit settlement</a>.</p><p>Adobe&#x27;s response, articulated clearly through today&#x27;s launches, is to lean into what it believes is its deepest moat: the integration of AI into a set of professional-grade, category-leading applications that no startup can replicate overnight. Costin framed the agentic transition as empowering rather than threatening to creative professionals, comparing Creative Skills to a next-generation version of Photoshop Actions — the macro-recording feature that has long allowed power users to automate repetitive tasks. &quot;We want to help our customers become — from the ones doing all the work — to be creative directors, doing some of the work, but most importantly, guiding the assistant in executing some of those creative visions,&quot; he said.</p><p>It is a compelling pitch — and, in its own way, a revealing one. For three decades, Adobe made its fortune by selling the tools that turned creative vision into finished pixels. Now it is asking its customers to let an AI agent handle more of that translation, trusting that the human role will shift from operating the tools to directing the outcome. Whether creators embrace that bargain — and whether Wall Street rewards it — will determine not just Adobe&#x27;s trajectory but the shape of an entire industry learning to create alongside machines.
</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/54hgOf2v64G0I5GTThr0l4/c3f65ee3d8279d04c8296e82710a3985/nuneybits_Vector_art_of_a_glossy_monitor_displaying_Adobe_suite_c0dc8886-2da1-42d0-9883-277e655ffa8c.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Traza raises $2.1 million led by Base10 to automate procurement workflows with AI ]]></title>
            <link>https://venturebeat.com/orchestration/traza-raises-usd2-1-million-led-by-base10-to-automate-procurement-workflows-with-ai</link>
            <guid isPermaLink="false">4ZP1UllkxTssTpK5UG4tr9</guid>
            <pubDate>Wed, 15 Apr 2026 12:55:00 GMT</pubDate>
            <description><![CDATA[<p>For decades, procurement has been the back office that enterprise software forgot. Billions of dollars flow through vendor negotiations, purchase orders, and supplier communications every year at the largest manufacturers and construction companies in the country — and the vast majority of that work still runs on email threads, spreadsheets, and phone calls.</p><p><a href="https://traza.ai/">Traza</a>, a newly launched startup headquartered in New York, believes the moment has arrived to change that. The company announced today the close of a $2.1 million pre-seed round led by <a href="https://base10.vc/">Base10 Partners</a>, with participation from <a href="https://www.kfund.vc/">Kfund</a>, <a href="https://superscout.co/program/a16z">a16z scouts</a>, <a href="https://www.clara.ventures/">Clara Ventures</a>, <a href="https://www.masia.vc/">Masia Ventures</a>, and a roster of angel investors including Pepe Agell, who scaled Chartboost to 700 million monthly users before its acquisition by Zynga.</p><p>The funding is modest by Silicon Valley standards. But Traza&#x27;s pitch is anything but incremental: the company deploys AI agents that don&#x27;t just recommend procurement actions — they execute them autonomously, handling vendor outreach, request-for-quote generation, order tracking, supplier communications, and invoice processing without continuous human supervision.</p><p>&quot;AI is redesigning the procurement category from the ground up,&quot; said Silvestre Jara Montes, Traza&#x27;s CEO and co-founder, in an exclusive interview with VentureBeat. &quot;This wave of AI won&#x27;t just build procurement software — it will rebuild how procurement works.&quot;</p><h2><b>Why procurement contracts silently lose millions after the ink dries</b></h2><p>The market Traza is targeting is enormous and, by the company&#x27;s framing, spectacularly underserved. The procurement software market alone <a href="https://www.precedenceresearch.com/procurement-software-market">exceeds $8 billion</a> and grows at roughly <a href="https://www.precedenceresearch.com/procurement-software-market">10% annually</a>. But the real cost sits in the labor — the armies of people, agencies, and ad hoc workarounds required to actually run procurement operations at scale. Most enterprises meaningfully engage with only their top 20% of suppliers. The remaining 80% — the vendor outreach, order tracking, invoice reconciliation, and compliance monitoring — goes largely unmanaged.</p><p>Research from <a href="https://info.worldcc.com/closing-the-procurement-value-gap">World Commerce &amp; Contracting and Ironclad</a> finds that organizations lose an average of 11% of total contract value after agreements are signed, a phenomenon described as &quot;post-signature value leakage.&quot; As Tim Cummins, President of WorldCC, put it: &quot;The research shows that the 11% value gap is not caused by poor negotiation, but by how contracts are managed after signature.&quot; For a large enterprise with $500 million in annual contracted spend, that represents $55 million vanishing each year — not from bad deals, but from the operational void between what gets agreed at the negotiating table and what actually gets executed on the ground. Missed savings, unauthorized changes, and poor renewal planning are responsible for the biggest losses.</p><p>Jara Montes argues that Traza sits precisely in this gap. &quot;The 11% spans commercial, operational, and compliance leakage. We own the operational layer — and that&#x27;s where the most recoverable value sits,&quot; he said. &quot;Supplier tail management that never happens, RFQ processes skipped because someone ran out of bandwidth, invoice discrepancies that slip through unnoticed. That&#x27;s where contracts bleed value after signing, and that&#x27;s exactly what we automate.&quot; The numbers from Traza&#x27;s early deployments, while nascent, are striking: the company claims a 70% reduction in human hours spent on procurement tasks and procurement cycles running three times faster than manual baselines.</p><h2><b>How AI agents crossed the line from procurement copilot to autonomous worker</b></h2><p>To understand what makes Traza&#x27;s approach different, it helps to understand what &quot;AI for procurement&quot; has meant until now. For the past several years, the term largely described dashboards, analytics layers, and recommendation engines that surfaced insights but left every decision and action in a human&#x27;s hands. Products from incumbents like <a href="https://www.sap.com/products/spend-management/ariba-login.html">SAP Ariba</a> and <a href="https://www.coupa.com/">Coupa</a> — as well as newer entrants like <a href="https://ziphq.com/">Zip</a>, <a href="https://www.fairmarkit.com/">Fairmarkit</a>, and <a href="https://www.tonkean.com/">Tonkean</a> — have layered AI capabilities on top of existing systems of record. But the gap between piloting AI and achieving production-scale impact remains stark, with 49 percent of procurement teams running pilots but only 4 percent reaching meaningful deployment.</p><p>Traza&#x27;s bet is that 2026 represents an inflection point. AI agents now possess the multi-step reasoning, tool use, and contextual memory required to execute full procurement workflows autonomously — from vendor discovery through invoice processing. The company frames this not as an upgrade to existing procurement software, but as an entirely new product category. &quot;The incumbents built systems of record. They organize procurement data and they&#x27;ve never executed procurement work — and their AI additions don&#x27;t fundamentally change that,&quot; Jara Montes said. &quot;What they&#x27;re shipping is a recommendation layer on the same underlying architecture. A human still has to act on every suggestion. We replace the operational layer entirely.&quot;</p><p>Industry data supports the thesis that enterprises are hungry for this shift. According to the <a href="https://www.ey.com/content/dam/ey-unified-site/ey-com/en-gl/services/consulting/documents/ey-gl-cpo-survey-2025-outlook-report-02-2025.pdf">2025 Global CPO Survey</a> from EY, 80 percent of global chief procurement officers plan to deploy generative AI in some capacity over the next three years, and 66 percent consider it a high priority over the next 12 months. A <a href="https://www.abiresearch.com/press/ai-adoption-surges-in-supply-chains-as-companies-prioritize-network-intelligence">2025 ABI Research survey</a> found that 76% of supply chain professionals already see autonomous AI agents as ready to handle core tasks like reordering, supplier outreach, and shipment rerouting without human intervention — and early deployments are demonstrably reducing supply chain operational costs by 20 to 35%.</p><h2><b>Inside the workflow: what Traza&#x27;s AI does and where humans still make the call</b></h2><p>In a typical deployment, Traza&#x27;s AI agent takes over the operational labor that currently lives in inboxes, spreadsheets, and manual follow-up chains. In a standard RFQ workflow, the agent identifies suitable suppliers, drafts and sends the request for quotes, monitors supplier responses, follows up automatically when responses lag, parses incoming quotes regardless of their format, and builds a structured comparison table ready for a human decision-maker. The key design principle is deliberate: humans remain in the loop at critical junctures.</p><p>&quot;At critical steps — approving a purchase order, flagging a compliance issue, committing spend above a threshold — a human is always in the loop,&quot; Jara Montes explained. &quot;That&#x27;s not a limitation, it&#x27;s the design. It&#x27;s how you maintain the auditability enterprises require while moving faster than any manual process could. You earn expanded autonomy over time, as trust is built and results compound.&quot;</p><p>When asked about the risk of AI errors — a wrong purchase order or a missed compliance check that could prove costly — Jara Montes was direct: &quot;Anything with meaningful financial or compliance exposure requires human approval before it executes — that&#x27;s non-negotiable and baked into the architecture. Below those thresholds, the agent acts autonomously and logs everything.&quot; He added a point that reveals a subtler product insight: &quot;Most procurement operations today are a black box — nobody has a clear picture of what&#x27;s happening across the supplier tail. We make it legible.&quot; In other words, the transparency the AI agent provides may itself be a product — giving procurement leaders visibility they have never had into the long tail of supplier relationships that most enterprises simply ignore.</p><h2><b>How Traza plugs into legacy enterprise systems without ripping them out</b></h2><p>One of the recurring challenges for any enterprise AI startup is the integration question: How do you plug into the deeply entrenched, often decades-old technology stacks that large manufacturers and construction companies rely on? Traza&#x27;s answer is to sit on top of existing systems rather than replace them. &quot;We connect via API or direct integration into whatever the customer already runs — ERPs, email, supplier portals. We have reach across more than 200 enterprise tools,&quot; Jara Montes said. &quot;We don&#x27;t rip out their system, we sit on top of them.&quot;</p><p>The go-to-market motion mirrors this pragmatism. Instead of attempting a big-bang deployment, Traza runs a two-to-three-month proof of value focused on a single, specific workflow. Integrations are built at the key steps that matter for that particular use case, then expanded as the scope of the engagement grows. &quot;We don&#x27;t try to connect everything upfront — we compound integrations as we expand scope within each account,&quot; Jara Montes said. &quot;And every integration we build compounds across customers too. Each new deployment makes the next one faster.&quot; Throughout the process, the company works side by side with the customer&#x27;s team, managing complexity and helping them transition into a new way of operating. It is a notably high-touch approach for a company selling automation.</p><p>The company is already working with large manufacturers and construction companies and says they are paying, though it declines to name them publicly. &quot;We want to earn the right to grow inside each account, not land a pilot that goes nowhere,&quot; Jara Montes said. &quot;That&#x27;s how you build something that actually sticks in enterprise.&quot;</p><h2><b>Traza bets that vertical depth in physical industry will beat horizontal AI platforms</b></h2><p>Traza enters a market that is rapidly heating up. The leading AI procurement solutions include platforms from <a href="https://www.coupa.com/">Coupa</a>, <a href="https://www.ivalua.com/">Ivalua</a>, <a href="https://www.sap.com/products/spend-management/ariba-login.html">SAP Ariba</a>, <a href="https://ziphq.com/">Zip</a>, <a href="https://www.zycus.com/">Zycus</a>, and <a href="https://www.fairmarkit.com/">Fairmarkit</a>. Keelvar provides autonomous sourcing bots capable of launching RFQs, collecting bids, and recommending optimal awards, while Tonkean offers a no-code orchestration platform using NLP and generative AI to streamline procurement intake and tail-spend management. Against this crowded field, Jara Montes draws a sharp distinction between horizontal automation tools and Traza&#x27;s focus on physical industry.</p><p>&quot;We&#x27;re built specifically for the physical industry, where supplier relationships, compliance requirements, and workflow complexity are categorically different from software procurement,&quot; he said. &quot;A generic agent doesn&#x27;t survive contact with how procurement actually works in manufacturing or construction. Specificity is the moat.&quot; The competitive dynamics with major incumbents are perhaps even more consequential. SAP Ariba, Coupa, and their peers have massive installed bases and deep enterprise relationships. Jara Montes frames their AI initiatives as surface-level additions to legacy architectures — but whether Traza can convert that framing into market share at scale, especially given the gravitational pull of existing vendor relationships, remains the central strategic question.</p><p>Beneath Traza&#x27;s product pitch sits a deeper strategic thesis about compounding data advantages. The company describes a two-layered learning architecture: at the agent level, Traza gets smarter across every deployment by absorbing supplier behavior patterns, RFQ response dynamics, pricing anomalies, and workflow edge cases. At the data level, each customer&#x27;s information stays fully isolated. &quot;What we&#x27;re building is deep operational knowledge of how procurement actually runs in the physical industry — not how it&#x27;s supposed to run according to an RFP, but how it really runs, with all the exceptions and workarounds,&quot; Jara Montes said. &quot;That&#x27;s extraordinarily hard to replicate if you&#x27;re starting from scratch, and it gets harder to catch up with the more deployments we have.&quot;</p><h2><b>Three Spanish founders, one fellowship, and a plan to rewire industrial procurement</b></h2><p><a href="https://traza.ai/">Traza</a> was co-founded by three Spanish entrepreneurs — Silvestre Jara Montes, Santiago Martínez Bragado, and Sergio Ayala Miñano — who came to the United States through the <a href="https://www.goexponential.org/">Exponential Fellowship</a>, a program that brings Europe&#x27;s top technical talent to the U.S. to build companies at the frontier of AI. Their backgrounds span both sides of the problem Traza is trying to solve. Jara Montes worked at Amazon and CMA CGM — one of the world&#x27;s largest shipping groups — at the intersection of operations strategy and supply chain optimization. Martínez Bragado built and deployed agentic AI at Clarity AI before joining Concourse (backed by a16z, Y Combinator, and CRV) as Founding AI Engineer. Ayala Miñano comes from StackAI, one of the fastest-growing enterprise AI platforms in San Francisco, where he was a Founding Engineer.</p><p>None of the founders carry the title of Chief Procurement Officer, a gap that the company acknowledges has occasionally surfaced in buyer conversations. Jara Montes&#x27;s response is characteristically direct: &quot;Our work is the answer. The results we&#x27;re generating move that conversation quickly.&quot; He noted that the company has senior procurement leaders serving as advisors who have run procurement at the scale of its target customers.</p><p>Base10 Partners, the lead investor, is a San Francisco-based venture capital firm that invests in companies automating sectors of what it calls &quot;<a href="https://base10.vc/">the Real Economy</a>.&quot; Its portfolio includes Notion, Figma, Nubank, Stripe, and Aurora Solar. Rexhi Dollaku, General Partner at Base10, framed the investment in emphatic terms: &quot;Supply chain and procurement is one of the largest, most underautomated markets in the Real Economy. AI agents are finally capable of doing the work, not just assisting with it.&quot; The supporting cast of investors reinforces the immigrant-founder narrative. Clara Ventures — founded by the executives behind Olapic&#x27;s $130 million exit — specifically invests in driven foreign founders building in the United States, and Agell adds operational credibility from building Chartboost into a $100 million revenue business in under three years as a Spanish founder in Silicon Valley.</p><h2><b>Why $2.1 million may stretch further than it looks for an enterprise AI startup</b></h2><p>At $2.1 million, this is a deliberately small round for a company selling to large enterprises with notoriously long procurement cycles. Jara Montes argues it goes further than it appears for structural reasons. &quot;We leverage Europe as a tech talent hub, where we have a deep network of exceptional engineers — people who want to work at the frontier of AI but have far fewer opportunities to do so than their US counterparts,&quot; he said. &quot;We&#x27;re not just lean — we&#x27;re built to outcompete on capital efficiency while others are burning through runway trying to hire in San Francisco.&quot;</p><p>The go-to-market motion is designed for speed to revenue. Proofs of value are scoped, time-bounded, and converted to paying partnerships. The company says it is not running 18-month enterprise sales cycles before seeing a dollar. The milestone for the next raise is explicit: more paying customers, meaningfully stronger annual recurring revenue, and a repeatable sales motion that makes the seed round, as Jara Montes put it, &quot;an obvious conversation.&quot;</p><p>Looking ahead, he outlined an ambitious three-year target: 20 to 30 large industrial enterprises in the U.S. and Europe running Traza across their procurement operations, with over a billion dollars in procurement spend flowing through the platform. Whether that vision is achievable depends on several interlocking variables — the pace at which AI agent capabilities continue to improve, the speed of enterprise adoption in a traditionally conservative buyer segment, and Traza&#x27;s ability to navigate the competitive gauntlet of incumbents adding AI features and well-funded startups attacking adjacent workflows.</p><p>But the underlying math may be on Traza&#x27;s side. In procurement, the money that disappears does not look like waste. It vanishes into inefficiency, missed obligations, unmanaged risks, and forgotten commitments — the kind of silent losses that no one tracks because no one has the bandwidth to track them. The traditional mandate of procurement, as currently configured, ends where the value gap begins: at signature. Traza is building an AI workforce that picks up where the humans leave off. For an industry that has spent decades losing $55 million at a time to the back office nobody watches, that might be precisely the point.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Orchestration</category>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/1BlLS9asVgj2WDV5j09DTX/7aa5369b05fb9ad458c16a4a1e481ba8/nuneybits_Vector_art_of_a_stack_of_invoices_8a53c753-9472-420d-8a66-873e33846084.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Anthropic’s Claude Managed Agents gives enterprises a new one-stop shop but raises vendor 'lock-in' risk]]></title>
            <link>https://venturebeat.com/orchestration/anthropics-claude-managed-agents-gives-enterprises-a-new-one-stop-shop-but</link>
            <guid isPermaLink="false">1rKR8SHOqgGCJMpZeX4VlH</guid>
            <pubDate>Tue, 14 Apr 2026 16:57:09 GMT</pubDate>
            <description><![CDATA[<p>Anthropic announced a new platform last week, <a href="https://claude.com/blog/claude-managed-agents">Claude Managed Agents</a>, aiming to cut out the more complex parts of AI agent deployment for enterprises and competes with existing orchestration frameworks.</p><p>Claude Managed Agents is also an architectural shift: enterprises, already burdened with orchestrating an increasing number of agents, can now choose to embed the orchestration logic <i>in the AI model layer</i>. </p><p>While this comes with some potential advantages, such as speed (Anthropic proposes its customers can deploy agents in days instead of weeks or months), it also, of course, then also turns more control over the enterprise&#x27;s AI agent deployments and operations to the model provider — in this case, Anthropic — potentially resulting in greater &quot;lock in&quot; for the enterprise customer, leaving them more subject to Anthropic&#x27;s terms, conditions, and any subsequent platform changes.</p><p>But maybe that is worth it for your enterprise, as Anthropic further claims that its platform “handles the complexity” by letting users define agent tasks, tools and guardrails with a built-in orchestration harness, all without the need for sandboxing code execution, checkpointing, credential management, scoped permissions and end-to-end tracing. </p><p>The framework manages state, execution graphs and routing and brings managed agents to a vendor-controlled runtime loop.</p><p>Even before the release of Claude Managed Agents, new directional VentureBeat research showed that Anthropic was gaining traction at the orchestration level as enterprises adopted its native tooling. Claude Managed Agents represents a new attempt by the firm to widen its footprint as the orchestration method of choice for organizations. </p><h2><b>Anthropic is surging in orchestration interest</b></h2><p>Orchestration has emerged as an important segment for enterprises to address as they scale AI systems and deploy agentic workflows. </p><p>VentureBeat directional research of several dozen firms for the first quarter of 2026 found that enterprises mostly chose existing frameworks, such as Microsoft’s Copilot Studio/Azure AI Studio, with 38.6% of respondents in February reporting using Microsoft’s platform. VentureBeat surveyed 56 organizations with more than 100 employees in January and 70 in February.</p><p>OpenAI closely followed at 25.7%. Both showed strong growth between the first two months of the year.</p><p>Anthropic, driven by increased interest in its offerings, such as Claude Code, over the past year, is putting up a fight. </p><p>Adoption of the Anthropic tool-use and workflows API increased from 0% to 5.7% between January and February. This tracks closely with the growing adoption of Anthropic’s foundation models, showing that enterprises using Claude turn to the company’s native orchestration tooling instead of adding a third-party framework. </p><p>While VentureBeat surveyed before the launch of Claude Managed Agents, we can extrapolate that the new tool will build on that growth, especially if it promises a more straightforward way to deploy agents.</p><h2><b>Collapsing the external orchestration layer</b></h2><p>Enterprises may find that a streamlined, internal harness for agents compelling, but it does mean giving up certain controls. </p><p>Session data is stored in a database managed by Anthropic, increasing the risk that enterprises become locked into a system run by a single company. This may be less desirable for some firms and compete with their desires to move away from the locked-in software-as-a-service (SaaS) applications in the current stacks, which many hope that AI will facilitate. </p><p>The specter of vendor lock-in means agent execution becomes more model-driven rather than direct by the organization, happens in an environment enterprises don’t fully control, and behavior becomes harder to guarantee. </p><p>It also opens the possibility of giving agents conflicting instructions, especially if the only way for users to exert any control over agents is to prompt them with more context. </p><p>Agents could have two control planes: one defined by the enterprises’ orchestration system through instructions and the other as an embedded skill from the Claude runtime.</p><p>This could pose an issue for highly sensitive and regulated workflows, such as financial analysis or customer-facing tasks. </p><h2><b>Pricing, control and competitive set</b></h2><p>Balancing control with ease is one thing; enterprises also consider the cost structure of Claude Managed Agents. </p><p>Claude Managed Agents introduces a <a href="https://platform.claude.com/docs/en/about-claude/pricing">hybrid pricing model</a> that blends token-based billing with a usage-based runtime fee. </p><p>This makes Managed Agets more dynamic, though less predictable, when determining cost structures. Enterprises will be charged a standard rate of $0.08 per hour when agents are actively running. </p><p>For example, at $0.70 per hour, a one-hour session could cost up to $37 to process 10,000 support tickets, depending on how long each agent runs and how many steps it takes to complete a task. </p><p>Microsoft, currently the leader according to VentureBeat&#x27;s directional survey, offers several orchestration offerings. <a href="https://www.microsoft.com/en-us/microsoft-365-copilot/pricing/copilot-studio">Copilot Studio</a> uses a capacity-based billing structure, so enterprises pay for blocks of interactions between users and agents rather than the number of steps an agent takes. </p><p>Microsoft&#x27;s approach tends to be more predictable than Anthropic&#x27;s pricing plan: Copilot Studio starts at $200 per month for 25,000 messages.</p><p>Compared to similar competitors like OpenAI&#x27;s Agents SDK, the picture becomes murky. Agents SDK is technically free to use as an open-source project. However, OpenAI bills for the <a href="https://openai.com/api/pricing/">underlying API usage</a>. Agents built and orchestration with Agents SDK using GPT-5.4, for example, will cost $2.50 per 1 million input tokens and $15 per 1 million output tokens. </p><h2><b>The enterprise decision</b></h2><p>Claude Managed Agents does give enterprises who find the actual deployment of production agents too complicated a reprieve. It reduces their engineering overhead while adding speed and simplicity in a fast-changing enterprise environment. </p><p>But that comes with a choice: lose control, observability and portability and risk further vendor lock-in.</p><p>Anthropic just made a case for why its ecosystem is becoming not just the foundation model of choice for enterprises, but also the orchestration infrastructure. It becomes more imperative for enterprises to balance ease with lesser control. </p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/1fKuGA9n3RlokiEFHb1eSO/f1e85403250c2347f81bf28a127a6a50/crimedy7_illustration_of_ai_vendor_lock-in_abstrack_--ar_169__b11f785c-c6f8-40fd-aa62-9dd0fa6adc03_0.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Google leaders including Demis Hassabis push back on claim of uneven AI adoption internally]]></title>
            <link>https://venturebeat.com/orchestration/google-leaders-including-demis-hassabis-push-back-on-claim-of-uneven-ai-adoption-internally</link>
            <guid isPermaLink="false">40CeV4BaWgQceBokejlemq</guid>
            <pubDate>Tue, 14 Apr 2026 16:51:00 GMT</pubDate>
            <description><![CDATA[<p>A <a href="https://x.com/Steve_Yegge/status/2043747998740689171">viral post on X</a> from veteran programmer and former Google engineer <a href="https://x.com/Steve_Yegge">Steve Yegge</a> set off a rhetorical firestorm this week, drawing sharp public rebuttals from some of Google’s most prominent AI leaders and reopening a sensitive question for the company: how deeply are its own engineers really using the latest generation of AI coding tools? </p><p>The debate began after Yegge summarized what he said was the view of his friend, a current and longtime Google employee (or Googler), who claimed the Gemini AI-firm&#x27;s internal AI adoption looks much more ordinary and less cutting-edge than outsiders might expect.</p><p>Yegge said Googler friend claimed Google engineering mirrors an “average” industry pattern of a 20%-60%-20% split: a small group of outright AI refusers (20%) a much larger middle still relying mainly on simpler chat and coding-assistant workflows (60%), and another small group of AI-first, cutting-edge engineers using agentic tools extensively and mastering them (20%).</p><div></div><p>A <a href="https://grok.com/share/bGVnYWN5LWNvcHk_6a0982f6-34cc-43fd-b185-02bd1410eb5a">VentureBeat search of X</a> using its parent company’s AI assistant Grok found that Yegge’s April 13 post spread quickly, topping 4,500 likes, 205 quote posts, 458 replies and 1.9 million views as of April 14. </p><p>We&#x27;ve reached out to Google for comment on the claims and will update when we receive a response. </p><h2><b>A veteran, oustpoken Googler voice</b></h2><p>Why did the opinion of Yegge&#x27;s unnamed Googler friend land so hard? In part because Yegge is not just another commentator taking shots from the sidelines. </p><p>He spent about 13 years at Google after earlier stints at Amazon and GeoWorks, later joined Grab, and then became head of engineering at Sourcegraph in 2022. He has long been known in software circles for widely read essays on programming and engineering culture, and for<a href="https://courses.cs.washington.edu/courses/cse452/23wi/papers/yegge-platform-rant.html"> an earlier internal Google memo that accidentally became public in 2011</a> and drew broad media attention. </p><p>That history helps explain why engineers and executives still take his critiques seriously, even when they reject them.</p><p>Yegge has built a reputation over many years as a blunt insider-outsider voice on software culture, someone with enough standing in the industry that his judgments can travel fast, especially when they touch nerves inside big technology companies. </p><p><a href="https://en.wikipedia.org/wiki/Steve_Yegge">Wikipedia’s summary of his career </a>notes his long Google tenure and the outsized attention his blog posts and prior Google critiques have received. </p><h2><b>Unpacking Yegge&#x27;s friend&#x27;s argument</b></h2><p>In this case, Yegge’s argument was not simply that Google uses too little AI. It was that the company’s adoption may be uneven, culturally constrained and less transformed than its branding implies. </p><p>His friend supposedly argued that some Googlers could not use Anthropic’s Claude Code because it was framed as “the enemy,” and that Gemini was not yet sufficient for the fullest agentic coding workflows. He contrasted Google with what he described as a smaller set of companies moving much faster. </p><h2><b>Pushback from Hassabis and current Googlers</b></h2><p>The first major pushback came from Demis Hassabis, the co-founder and CEO of Google DeepMind, who <a href="https://x.com/demishassabis/status/2043867486320222333">replied directly and forcefully</a>. “Maybe tell your buddy to do some actual work and to stop spreading absolute nonsense. This post is completely false and just pure clickbait,” Hassabis wrote.</p><div></div><p>Other Google leaders followed with lengthier defenses. </p><p><a href="https://x.com/addyosmani/status/2043812343508021460">Addy Osmani</a>, a director at Google Cloud AI, wrote that Yegge’s account “doesn’t match the state of agentic coding at our company.” He added, “Over 40K SWEs use agentic coding weekly here.” </p><p>Osmani said Googlers have access to internal tools and systems including “custom models, skills, CLIs and MCPs,” and pushed back on the idea that Google employees are sealed off from outside models, writing that “folks can even use @AnthropicAI’s models on Vertex” and concluding that “Google is anything but average.” </p><p>Other current Google employees reinforced that message. <a href="https://x.com/rakyll/status/2043859775985988053">Jaana Dogan</a>, a software engineer at Google, wrote in a quote tweet: “Everyone I work with uses @antigravity like every second of the day,” later following up with another <a href="https://x.com/rakyll/status/2044061875902824640">X post stating:</a> &quot;Unpopular opinion: If you think tokens burned is a productivity metric, no one should take you seriously. Imagine you are a top 0.0001% writer and they are only counting the tokens you produce.&quot;</p><p><a href="https://x.com/DynamicWebPaige/status/2043891932544544841">Paige Bailey</a>, a DevX engineering lead at Google DeepMind, said teams had agents “running 24/7.” </p><p>Several other Google and DeepMind figures also challenged Yegge’s characterization, some disputing the factual basis of his claims and others suggesting he lacked visibility into current internal usage. </p><h2><b>Yegge&#x27;s rebuttal</b></h2><p>Yegge, for his part, did not retreat. In a <a href="https://x.com/Steve_Yegge/status/2043925814996279601">follow-up to Hassabis</a>, he wrote, “I’m not trying to misrepresent anyone,” but argued that by his own standard for advanced AI adoption, Google still does not appear to be doing especially well.</p><p>He pointed to token usage and the replacement of older development habits with truly agentic workflows as the more meaningful benchmark, and said he would be willing to retract his criticism if Google could show its engineers were operating at that level. </p><h2><b>AI adoption vs. AI transformation</b></h2><p>That leaves the core dispute unresolved, but clearer. This is less a fight over whether Google engineers use AI at all than a fight over what should count as meaningful adoption. </p><p>Googlers are pointing to scale, weekly usage and the availability of internal and external tools. Yegge is arguing that those measures may capture broad exposure without proving a deeper change, an AI transformation, in how engineering work gets done. The clash reflects a wider industry split between visible usage metrics and more transformative, power-user behavior. </p><p>For Google, the subject is especially sensitive. Yegge has criticized the company before, including in a<a href="https://steve-yegge.medium.com/why-i-left-google-to-join-grab-86dfffc0be84"> 2018 essay explaining why he left,</a> where he argued Google had become too risk-averse and had lost much of its ability to innovate. </p><p>If his latest critique had come from a lesser-known poster, it might have faded. Coming from a former longtime Google engineer with a record of memorable public criticism, it instead drew direct responses from some of the company’s top AI figures — and turned a single post into a broader public argument about whether Google’s AI leadership is as deep internally as it looks from the outside. </p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/8xVMYUcgko6r0u1PgFA16/553af163b930a1986bd6aab9d8758bbe/Gemini_Generated_Image_cwb2kxcwb2kxcwb2.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Microsoft launches MAI-Image-2-Efficient, a cheaper and faster AI image model]]></title>
            <link>https://venturebeat.com/technology/microsoft-launches-mai-image-2-efficient-a-cheaper-and-faster-ai-image-model</link>
            <guid isPermaLink="false">6H1gSCd9gBQtvkV6UEVUWA</guid>
            <pubDate>Tue, 14 Apr 2026 16:00:00 GMT</pubDate>
            <description><![CDATA[<p><a href="https://www.microsoft.com/en-us">Microsoft</a> today launched <a href="https://microsoft.ai/news/mai-image-2e-flagship-quality-41-lower-cost/">MAI-Image-2-Efficient</a>, a lower-cost, higher-speed variant of its flagship text-to-image model that the company says delivers production-ready quality at nearly half the price. The release, available immediately in <a href="https://azure.microsoft.com/en-us/products/ai-foundry">Microsoft Foundry</a> and <a href="https://playground.microsoft.ai/chat">MAI Playground</a> with no waitlist, marks the fastest turnaround yet from Microsoft&#x27;s in-house AI superintelligence team — and the clearest signal that Redmond is serious about building a self-sufficient AI stack that doesn&#x27;t depend on OpenAI.</p><p>The new model is priced at $5 per million text input tokens and $19.50 per million image output tokens, a <a href="https://microsoft.ai/news/mai-image-2e-flagship-quality-41-lower-cost/">roughly 41% reduction</a> from MAI-Image-2&#x27;s pricing of $5 and $33, respectively, for those same tiers. Microsoft says the model runs 22% faster than its flagship sibling and achieves 4x greater throughput efficiency per GPU, as measured on NVIDIA H100 hardware at 1024×1024 resolution. The company also claims it outpaces competing hyperscaler models — specifically naming Google&#x27;s <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live/">Gemini 3.1 Flash</a>, <a href="https://deepmind.google/models/gemini-image/flash/">Gemini 3.1 Flash Image</a>, and <a href="https://deepmind.google/models/gemini-image/pro/">Gemini 3 Pro Image</a> — by an average of 40% on p50 latency benchmarks.</p><p>The model is also rolling out across <a href="https://copilot.microsoft.com/">Copilot</a> and <a href="https://www.bing.com/">Bing</a>, Microsoft said, with additional product surfaces to follow.</p><h2><b>Microsoft&#x27;s two-model strategy borrows a page from the AI pricing playbook</b></h2><p>Microsoft is positioning <a href="https://microsoft.ai/news/mai-image-2e-flagship-quality-41-lower-cost/">MAI-Image-2-Efficient</a> and its flagship <a href="https://playground.microsoft.ai/chat">MAI-Image-2</a> as complementary tools rather than replacements for each other — a tiered pairing designed to cover the full spectrum of enterprise image generation needs.</p><p><a href="https://microsoft.ai/news/mai-image-2e-flagship-quality-41-lower-cost/">MAI-Image-2-Efficient</a> targets high-volume, cost-sensitive production workloads: product photography, marketing creative, UI mockups, branded asset pipelines, and real-time interactive applications. It handles short-form in-image text like headlines and labels cleanly, according to Microsoft, and is built to operate within the tight latency and budget constraints of batch processing environments. <a href="https://playground.microsoft.ai/chat">MAI-Image-2</a>, meanwhile, remains the company&#x27;s precision instrument — the model you reach for when the brief demands the highest photorealistic fidelity, complex stylization like anime or illustration, or longer, more intricate in-image typography. Microsoft is effectively telling enterprise customers: use the efficient model for your assembly line, and the flagship for your showcase.</p><p>This approach mirrors pricing strategies that have worked across the AI industry — OpenAI&#x27;s <a href="https://developers.openai.com/api/docs/models">GPT model tiers</a>, Anthropic&#x27;s <a href="https://platform.claude.com/docs/en/about-claude/models/overview">Haiku-Sonnet-Opus lineup</a>, Google&#x27;s<a href="https://developers.googleblog.com/en/gemini-2-family-expands/"> Flash-Pro distinction</a> — but applies it specifically to image generation, a domain where cost-per-image economics can make or break production deployment at scale.</p><h2><b>How Microsoft shipped a production-optimized image model in under a month</b></h2><p>The speed of this release deserves attention. MAI-Image-2 itself only debuted on MAI Playground on March 19, <a href="https://venturebeat.com/technology/microsoft-launches-3-new-ai-models-in-direct-shot-at-openai-and-google">as VentureBeat previously reported</a>, with broader availability through <a href="https://azure.microsoft.com/en-us/products/ai-foundry">Microsoft Foundry</a> arriving on April 2 alongside two other new foundation models: <a href="https://microsoft.ai/pdf/MAI-Transcribe-1-Model-Card.pdf">MAI-Transcribe-1</a> (a speech-to-text model supporting 25 languages) and <a href="https://ai.azure.com/catalog/models/MAI-Voice-1">MAI-Voice-1</a> (an audio generation model). Less than a month later, Microsoft has shipped an optimized production variant.</p><p>That cadence suggests the <a href="https://microsoft.ai/">MAI Superintelligence team</a> — the research group led by Mustafa Suleyman, CEO of Microsoft AI, that was formed in November 2025 — is operating more like a startup shipping iterative products than a traditional corporate research lab publishing papers. When Suleyman wrote in his April 2 blog post that the team was &quot;<a href="https://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/">building Humanist AI</a>&quot; with a focus on &quot;optimizing for how people actually communicate, training for practical use,” he appears to have meant it literally: the models aren&#x27;t just shipping, they&#x27;re shipping fast enough to have product roadmaps.</p><p>The early reception for <a href="https://microsoft.ai/news/introducing-mai-image-2/">MAI-Image-2</a> has been notably positive. Decrypt reported in its <a href="https://decrypt.co/361791/microsoft-mai-image-2-text-image-model-review">hands-on review</a> that the model had already reached the No. 3 position on the <a href="http://arena.ai">Arena.ai leaderboard</a> for image generation, trailing only Google and OpenAI. Decrypt&#x27;s reviewer noted that the model&#x27;s photorealism was &quot;a real strength&quot; and that its text rendering was &quot;a legitimate highlight&quot; that &quot;handled complex typography with far more consistency than we expected.&quot; The review also found that in some direct comparisons, <a href="https://microsoft.ai/news/introducing-mai-image-2/">MAI-Image-2</a> outperformed OpenAI&#x27;s GPT-Image on image quality and text rendering despite sitting below it on the leaderboard — an observation that underscores how benchmark rankings don&#x27;t always capture real-world utility.</p><p>That said, the original model shipped with significant constraints that Decrypt flagged: a 30-second cooldown between generations, a 15-image daily cap in the native UI, only 1:1 aspect ratio output, no image-to-image capabilities, and aggressive content filtering that blocked even innocuous creative prompts. Whether <a href="https://microsoft.ai/news/mai-image-2e-flagship-quality-41-lower-cost/">MAI-Image-2-Efficient</a> inherits or relaxes any of these limitations isn&#x27;t addressed in today&#x27;s announcement, and enterprise customers accessing the model through the Foundry API will likely face different constraints than playground users.</p><h2><b>Inside the fraying Microsoft-OpenAI relationship that made in-house models inevitable</b></h2><p>Today&#x27;s launch cannot be understood in isolation. It arrives at a moment when the relationship between <a href="https://microsoft.com/">Microsoft</a> and <a href="https://openai.com/">OpenAI</a> — once the defining partnership of the generative AI era — is visibly fraying at the seams.</p><p>Just yesterday, CNBC reported that OpenAI&#x27;s newly appointed chief revenue officer, Denise Dresser, sent an <a href="https://www.cnbc.com/2026/04/13/openai-touts-amazon-alliance-in-memo-microsoft-limited-our-ability.html">internal memo to staff</a> explicitly stating that the Microsoft partnership &quot;has also limited our ability to meet enterprises where they are.&quot; The memo reportedly touted OpenAI&#x27;s new alliance with Amazon Web Services and the Bedrock platform as a key growth driver, describing inbound customer demand as &quot;frankly staggering&quot; since the partnership was announced in late February. <a href="https://www.cnbc.com/2024/07/31/microsoft-says-openai-is-now-a-competitor-in-ai-and-search.html">Microsoft added OpenAI to its list of competitors</a> in its annual report in mid-2024. OpenAI, meanwhile, has diversified its cloud infrastructure across <a href="https://coreweave.com/">CoreWeave</a>, <a href="https://www.google.com/">Google</a>, and <a href="https://www.oracle.com/">Oracle</a>, reducing its dependence on Microsoft Azure.</p><p>The <a href="https://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/">MAI model family</a> is the most tangible expression of Microsoft&#x27;s side of that strategic uncoupling. When Microsoft can generate production-quality images with its own model at $19.50 per million output tokens, the calculus for continuing to license OpenAI&#x27;s image models — and paying OpenAI a share of the resulting revenue — shifts dramatically. Every MAI model that reaches production quality is a line item that Microsoft can potentially move off OpenAI&#x27;s balance sheet and onto its own.</p><p>The organizational infrastructure to support this shift is already in place. On March 17, as disclosed in communications posted on <a href="https://blogs.microsoft.com/blog/2026/03/17/announcing-copilot-leadership-update/">Microsoft&#x27;s official blog</a>, CEO Satya Nadella announced a sweeping reorganization that unified the company&#x27;s consumer and commercial Copilot efforts under a single leadership team, with Jacob Andreou elevated to EVP of Copilot reporting directly to Nadella. Critically, the reorganization also refocused Suleyman&#x27;s role. As Nadella wrote in his message to employees, the company is &quot;doubling down on our superintelligence mission with the talent and compute to build models that have real product impact, in terms of evals, COGS reduction, as well as advancing the frontier.&quot; That phrase — &quot;COGS reduction&quot; — is corporate-speak for reducing the cost of goods sold, and it points directly to the economic motivation behind models like MAI-Image-2-Efficient. Every dollar Microsoft saves by using its own models instead of licensing from partners flows straight to gross margin.</p><h2><b>Why cheap, fast image generation is the secret ingredient for Microsoft&#x27;s agentic AI future</b></h2><p>There&#x27;s one more dimension that makes today&#x27;s release strategically significant, and it may be the most important one: the rise of AI agents.</p><p><a href="https://techcrunch.com/2026/04/13/microsoft-is-working-on-yet-another-openclaw-like-agent/">TechCrunch reported</a> yesterday that Microsoft is testing ways to integrate OpenClaw-like features into Microsoft 365 Copilot, building toward an always-on agent that can execute multi-step tasks over extended periods. The company has also launched Copilot Cowork (an agent that takes actions within Microsoft 365 apps), Copilot Tasks (an agent for completing multi-step personal productivity tasks), and Agent 365 (referenced in Nadella&#x27;s March reorganization memo). Microsoft is expected to showcase these agentic capabilities at its Build conference in June.</p><p>In an agentic world — where AI systems don&#x27;t just answer questions but execute complex workflows autonomously — image generation becomes a primitive that agents call programmatically, not a standalone product that users interact with manually. An enterprise agent building a marketing campaign might need to generate dozens of product images, create social media assets, produce presentation graphics, and iterate on design concepts, all without human intervention at each step. The economics of that workflow are governed entirely by per-token pricing and latency, which is precisely what MAI-Image-2-Efficient optimizes for. If Microsoft&#x27;s vision for Copilot involves agents that generate images as a routine subtask within larger workflows, those agents need image generation that&#x27;s fast enough to not create bottlenecks and cheap enough to not blow up cost projections when called thousands of times per day. The 4x efficiency improvement and 41% price cut aren&#x27;t just nice marketing numbers — they&#x27;re architectural requirements for the agentic future Microsoft is betting the company on.</p><h2><b>What Microsoft still hasn&#x27;t answered about its new image model</b></h2><p>Several important questions remain unaddressed by today&#x27;s announcement. Microsoft didn&#x27;t disclose whether <a href="https://microsoft.ai/news/mai-image-2e-flagship-quality-41-lower-cost/">MAI-Image-2-Efficient</a> resolves the aspect ratio limitations and aggressive content filtering that reviewers flagged in the original model. The company also didn&#x27;t specify whether the quality-to-speed tradeoffs involve visible degradation on complex prompts — the announcement describes &quot;production-ready quality&quot; and &quot;flagship quality&quot; interchangeably, but distillation models of any kind typically involve some quality concession.</p><p>The footnotes in the press release also reveal the narrow conditions under which the benchmark claims were tested: efficiency figures were measured on NVIDIA H100 at 1024×1024 with &quot;optimized batch sizes and matched latency targets,&quot; and the latency comparisons against Google models were conducted at p50 (median) rather than p95 or p99, which would capture worst-case performance. Enterprise customers running diverse workloads at varying concurrency levels may see different results. MAI Playground is currently available only in select markets, including the U.S., with EU availability listed as &quot;coming soon.&quot; Copilot integration is underway but not complete. And the enterprise API through Foundry, while live, is still in early deployment.</p><p>But the trajectory is unmistakable. In less than five months since the <a href="https://microsoft.ai/">MAI Superintelligence team</a> was announced, Microsoft has shipped a <a href="https://microsoft.ai/news/introducing-mai-image-1-debuting-in-the-top-10-on-lmarena/">flagship image model</a>, <a href="https://venturebeat.com/technology/microsoft-launches-3-new-ai-models-in-direct-shot-at-openai-and-google">three additional foundation models</a>, and now a <a href="https://microsoft.ai/news/mai-image-2e-flagship-quality-41-lower-cost/">cost-optimized production variant </a>— all while reorganizing its entire Copilot organization, navigating a fracturing relationship with its most important AI partner, and laying the groundwork for agentic AI features that could redefine enterprise productivity. Whether all of that is fast enough to catch Anthropic&#x27;s momentum, contain OpenAI&#x27;s drift toward Amazon, and justify a $600 price target is the multi-hundred-billion-dollar question. But for a company that spent the first two years of the generative AI era mostly reselling someone else&#x27;s technology, Microsoft is now doing something it hasn&#x27;t done in a long time in AI: shipping its own work, on its own schedule, at its own price — and daring the market to keep up.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/17szvygLbh5CP67yGHGlNG/7a78dac28a40f270d5d5ef636980f606/nuneybits_Vector_art_of_the_iconic_Microsoft_Windows_logo_on_a__d3fc862c-d081-4a53-86a0-8b31f591dd93.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Databricks tested a stronger model against its multi-step agent on hybrid queries. The stronger model still lost by 21%.]]></title>
            <link>https://venturebeat.com/data/databricks-research-shows-multi-step-agents-consistently-outperform-single</link>
            <guid isPermaLink="false">3VeoTXwhwzC3OokZBsIQ82</guid>
            <pubDate>Tue, 14 Apr 2026 15:00:00 GMT</pubDate>
            <description><![CDATA[<p>Data teams building AI agents keep running into the same failure mode. Questions that require joining structured data with unstructured content, sales figures alongside customer reviews or citation counts alongside academic papers, break single-turn RAG systems. </p><p>New research from Databricks puts a number on that failure gap. The company&#x27;s AI research team tested a multi-step agentic approach against state-of-the-art single-turn RAG baselines across nine enterprise knowledge tasks and reported gains of 20% or more on Stanford&#x27;s STaRK benchmark suite, along with consistent improvement across Databricks&#x27; own <a href="https://venturebeat.com/data/databricks-built-a-rag-agent-it-says-can-handle-every-kind-of-enterprise">KARLBench evaluation framework</a>, according to the research. Databricks argues the performance gap between single-turn RAG and multi-step agents on hybrid data tasks is an architectural problem, not a model quality problem.</p><p>The work builds on Databricks&#x27; earlier <a href="https://venturebeat.com/data/databricks-instructed-retriever-beats-traditional-rag-data-retrieval-by-70">instructed retriever</a> research, which showed retrieval improvements on unstructured data using metadata-aware queries. This latest research adds structured data sources, relational tables and SQL warehouses, into the same reasoning loop, addressing the class of questions enterprises most commonly fail to answer with current agent architectures.</p><p>&quot;RAG works, but it doesn&#x27;t scale,&quot; Michael Bendersky, research director at Databricks, told VentureBeat. &quot;If you want to make your agent even better, and you want to understand why you have declining sales, now you have to help the agent see the tables and look at the sales data. Your RAG pipeline will become incompetent at that task.&quot;</p><h2>Single-turn retrieval cannot encode structural constraints</h2><p>The core finding is that standard RAG systems fail when a query mixes a precise structured filter with an open-ended semantic search. </p><p>Consider a question like &quot;Which of our products have had declining sales over the past three months, and what potentially related issues are brought up in customer reviews on various seller sites?&quot; The sales data lives in a warehouse. The review sentiment lives in unstructured documents across seller sites. A single-turn RAG system cannot split that query, route each half to the right data source and combine the results.</p><p>To confirm this is an architecture problem rather than a model quality problem, Databricks reran published STaRK baselines using a current state-of-the-art foundation model. The stronger model still lost to the multi-step agent by 21% on the academic domain and 38% on the biomedical domain, according to the research. </p><p>STaRK is a benchmark published by Stanford researchers covering three semi-structured retrieval domains: Amazon product data, the Microsoft Academic Graph and a biomedical knowledge base. </p><h2>How the Supervisor Agent handles what RAG cannot</h2><p>Databricks built the Supervisor Agent as the production implementation of this research approach, and its architecture illustrates why the gains are consistent across task types. The approach includes three core steps:</p><p><b>Parallel tool decomposition</b>. Rather than issuing one broad query and hoping the results cover both structured and unstructured needs, the agent fires SQL and vector search calls simultaneously, then analyzes the combined results before deciding what to do next. That parallel step is what allows it to handle queries that cross data type boundaries without requiring the data to be normalized first.</p><p><b>Self-correction. </b>When an initial retrieval attempt hits a dead end, the agent detects the failure, reformulates the query and tries a different path. On a STaRK benchmark task that requires finding a paper by an author with exactly 115 prior publications on a specific topic, the agent first queries both SQL and vector search in parallel. When the two result sets show no overlap, it adapts and issues a SQL JOIN across both constraints, then calls the vector search system to verify the result before returning the answer.</p><p><b>Declarative configuration.</b>  The agent is not tuned to any specific dataset or task. Connecting it to a new data source means writing a plain-language description of what that source contains and what kinds of questions it should answer. No custom code is required.</p><p>&quot;The agent can do things like decomposing the question into a SQL query and a search query out of the box,&quot; Bendersky said. &quot;It can combine the results of SQL and RAG, reason about those results, make follow-up queries and then reason about whether the final answer was actually found.&quot;</p><h2>It&#x27;s not just about hybrid retrieval</h2><p>The distinction Databricks draws isn&#x27;t about retrieval technique, it&#x27;s about architecture.</p><p>&quot;We almost don&#x27;t see it as a hybrid retrieval where you combine embeddings and search results, or embeddings and tables,&quot; he said. &quot;We see this more as an agent that has access to multiple tools.&quot;</p><p>The practical consequence of that framing is that adding a new data source means connecting it to the agent and writing a description of what it contains. The agent handles routing and orchestration without additional code. </p><p>Custom RAG pipelines require data to be converted into a format the retrieval system can read, typically text chunks with embeddings. SQL tables have to be flattened, JSON has to be normalized. Every new data source added to the pipeline means more conversion work. Databricks&#x27; research argues that as enterprise data grows to include more source types, that burden makes custom pipelines increasingly impractical compared to an agent that queries each source in its native format.</p><p>&quot;Just bring the agent to the data,&quot; Bendersky said. &quot;You basically give the agent more sources, and it will learn to use them pretty well.&quot;</p><h2>What this means for enterprises</h2><p>For data engineers evaluating whether to build custom RAG pipelines or adopt a declarative agent framework, the research offers a clear direction: if the task involves questions that span structured and unstructured data, building custom retrieval is the harder path. The research found that across all tested tasks, the only things that differed between deployments were instructions and tool descriptions. The agent handled the rest.</p><p><b>The practical limits are real but manageable.</b> The approach works well with five to ten data sources. Adding too many at once, without curating which sources are complementary rather than contradictory, makes the agent slower and less reliable. Bendersky recommends scaling incrementally and verifying results at each step rather than connecting all available data upfront.</p><p><b>Data accuracy is a prerequisite.</b> The agent can query across mismatched formats, JSON review feeds alongside SQL sales tables, without requiring normalization. It cannot fix source data that is factually wrong. Adding a plain-language description of each data source at ingestion time helps the agent route queries correctly from the start.</p><p>The research positions this as an early step in a longer trajectory. As enterprise AI workloads mature, agents will be expected to reason across dozens of source types, including dashboards, code repositories and external data feeds. The research argues the declarative approach is what makes that scaling tractable, because adding a new source stays a configuration problem rather than an engineering one.</p><p>&quot;This is kind of like a ladder,&quot; Bendersky said. &quot;The agent will slowly get more and more information and then slowly improve overall.&quot; </p>]]></description>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/67zWD4BorZFoyltqGXzV1O/94c41d56da0b69739197e37c2cab72e0/hybrid-reasoning-smk1.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[43% of AI-generated code changes need debugging in production, survey finds]]></title>
            <link>https://venturebeat.com/technology/43-of-ai-generated-code-changes-need-debugging-in-production-survey-finds</link>
            <guid isPermaLink="false">5bWleqe2LUYYQuqucsMzV1</guid>
            <pubDate>Tue, 14 Apr 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[<p>The software industry is racing to write code with artificial intelligence. It is struggling, badly, to make sure that code holds up once it ships.</p><p>A survey of 200 senior site-reliability and DevOps leaders at large enterprises across the United States, United Kingdom, and European Union paints a stark picture of the hidden costs embedded in the AI coding boom. According to <a href="https://lightrun.com/ebooks/state-of-ai-powered-engineering-2026/">Lightrun&#x27;s 2026 State of AI-Powered Engineering Report</a>, shared exclusively with VentureBeat ahead of its public release, 43% of AI-generated code changes require manual debugging in production environments even after passing quality assurance and staging tests. Not a single respondent said their organization could verify an AI-suggested fix with just one redeploy cycle; 88% reported needing two to three cycles, while 11% required four to six.</p><p>The findings land at a moment when AI-generated code is proliferating across global enterprises at a breathtaking pace. Both <a href="https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-as-30percent-of-microsoft-code-is-written-by-ai.html">Microsoft CEO Satya Nadella</a> and <a href="https://arstechnica.com/ai/2024/10/google-ceo-says-over-25-of-new-google-code-is-generated-by-ai/">Google CEO Sundar Pichai</a> have claimed that around a quarter of their companies&#x27; code is now AI-generated. The AIOps market — the ecosystem of platforms and services designed to manage and monitor these AI-driven operations — stands at $18.95 billion in 2026 and is projected to reach $37.79 billion by 2031.</p><p>Yet the report suggests the infrastructure meant to catch AI-generated mistakes is badly lagging behind AI&#x27;s capacity to produce them.</p><p>&quot;The 0% figure signals that engineering is hitting a trust wall with AI adoption,&quot; said Or Maimon, Lightrun&#x27;s chief business officer, referring to the survey&#x27;s finding that zero percent of engineering leaders described themselves as &quot;very confident&quot; that AI-generated code will behave correctly once deployed. &quot;While the industry&#x27;s emphasis on increased productivity has made AI a necessity, we are seeing a direct negative impact. As AI-generated code enters the system, it doesn&#x27;t just increase volume; it slows down the entire deployment pipeline.&quot;</p><h2><b>Amazon&#x27;s March outages showed what happens when AI-generated code ships without safeguards</b></h2><p>The dangers are no longer theoretical. In early March 2026, Amazon suffered a series of <a href="https://www.reuters.com/business/retail-consumer/amazon-down-thousands-users-us-downdetector-shows-2026-03-05/">high-profile outages</a> that underscored exactly the kind of failure pattern the Lightrun survey describes. On March 2, Amazon.com experienced a disruption lasting nearly six hours, resulting in 120,000 lost orders and 1.6 million website errors. Three days later, on March 5, a more <a href="https://www.cnbc.com/2026/03/05/amazon-online-store-suffers-outage-for-some-users.html">severe outage hit the storefront</a> — lasting six hours and causing a 99% drop in U.S. order volume, with approximately 6.3 million lost orders. Both incidents were traced to AI-assisted code changes deployed to production without proper approval.</p><p>The fallout was swift. Amazon launched a 90-day code safety reset across 335 critical systems, and AI-assisted code changes must now be approved by senior engineers before they are deployed.</p><p>Maimon pointed directly to the Amazon episodes. &quot;This uncertainty isn&#x27;t based on a hypothesis,&quot; he said. &quot;We just need to look back to the start of March, when Amazon.com in North America went down due to an AI-assisted change being implemented without established safeguards.&quot;</p><p>The Amazon incidents illustrate the central tension the Lightrun report quantifies in survey data: AI tools can produce code at unprecedented speed, but the systems designed to validate, monitor, and trust that code in live environments have not kept pace. Google&#x27;s own <a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report">2025 DORA report</a> corroborates this dynamic, finding that AI adoption correlates with an increase in code instability, and that 30% of developers report little or no trust in AI-generated code.</p><p>Maimon cited that research directly: &quot;Google&#x27;s 2025 DORA report found that AI adoption correlates with an almost 10% increase in code instability. Our validation processes were built for the scale of human engineering, but today, engineers have become auditors for massive volumes of unfamiliar code.&quot;</p><h2><b>Developers are losing two days a week to debugging AI-generated code they didn&#x27;t write</b></h2><p>One of the report&#x27;s most striking findings is the scale of human capital being consumed by AI-related verification work. Developers now spend an average of 38% of their work week — roughly two full days — on debugging, verification, and environment-specific troubleshooting, according to the survey. For 88% of the companies polled, this &quot;reliability tax&quot; consumes between 26% and 50% of their developers&#x27; weekly capacity.</p><p>This is not the productivity dividend that enterprise leaders expected when they invested in AI coding assistants. Instead, the engineering bottleneck has simply migrated. Code gets written faster, but it takes far longer to confirm that it works.</p><p>&quot;In some senses, AI has made the debugging problem worse,&quot; Maimon said. &quot;The volume of change is overwhelming human validation, while the generated code itself frequently does not behave as expected when deployed in Production. AI coding agents cannot see how their code behaves in running environments.&quot;</p><p>The redeploy problem compounds the time drain. Every surveyed organization requires multiple deployment cycles to verify a single AI-suggested fix — and according to Google&#x27;s <a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report">2025 DORA report</a>, a single redeploy cycle takes a day to one week on average. In regulated industries such as healthcare and finance, deployment windows are often narrow, governed by mandated code freezes and strict change-management protocols. Requiring three or more cycles to validate a single AI fix can push resolution timelines from days to weeks.</p><p>Maimon rejected the idea that these multiple cycles represent prudent engineering discipline. &quot;This is not discipline, but an expensive bottleneck and a symptom of the fact that AI-generated fixes are often unreliable,&quot; he said. &quot;If we can move from three cycles to one, we reclaim a massive portion of that 38% lost engineering capacity.&quot;</p><h2><b>AI monitoring tools can&#x27;t see what&#x27;s happening inside running applications — and that&#x27;s the real problem</b></h2><p>If the productivity drain is the most visible cost, the <a href="https://lightrun.com/ebooks/state-of-ai-powered-engineering-2026/">Lightrun report</a> argues the deeper structural problem is what it calls &quot;the runtime visibility gap&quot; — the inability of AI tools and existing monitoring systems to observe what is actually happening inside running applications.</p><p>Sixty percent of the survey&#x27;s respondents identified a lack of visibility into live system behavior as the primary bottleneck in resolving production incidents. In 44% of cases where AI SRE or application performance monitoring tools attempted to investigate production issues, they failed because the necessary execution-level data — variable states, memory usage, request flow — had never been captured in the first place.</p><p>The report paints a picture of AI tools operating essentially blind in the environments that matter most. Ninety-seven percent of engineering leaders said their AI SRE agents operate without significant visibility into what is actually happening in production. Approximately half of all companies (49%) reported their AI agents have only limited visibility into live execution states. Only 1% reported extensive visibility, and not a single respondent claimed full visibility.</p><p>This is the gap that turns a minor software bug into a costly outage. When an AI-suggested fix fails in production — as 43% of them do — engineers cannot rely on their AI tools to diagnose the problem, because those tools cannot observe the code&#x27;s real-time behavior. Instead, teams fall back on what the report calls &quot;tribal knowledge&quot;: the institutional memory of senior engineers who have seen similar problems before and can intuit the root cause from experience rather than data. The survey found that 54% of resolutions to high-severity incidents rely on tribal knowledge rather than diagnostic evidence from AI SREs or APMs.</p><h2><b>In finance, 74% of engineering teams trust human intuition over AI diagnostics during serious incidents</b></h2><p>The trust deficit plays out with particular intensity in the finance sector. In an industry where a single application error can cascade into millions of dollars in losses per minute, the survey found that 74% of financial-services engineering teams rely on tribal knowledge over automated diagnostic data during serious incidents — far higher than the 44% figure in the technology sector.</p><p>&quot;Finance is a heavily regulated, high-stakes environment where a single application error can cost millions of dollars per minute,&quot; Maimon said. &quot;The data shows that these teams simply do not trust AI not to make a dangerous mistake in their Production environments. This is a rational response to tool failure.&quot;</p><p>The distrust extends beyond finance. Perhaps the most telling data point in the entire report is that not a single organization surveyed — across any industry — has moved its AI SRE tools into actual production workflows. Ninety percent remain in experimental or pilot mode. The remaining 10% evaluated AI SRE tools and chose not to adopt them at all. This represents an extraordinary gap between market enthusiasm and operational reality: enterprises are spending aggressively on AI for IT operations, but the tools they are buying remain quarantined from the environments where they would deliver the most value.</p><p>Maimon described this as one of the report&#x27;s most significant revelations. &quot;Leaders are eager to adopt these new AI tools, but they don&#x27;t trust AI to touch live environments,&quot; he said. &quot;The lack of trust is shown in the data; 98% have lower trust in AI operating in production than in coding assistants.&quot;</p><h2><b>The observability industry built for human-speed engineering is falling short in the age of AI</b></h2><p>The findings raise pointed questions about the current generation of observability tools from major vendors like <a href="https://www.datadoghq.com/">Datadog</a>, <a href="https://www.dynatrace.com/">Dynatrace</a>, and <a href="https://www.splunk.com/">Splunk</a>. Seventy-seven percent of the engineering leaders surveyed reported low or no confidence that their current observability stack provides enough information to support autonomous root cause analysis or automated incident remediation.</p><p>Maimon did not shy away from naming the structural problem. &quot;Major vendors often build &#x27;closed-garden&#x27; ecosystems where their AI SREs can only reason over data collected by their own proprietary agents,&quot; he said. &quot;In a modern enterprise, teams typically have a multi-tool stack to provide full coverage. By forcing a team into a single-vendor silo, these tools create an uncomfortable dependency and a strategic liability: if the vendor&#x27;s data coverage is missing a specific layer, the AI is effectively blind to the root cause.&quot;</p><p>The second issue, Maimon argued, is that current observability-backed AI SRE solutions offer only partial visibility — defined by what engineers thought to log at the time of deployment. Because failures rarely follow predefined paths, autonomous root cause analysis using only these tools will frequently miss the key diagnostic evidence. &quot;To move toward true autonomous remediation,&quot; he said, &quot;the industry must shift toward AI SRE without vendor lock-in; AI SREs must be an active participant that can connect across the entire stack and interrogate live code to capture the ground truth of a failure as it happens.&quot;</p><p>When asked what it would take to trust AI SREs, the survey&#x27;s respondents coalesced unanimously around live runtime visibility. Fifty-eight percent said they need the ability to provide &quot;evidence traces&quot; of variables at the point of failure, and 42% cited the ability to verify a suggested fix before it actually deploys. No respondents selected the ability to ingest multiple log sources or provide better natural language explanations — suggesting that engineering leaders do not want AI that talks better, but AI that can see better.</p><h2><b>The question is no longer whether to use AI for coding — it&#x27;s whether anyone can trust what it produces</b></h2><p>The <a href="https://lightrun.com/ebooks/state-of-ai-powered-engineering-2026/">survey</a> was administered by <a href="https://surveyz.io/">Global Surveyz Research</a>, an independent firm, and drew responses from Directors, VPs, and C-level executives in SRE and DevOps roles at enterprises with 1,500 or more employees across the finance, technology, and information technology sectors. Responses were collected during January and February 2026, with questions randomized to prevent order bias.</p><p><a href="https://lightrun.com/">Lightrun</a>, which is backed by $110 million in funding from Accel and Insight Partners and counts <a href="https://www.att.com/">AT&amp;T</a>, <a href="https://www.citi.com/">Citi</a>, <a href="https://www.microsoft.com/en-us">Microsoft</a>, <a href="https://www.salesforce.com/">Salesforce</a>, and <a href="https://www.unitedhealthgroup.com/">UnitedHealth Group</a> among its enterprise clients, has a clear commercial interest in the problem the report describes: the company sells a runtime observability platform designed to give AI agents and human engineers real-time visibility into live code execution. Its AI SRE product uses a Model Context Protocol connection to generate live diagnostic evidence at the point of failure without requiring redeployment. That commercial interest does not diminish the survey&#x27;s findings, which align closely with independent research from Google DORA and the real-world evidence of the Amazon outages.</p><p>Taken together, they describe an industry confronting an uncomfortable paradox. AI has solved the slowest part of building software — writing the code — only to reveal that writing was never the hard part. The hard part was always knowing whether it works. And on that question, the engineers closest to the problem are not optimistic.</p><p>&quot;If the live visibility gap is not closed, then teams are really just compounding instability through their adoption of AI,&quot; Maimon said. &quot;Organizations that don&#x27;t bridge this gap will find themselves stuck with long redeploy loops, to solve ever more complex challenges. They will lose their competitive speed to the very AI tools that were meant to provide it.&quot;</p><p>The machines learned to write the code. Nobody taught them to watch it run.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Security</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5nAHuSU7TlSixVhQbV3Zpy/f97f9591cd1d877db961dac2be53b6cc/nuneybits_Vector_art_of_developer_mopping_code_spill_dbcceaac-fb6e-4e63-90cf-5774d34a0f44.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
    </channel>
</rss>