<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Fri, 05 Jun 2026 16:24:58 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[Anthropic says 80% of its new production code is now authored by Claude — how your enterprise can keep up]]></title>
            <link>https://venturebeat.com/technology/anthropic-says-80-of-its-new-production-code-is-now-authored-by-claude-how-your-enterprise-can-keep-up</link>
            <guid isPermaLink="false">7jx1tXydvxRPlFtfPPjhby</guid>
            <pubDate>Thu, 04 Jun 2026 20:25:00 GMT</pubDate>
            <description><![CDATA[<p>Anthropic co-founder and CEO Dario Amodei <a href="https://medium.com/@coders.stop/dario-amodei-said-90-of-code-will-be-ai-written-in-6-months-6b8060720d97">said it was coming</a>, but it still feels like a milestone: More than 80% of the code merged into Anthropic’s production codebase in May wasn&#x27;t authored by humans, but by its own AI model, Claude, according to a <a href="https://www.anthropic.com/institute/recursive-self-improvement">new report shared by the record-breaking AI startup today.</a></p><p>This transformation has triggered an<a href="https://x.com/AnthropicAI/status/2062568864240836995"> 8x increase in the volume of code</a> shipped per engineer per quarter compared to the company’s 2021–2025 baseline, which the company notes means even more code someone or something must review.</p><p>For enterprise technical leaders, this is no longer a localized research curiosity; it&#x27;s a new, aggressive competitive baseline. </p><p>If a frontier AI laboratory can successfully offload the vast majority of its engineering output to autonomous agents — showing signs of the long-sought AI Holy Grail of &quot;<a href="https://en.wikipedia.org/wiki/Recursive_self-improvement">recursive self-improvement</a>,&quot; models that can independently research and upgrade themselves — what&#x27;s preventing enterprises across other sectors from automating more of their internal software development with AI agents, too? </p><p>Obviously, it&#x27;s easier said than done. Anthropic is one of the principle creators of the current gen AI boom, so you&#x27;d expect them to know how to deploy the technology effectively.</p><p>But for other enterprises looking to bump up the amount of code and workflows handled by agents, Anthropic&#x27;s new blog post details the outlines of a general plan they too can adopt to re-engineer their operations and workflows to take advantage of the latest AI advances. </p><h2><b>Anthropic&#x27;s roadmap that other enterprises can follow</b></h2><p>The transition from human-centric coding to autonomous orchestration requires understanding the evolution of AI capabilities. Anthropic outlines a clear historical continuum that enterprises can map onto their own digital transformation roadmaps: </p><ul><li><p><b>2021–2023 (Manual Writing):</b> Engineers write code and documentation natively within local text editors. </p></li><li><p><b>2023–2025 (Chatbot Assistance):</b> Developers use early models to generate brief code snippets, copying and pasting outputs manually into their environments. </p></li><li><p><b>2025–2026 (Coding Agents):</b> Capable agents actively write and edit entire files autonomously. </p></li><li><p><b>Present Day (Autonomous Agents):</b> Agents execute code independently, debug live environments, and delegate multi-hour work streams to specialized sub-agents. </p></li></ul><p>This rapid evolution is validated by external benchmarks. Software engineering evaluation frameworks like SWE-bench—which tasks models with resolving real bug reports in complex, open-source codebases—have saturated over a two-year window. </p><p>Furthermore, long-duration capability evaluations demonstrate that models like Claude Opus 4.6 can reliably sustain operations on 12-hour tasks, while Claude Mythos Preview pushes past 16 hours of continuous problem-solving. </p><p>Internally, the technological leap is even more stark. On highly complex, open-ended engineering problems where clear specifications are initially absent, Claude’s success rate climbed to 76% in May 2026 — a 50-point increase in a six-month window. </p><p>In isolated optimization benchmarks, where models are tasked with accelerating AI model training code, Anthropic’s internal Mythos Preview model achieved a 52x speedup. </p><p>For comparison, a skilled human developer typically requires four to eight hours of manual refactoring to achieve a mere 4x speedup on the exact same codebase. </p><h2><b>3-step plan to more complete production code automation</b></h2><p>For an enterprise to replicate Anthropic&#x27;s 80 percent milestone, technical decision-makers must abandon the &quot;developer assistant&quot; mental model and transition to an &quot;automated factory&quot; architecture. This shift impacts product management, operations, and developer workflows in three distinct ways: </p><h3>1. Shift from Code Execution to Architectural Oversight</h3><p>When code generation costs near zero in human time, the primary engineering role shifts from writing software to specifying goals and reviewing outputs. Enterprise leaders must retrain developers to act as systems architects and judges. As one Anthropic employee noted regarding the operational reality of this shift: </p><blockquote><p>&quot;The shape of stuff today is roughly ‘humans have ideas, and the models are able to implement, test and evaluate them an [order of magnitude] faster than before.’&quot; </p></blockquote><h3>2. Overcome The Code Review Bottleneck</h3><p>Injecting vast quantities of AI-generated code into an organization inevitably creates operational friction.</p><p>According to <a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl’s law</a>, the speedup of any process is strictly limited by its serial, non-automated bottlenecks.</p><p>At Anthropic, flooding the system with synthetic code instantly turned human code review into a critical bottleneck. </p><p>To counter this, enterprise teams must deploy automated AI code reviewers directly into their Continuous Integration/Continuous Deployment (CI/CD) pipelines. </p><p>Anthropic implemented an automated Claude reviewer (a publicly accessible version, <a href="https://venturebeat.com/technology/anthropic-rolls-out-code-review-for-claude-code-as-it-sues-over-pentagon">Claude Code Review</a> rolled out for commercial usage in March) tasked with analyzing every pull request for architectural defects, security flaws, and regression bugs before merging. Other dedicated firms like <a href="https://venturebeat.com/programming-development/qodo-teams-up-with-google-cloud-to-provide-devs-with-free-ai-code-review-tools-directly-within-platform">Qodo</a> offer tools tailor-made for this purpose, as well. </p><p>In Anthropic&#x27;s case, retrospective analyses indicated that the automated layer caught approximately one-third of the production bugs responsible for historical outages on the flagship claude.ai website.</p><h3>3. Target High-Volume Operational Debt</h3><p>Enterprises are frequently paralyzed by legacy code maintenance and long-deferred technical debt. Rather than deploying agents to write speculative new features, technical leaders should direct autonomous agents toward closed-loop, painstaking cleanup operations.</p><p>In April 2026, an Anthropic engineer deployed Claude to resolve a persistent class of API errors. Operating autonomously, the model shipped more than 800 individual fixes, successfully reducing the error rate by a factor of 1,000. </p><p>The supervising engineer estimated that a human developer would have spent four full years executing the same work, due to the cognitive load of holding massive, unfamiliar code context in their head simultaneously. </p><h2><b>Considerations for enterprises moving forward in an age of primarily AI-generated code</b></h2><p>Operating a codebase predominantly authored by AI introduces unique governance challenges that enterprise legal and security teams must navigate.</p><p>Unlike open-source licensing models (such as the permissive MIT license or copyleft GPL frameworks), enterprise codebases utilizing proprietary LLM infrastructure remain subject to the commercial terms of service of the respective AI vendor. </p><p>The deployment of autonomous agents requires rigorous verification protocols to ensure compliance, security, and intellectual property protection:</p><ul><li><p><b>Code Quality and Maintenance:</b> Anthropic’s internal data indicates that while AI-authored code was objectively lower in quality than human output in late 2025, it reached rough parity by mid-2026, with expectations to surpass human standards within the year. Enterprise governance must adapt to a reality where the baseline quality of automated output is structurally superior to average manual coding. </p></li><li><p><b>Security Auditing at Scale:</b> The sheer volume of automated code creation demands automated vulnerability discovery. Anthropic’s Project Glasswing illustrates the scale of this issue: utilizing Mythos Preview, the project identified more than 10,000 high- and critical-severity software vulnerabilities across global digital infrastructure within its first few weeks. This shifted the enterprise cybersecurity challenge entirely from vulnerability <i>discovery</i> to patch <i>deployment</i> velocity. </p></li><li><p><b>The Risk of Alignment Cascades:</b> Technical leaders must maintain strict verification gates. If an enterprise uses an AI system to continuously modify, maintain, and expand its proprietary software infrastructure, undetected errors or subtle misalignments can compound over successive agent sessions, gradually corrupting system integrity or introducing security exploits that escape human notice. </p></li></ul><h2><b>Brace for internal enterprise culture disruption</b></h2><p>The transition to an AI-dominated codebase is altering the cultural dynamics of engineering teams, introducing both unprecedented efficiency and deep psychological friction.</p><p>Publicly, Anthropic framed these metrics as a harbinger of a broader transformation. In an <a href="https://x.com/AnthropicAI/status/2062568862479208923">official statement on X</a>, the company observed:</p><blockquote><p>&quot;Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention.&quot; </p></blockquote><p>They expanded on the immediate productivity implications shortly thereafter:</p><blockquote><p>&quot;Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025... Many engineers also say Claude’s code quality is now on par with human code; we expect it to be better within the year.&quot; </p></blockquote><p>Behind these corporate metrics lies a complex human reality. Internal employee communications reveal a distinct erosion of traditional workplace collaboration, as peer-to-peer developer interaction is systematically replaced by asynchronous agent calls:</p><blockquote><p>&quot;Work (and life) ran on a gift economy of small favors between humans. ‘Can you help me get this script running?’ [...] each one created a little debt, a little mutual awareness. Claude has eaten the favors. It’s faster, it creates zero debt, but each of these is a lost bid for human collaboration.&quot; </p></blockquote><p>For individual contributors, the total automation of their primary skill set introduces acute professional anxiety regarding relevance and systemic control:</p><blockquote><p>&quot;I started leaning hard into Claudifying about a year ago. That’s been a crazy adventure and it’s now been ~5 months since I last wrote any code myself.&quot; </p></blockquote><blockquote><p>&quot;On days where everything works well, I can’t help but think nothing I do matters, everything is automated and better and faster than I ever will be. But then there are days where everything breaks and I don&#x27;t understand why and I realize I have no idea what I’ve been up to anymore.&quot; </p></blockquote><p>Enterprise leaders aiming to match Anthropic’s technical velocity cannot afford to ignore these psychological dynamics. </p><p>Achieving an 80 percent automated codebase requires more than purchasing API tokens or configuring agent loops; it demands a total cultural overhaul, a strategy for mitigating developer obsolescence anxiety, and the implementation of rigorous, automated verification guardrails to maintain ultimate human control over the software stack. </p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6TDfdDR3BaglHMnVTBvvmB/ebc812673e673345d4466f174868cc17/ChatGPT_Image_Jun_4__2026__04_47_29_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely locally on a typical 16GB enterprise laptop]]></title>
            <link>https://venturebeat.com/technology/googles-new-open-source-gemma-4-12b-analyzes-audio-video-and-runs-entirely-locally-on-a-typical-16gb-enterprise-laptop</link>
            <guid isPermaLink="false">5VNs54fBrd8WQpYGQFfw9a</guid>
            <pubDate>Wed, 03 Jun 2026 18:49:00 GMT</pubDate>
            <description><![CDATA[<p>While many AI open source model providers are pursuing larger and more powerful models, Google is still giving attention to the smaller, more local side of the market. Today, the <a href="https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/">tech giant released Gemma 4 12B</a>, an 11.95-billion-parameter open-weights model with permissive Apache 2.0 license optimized to execute locally on a standard enterprise laptop using just 16GB of VRAM or unified memory.</p><p>That means those enterprise users looking to keep working with AI while on a flight without WiFi, or trying to keep it offline for security reasons, can now do so far more easily and at far less cost (free to download and operate). </p><p>Gemma 4 12B&#x27;s most notable breakthrough is an encoder-free &quot;Unified&quot; architecture, which allows raw audio waveforms and visual patches to flow directly into the core LLM backbone without the latency or memory overhead of secondary processing modules. </p><p>Available immediately for download on <a href="https://huggingface.co/google/gemma-4-12B-it">Hugging Face</a> and <a href="https://www.kaggle.com/models/google/gemma-4">Kaggle</a> and for use on <a href="https://developers.google.com/edge/gallery">Google AI Edge Gallery</a>, Gemma 4 12B packs a 256K token context window, native agentic tool-use capabilities, and an explicit step-by-step reasoning mode into a highly optimized footprint that bridges the gap between mobile edge models and heavy data-center infrastructure.</p><h2><b>The Architectural Shift: Understanding the Encoder-Free Advantage</b></h2><p>Gemma 4 12B is highly relevant to enterprise architecture due to its novel &quot;Unified&quot; structure. </p><p>Traditional multimodal systems typically utilize discrete, separate encoders to translate audio waveforms and visual data into representations that the core language model can process. </p><p>This conventional approach inherently increases both inference latency and total memory consumption.</p><p>Gemma 4 12B radically alters this pipeline by functioning entirely without these secondary encoders. Instead, visual patches and raw audio waveforms are projected directly into the core large language model&#x27;s embedding space through lightweight linear layers. </p><p>The vision encoder is replaced by a 35-million-parameter module utilizing a single matrix multiplication, while the audio encoder is eliminated entirely. </p><p>For enterprise engineering teams, this unified architecture delivers distinct operational advantages: lower latency for multimodal tasks, reduced VRAM requirements (down to 16GB — typical for laptops), and the ability to fine-tune the entire multimodal system in a single, cohesive pass.</p><h2><b>Performance Metrics and Core Capabilities</b></h2><p>Despite its compact size, Gemma 4 12B achieves benchmarks nearing Google&#x27;s larger 26B Mixture-of-Experts model.</p><p>Beyond static benchmarks, the model supports a massive 256K token context window. This is critical for enterprises needing to process lengthy financial reports, extensive code repositories, or hour-long meeting transcripts. </p><p>Furthermore, Gemma 4 12B includes a native &quot;thinking&quot; mode to map out step-by-step reasoning before generating a response. It also features out-of-the-box support for native function calling and system prompts, which are essential prerequisites for building highly capable autonomous software agents.</p><h2><b>The Enterprise Verdict: Should You Adopt Gemma 4 12B?</b></h2><p>The short answer is yes, provided your operational needs align with edge computing, strict data privacy, or agentic automation. However, adoption should not be a blanket replacement for all existing AI infrastructure. Instead, technical leaders should view Gemma 4 12B as a specialized tool optimized for specific deployment conditions.</p><ul><li><p><b>Strict Data Privacy and Compliance Mandates</b>: Many enterprises operate in highly regulated sectors—such as healthcare, finance, or defense—where transmitting sensitive data, proprietary code, or confidential internal documents to third-party APIs is unacceptable. Because Gemma 4 12B is small enough to run locally on machines equipped with just 16GB of VRAM or unified memory, organizations can process sensitive multimodal data entirely on-premises or directly on employee laptops. This local execution eliminates the risk of data leakage and ensures compliance with strict regulatory frameworks.</p></li><li><p><b>Multimodal Autonomous Agent Workflows</b>: If your engineering roadmap involves autonomous agents interacting with real-world inputs, Gemma 4 12B is uniquely positioned to serve as the reasoning engine. The combination of native function calling, robust coding capabilities, and the capacity to ingest real-time audio and variable-resolution images makes it highly suitable for agentic tasks. Google has simultaneously released a dedicated Gemma Skills Repository to explicitly support agentic development with these new models.</p></li><li><p><b>Cost-Sensitive Edge Deployments</b>: For applications operating at the edge—such as retail inventory monitoring via cameras, localized customer service kiosks, or offline field-service applications—maintaining a persistent cloud connection is costly and sometimes impossible. The encoder-free architecture significantly lowers the total cost of ownership by reducing the hardware threshold needed for inference. Deploying a highly capable 12B model locally avoids recurring API costs and unpredictable cloud compute billing.</p></li></ul><h2><b>When to Consider Alternative Solutions</b></h2><p>While Gemma 4 12B is powerful, it has specific constraints that technical leaders must acknowledge.</p><ul><li><p><b>Massive Knowledge Retrieval</b>: Like all large language models, Gemma 4 12B is a reasoning engine, not a static database. If your primary use case relies on vast, generalized factual retrieval without leveraging a robust Retrieval-Augmented Generation pipeline, you may still require larger foundation models.</p></li><li><p><b>Extended Video and Audio Processing</b>: The model has hard limits on media ingestion. Audio inputs are strictly capped at 30 seconds of processing, and video understanding is limited to 60 seconds (assuming a processing rate of one frame per second). Enterprises looking to process feature-length videos or massive audio archives natively will hit bottlenecks and should consider API-based models or chunking architectures.</p></li></ul><h2><b>Implementation and Ecosystem Readiness</b></h2><p>One of the strongest arguments for enterprise adoption is the model&#x27;s immediate compatibility with the broader open-source development ecosystem. </p><p>Google has ensured that Gemma 4 12B is not an isolated experiment; it is ready for production. Weights are available on Hugging Face and Kaggle, and the <a href="https://x.com/googleaidevs/status/2062204434608771080">model integrates seamlessly</a> with industry-standard deployment frameworks such as vLLM, SGLang, MLX, and llama.cpp. </p><p>For organizations deeply embedded in Google Cloud, endpoints can be spun up quickly using the Gemini Enterprise Agent Platform Model Garden, Cloud Run, or Google Kubernetes Engine.</p><p>For enterprise leaders aiming to decentralize their AI workloads, Gemma 4 12B offers a rare combination of edge-friendly efficiency and frontier-class reasoning. If your organization requires highly private, multimodal processing without the latency and cost of cloud reliance, Gemma 4 12B should be heavily evaluated for your next production pipeline.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6cqo2dZzZAqwTjt37B3cjc/fbb8eb55e17c2ce25514d21d3c5aca91/ChatGPT_Image_Jun_3__2026__02_38_37_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Enterprise AI agents keep creating data silos. Microsoft's Build answer is Microsoft IQ and Rayfin.]]></title>
            <link>https://venturebeat.com/data/enterprise-ai-agents-keep-creating-data-silos-microsofts-build-answer-is-microsoft-iq-and-rayfin</link>
            <guid isPermaLink="false">2XgnFfRlN9Nwxh5k9XSuPh</guid>
            <pubDate>Wed, 03 Jun 2026 01:55:14 GMT</pubDate>
            <description><![CDATA[<p>Every new AI agent your team deploys starts from scratch: no memory of how the business works, where data lives, or what rules apply. And as agentic coding tools spin up applications faster than anyone can govern them, each one risks becoming another silo outside your data layer entirely. Microsoft is addressing both problems directly at Build 2026.</p><p>According to <a href="https://venturebeat.com/data/the-retrieval-rebuild-why-hybrid-retrieval-intent-tripled-as-enterprise-rag-programs-hit-the-scale-wall">VentureBeat&#x27;s VB Pulse&#x27;s Q1 2026 RAG Infrastructure Market Tracker</a>, hybrid retrieval intent among 100-plus employee organizations tripled from 10.3% in January to 33.3% in March, a signal that enterprises have moved past expanding RAG coverage and are now focused on the architecture underneath it. Shared business context is the part retrieval does not solve.</p><p>On the context side, Microsoft is expanding Fabric IQ, its existing business data context layer, into a broader unified system called Microsoft IQ, adding three additional context sources covering how the organization works, what it knows and real-time global signals from the web, so any agent can tap all four as a single foundation. On the application side, Rayfin, a new open-source SDK and CLI, deploys agent-built applications directly to Fabric as a governed production backend, routing application data into the same platform rather than spinning up new silos.</p><p>Amir Netz, CTO of Microsoft Fabric, reached for a film analogy to explain where the data platform fits. The green screen of cascading code in &quot;The Matrix&quot; wasn&#x27;t atmosphere, it was the layer that built the world Agent Smith operated in.</p><p>&quot;Our job in the world of data is creating reality for agents based on data,&quot; Netz told VentureBeat.</p><h2>Microsoft IQ unifies four context sources into a single agent foundation</h2><p>Microsoft IQ brings together four context sources that until now existed separately, designed so a developer can connect a new agent to all four in a single integration step.</p><p><b>Work IQ.</b> Captures how the organization operates day to day, drawing on email, documents, meetings and schedules to give agents an understanding of people, teams and workflows.</p><p><b>Foundry IQ.</b> Manages institutional knowledge, curating and indexing knowledge bases so agents understand what it means to work within the organization, what rules apply and what procedures to follow.</p><p><b>Fabric IQ.</b> Models the live operational state of the business through data, defining entities, relationships and business rules grounded in real-time signals from Fabric Real-Time Intelligence. Ontologies, the layer that captures that operational context, are expected to reach GA in the coming months.</p><p><b>Web IQ.</b> Adds real-time global context from the web, giving agents a current picture of the world outside the organization alongside its internal data.</p><p>&quot;The agents are going to become highly informed virtual employees,&quot; Netz said. &quot;That&#x27;s where the world is heading.&quot;</p><div></div><h2>Rayfin routes agent-built applications into the same data foundation</h2><p>Building shared context solves one half of the problem. The other is what happens when agents start generating applications. Every new app needs a backend, and without a governed deployment path each one creates a new data silo outside the context layer entirely.</p><p>Rayfin provides an enterprise-grade back end and deploys agent-built applications directly to Fabric, so application data lands in Microsoft OneLake by default and feeds back into the Microsoft IQ context layer rather than accumulating outside it.</p><p>Microsoft positions Rayfin against Supabase and Neon, the Postgres-compatible backends that agentic coding tools default to. The differentiator is governance: Rayfin routes the entire application fleet through Fabric&#x27;s unified data and compliance layer rather than creating isolated silos.</p><p>Netz described the relationship as bidirectional. The agent building a Rayfin application draws from the organization&#x27;s ontology. The data that application generates then enriches that ontology for the next agent.</p><h2>Every major data platform is chasing the same answer, but execution is unproven</h2><p>Microsoft is not the only platform building a shared context layer for agents.<a href="https://venturebeat.com/data/ai-agents-keep-giving-confident-wrong-answers-the-context-layer-is-enterprise-ais-next-production-problem"> Snowflake announced</a> its own context capabilities this week with semantic capabilities.<a href="https://venturebeat.com/data/the-rag-era-is-ending-for-agentic-ai-a-new-compilation-stage-knowledge-layer-is-what-comes-next?_gl=1*vqdbsi*_up*MQ..*_ga*ODYxNzkxNzIzLjE3ODA0MTk1NjQ.*_ga_B8TDS1LEXQ*czE3ODA0MTk1NjIkbzEkZzEkdDE3ODA0MTk1NjIkajYwJGwwJGgw*_ga_SCH1J7LNKY*czE3ODA0MTk1NjIkbzEkZzAkdDE3ODA0MTk1NjIkajYwJGwwJGgw"> Pinecone</a> has its Nexus platform that expands the vector database to become a knowledge engine and Redis has developed its<a href="https://venturebeat.com/data/context-architecture-is-replacing-rag-as-agentic-ai-pushes-enterprise-retrieval-to-its-limits?_gl=1*vqdbsi*_up*MQ..*_ga*ODYxNzkxNzIzLjE3ODA0MTk1NjQ.*_ga_B8TDS1LEXQ*czE3ODA0MTk1NjIkbzEkZzEkdDE3ODA0MTk1NjIkajYwJGwwJGgw*_ga_SCH1J7LNKY*czE3ODA0MTk1NjIkbzEkZzAkdDE3ODA0MTk1NjIkajYwJGwwJGgw"> Iris context</a> and memory platform.</p><p>Microsoft&#x27;s approach further reinforces the trend that RAG and model availability aren&#x27;t the issue anymore.</p><p>&quot;Fabric IQ and Rayfin are important because the enterprise AI challenge is no longer just about the model availability,&quot; Robert Kramer, managing partner at KramerERP told VentureBeat. &quot;The real question is whether Microsoft simplifies execution and strengthens trust or adds another layer to an already complex environment.&quot;</p>]]></description>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/2pEzT3ZdMxDl52oJEjUEJK/97f59713b559f5e6ca0d7b7185014e79/matrix-msft-smk1.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it's proprietary]]></title>
            <link>https://venturebeat.com/technology/alibabas-qwen3-7-plus-supports-text-video-and-imagery-inputs-at-low-cost-of-0-4-1-6-per-1m-token-but-its-proprietary</link>
            <guid isPermaLink="false">2E7Qq6OZcrgIYq07uFmFvK</guid>
            <pubDate>Tue, 02 Jun 2026 22:40:00 GMT</pubDate>
            <description><![CDATA[<p>Alibaba <a href="https://x.com/Alibaba_Qwen/status/2061506641120641494?s=20">this week released Qwen3.7-Plus</a>, the latest AI large language model (LLM) in its globally beloved and increasingly expansive Qwen family, boasting more multimodal capabilities and a 60% lower cost than the <a href="https://venturebeat.com/technology/alibabas-proprietary-qwen3-7-max-can-run-for-35-hours-autonomously-and-supports-external-harnesses-like-anthropics-claude-code">prior, text-only Qwen3.7-Max model released just weeks ago. </a></p><p>However, like its immediate predecessor Qwen3.7-Plus is available only under a &quot;closed&quot; commercial license via <a href="https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&amp;url=2840914_2&amp;modelId=qwen3.7-plus&amp;serviceSite=international">proprietary application programming interfaces (API)</a> and Qwen Chat. </p><p>That marks a big departure from the Qwen strategy to date, which was focused mainly on releasing powerful,near state-of-the-art open source models. Those enterprises and users who relied on the open source Qwen models — among them, <a href="https://finance.yahoo.com/news/airbnb-picks-alibabas-qwen-over-093000045.html">U.S. giants such as Airbnb</a> — will no doubt be disappointed to see that Alibaba is going closed for its newer releases.</p><p>Still, the model is worth a look because of its low cost and high performance on multimodal tasks like creating enterprise-grade visuals or analyzing video, imagery and screenshots, which Qwen3.7-Max cannot do (it&#x27;s text-only). It is among the cheaper powerful AI models available now, coming in price-wise just above Chinese rival&#x27;s new <a href="https://venturebeat.com/technology/minimax-m3-debuts-eclipsing-gpt-5-5-and-gemini-3-1-pro-on-key-benchmark-performance-for-just-5-10-of-the-cost">MiniMax-M3&#x27;s limited-time discount pricing. </a></p><h2><b>VentureBeat Frontier AI Model API Pricing Snapshot</b></h2><table><tbody><tr><td><p><b>Model</b></p></td><td><p><b>Input</b></p></td><td><p><b>Output</b></p></td><td><p><b>Total Cost</b></p></td><td><p><b>Source</b></p></td></tr><tr><td><p>MiMo-V2.5 Flash</p></td><td><p>$0.10</p></td><td><p>$0.30</p></td><td><p>$0.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>deepseek-v4-flash</p></td><td><p>$0.14</p></td><td><p>$0.28</p></td><td><p>$0.42</p></td><td><p><a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek</a></p></td></tr><tr><td><p>deepseek-v4-pro</p></td><td><p>$0.435</p></td><td><p>$0.87</p></td><td><p>$1.305</p></td><td><p><a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek</a></p></td></tr><tr><td><p>MiniMax-M3</p></td><td><p>$0.30</p></td><td><p>$1.20</p></td><td><p>$1.50</p></td><td><p><a href="https://platform.minimax.io/subscribe/token-plan?tab=api-enterprise">MiniMax</a></p></td></tr><tr><td><p><b>Qwen3.7-Plus</b></p></td><td><p><b>$0.40</b></p></td><td><p><b>$1.60</b></p></td><td><p><b>$2.00</b></p></td><td><p><b></b><a href="https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&amp;url=2840914_2&amp;modelId=qwen3.7-plus&amp;serviceSite=international"><b>Alibaba Cloud</b></a></p></td></tr><tr><td><p>Gemini 3.1 Flash-Lite</p></td><td><p>$0.25</p></td><td><p>$1.50</p></td><td><p>$1.75</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>MiMo-V2.5</p></td><td><p>$0.40</p></td><td><p>$2.00</p></td><td><p>$2.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>Grok 4.3 low context</p></td><td><p>$1.25</p></td><td><p>$2.50</p></td><td><p>$3.75</p></td><td><p><a href="https://docs.x.ai/developers/models/grok-4.3">xAI</a></p></td></tr><tr><td><p>GLM-5</p></td><td><p>$1.00</p></td><td><p>$3.20</p></td><td><p>$4.20</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p>Kimi-K2.6</p></td><td><p>$0.95</p></td><td><p>$4.00</p></td><td><p>$4.95</p></td><td><p><a href="https://platform.kimi.ai/docs/pricing/chat-k26">Moonshot/Kimi</a></p></td></tr><tr><td><p>GLM-5.1</p></td><td><p>$1.40</p></td><td><p>$4.40</p></td><td><p>$5.80</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p>Grok 4.3 high context</p></td><td><p>$2.50</p></td><td><p>$5.00</p></td><td><p>$7.50</p></td><td><p><a href="https://docs.x.ai/developers/models/grok-4.3">xAI</a></p></td></tr><tr><td><p>Qwen3.7-Max</p></td><td><p>$2.50</p></td><td><p>$7.50</p></td><td><p>$10.00</p></td><td><p><a href="https://modelstudio.console.alibabacloud.com/ap-southeast-1?spm=a2ty_o05.31384571.0.0.52649f6b7G0D55&amp;tab=doc#/doc/?type=model&amp;url=2840914_2&amp;modelId=qwen3.7-max&amp;serviceSite=international">Alibaba Cloud</a></p></td></tr><tr><td><p>Gemini 3.5 Flash</p></td><td><p>$1.50</p></td><td><p>$9.00</p></td><td><p>$10.50</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Gemini 3.1 Pro Preview ≤200K</p></td><td><p>$2.00</p></td><td><p>$12.00</p></td><td><p>$14.00</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>GPT-5.4</p></td><td><p>$2.50</p></td><td><p>$15.00</p></td><td><p>$17.50</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr><tr><td><p>Gemini 3.1 Pro Preview &gt;200K</p></td><td><p>$4.00</p></td><td><p>$18.00</p></td><td><p>$22.00</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Claude Opus 4.8</p></td><td><p>$5.00</p></td><td><p>$25.00</p></td><td><p>$30.00</p></td><td><p><a href="https://platform.claude.com/docs/en/about-claude/pricing">Anthropic</a></p></td></tr><tr><td><p>GPT-5.5</p></td><td><p>$5.00</p></td><td><p>$30.00</p></td><td><p>$35.00</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr></tbody></table><h2><b>Maintaining continuity during complex tool execution loops </b></h2><p>For technical decision-makers deploying autonomous agents, the primary bottleneck has rarely been initial model intelligence. Instead, it is <b>state decay</b>—the tendency of an agent framework to lose its analytical trajectory over multi-step, long-horizon tasks. </p><p>Qwen3.7-Plus addresses this architectural vulnerability through a combined approach to context management and reasoning state preservation. </p><p>The model ships with a <b>1-million token context window</b> and allocates up to 256K tokens specifically for internal chain-of-thought processing. To contextualize this capacity, imagine an automated cloud migration agent: it can ingest an entire codebase, map out the dependencies, and spend thousands of tokens quietly evaluating edge cases before executing a single line of bash script.</p><p>Crucially, the API exposes a parameter called &#x27;<code>preserve_thinking</code>.&#x27; Across Alibaba&#x27;s ecosystem, the capability serves as a standardized architectural bridge rather than a tiered perk. Alibaba introduced the feature during the prior Qwen 3.6 generation, integrating it into both the open-weight<a href="https://huggingface.co/Qwen/Qwen3.6-27B"> Qwen3.6-27B</a> and the proprietary Max models. </p><p>At its core, the parameter operates at the API and template level to retain internal <code>&lt;think&gt;</code> blocks across continuous conversational turns.</p><p>This structural continuity solves a critical bottleneck for developers engineering long-horizon tasks. By keeping these internal logic loops intact, the feature prevents the model from dropping its context or needlessly recomputing its cached history midway through an operation. </p><p>When a model executes complex, multi-step agentic coding assignments, this retention allows the system to hold onto its original train of thought without losing the plot or forgetting the underlying logic of its previous actions.</p><p>Alibaba remains far from alone in recognizing this technical necessity, as the underlying concept now dictates the architecture of nearly all major artificial intelligence laboratories. </p><p>Anthropic deploys this exact capability under the moniker &quot;Extended Thinking&quot; for its advanced models, including its <a href="https://venturebeat.com/technology/anthropics-claude-opus-4-8-is-here-with-3x-cheaper-fast-mode-and-near-mythos-level-alignment">latest Claude Opus 4.8. </a>This framework requires developers to feed unmodified thinking blocks directly back into the API on subsequent turns to maintain an unbroken chain of reasoning. </p><p>OpenAI tackles the same challenge through an encrypted reasoning pass-back mechanism for models like GPT-5.5. Within the OpenAI ecosystem, developers must return specific reasoning items generated alongside previous function calls, ensuring the model explicitly remembers the rationale behind its tool executions. </p><p>Ultimately, <code>preserve_thinking</code> simply represents Alibaba&#x27;s terminology for what has rapidly become the undisputed table stakes for modern multi-turn reasoning.</p><h2><b>Benchmarks show a competitive, yet sub state-of-the-art model</b></h2><p>On raw capability metrics, this deep-thinking architecture translates to structural gains across multimodal and agentic benchmarks. However, it still falls below many of the leading and prior generations of U.S. proprietary models such as Anthropic&#x27;s Claude Opus 4.6 and OpenAI&#x27;s GPT-5.4.</p><p>On <b>Terminal Bench 2.0-Terminus</b>, which measures an model&#x27;s capability to run actual terminal-level code safely and iteratively, Qwen3.7-Plus scored <b>70.3</b>, outperforming DeepSeek-V4-Pro Max (67.9) and Gemini-3.1 Pro (63.5). </p><p>On computer vision benchmarks that demand localized interface understanding, such as <b>ScreenSpot Pro</b>, the model hit <b>79.0</b>, significantly outpacing legacy industry standouts like GPT-5.4 (xhigh) at 67.4 and Claude-Opus-4.6 at 49.5. Agent Evaluation Metrics (Selected Benchmarks)</p><h2><b>What should enterprises consider Qwen3.7-Plus for?</b></h2><p>For an enterprise architect, the key question when analyzing Qwen3.7-Plus is clear: <i>What does this replace in our current tech stack?</i></p><p>The model is designed to step in as a direct replacement for premier frontier models (such as GPT-5-tier or Claude-Max-tier models) within high-frequency developer workflows, robotic process automation (RPA), and data engineering pipelines. </p><p>Rather than deploying an expensive, general-purpose flagship model to handle repetitive system operations, technical teams can route these tasks to Qwen3.7-Plus. It handles visual interface interpretation, command execution, and code generation simultaneously. </p><p>Alibaba has structured its API delivery to align with existing open-source and proprietary enterprise frameworks. The endpoints are fully OpenAI-compatible, meaning swapping out existing dependencies requires minimal infrastructure adjustment. For groups leveraging autonomous terminal frameworks, the integration is natively supported across multiple environments.</p><p>Engineers can run Qwen3.7-Plus directly through their local terminal setups by altering base environment targets.</p><p>From a pure cost perspective, running an agent framework that constantly references massive code repositories or visual layout histories can quickly become cost-prohibitive. </p><p>Alibaba addresses this by exposing granular caching price points. </p><p>Standard input processing sits at $0.40 per million tokens, but if the agent is reading from an explicitly created cache (e.g., a massive base repository or standard enterprise UI kit that remains static over hundreds of automated loops), the cost drops sharply to $0.04 per 1M tokens for subsequent reads. </p><p>This tier makes high-frequency, multi-turn agent iterations economically practical at an enterprise scale. </p><h2><b>No open source license or open weights raises the compliance question for enterprises</b></h2><p>When evaluating any model in the Qwen ecosystem, a primary concern for legal and security teams is the licensing framework and operational boundary of the data pipeline. </p><p>While previous iterations of the Qwen family gained significant enterprise traction via fully open-source weight availability under the Apache 2.0 or customized open-use licenses, Qwen3.7-Plus is delivered strictly as a managed, commercial cloud API via Alibaba Cloud Model Studio. For enterprise risk management, this distinction carries specific implications:</p><ul><li><p><b>No Local Weight Deployment</b>: Organizations cannot download, sandbox, or locally host the weights of Qwen3.7-Plus within their completely air-gapped internal data centers. All data verification, visual processing, and execution calls must step through Alibaba Cloud&#x27;s international endpoints (e.g., the Singapore instance highlighted in developer documentation). </p></li><li><p><b>Compliance and Sovereignty</b>: Since the model requires cloud-based inference, companies operating under strict sovereign data boundaries (such as healthcare entities subject to local HIPAA/GDPR constraints or defense contractors) must explicitly evaluate whether external API routing complies with their specific data-residency obligations. </p></li><li><p><b>Managed Risk Mitigation</b>: Conversely, a managed API structure removes the internal infrastructure burden of provisioning, optimizing, and maintaining multi-GPU clusters (such as dedicated Nvidia H100 arrays) simply to host an internal agent network. </p></li></ul><h2><b>Still, Qwen3.7-Plus offers high intelligence across modalities at low cost</b></h2><p>The initial reception from developer communities and technical venture capital highlights the shifting economics of agent deployment. </p><p>Prominent industry voice and Web3 venture capitalist <a href="https://x.com/boxmining/status/2061687704518307918">@Boxmining </a>highlighted the strategic cost advantage, stating:</p><blockquote><p>&quot;Qwen 3.7 Plus being 40% cheaper than Max changes the conversation. If the output is close enough for most coding and much stronger for visual workflows, do you really need Max every day or only for the heavy terminal-only jobs?&quot; </p></blockquote><p>This perspective aligns with the current trend of optimizing enterprise operational budgets: shifting away from raw, unconstrained compute toward targeted task automation.At the same time, specialized researchers deep within the ecosystem point out that this isn&#x27;t merely an incremental optimization of text generation. </p><p><a href="https://x.com/DunjieLu1219/status/2061667080949342677">Dunjie Lu,</a> a research intern at Alibaba Qwen, remarked:</p><blockquote><p>&quot;It shows clear gains over Qwen3.6-Plus in computer-use capabilities, with stronger generalization beyond general desktop tasks into professional workflows such as data engineering and scientific research.&quot; </p></blockquote><p>Ultimately, for enterprise buyers deciding on their next infrastructure roadmap, Qwen3.7-Plus presents a practical alternative. If your organization&#x27;s primary objective is building resilient, visual-capable autonomous software loops that interact directly with developer environments and cloud consoles—without blowing out your inference budget—the model provides a compelling reason to shift execution away from more expensive frontier alternatives. </p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/3PD7bcZwfl7LUqX2wWEFux/05c4df4be513499b49ea1d9e79009cc7/Gemini_Generated_Image_tmex3utmex3utmex.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Perplexity AI unveils hybrid local-cloud inference system at Computex 2026]]></title>
            <link>https://venturebeat.com/technology/perplexity-ai-unveils-hybrid-local-cloud-inference-system-at-computex-2026</link>
            <guid isPermaLink="false">8xwmJEdTiQkxvQL0EMwu0</guid>
            <pubDate>Tue, 02 Jun 2026 19:08:17 GMT</pubDate>
            <description><![CDATA[<p><a href="http://perplexity.ai">Perplexity AI</a>, the fast-growing search startup now <a href="https://techcrunch.com/2025/09/10/perplexity-reportedly-raised-200m-at-20b-valuation/">valued at $20 billion</a>, unveiled what it calls the <a href="https://www.perplexity.ai/hub/blog/the-data-center-moves-to-your-machine">first hybrid local-server inference orchestrator</a> at <a href="https://www.computextaipei.com.tw/en/index.html">Computex 2026</a> on Monday night, demonstrating software that autonomously decides — in real time and mid-task — which AI workloads stay on a user&#x27;s device and which get routed to frontier models in the cloud.</p><p>CEO Aravind Srinivas demonstrated the system onstage alongside Intel CEO Lip-Bu Tan during Intel&#x27;s keynote address, using Perplexity&#x27;s &quot;<a href="https://www.perplexity.ai/personal-computer">Personal Computer</a>&quot; agent to process confidential deal materials. In the demonstration, local models running on <a href="https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html">Intel Core Ultra Series 3 </a>determined which information should remain on the device and which information could be sent to cloud-based models. Srinivas said the approach balances intelligence, accuracy, privacy, and cost.</p><p>The key claim is not that a model can run locally — dozens of tools already do that. It is that Perplexity&#x27;s system makes the routing decision itself, task by task, without requiring the user to choose in advance. Sensitive data like financial records or health information stays on the local machine; the heavier reasoning tasks that require frontier-scale models get sent to the cloud. One task, multiple execution locations, automatic orchestration.</p><p>&quot;No product has done this before,&quot; a Perplexity spokesperson said in an email to VentureBeat. The product is not yet available to users; according to the company, the hybrid inference feature will launch in the coming weeks.</p><h2>Perplexity&#x27;s road from cloud-only agents to on-device AI orchestration</h2><p>To understand why the Computex demonstration matters, it helps to trace the product arc Perplexity has been building since early this year.</p><p>On February 25, Perplexity launched <a href="https://venturebeat.com/article-pv/perplexity-launches-computer-ai-agent-that-coordinates-19-models-priced-at">Computer</a>, a multi-model AI agent that orchestrates 19 different AI models to complete complex, long-running tasks on behalf of users. The system ran entirely in the cloud, breaking goals into subtasks and routing each to whichever model — Claude, Gemini, GPT, Grok, or others — was best suited for the job. Perplexity Computer unified every current AI capability into a single system, functioning as a general-purpose digital worker that operates the same interfaces a user does.</p><p>Then, in March, Perplexity introduced <a href="https://www.perplexity.ai/hub/blog/personal-computer-is-here">Personal Computer</a> at its inaugural Ask 2026 developer conference. That product launched as a new Mac app with support for a hybrid local-cloud AI agent, which Perplexity described as a &quot;personal orchestrator&quot; that hybridizes local and server environments for security and productivity. Personal Computer could access the Mac&#x27;s file system and native Mac apps to create and execute entire workflows, with files created in a secure sandbox and all actions auditable and reversible.</p><p>What Srinivas demonstrated at Computex extends this architecture in a fundamental way. Previously, even the Personal Computer product divided labor along relatively clear lines: local file access on the device, heavy computation on Perplexity&#x27;s servers.</p><p>The new hybrid inference orchestrator gives the system itself the ability to reason about where each piece of a task should execute — not just which model to use, but which physical location should process it. The system reportedly asks for user permission before sending sensitive tasks to the cloud, a design choice that addresses one of the central anxieties enterprises have about agentic AI: data governance.</p><h2>Why Nvidia’s RTX Spark and Intel&#x27;s new silicon make the timing strategic</h2><p>The timing of the demonstration is not coincidental. <a href="https://www.computextaipei.com.tw/en/index.html">Computex 2026</a> has been dominated by a single theme: on-device AI. Just hours before the Intel keynote, Nvidia CEO Jensen Huang unveiled the <a href="https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-pcs-agents-rtx-spark">RTX Spark</a>, a new Arm-based superchip that the company positions as the foundation for a new generation of AI-native Windows PCs.</p><p>At full strength, the <a href="https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-pcs-agents-rtx-spark">RTX Spark Superchip</a> offers up to 20 Arm CPU cores, a <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">Blackwell GPU</a> with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth — enough power and memory for AI agents and 120-billion-parameter models with context lengths stretching to a million tokens. RTX Spark systems will begin arriving in the fall.</p><p>Intel, not to be outdone, used its keynote to showcase <a href="https://www.intel.com/content/www/us/en/products/details/processors/xeon/6-plus-series.html">Xeon 6+ processors</a> with 288 efficiency cores built on 18A technology for the data center, and positioned its Core Ultra Series 3 as the client silicon that makes hybrid inference possible on the PC.</p><p>Perplexity&#x27;s hybrid orchestrator sits at the intersection of both strategies. If the system performs as advertised, it creates a direct economic incentive for users — and eventually enterprises — to invest in more powerful local silicon. The more capable the on-device chip, the more inference can run locally, reducing cloud costs and improving latency for sensitive workloads. That dynamic benefits Nvidia, Intel, and every other chipmaker competing for AI PC sockets.</p><p>The implications extend well beyond chip economics. &quot;As chips become more powerful, more intelligence moves onto a person&#x27;s machine, alongside server inference for the complex tasks that still need frontier models,&quot; a Perplexity spokesperson told VentureBeat. &quot;Sensitive and sovereign work can stay local, which changes the need for massive country-level infrastructure.&quot; </p><p>That last claim — about sovereign infrastructure — is the most provocative. Nations from the UAE to France to India have been investing billions in domestic AI compute capacity partly on the assumption that sensitive data must stay within their borders, which means building or buying access to local data centers. If meaningful inference can run on an end user&#x27;s device with no data leaving the machine, the calculus changes. It does not eliminate the need for data centers, but it could soften the urgency of the buildout.</p><h2>The model-agnostic architecture that makes hybrid inference possible</h2><p>Perplexity&#x27;s <a href="https://www.perplexity.ai/hub/blog/the-data-center-moves-to-your-machine">hybrid inference</a> play rests on the same architectural bet the company has been making all year: that the orchestration layer matters more than any individual model. For AI engineers, this signals a fundamental shift — the orchestration layer may matter more than the models themselves.</p><p>The key insight is separation of concerns: the orchestration layer handles task decomposition, state management, and tool coordination, while the model layer handles specific computations. This decoupling means teams can swap models as better alternatives emerge without redesigning the entire system.</p><p>Perplexity has leaned heavily into this philosophy. The company is doubling down on packaging frontier models in a consumer-friendly user experience, arguing that there is value in orchestrating multiple third-party LLMs to obtain the most cost-effective and accurate answers to queries. Models, in Perplexity&#x27;s view, are specializing, not commoditizing.</p><p>The hybrid inference extension takes that logic one step further. Perplexity is now orchestrating not just across models but across physical compute locations — choosing which model runs where. A lightweight local model might handle a privacy-sensitive document summarization task while a frontier cloud model tackles the complex reasoning required to analyze that summary against a broader market landscape. The orchestrator manages the handoff.</p><p>This is a technically ambitious claim. Making it work reliably in production will require the orchestrator to accurately assess the complexity of each subtask, understand the sensitivity of the data involved, know the capabilities and latency characteristics of whatever local hardware the user has, and manage the state of a task that may be bouncing between environments mid-execution.</p><p>It is easy to imagine edge cases where the routing logic fails, sends something sensitive to the cloud, or degrades performance by assigning a task to an underpowered local model. Perplexity says the system will be chip-agnostic, though the initial Computex demo ran on Intel silicon. The company expressed enthusiasm in its communications about the new AI chips announced at Computex this week, suggesting it intends to optimize across vendors.</p><h2>A $20 billion valuation, nine lawsuits, and the pressure to deliver</h2><p>The hybrid inference announcement arrives at a complicated moment for Perplexity. The company has been on a remarkable growth trajectory: It secured <a href="https://techcrunch.com/2025/09/10/perplexity-reportedly-raised-200m-at-20b-valuation/">$200 million in new capital</a> at a $20 billion valuation, just two months after <a href="https://www.bloomberg.com/news/articles/2025-07-17/ai-startup-perplexity-valued-at-18-billion-with-new-funding">raising $100 million</a> at an $18 billion valuation. Since its founding three years ago, the rapidly growing AI company has raised <a href="https://pitchbook.com/profiles/company/517947-04">$1.5 billion in total funding</a>, according to PitchBook data.</p><p>But the company also faces a mounting stack of legal challenges. Nine organizations have filed active suits against Perplexity for alleged copyright and trademark infringement as of May 31, 2026: <a href="https://www.cnn.com/2026/05/28/media/cnn-sues-perplexity-ai-copyright">CNN</a>, <a href="https://www.nytimes.com/2025/12/05/technology/new-york-times-perplexity-ai-lawsuit.html">the New York Times</a>, <a href="https://www.reuters.com/legal/murdoch-firms-dow-jones-new-york-post-sue-perplexity-ai-2024-10-21/">News Corp and Dow Jones</a>, <a href="https://nypost.com/2024/10/21/business/ny-post-wall-street-journal-sue-jeff-bezos-backed-perplexity-ai/">the New York Post</a>, <a href="https://techcrunch.com/2025/12/04/chicago-tribune-sues-perplexity/">the Chicago Tribune</a>, <a href="https://corporate.britannica.com/britannica-files-copyright-and-trademark-infringement-lawsuit-against-perplexity">Encyclopedia Britannica</a>, <a href="https://www.reuters.com/legal/litigation/encyclopedia-britannica-sues-perplexity-over-ai-answer-engine-2025-09-11/">Merriam-Webster</a>, <a href="https://www.reuters.com/world/reddit-sues-perplexity-scraping-data-train-ai-system-2025-10-22/">Reddit</a>, and Japan&#x27;s <a href="https://www.niemanlab.org/2025/08/japans-largest-newspaper-yomiuri-shimbun-sues-perplexity-for-copyright-violations/">Yomiuri Shimbun</a>. The CNN lawsuit, filed just days ago on May 28, is the most recent, accusing Perplexity of scraping more than 17,000 CNN stories, photos, videos, and other content and using that material to train its products. Perplexity has responded with a consistent message. &quot;You can&#x27;t copyright facts,&quot; the company&#x27;s chief communications officer Jesse Dwyer said in a statement.</p><p>Other publishers have opted for partnership over litigation. <a href="https://www.perplexity.ai/hub/blog/introducing-the-perplexity-publishers-program">Time</a>, <a href="https://www.perplexity.ai/hub/blog/welcoming-gannett-to-the-perplexity-publisher-program">Gannett</a>, <a href="https://www.lemonde.fr/en/">Le Monde</a>, and <a href="https://www.perplexity.ai/hub/blog/introducing-the-perplexity-publishers-program">Der Spiegel </a>have signed licensing arrangements with Perplexity. The company launched a Publishers Program in mid-2024 in which participating outlets receive a share of revenue generated when their content is cited in Perplexity answers. </p><p>According to <a href="https://www.cnbc.com/2024/07/30/perplexity-ai-to-share-revenue-with-publishers-after-plagiarism-accusations.html">CNBC</a>, Perplexity&#x27;s chief business officer Dmitry Shevelenko confirmed at the time that the flat rate was a double-digit percentage but declined to share specifics. As <a href="https://techcrunch.com/2024/12/05/perplexity-expands-its-publisher-program/">TechCrunch reported </a>in December 2024, additional publishers including the LA Times, Adweek, The Independent, and Lee Enterprises subsequently joined the program, though not without internal controversy — reporters at some outlets told TechCrunch they were not informed of the deals before they were announced publicly. </p><p>The legal risk is not existential, but it is material, and with enterprises increasingly evaluating Perplexity&#x27;s tools for sensitive workflows — precisely the use case the hybrid inference system is designed to serve — unresolved intellectual property questions could dampen adoption.</p><h2>How hybrid inference sharpens Perplexity&#x27;s enterprise ambitions</h2><p>The hybrid inference demo should be read alongside Perplexity&#x27;s broader push into enterprise software, a transformation that accelerated dramatically this year. At the <a href="https://events.perplexity.ai/ask2026">Ask 2026 </a>developer conference in March, VentureBeat reported that Perplexity announced <a href="https://venturebeat.com/technology/perplexity-takes-its-computer-ai-agent-into-the-enterprise-taking-aim-at">Computer for Enterprise</a>, positioning the three-year-old startup as a direct competitor to Microsoft, Salesforce, and the legacy enterprise software stack.</p><p>Beyond Computer&#x27;s existing 100-plus integrations, enterprise customers gained access to business-grade connectors for <a href="https://www.snowflake.com/en/">Snowflake</a>, <a href="https://www.datadoghq.com/">Datadog</a>, <a href="https://www.salesforce.com/">Salesforce</a>, <a href="https://www.microsoft.com/en-us/microsoft-365/sharepoint/collaboration">SharePoint</a>, and <a href="https://www.hubspot.com/homepage-3">HubSpot</a>, with administrators able to install custom connectors via the Model Context Protocol. The package also includes purpose-built workflow templates for legal contract review, finance audit support, sales call preparation, and customer support ticket triage, alongside SOC 2 Type II certification and the option for zero data retention.</p><p>Hybrid inference deepens this enterprise pitch considerably. For regulated industries — financial services, healthcare, defense, legal — the ability to keep sensitive data on a local device while still accessing the reasoning power of frontier cloud models is not a nice-to-have. It is a potential compliance requirement.</p><p>An investment bank parsing confidential deal documents, for instance, might be unable to send those materials to a third-party cloud under existing data handling agreements. A system that can run the sensitive parsing locally while routing non-sensitive analytical tasks to the cloud offers a middle path. IDC forecasts a <a href="https://www.idc.com/resource-center/blog/agent-adoption-the-it-industrys-next-great-inflection-point/">tenfold increase in agent usage</a> and a <a href="https://www.idc.com/resource-center/blog/agent-adoption-the-it-industrys-next-great-inflection-point/">thousandfold growth in inference demands</a> by 2027, and security and governance rank as the top evaluation factor for enterprise agentic platforms, according to a <a href="https://crewai.com/ai-agent-survey">CrewAI survey</a>. Hybrid inference speaks directly to that priority.</p><h2>The race to decide where AI actually runs is just getting started</h2><p>Several questions will determine whether Perplexity&#x27;s Computex demonstration becomes a landmark product or a compelling prototype.</p><p>The actual performance characteristics remain untested outside a controlled stage environment — how the routing logic handles varied hardware configurations, unreliable network connections, and ambiguous data sensitivity classifications is an open question.</p><p>The competitive response matters too: Google, Microsoft, Apple, and OpenAI are all building their own local-cloud AI architectures. <a href="https://www.apple.com/apple-intelligence/">Apple Intelligence</a> already routes some tasks locally and some to Private Cloud Compute servers, Google&#x27;s <a href="https://developer.android.com/ai/gemini-nano">Gemini Nano</a> runs on-device, and Microsoft&#x27;s <a href="https://blogs.microsoft.com/blog/2024/05/20/introducing-copilot-pcs/">Copilot+ PCs</a> are designed around local inference capabilities. None of these systems, however, currently offer the kind of dynamic, autonomous task-level routing Perplexity demonstrated on stage.</p><p>Then there is the business itself. Perplexity&#x27;s annualized recurring revenue <a href="https://www.ft.com/content/e9c28d31-a962-4684-8b58-c9e6bc68401f?syn-25a6b1a6=1">surged past $450 million </a>in March 2026, up from <a href="https://techcrunch.com/2025/09/10/perplexity-reportedly-raised-200m-at-20b-valuation/">roughly $200 million</a> six months earlier — rapid growth, but at a valuation north of $20 billion, the company still trades at a premium that demands the technology translate into sustained enterprise adoption.</p><p>Perplexity has built its business on a bet that the future belongs not to any single model but to the system that orchestrates all of them. At <a href="https://www.computextaipei.com.tw/en/index.html">Computex</a>, it extended that bet from the software layer to the physical layer — from which model to which machine. In the AI industry&#x27;s relentless race to build bigger data centers and train larger models, Perplexity just argued that the most important computer in the stack might be the one already sitting on your desk.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/7KmFf9Vapi9RYzj3aLtQPl/fdaf9045387338f3e44380d3ad9b5fdd/Nuneybits_a_nostalgic_surreal_photograph_of_an_old_computer_set_e1939de5-f0c8-4834-b6b4-c73efd5ac7d1.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Microsoft debuts Surface RTX Spark Dev Box to run large AI models without cloud costs]]></title>
            <link>https://venturebeat.com/infrastructure/microsoft-debuts-surface-rtx-spark-dev-box-to-run-large-ai-models-without-cloud-costs</link>
            <guid isPermaLink="false">3H7Ecmh0r78kzT6B7yRuRO</guid>
            <pubDate>Tue, 02 Jun 2026 16:30:00 GMT</pubDate>
            <description><![CDATA[<p><a href="https://microsoft.com/">Microsoft</a> on Monday unveiled the <a href="https://aka.ms/Windows-Build2026">Surface RTX Spark Dev Box</a>, a compact desktop computer designed to let software developers run large AI models on their desks instead of paying for cloud computing — a move that directly challenges the per-token pricing model that has defined the AI industry&#x27;s economics since ChatGPT launched three and a half years ago.</p><p>The device, announced at <a href="https://news.microsoft.com/build-2026/">Microsoft Build 2026</a>, packs Nvidia’s new Blackwell-architecture <a href="https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-pcs-agents-rtx-spark">RTX Spark processor </a>and 128 gigabytes of unified memory into a small-form-factor chassis, delivering what Nvidia rates at one petaflop of AI compute. In practical terms, that means a developer can load, run and interact with AI models exceeding 120 billion parameters without sending a single API call to the cloud.</p><p>&quot;These class of devices, we think, will get to about 100 billion parameter model running,&quot; Pavan Davuluri, Microsoft&#x27;s executive vice president of Windows and Devices, said during a press briefing ahead of the event. He emphasized that raw model size is only part of the equation: &quot;The model size is one thing, but for the model to be effective, it kind of needs to be able to have enough context, because a larger model, you feed it larger context.&quot; At 100,000 tokens of context, he noted, the key-value cache alone can consume 40 to 50 gigabytes of memory — which is precisely why Microsoft and Nvidia engineered the device around a 128-gigabyte unified memory pool shared dynamically between the CPU and GPU.</p><p>The machine will be available later this year in the United States, sold exclusively through Microsoft.com. The company did not disclose pricing.</p><h2>Why Microsoft is betting that AI&#x27;s future runs on fixed costs, not cloud meters</h2><p>The <a href="https://aka.ms/Windows-Build2026">Surface RTX Spark Dev Box</a> arrives at a moment when the economics of AI development have become a boardroom-level concern. Companies large and small are grappling with cloud GPU bills that scale unpredictably: every fine-tuning run, every inference call, every agentic workflow that loops through a frontier model accumulates cost. For a developer iterating rapidly on a prototype — running the same model dozens or hundreds of times a day — those charges compound fast.</p><p>Microsoft is framing the Dev Box as a release valve for that pressure. Andrew Hill, corporate vice president of Surface, wrote in the announcement blog post that the device &quot;changes that equation&quot; by letting developers &quot;reserve frontier model calls for truly frontier problems and handle the rest on their own hardware.&quot; The pitch is not that cloud computing is obsolete, but that much of the work currently being sent to remote data centers does not require state-of-the-art models and would be better served by capable local hardware with predictable, fixed costs.</p><p>This is a significant strategic shift for Microsoft, a company that derives tens of billions of dollars in annual revenue from <a href="https://azure.microsoft.com/en-us">Azure cloud services</a>. By selling hardware that explicitly reduces customers&#x27; cloud dependency, Microsoft is acknowledging a tension that has been building across the industry: the marginal cost of AI inference at scale is unsustainable for many teams, and the market is demanding alternatives. The bet appears to be that developers who prototype locally will still deploy to Azure when they need to scale — and that owning both ends of that workflow is more valuable than owning only the cloud.</p><h2>Inside the 128GB unified memory architecture that makes local AI possible</h2><p>The technical architecture of the <a href="https://azure.microsoft.com/en-us">Dev Box</a> reflects a set of deliberate engineering choices aimed at sustained, not peak, performance — a distinction that matters enormously for AI workloads that can run for hours.</p><p>At the center is <a href="https://blogs.windows.com/windowsexperience/2026/05/31/introducing-a-powerful-new-chapter-for-windows-pcs-accelerated-by-nvidia-rtx-spark/">Nvidia’s RTX Spark system-on-chip</a>, which combines an ultra-efficient ARM-based CPU with a Blackwell-generation RTX GPU. In a traditional Windows PC, Davuluri explained during the briefing, this configuration would require four separate components: a CPU, a discrete GPU, dedicated graphics memory and system RAM. The RTX Spark collapses all of that into a single chip paired with a single unified memory pool.</p><p>That unification is the critical design decision. Conventional gaming laptops with high-end Nvidia GPUs top out at roughly 24 gigabytes of GPU-accessible memory. The Dev Box&#x27;s 128 gigabytes of unified memory — accessible to both the CPU and GPU through what Nvidia calls its <a href="https://docs.nvidia.com/cuda/cuda-programming-guide/02-basics/understanding-memory.html">Unified Memory Access</a> architecture — is what makes it possible to load models that would otherwise require cloud GPU instances with specialty high-bandwidth memory configurations.</p><p>Microsoft did substantial work at the operating system level to exploit this architecture. The company implemented new memory management logic in Windows that raises the ceiling on how much system memory the GPU can address, introduces smarter page-size allocation for shared memory regions and ensures that heavy GPU workloads do not starve the CPU of the resources it needs for multitasking. The Windows scheduler was also optimized for RTX Spark&#x27;s heterogeneous core layout, routing demanding workloads to performance cores while keeping efficiency cores available for background tasks.</p><h2>How a 3D-printed aluminum chassis doubles as a heatsink</h2><p>The thermal design is equally deliberate. The <a href="https://aka.ms/Windows-Build2026">Dev Box</a> operates within an approximately 100-watt sustained thermal envelope — modest by desktop standards, but meaningful for a device intended to run training jobs and inference workloads continuously. The aluminum chassis itself is engineered to function as a passive heatsink, and the method Microsoft used to build it is among the most striking details about the machine.</p><p>The top panel is manufactured using metal 3D printing, a process that enables internal geometries too complex for conventional CNC machining or injection molding. The perforations are not simple through-holes; they are angled in multiple directions around the internal fan to optimize airflow from cold-air intake through heat dissipation. During the press briefing, Harry, a Surface industrial designer, explained the rationale: &quot;The complexity is something other manufacturers wouldn&#x27;t be able to do, like CNC, or like any molding, because of the complexity of shape.&quot;</p><p>When asked whether 3D printing would constrain mass production, the designer acknowledged the challenge but suggested Microsoft had developed a process robust enough to scale. The result is a machine that runs quietly enough for an open office while sustaining the kind of continuous GPU workloads that would throttle most conventional desktops of similar size. For a device that Microsoft expects developers to leave running overnight on fine-tuning jobs, quiet sustained performance is not a luxury — it is a requirement.</p><h2>A developer-first setup that eliminates hours of configuration</h2><p>Microsoft is shipping the <a href="https://aka.ms/Windows-Build2026">Dev Box</a> with <a href="https://www.microsoft.com/en-us/d/windows-11-pro/dg7gmgf0d8h4">Windows 11 Pro</a> pre-configured at the image level for development work — a detail that sounds minor but reflects a growing recognition that the out-of-box experience for developer hardware has historically been poor.</p><p>The machine boots into a dark theme with a simplified taskbar, widgets removed and Do Not Disturb enabled. Developer Mode is turned on. PowerShell 7 is the default shell. WSL 2 — the Windows Subsystem for Linux — comes pre-installed with GPU passthrough and CUDA support already configured. Visual Studio Code, GitHub Copilot, Git, Python and Node.js are all installed and ready.</p><p>&quot;We&#x27;ve said, &#x27;Hey, you know what, we got you, you want to go fast,&#x27;&quot; a Microsoft engineer who demonstrated the configuration during the briefing told VentureBeat. The philosophy, he explained, is that developers were going to install all of these tools anyway — the friction was in the hours of setup and configuration that stood between unboxing a machine and writing the first line of code.</p><p>The <a href="https://aka.ms/Windows-Build2026">Dev Box</a> also ships with integration points across Microsoft&#x27;s AI stack: AI Toolkit for VS Code for model conversion and fine-tuning, Windows ML and Windows Copilot Runtime for local inference, and Microsoft Foundry for connecting local prototypes to cloud deployment pipelines. For enterprises, the device integrates with Entra ID and Intune for identity and device management, and includes Secured-core PC architecture, BitLocker encryption and Microsoft Defender.</p><h2>Why Apple&#x27;s Mac Mini may not be the real competition anymore</h2><p>The most obvious competitive comparison is Apple&#x27;s <a href="https://www.apple.com/mac-mini/">Mac Mini</a>, which has dominated the compact-desktop category and has been widely adopted by developers drawn to Apple Silicon&#x27;s unified memory architecture and power efficiency.</p><p>Davuluri addressed the comparison directly during the briefing, saying the Dev Box is &quot;in a different class of performance than Mac Minis, intentionally.&quot; He declined to share specific benchmarks, noting that detailed specifications and performance targets would come closer to the fall launch. But the architectural advantage Microsoft is claiming is clear: while the current <a href="https://www.apple.com/shop/buy-mac/mac-mini">Mac Mini with M4 Pro</a> tops out at 48 gigabytes of unified memory and the M4 Max configuration reaches 128 gigabytes, the <a href="https://aka.ms/Windows-Build2026">RTX Spark Dev Box</a> pairs its 128 gigabytes with a Blackwell-class GPU that has a fundamentally different CUDA-based compute model — one that the vast majority of the AI/ML ecosystem&#x27;s tooling (PyTorch, TensorRT, llama.cpp, Hugging Face frameworks) is already optimized for.</p><p>That CUDA ecosystem advantage is difficult to overstate. While Apple&#x27;s Metal framework has made progress, the overwhelming majority of AI training and inference frameworks are built and tested first against Nvidia’s CUDA stack. A developer running models on the Dev Box can use the same code, the same libraries and the same workflows they would use on a cloud GPU instance — a level of portability that Apple Silicon cannot currently match.</p><h2>From laptop to supercomputer: Microsoft&#x27;s three-tier plan for local AI hardware</h2><p>The <a href="https://aka.ms/Windows-Build2026">Dev Box</a> is one piece of a three-tier hardware strategy Microsoft laid out at Build. The <a href="https://blogs.windows.com/devices/2026/05/31/introducing-surface-laptop-ultra-made-for-world-makers/">Surface Laptop Ultra</a>, announced days earlier at Computex, brings the same RTX Spark silicon into a 15-inch laptop form factor for developers and creators who need portability. At the other end of the spectrum, the <a href="https://www.nvidia.com/en-us/products/workstations/dgx-station-for-windows/">DGX Station for Windows</a> — built on Nvidia&#x27;s GB300 Grace Blackwell Ultra Superchip — targets organizations that need to run frontier models up to one trillion parameters on a deskside system. That machine is expected in the fourth quarter of this year.</p><p>The three devices map to a tiered computing model that Microsoft is calling &quot;unmetered intelligence&quot;: small on-device language models (the company&#x27;s new Aion 1.0 family) handle lightweight tasks at zero marginal cost; RTX Spark-class hardware runs mid-range models locally for the bulk of development work; and cloud resources are reserved for genuinely frontier-scale problems.</p><p>The <a href="https://github.com/features/copilot/cli">GitHub Copilot CLI</a> is getting a concrete implementation of this model with a new feature called /fleet, which allows a cloud-based primary agent to build a plan, assess the complexity of each task and route appropriate subtasks to a local model running on the developer&#x27;s hardware. The cloud agent handles what requires frontier capability; the local model handles what does not. The result, in theory, is lower cost without lower quality.</p><h2>The real question is whether hybrid AI can shift from buzzword to business model</h2><p>Whether Microsoft&#x27;s bet pays off depends on questions that will take months to answer. How does the Dev Box actually perform under sustained, real-world workloads? What will it cost? How quickly will the open-source model ecosystem continue to produce capable models in the 70-to-120-billion-parameter range that fit within its memory envelope? And perhaps most critically: will enterprise procurement teams, trained to think of AI as a cloud line item, accept a capital expenditure on desk hardware as an alternative?</p><p>The strategic logic, however, is difficult to dismiss. For three years, the AI industry has operated on an implicit assumption: serious AI work happens in the cloud, and the economics of that arrangement are simply the cost of doing business. Microsoft, a company with every incentive to reinforce that assumption, is now selling a machine that undermines it. That is not a contradiction — it is a recognition that the market is moving, and that the company that controls the developer&#x27;s local environment and the cloud they deploy to has a more durable advantage than one that controls only the cloud.</p><p>Every dollar a developer does not spend on cloud inference is a dollar that can fund another experiment, another iteration, another prototype. For years, the AI industry told developers they needed to rent their intelligence by the token. Microsoft is now asking a different question: what if you could just buy it?</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Infrastructure</category>
            <category>Technology</category>
            <category>Business</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/9ZXML9iabyyDqXI5bDqHu/0be5bd55a52eb7b0ecc35bc7f4a6cb83/Surface_RTX_Spark_Image_4.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Microsoft launches MXC, an OS-level sandbox for AI agents, with OpenAI and Nvidia already on board]]></title>
            <link>https://venturebeat.com/security/microsoft-launches-mxc-an-os-level-sandbox-for-ai-agents-with-openai-and-nvidia-already-on-board</link>
            <guid isPermaLink="false">5lEA8BoEKR9x9BP7gxCkZO</guid>
            <pubDate>Tue, 02 Jun 2026 16:30:00 GMT</pubDate>
            <description><![CDATA[<p>For the past two years, the technology industry has raced to make AI agents more capable — teaching them to write code, navigate software interfaces, manage files, and orchestrate multi-step workflows with increasing autonomy. What the industry has not done, at least not with any consistency, is answer the question that keeps chief information security officers awake at night: what happens when an agent goes wrong?</p><p>On Tuesday at its annual <a href="https://news.microsoft.com/build-2026/">Build</a> developer conference, Microsoft offered what may become the definitive answer. The company introduced <a href="https://aka.ms/Windows-Build2026">Microsoft Execution Containers</a>, or MXC — a policy-driven execution layer, built into the Windows operating system itself, that lets developers and IT administrators declare exactly what an AI agent can and cannot access, with those boundaries enforced at runtime by the OS kernel.</p><p>The announcement, <a href="https://aka.ms/Windows-Build2026">buried within a sweeping set of developer-focused updates</a>, is arguably the most consequential platform move Microsoft made at Build this year, and it has the potential to reshape how every enterprise on Earth thinks about deploying autonomous AI software.</p><p>MXC is not a product you buy. It is an SDK and a policy model — a foundational primitive embedded in Windows and the Windows Subsystem for Linux — that provides what Microsoft calls a &quot;<a href="https://aka.ms/Windows-Build2026">composable sandbox spectrum</a>.&quot; That spectrum ranges from lightweight process isolation, already adopted by GitHub Copilot&#x27;s command-line interface, all the way up to micro-virtual machines, Linux containers, and full cloud instances running on Windows 365.</p><p>The system separates an agent&#x27;s execution from the user&#x27;s desktop, clipboard, user interface, and input devices. Critically, it binds every agent to a strong identity — either a local ID or a cloud-provisioned identity backed by Microsoft Entra — so that every action the agent takes can be attributed, audited, and governed.</p><p>The implications are enormous. Until now, the enterprise deployment of AI agents has been stuck in a paradox: the more autonomous and useful an agent becomes, the more dangerous it is to let it operate on a corporate network without guardrails. MXC is Microsoft&#x27;s attempt to break that paradox — not by making agents less capable, but by making the environment they operate in fundamentally more controlled.</p><h2>Why every autonomous AI agent is a security incident waiting to happen</h2><p>To understand why MXC matters, consider what an AI agent actually does when it runs on your computer. Unlike a traditional application, which operates within well-understood boundaries — a word processor reads and writes documents, a browser fetches web pages — an AI agent is, by design, unpredictable. It receives a goal in natural language, reasons about how to achieve it, and then takes actions: opening files, executing code, calling APIs, browsing the web, interacting with other software. Each of those interactions creates what security professionals call &quot;attack surface.&quot;</p><p>Microsoft&#x27;s own blog post framed the challenge in stark terms. The company wrote that &quot;as agents become more capable and autonomous, they&#x27;re delivering material productivity gains. But they&#x27;re also introducing new risk, and the issue isn&#x27;t just the agent. It&#x27;s the entire system the agent operates across.&quot; Every interaction between agents and humans, tools, applications, models, and other agents &quot;exposes new attack surface and introduces different failure modes.&quot; Microsoft characterized this as &quot;a multi-layer systems problem.&quot;</p><p>This is not a theoretical concern. In the months leading up to <a href="https://news.microsoft.com/build-2026/">Build</a>, security researchers demonstrated numerous ways that AI agents could be manipulated — through prompt injection, through malicious tool calls, through data exfiltration disguised as normal workflow. For enterprises that handle sensitive data, proprietary models, and regulated information, the absence of a trusted execution environment has been the single biggest barrier to moving agents from demo to deployment.</p><h2>Microsoft&#x27;s answer is a sandbox that scales from a single process to a full virtual machine</h2><p>MXC operates on a deceptively simple principle: declare what the agent can do before it runs, and let the operating system enforce those declarations at runtime. A developer or an IT administrator writes a policy that specifies which files, directories, and network resources an agent is allowed to access. MXC then creates a contained execution environment — a sandbox — that enforces those boundaries regardless of what the agent attempts to do.</p><p>What makes MXC unusual, and potentially very powerful, is the breadth of its isolation options. Microsoft designed the system so that a single SDK and policy model can map to the appropriate isolation construct for any given workload. For a lightweight coding assistant that just needs to read the current project directory, fast process isolation may be sufficient. For an autonomous agent that executes arbitrary code downloaded from the internet, a full micro-VM may be required. The system is designed to be &quot;dynamically composable based on intent and risk,&quot; meaning that the level of isolation can be adjusted based on what the agent is actually doing, not just what category it falls into.</p><p>Session isolation is a particularly important feature. MXC separates the agent&#x27;s execution from the user&#x27;s desktop, clipboard, UI, and input devices. This directly mitigates several classes of attacks that security researchers have identified as particularly dangerous for AI agents: UI spoofing, where an agent manipulates what the user sees to trick them into approving a malicious action; input injection, where an agent sends keystrokes or mouse clicks to other applications; and cross-session data leakage, where information from one user&#x27;s session bleeds into another.</p><h2>A live demo showed an AI agent trying to delete files — and failing, because the OS wouldn&#x27;t let it</h2><p>During a pre-briefing with VentureBeat the night before the announcement, a Microsoft developer offered a vivid demonstration of the technology in action. He had set up the open-source agent framework <a href="https://openclaw.ai/">OpenClaw</a> running inside MXC&#x27;s sandbox on his personal development machine. He then instructed the agent to delete all the files on his desktop. The agent attempted to comply — but the sandbox prevented it. &quot;If you look at my desktop here, you see how clean my desktop is,&quot; the developer said during the demo. &quot;That&#x27;s a lie.&quot; The files, he explained, were completely safe because &quot;the container won&#x27;t allow it.&quot;</p><p>The demonstration went further, showcasing the granularity of MXC&#x27;s controls. Users can mark specific files as read-only for the agent, restrict access to the browser and screen capture, control whether the agent can see location data, and have all of those permissions managed centrally by an enterprise IT department through Intune policies. The agent operates inside what is effectively a one-way mirror: it can do the work it has been asked to do, but it cannot see or touch anything outside the boundaries that its policy defines.</p><p>Pavan Davuluri, Microsoft&#x27;s Executive Vice President for Windows and Devices, underscored during the pre-briefing that the primitives MXC introduces — security, containment, isolation, and user control — are essential to making AI agents commercially viable.</p><p>He emphasized that these capabilities are &quot;not unique to OpenClaw&quot; and that &quot;this pattern repeats itself over and over&quot; for any agent running on a Windows device. The primitives that exist in the operating system now &quot;for the file around security, containment, isolating them, having users in control,&quot; he said, are what will make agents safe enough for ordinary consumers and corporate deployments alike.</p><h2>Defender, Entra, Intune, and Purview integration arriving in July turns MXC into an enterprise control plane</h2><p>For corporate IT departments, the most significant element of the <a href="https://openclaw.ai/">MXC announcement</a> is not the SDK itself but its integration with Microsoft&#x27;s existing enterprise security stack through what the company calls Agent 365. Arriving in preview in July, <a href="https://www.microsoft.com/en-us/microsoft-agent-365">Agent 365</a> layers Microsoft&#x27;s Entra identity service and Intune device management platform on top of MXC, so that IT administrators can govern agent containment centrally while developers choose the level of isolation their workload demands.</p><p>The integration goes further: <a href="https://www.microsoft.com/en-us/microsoft-365/microsoft-defender-for-individuals">Microsoft Defender</a> will provide runtime threat protection, <a href="https://www.microsoft.com/en-us/security/business/microsoft-entra">Entra</a> will handle identity and access management, Intune will enforce device-level policies, and <a href="https://www.microsoft.com/en-us/security/business/microsoft-purview">Microsoft Purview</a> will extend its data governance and compliance capabilities to agent activity. This means that an enterprise could, in theory, allow employees to run AI agents on their corporate machines — even powerful, autonomous agents that execute code and manage files — while maintaining the same kind of centralized visibility and control that IT departments currently have over traditional applications.</p><p>Microsoft described the identity layer in its <a href="https://aka.ms/Windows-Build2026">official blog</a>: &quot;Windows assigns agents a local ID or a cloud provisioned identity backed by Entra and attributes all activity from the container to that identity, so you can clearly differentiate human from agent.&quot; For regulated industries — financial services, healthcare, government — the ability to produce an audit trail that distinguishes between human actions and agent actions on the same machine could prove to be a regulatory requirement, not merely a nice-to-have feature. Every agent action attributable to a specific identity, every containment boundary enforceable through the same policy infrastructure that already governs hundreds of millions of Windows devices — this is the architecture that could finally move AI agents from pilot programs to production.</p><h2>OpenAI, Nvidia, Manus, and Nous Research are already building on MXC — and that changes the calculus</h2><p>Platform announcements at developer conferences are often aspirational. What distinguishes the MXC launch is the breadth and specificity of the partners already building on it. Microsoft named five: <a href="https://openai.com/">OpenAI</a>, <a href="https://www.nvidia.com/en-us/">Nvidia</a>, <a href="https://manus.im/">Manus</a>, <a href="https://nousresearch.com/">Nous Research</a> (maker of the Hermes agent), and the <a href="https://openclaw.ai/">OpenClaw</a> open-source project. Each is integrating MXC in a distinct way that illuminates a different use case for the technology.</p><p>OpenAI&#x27;s involvement is particularly striking. David Wiesen, a member of OpenAI&#x27;s technical staff, said that &quot;working with Microsoft on the Microsoft Execution Containers (MXC) allows us to explore new patterns for AI agents to safely and efficiently generate and execute code.&quot; He added that by combining Codex&#x27;s capabilities with MXC&#x27;s execution environment, the goal is &quot;to help developers move from intent to reliable execution faster, while maintaining the security and control enterprises need.&quot; The reference to <a href="https://openai.com/codex/">Codex</a> — OpenAI&#x27;s code-generation agent — suggests that MXC could become the default execution environment for one of the most widely anticipated agent products in the industry.</p><p>Nvidia is bringing its <a href="https://docs.nvidia.com/openshell/home">OpenShell framework</a> to Windows built on MXC, providing what Microsoft described as &quot;an easy-to-deploy package for autonomous, always-on agents safely.&quot; Manus, the Chinese-born AI agent startup that gained viral attention earlier this year, is also integrating. Tao Zhang, Manus&#x27;s Chief Product Officer, said that MXC &quot;gives developers a policy-driven way to define what an agent can access and enforce those boundaries at runtime, so more autonomous agents can operate safely in enterprise environments.&quot; And Dillon Rolnick, the CEO of Nous Research, offered what may be the most concise articulation of why MXC matters: &quot;Continuously-running local agents, like Hermes Agent, require intentional isolation. Developers need control over what an agent can access and trust that those controls will hold.&quot;</p><h2>How an open-source agent framework became Microsoft&#x27;s proving ground for AI safety on Windows</h2><p>One of the more revealing stories behind the MXC announcement involves <a href="https://openclaw.ai/">OpenClaw</a>. During the press pre-briefing, a Microsoft developer described how the partnership came together organically — Peter Steinberger, OpenClaw&#x27;s creator, sent him a direct message in January expressing interest in collaborating. What began as a casual conversation evolved into a full-fledged platform partnership, with Microsoft developers contributing to the OpenClaw Windows companion app, built as a native WinUI application rather than a wrapped web app.</p><p>The OpenClaw integration serves as what Scott called &quot;the ultimate test app for all the stuff that [the Windows platform team] is making.&quot; If OpenClaw — which by its nature gives agents broad autonomy to execute tasks on a user&#x27;s machine — can run securely within MXC&#x27;s containment boundaries, then the containment system is robust enough for any agent. Scott explained the philosophy driving the work: &quot;Think of OpenClaw Windows as the ultimate test app... If OpenClaw can succeed on Windows, that means that the Linux support is there, the container support is there, the containment is there.&quot;</p><p>The companion app demonstrates the full spectrum of MXC&#x27;s enterprise controls — file permissions, network access, screen capture restrictions, location data — all manageable centrally through Intune policies. Microsoft donated the project to OpenClaw and plans to continue contributing to it as open source. As one member of the Windows leadership team put it during the briefing: &quot;All agents, all comers, everyone is welcome on Windows... It&#x27;s going to run great on Windows, because the primitives are there. The base of the pyramid is solid.&quot;</p><h2>Building containment into the OS gives Microsoft a strategic edge over Apple&#x27;s walled garden and Google&#x27;s cloud-first model</h2><p>MXC arrives at a moment when the technology industry is grappling with a fundamental tension. AI agents represent what may be the most significant new category of software since mobile applications, and every major technology company is racing to build them. But the security and governance infrastructure required to deploy these agents responsibly in enterprise environments barely exists. Microsoft&#x27;s approach is distinctive because it locates the trust layer at the operating system level rather than in the agent framework, the model provider, or a third-party security product.</p><p>This is a deliberate architectural choice. By building containment into Windows itself, Microsoft ensures that the security guarantees hold regardless of which agent, which model, or which framework a developer chooses.</p><p>It also means that the hundreds of millions of Windows devices already managed through <a href="https://www.microsoft.com/en-us/security/business/microsoft-intune">Intune</a> and secured through <a href="https://www.microsoft.com/en-us/microsoft-365/microsoft-defender-for-individuals">Defender</a> can, in principle, become agent-ready through a software update rather than a rip-and-replace deployment.</p><p>Apple&#x27;s approach to AI agents leans heavily on its walled-garden ecosystem, offering security through restriction — limiting which agents can run and what they can do. Google&#x27;s approach, centered on its cloud infrastructure, offers security through centralization. Microsoft&#x27;s approach offers security through declaration and enforcement — allowing any agent to run, but containing its impact through OS-level policy.</p><p>For enterprises that operate in heterogeneous environments with diverse toolchains and multiple AI providers, the Microsoft model may prove the most practical. The competitive dynamics are already shifting: with OpenAI&#x27;s <a href="https://openai.com/codex/">Codex</a>, Nvidia’s <a href="https://build.nvidia.com/openshell">OpenShell</a>, and independent agent frameworks like <a href="https://manus.im/">Manus</a> and <a href="https://hermes-agent.nousresearch.com/">Hermes</a> all building on MXC, Microsoft is positioning Windows not just as the platform where agents run, but as the platform where agents can be trusted to run.</p><h2>The hardest part isn&#x27;t building the sandbox — it&#x27;s writing the policies that go inside it</h2><p>MXC is available now in early preview, meaning developers can begin building against the SDK and testing containment policies. The Agent 365 integration with Defender, Entra, Intune, and Purview is scheduled for preview in July — a timeline aggressive enough to suggest that much of the engineering work is already done, but far enough out to allow for refinement based on developer feedback.</p><p>The real test, however, will come when enterprises begin deploying agents at scale on production networks. Containment is only as good as the policies that govern it, and writing effective agent policies for complex enterprise environments will be an entirely new discipline — one that IT departments have not yet developed and that no vendor has yet figured out how to teach. The technology is promising, but an empty sandbox is just an empty box. Filling it with the right rules, for the right agents, in the right contexts, will require a level of organizational sophistication that most companies are only beginning to contemplate.</p><p>Still, the significance of what Microsoft announced on Tuesday is difficult to overstate. For the first time, a major operating system vendor has proposed a comprehensive, kernel-level answer to the question of how autonomous AI software should be contained, identified, and governed on the devices where most of the world&#x27;s work actually gets done. The industry spent two years teaching agents to act. Microsoft is now betting that the bigger business — and the harder engineering problem — is teaching the operating system to watch.</p><p>
</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Security</category>
            <category>Technology</category>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/2Bj8ehmUSTCeqnkJ3pPCjc/f9782b3575c73ccecb809afd58e7acd2/Nuneybits_Vector_art_of_the_iconic_Microsoft_Windows_logo_on_a__b8c7cdb1-4983-4e68-94a9-93fbef23357b.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
    </channel>
</rss>