<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Thu, 28 May 2026 17:30:50 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[How DeepSeek’s radical architecture is shattering Silicon Valley's token moat]]></title>
            <link>https://venturebeat.com/infrastructure/how-deepseeks-radical-architecture-is-shattering-silicon-valleys-token-moat</link>
            <guid isPermaLink="false">5fqdHpqTh6516CePXPB38N</guid>
            <pubDate>Thu, 28 May 2026 16:57:24 GMT</pubDate>
            <description><![CDATA[<p>DeepSeek’s announcement over the weekend that it has made its <a href="https://www.engadget.com/2180062/deepseek-permanently-reduces-the-price-of-its-flagship-v4-model-by-75-percent/">75% price cut permanent on its flagship V4 Pro model</a> is a disruptive assault on the capital-heavy business models of Silicon Valley’s frontier labs. </p><p>The reduction on DeepSeek V4 Pro directly undercuts comparable Western models used as workhorses for enterprise production. It is <b>7x cheaper on inputs and 17x cheaper on outputs</b> than Anthropic’s Claude Sonnet or OpenAI’s GPT 5.5-Med, while the lightweight DeepSeek <b>V4 Flash</b> undercuts entry-tier alternatives like Claude Haiku by <b>10x to 25x</b>. </p><p>The price cuts are enabled by a series of hardware-software innovations, especially around cache, that make DeepSeek&#x27;s models radically more efficient to run. When hosted natively in China, DeepSeek’s cache-read pricing is a whopping <b>87x cheaper</b> than Western clouds — a deflationary floor so aggressive that handset giant Xiaomi just <a href="https://x.com/XiaomiMiMo/status/2059314052892099070">moved to match the exact pricing tier</a> for its newly deployed MiMo architecture.</p><p>DeepSeek <a href="https://fe-static.deepseek.com/chat/transparency/deepseek-V4-model-card-EN.pdf">V4 Pro’s performance</a> is ranked almost on <a href="https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro">par with Western frontier models</a>, hitting <b>80.6%</b> on coding-agent tasks via the <a href="https://www.demandsphere.com/research/demandsphere-radar/ai-frontier-model-tracker/benchmarks/swe-bench/"><b>SWE-bench Verified leaderboard</b></a> and an <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro">elite reasoning score of <b>87.5</b></a> on the advanced<b> </b><a href="https://build.nvidia.com/deepseek-ai/deepseek-v4-pro/modelcard"><b>MMLU-Pro technical index</b></a>. Both V4 Pro and V4 Flash — a hyper-optimized speedy version for developers — are open-weight and issued under a permissive MIT license. This gives enterprises complete flexibility over deployment. This dual-model strategy allows technical teams to route their heaviest, multi-step autonomous agent workloads to the lightning-fast Flash model, while reserving the heavy Pro model for deep reasoning tasks, drastically lowering costs at a time when budget concerns have grown considerably.</p><p>This also comes at a time when the closed Western labs, in particular OpenAI and Anthropic, face an intense return-on-investment scrutiny for their multi-billion dollar general-purpose hardware infrastructure investments. </p><p>This deflationary collapse will not affect all Silicon Valley labs equally, signaling a permanent bifurcation of the enterprise AI market. While a premium, deterministic tier will endure for mission-critical engineering workflows, the high-volume background agentic layer is being completely commoditized by open weights. Ultimately, it creates a much more dangerous exposure for OpenAI — whose revenue mix relies heavily on general-purpose commodity API streams — than for software-insulated peers like Anthropic.</p><h2>The token cost crisis</h2><p>Uber says it burned through its entire 2026 budget for Claude Code and Cursor in just the first four months of the year; its COO said that the cost related to high token usage by some of its engineers was <a href="https://gizmodo.com/ai-investment-is-harder-to-justify-as-productivity-returns-lag-uber-coo-says-2000763514">getting “harder to justify</a>” without better products to show for it. Airbnb&#x27;s Brian Chesky said last year that while the company uses OpenAI&#x27;s latest models, <a href="https://www.bloomberg.com/news/articles/2025-10-21/airbnb-ceo-brian-chesky-says-chatgpt-integration-not-ready-for-airbnb-app">they don&#x27;t rely on them heavily in production</a> — favoring faster, cheaper alternatives like Alibaba&#x27;s Qwen. And in the latest episode of VentureBeat’s podcast Beyond the Pilot, Pinterest CTO Matt Madrigal confirmed that the company went <a href="https://www.youtube.com/watch?v=BvFanq9fTg0">all-in on an open-source AI strategy</a>, post-training Alibaba’s open Qwen model on the company’s proprietary &quot;taste graph&quot; to drive Pinterest’s assistant — achieving frontier-like quality at a 90% reduction in costs. DeepSeek’s subsequent price drop makes the possibility of such cost differences <i>even </i>greater<i>.</i></p><h2>Geopolitical headwinds and compliance defenses</h2><p>Widespread enterprise adoption of Chinese models faces massive geopolitical headwinds in the West. For highly regulated U.S. giants in finance, healthcare, and defense, getting comfortable with DeepSeek will take time. </p><p>Even though an open-weights architecture under an MIT license allows a company to self-host the model locally and prevent active data exfiltration to foreign servers, corporate compliance boards remain deeply paranoid over software supply chain risks, potential hidden backdoors, and the legal threat of sudden federal sanctions. </p><p>Smaller, more nimble software teams, on the other hand, face far less bureaucratic gridlock. Free from multi-month security review cycles, these fast-moving organizations view the immediate 75% infrastructure savings as a massive competitive edge worth deploying right now</p><h2>The OpenRouter clearinghouse: mapping global token traffic</h2><p>Take the token usage metrics on OpenRouter, a leading public proxy for what models are the most popular among developers. OpenRouter allows developers an easy way to compare and deploy models, and while its data is by no means a full proxy for real model popularity — it confirms this structural migration is already taking place within company data pipelines. DeepSeek V4 Flash model has captured the No. 1 position on the OpenRouter leaderboard over the past week, surging 48% in token usage. Its advanced counterpart, V4 Pro, sits at No. 6. DeepSeek’s top three models processed nearly 6 trillion tokens on OpenRouter over the past week, giving it a huge lead over other competitors. For example, OpenAI’s premium model, GPT-5.5, has slipped down to No. 15 at 470B tokens.</p><p>It’s not clear exactly how much of the world’s token traffic is on OpenRouter. Conservative estimates put it at about 3%. It does not show the massive amounts of tokens being served by the APIs offered directly to developers by companies like Anthropic, OpenAI and Google. But recent estimates suggest OpenRouter processes <a href="https://menlovc.com/perspective/openrouter-now-processes-more-than-a-quadrillion-tokens-a-year/#:~:text=For%20comparison%2C%20that%27s%20~15%2D,round%20was%20done%20in%20February">between 15 and 40% of each of OpenAI’s and Google’s token usage</a>, and growing, making it a significant indicator of relative trends regardless of the exact percentage it represents.</p><p>While skeptics often dismiss aggregator traffic as an indie developer signal rather than a reflection of Fortune 500 IT spend, the corporate pipeline reality is shifting. An infrastructure analysis by a leading venture capital firm,<a href="https://a16z.com/the-state-of-generative-media-2026/"> Andreessen Horowitz</a>, revealed that enterprise production environments deploy a median of 14 different models simultaneously to price-route workloads and avoid single-vendor lock-in. This structural architecture shift is why OpenRouter recently secured a massive<a href="https://siliconangle.com/2026/05/26/openrouter-raises-113m-bring-order-enterprise-ai-inference-routing/"> $113 million Series B funding round</a> backed directly by the big enterprise data and software vendors that serve corporate America — including ServiceNow Ventures, Snowflake Ventures, Databricks Ventures, Nvidia&#x27;s NVentures, and Google’s CapitalG. Stripe also cited OpenRouter’s enterprise customers in its decision to <a href="https://stripe.com/newsroom/news/openrouter-and-stripe">partner closely with the company</a>.</p><p>That’s why DeepSeek’s surge on this leaderboard is so eye-opening. DeepSeek itself offers an API directly to developers, and so it too delivers more token traffic than what OpenRouter lets on.</p><h2>Beyond chatbots: the rise of multi-step autonomous agents</h2><p>The DeepSeek spike on OpenRouter indicates a deeper structural shift in how automated software architectures consume machine intelligence. Technical teams are moving beyond using trivial, single-turn chatbots, and starting to deploy more sophisticated autonomous agents that persist for hours at a time — recursively looping through codebases and data lakes. Their huge number of tool calls, and continuous rereading of long context histories, means AI token consumption expands exponentially. </p><p>Running these recursive loops on closed, premium Western APIs quickly creates unsustainable infrastructure costs. While corporate tech teams spent last year experimenting freely with early, single-turn prototypes without worrying about budgets, the onset of token-prolific autonomous agents has triggered an enterprise line-item crisis. VentureBeat&#x27;s Q1 2026 research, which surveyed enterprise users at organizations with over 100 employees (n=65, in the U.S. software, finance and healthcare industries), confirms the shift: “Cost per token or licensing model” jumped from 25.4% in January to 36.7% in March, trailing only raw performance as the primary selection criterion for enterprise buyers.</p><p>DeepSeek target-optimized its weights for this specific trend of agentic high-token use. It has locked in on a standard input cost of $0.435 per million tokens and a standard output rate of $0.87 per million tokens, alongside a rock-bottom prefix-cached read cost of $0.003625 per million.</p><p>It&#x27;s this third cost item — for cache — which is arguably the most significant. “If you measure how all of these agents now are using tokens, 80 to 90% of the tokens are cache-read tokens,” said Val Bercovici, Chief AI Officer at WEKA, a company that provides fast storage for much of this cache. “Which means that [that price] is almost by far the most important price, making the others irrelevant — nearly a rounding error. So what DeepSeek did is not just say we&#x27;re going to be 5% cheaper, 10% cheaper, 20% cheaper. They&#x27;re like 87x cheaper on that cache-read price with DeepSeek V4 Pro. So that&#x27;s really set the industry on notice.”</p><h2>The infrastructure coup: Decoupling HBM from Context</h2><p>DeepSeek&#x27;s core innovations are around hardware-software alignment. This is where we get a little technical.</p><p>While Western frontier labs like OpenAI have prioritized performance at all cost, they’ve invested billions into uncompressed &quot;dense&quot; neural architectures. DeepSeek, by contrast, has systematically sought to extract maximum intelligence from lower grade hardware, given that they’ve lacked access to Nvidia’s GPUs. By pioneering deep software optimizations <a href="https://venturebeat.com/ai/deepseek-r1s-bold-bet-on-reinforcement-learning-how-it-outpaced-openai-at-3-of-the-cost">as early as its V2 architectures in 2024</a>, the lab engineered a series of four interconnected hardware-software alignment breakthroughs that decoupled a model&#x27;s operational context from expensive computing overhead:</p><p><b>Breakthrough 1: Sequence Dimension Compression via CSA and HCA</b></p><p>The transformer architecture that most LLMs use is bottlenecked by something called the Key-Value (KV) cache. As an agent executes long, multi-step sessions, historical context keys clog the high-bandwidth memory (HBM) on the GPU, causing severe latency spikes and an expensive infrastructure tax.</p><p>DeepSeek resolved this structural bottleneck by introducing a hybrid attention mechanism — documented in the <a href="https://aipapersacademy.com/deepseek-v4/">DeepSeek V4 Architecture Paper</a> — that combines <b>Compressed Sparse Attention (CSA)</b> and <b>Heavily Compressed Attention (HCA)</b> to cut overall KV-cache usage by a massive 90% across its 1-million-token context window.</p><p>While traditional models try to keep a unique memory log for every individual word, DeepSeek compresses the rows of its memory cache. CSA acts as a local filter, condensing small windows of text into concise, indexable blocks so the model doesn&#x27;t sweat the fine-grained details. HCA acts as an aggressive global index, crushing massive spans of text deep within a session&#x27;s history into high-density summaries. By interleaving these layers, DeepSeek shrinks millions of memory rows down to a fraction of their size.</p><p><b>Breakthrough 2: Native memory offloading via Multi-head Latent Attention (MLA)</b></p><p>Using something called Multi-head Latent Attention (MLA), DeepSeek strips the active memory footprint of its context history down to a fraction of standard models. It achieves this by running a <a href="https://zhuanlan.zhihu.com/p/2034345100680114900?share_code=18r7heck652dt&amp;utm_psn=2036966822600292003">physical division of labor between hardware chips</a>. While traditional models force expensive GPUs to hold a session&#x27;s entire history, DeepSeek’s architecture keeps only the tiny, highly compressed search index tags (the Keys) on the GPU. Meanwhile, it offloads the heavy data payloads (the Values) entirely <a href="https://arxiv.org/html/2507.19823v1#S3">into cheaper system memory and local storage tiers</a>. Once the GPU handles the high-speed matching to find relevant data, it calls the values from storage only on an as-needed basis.</p><p>DeepSeek’s architecture is so different that the inference engines that load an AI model&#x27;s weights into GPU memory, in order to be ready for prompting, are being stretched. The three most popular engines — Nvidia TensorRT-LLM, the UC Berkeley one, SGLang and the really popular vLLM — “are all being stretched to keep up with being able to offer it, which is not normal,” explains WEKA’s Bercovici. &quot;Every other open model has had some similarity to other open models. This one from DeepSeek is just built different.&quot;</p><p>DeepSeek&#x27;s software engineering means its massive 1.6-trillion parameter model requires an astonishingly tiny <b>5.48 GB of HBM</b> to hold a 1-million-token context loop in production, <a href="https://x.com/bookwormengr/status/2057909493250539891">according to calculations by an analyst</a> using hardware modeling benchmarks. For comparison, smaller models utilizing standard Western architectures choke up to 89 GB of HBM under the exact same context load.</p><table><tbody><tr><td><p><b>Model Framework / Metric Tier</b></p></td><td><p><b>Active HBM Needed (1M Context)</b></p></td><td><p><b>Context Length Capacity</b></p></td><td><p><b>Multi-Step Cached Economics</b></p></td></tr><tr><td><p><b>DeepSeek V4-Pro</b> (1.6T MoE)</p></td><td><p><b>5.48 GB</b></p></td><td><p>1,000,000 tokens</p></td><td><p>80% to 90% of workflow tokens</p></td></tr><tr><td><p><b>Qwen3-235B-A22B</b> (GQA Standard)</p></td><td><p><b>89.00 GB</b></p></td><td><p>1,000,000 tokens</p></td><td><p>Subject to steep hardware tax</p></td></tr><tr><td><p><b>GPT-5.5 / Claude 4.7-class (</b>Western Frontier / MoE)</p></td><td><p><b>180+ GB</b></p></td><td><p>1,000,000 tokens</p></td><td><p>Prohibitive premium infrastructure tax</p></td></tr></tbody></table><p>
</p><p>DeepSeek’s extreme compression of the KV cache down to 5.48 GB of HBM is also a <a href="https://x.com/bookwormengr/status/2057909493250539891">calculated geopolitical strategy to bypass U.S. export bans</a> on top-tier Nvidia GPUs. By reducing the need for HBM and Nvidia’s CUDA ecosystem, DeepSeek’s software design allows frontier AI to run efficiently on domestic, lower-cost, and unsanctioned Chinese storage tiers like NAND flash, commodity SSDs, and LPDDR memory (produced by domestic giants like YMTC and CXMT). </p><p><b>Breakthrough 3: Ultra-Low Footprint Inference via FP4 Quantization-Aware Training (QAT)</b></p><p>To keep compute costs low over massive context windows, DeepSeek moved away from the old approach of scanning bulky, uncompressed numbers every time the model searches its memory. Instead, as detailed in the <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro">DeepSeek V4 Technical Report</a>, the architecture runs an advanced form of data compression directly on the active pathways it uses to find information during training.</p><p>This compression slashes memory demands to deliver a 2x hardware speedup, yet it maintains a near-flawless 99.7% accuracy in how the system targets and indexes specific data blocks. This engineering win allows enterprise workflows to process massive, multi-step agent tasks smoothly while keeping an exceptional 83.5% retrieval accuracy on extreme, million-token &quot;needle-in-a-haystack&quot; benchmarks—eliminating performance lags without draining expensive GPU power.</p><p><b>Breakthrough 4: Ultra-scale training stability via manifold-constrained hyper-connections (mHC)</b></p><p>Training a 1.6-trillion parameter model creates instability risk — causing too many data pathways and processing signals to cascade out of control, crashing the run. DeepSeek resolved this with a framework called <a href="https://arxiv.org/abs/2512.24880">Manifold-Constrained Hyper-Connections</a> (mHC), which uses a balancing routine to force the model&#x27;s internal data tables to always sum to one — a mathematical safety valve that lets complex data move through deep networks without runaway spikes.</p><h2>The infrastructure pivot: rebuilding corporate plumbing</h2><p>DeepSeek’s significant architectural cache efficiency alters the underlying unit economics for the cloud platforms hosting these models. On developer aggregators like OpenRouter, where third-party providers routinely offer advanced endpoints at a loss, to capture developer mindshare, this hardware-software decoupling alters the balance sheet. DeepSeek&#x27;s extremely low cost likely gives DeepSeek a profit, at least when it comes to serving the model in China, Bercovici said.</p><p>This transformation in provider-side unit economics is mirrored on the buy-side, which shows a structural change happening across enterprise IT budgets. VentureBeat&#x27;s Q1 2026 AI Infrastructure and Compute tracker survey — which tracks enterprise technology buyers at organizations with over 100 employees (n=53 in January, n=39 in February) across software, financial services, healthcare, and manufacturing sectors — revealed that enterprise adoption of custom, self-managed inference stacks utilizing open-source frameworks like Triton, vLLM, Ray, and Kubernetes surged from 11.3% to 17.9%. Because these software layers allow corporate engineering teams to deploy open-weights architectures natively across their own clusters, they act as an operational escape hatch from closed cloud ecosystems. </p><p>This software shift is paired with an aggressive hardware migration: enterprise workloads moving to specialized, inference-first AI clouds like CoreWeave, Lambda, and Crusoe grew from 30.2% to 35.9% in the latest survey window. These infrastructure metrics indicate that corporate technology leaders are no longer just prototyping with open alternatives; they are actively laying down the physical plumbing required to host architectures like DeepSeek V4 independently, increasingly pricing away the premium markup of Western API gatekeepers.</p><h2>The strategic split for Western labs</h2><p>This baseline cost reduction could soon fracture the competitive field in Silicon Valley, by rewriting the expectations for labs attempting to yield a return on massive infrastructure investments.</p><p>For now, though, the Silicon Valley music is unlikely to stop anytime soon. Anthropic remains on an extraordinary enterprise trajectory, driven by widespread adoption of Claude Code and its codebase-aware terminal execution. For enterprise engineering teams, paying a premium for Anthropic&#x27;s deterministic accuracy makes perfect sense for core production software development. Yet even an elite frontier lab scaling at this pace must watch DeepSeek with caution: an open-weights architecture under an MIT license offering near-frontier utility at a 75% cost reduction places downward pricing pressure on the high-volume operational layers of any multi-agent system.</p><p>The primary structural margin squeeze may land more squarely on OpenAI, despite its aggressive pivot toward a multi-cloud footprint. To support its staggering consumer and API token volumes, OpenAI fundamentally altered its historic seven-year exclusive alliance with Microsoft, unbundling its distribution so it can serve models across Azure, Oracle, AWS, and Google Cloud. Yet this multi-cloud strategy, while providing raw capacity at scale, leaves the company intensely exposed to infrastructure commodity pressure.</p><p>Unlike Anthropic, which has successfully insulated its margins by embedding its models into premium, high-utility software environments like Claude Code, a massive portion of OpenAI&#x27;s enterprise revenue relies on high-volume, general-purpose API token streams. To be fair, Western labs have already begun quietly retreating from this territory — aggressively launching deep batch API discounts, prompt caching features, and lightweight entry models to stem the bleed. Yet this tactical retreat only reinforces the structural crisis: Silicon Valley is actively conceding the high-volume commodity layer because they know they cannot defend its margins. When those exact same automated background workflows can be handled natively by highly intelligent open weights like DeepSeek V4, defending a premium price point for raw cloud text completion ceases to be a defensible strategy.</p><p>More significantly, unlike OpenAI or Anthropic, DeepSeek has much less interest in urgently building consumer wrappers or locking developers into subscription frameworks. Instead, DeepSeek is <a href="https://www.kucoin.com/news/flash/deepseek-s-10-trillion-dollar-strategy-open-source-and-ai-hardware-ecosystem">positioned for a longer-term ecosystem play</a>. Supported by a massive state-backed funding round led by China’s &quot;Big Fund&quot; — which has pushed the startup&#x27;s targeted valuation into the $10 billion to $45 billion range — the lab’s more likely objective is to prove the viability of a self-sufficient, independent Chinese AI hardware stack that could one day be <a href="https://x.com/bookwormengr/status/2057909493250539891">worth up to $10 trillion</a>.</p><table><tbody><tr><td><p><b>Premium deterministic tier (Anthropic / OpenAI / Google)</b></p></td><td><p><b>High-volume agentic tier (DeepSeek / open ecosystems)</b></p></td></tr><tr><td><p>• Core Codebase Refactoring</p><p>• Strict Corporate Compliance &amp; Guardrails</p><p>• Mission-Critical Financial/Legal Precision</p><p>• High CapEx / R&amp;D Premium Margins</p></td><td><p>• Recursive Multi-Agent Loops</p><p>• Prefix-Cached Autonomous Tool Swarms</p><p>• Massive Real-Time Ingestion Logs</p><p>• Bare-Metal / Optimized HBM Economics</p></td></tr></tbody></table><p>The operational division between western labs and models like DeepSeek V4 Pro is already showing up. Financial company Ramp benchmarked automated<a href="https://x.com/RampLabs/status/2059678575939273091"> cybersecurity agent swarms</a>, and showed that while DeepSeek V4 Pro completely flatlines on the most complex security logic, it achieves a flawless 100% detection rate on high-volume baseline tasks like cloud configuration triage — significantly outperforming OpenAI’s GPT-5.5 (44%). For an enterprise CISO, the strategy is clear: You offload the high-volume token burn of routine background noise to cheap open weights, and reserve premium frontier models strictly for the high-level reasoning required to catch the most sophisticated flaws.</p><h2>The enterprise verdict</h2><p>For IT operations directors and data pipeline managers, the choice to migrate to an open architecture like DeepSeek V4-Pro is a smart governance decision. The open model gives companies total architecture control, allowing them to host it on-premise or via any specialized cloud layer they choose. Crucially, it provides enterprise infrastructure leads with a strategic operational fallback that closed vendors can’t match: the power to download raw model weights and execute them privately for zero marginal token cost if public cloud pricing or API access conditions change.</p><p>The assumption that closed frontier labs hold a permanent monopoly on useful enterprise reasoning has collapsed. While engineering directors will continue to pay a premium to protect specialized, deterministic workflows, the financial foundation of the frontier lab model has fundamentally shifted. By diverting the immense, day-to-day token volume of recursive background agents onto highly optimized, open-source clusters, enterprise teams are starving proprietary clouds of their highest-margin fuel. Silicon Valley’s multi-billion dollar token moat didn&#x27;t just narrow — it was completely drained from the bottom up.</p>]]></description>
            <author>mmarshall@venturebeat.com (Matt Marshall)</author>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5f6tA2TPIBdaw9T4HWo7uQ/d114520ee379949d6f7f8aaf591e7754/Gemini_Generated_Image_a6l659a6l659a6l6.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Are designers the new SWEs? Figma Make's new two-way GitHub integration turns designs into live, production code — with built-in governance]]></title>
            <link>https://venturebeat.com/technology/are-designers-the-new-swes-figma-makes-new-two-way-github-integration-turns-designs-into-live-production-code-with-built-in-governance</link>
            <guid isPermaLink="false">7KWhdg1a1sXMDbZa5VbDKr</guid>
            <pubDate>Thu, 28 May 2026 15:22:00 GMT</pubDate>
            <description><![CDATA[<p>Cloud design software company <a href="https://www.figma.com/">Figma</a> is officially transforming its AI design assistant, Figma Make, from a prototyping sandbox into a live, visual software editor that connects natively to production codebases. </p><p>Announced today, <a href="https://www.figma.com/blog/figma-make-now-on-your-local-code/">the update</a> allows product managers, designers, and non-technical builders to import an existing Git repository directly into the Figma desktop app, visually edit the application&#x27;s underlying code via the canvas, and push those changes back to engineering through standard GitHub pull requests. </p><h2><b>Engineering Governance &amp; Licensing</b></h2><p>Crucially for enterprise deployments, this integration does not bypass established engineering guardrails. Figma Make operates entirely within a standard version control workflow. </p><p>The platform acts as a local development environment where design changes accumulate as local commits. </p><p>When a designer is ready to ship, they generate a branch and open a pull request (PR) directly from Figma Make.</p><p>From an enterprise governance perspective, this means visual AI edits are subject to the exact same continuous integration pipelines, security checks, and code reviews as any traditional engineering commit. </p><p>Figma Make remains a proprietary commercial service available to Full seats on Figma’s paid plans—ranging from $16 per month for Professional teams up to $90 per month for Enterprise deployments—but it interfaces cleanly with open-source and proprietary Git repositories without imposing new licensing restrictions on the generated code. </p><h2><b>Breaking the One-Way Barrier</b></h2><p>When <a href="https://www.figma.com/blog/introducing-figma-make/">Figma Make originally launched a year ago in May 2025</a>, it successfully bridged the gap between static wireframes and interactive prototypes, but it was structurally isolated from the real-world software lifecycle. </p><p>It operated on a rigid, one-way push mechanism: users could export an AI-generated project to a brand-new GitHub repository, but at the time, Figma Make could <i>not</i> receive upstream changes or sync with an existing codebase. </p><p>Today&#x27;s update fundamentally alters that architecture: by enabling a connection to any Git provider, builders no longer have to maintain parallel, out-of-sync environments. </p><p>Teams can connect a production or sandbox repository, highlight specific UI elements, and use natural language or contextual annotations to prompt Figma’s multi-model AI — which toggles between Anthropic’s Claude 3.7 Sonnet, Claude Opus, and Google’s Gemini models — to write the underlying code. </p><p>The agent dynamically reads the surrounding code architecture, applies the visual edits, and anchors the generated code to the team&#x27;s existing design system guidelines. </p><h2><b>The Competitive Landscape: Figma Make vs. Lovable vs. Claude Design</b></h2><p>As code generation becomes commoditized by large language models, the competition to own the visual layer of software development has fractured into distinct approaches. </p><p>Figma Make is no longer competing merely with other design canvases; it is contending with full-stack &quot;vibe coding&quot; platforms like <a href="https://venturebeat.com/security/vibe-coded-apps-shadow-ai-s3-bucket-crisis-ciso-audit-framework">Lovable</a> and LLM-native environments like <a href="https://venturebeat.com/technology/anthropic-just-launched-claude-design-an-ai-tool-that-turns-prompts-into-prototypes-and-challenges-figma">Anthropic&#x27;s Claude Design</a>, which just launched last month. Each platform targets a fundamentally different user and objective:</p><ul><li><p><b>Figma Make (Design-First Systems):</b> Operating at $16 to $90 per month for Full seats, Figma Make caters to established product teams that prioritize brand fidelity. It wins on design system adherence, automatically pulling from existing color tokens, typography rules, component variants, and auto-layout structures. It is built for teams that want deep, layer-based canvas manipulation while keeping code ownership strictly within their existing GitHub architecture. </p></li><li><p><b>Lovable (Code-First Production):</b> Priced at $25 per month for Pro and $50 per month for Business tiers, Lovable functions as a standalone, full-stack application builder. Unlike Figma Make, Lovable relies on a native backend architecture (often paired with databases like Supabase) and a slider-driven UI styling approach. It enforces a strict automatic two-way sync with GitHub, treating the repository as the ultimate source of truth, and is optimized for solo developers or lean startup teams looking to launch production-ready SaaS apps from scratch without maintaining heavy vector design files. </p></li><li><p><b>Claude Design (AI-Native Prototyping):</b> Anthropic’s built-in canvas environment is accessible to users on Claude Pro ($20 per month) or Max ($100–$200 per month) subscriptions. While lacking the granular vector control of Figma Make or the full-stack database integrations of Lovable, Claude Design is ideal for product managers and engineers who need to generate quick, functional UI prototypes and immediately hand them off to coding agents like Claude Code. However, heavy iterative design sprints can quickly burn through Anthropic&#x27;s strict token limits, making it less viable as a primary design hub. </p></li></ul><h2><b>Navigating the &quot;Vibe Coding&quot; Era</b></h2><p>The emergence of two-way repo synchronization crystallizes the enterprise reality of the &quot;vibe coding&quot; era: the primary bottleneck in product development is shifting from raw engineering bandwidth to architectural governance and design intent. Technical leaders navigating this fast-moving landscape must look past the initial marketing hype to understand exactly who stands to benefit from this new paradigm. </p><p>Figma Make is not a general-purpose, standalone application builder; instead, it is a highly specialized frontend optimization tool designed explicitly for established, mid-to-large cross-functional product teams.</p><p><b>Figma</b> explicitly notes in its documentation that designers who already possess access rights to their company’s existing corporate codebase are currently the best suited for this functionality. Consequently, enterprise leaders should consider adopting Figma Make if they have a mature engineering organization with a well-defined design system, rigid repository guardrails, and a desire to unlock faster iteration cycles. It directly addresses the technical friction felt by the 45% of designers and 59% of product managers who already contribute to code on a regular basis but prefer to operate from a visual canvas rather than a command-line terminal. By turning the canvas into a local development environment, it allows these non-technical builders to execute visual layouts, typography tweaks, and color changes independently, offloading tedious frontend implementation from core engineers.</p><p>Conversely, organizations or teams launching zero-to-one skunkworks projects, or solo developers building lightweight SaaS products from scratch, will find far better utility in a code-first, full-stack platform like <b>Lovable</b>. Because Lovable natively orchestrates backend logic and database integrations like Supabase, it excels at spinning up functional applications rapidly without requiring a pre-existing vector infrastructure or a legacy codebase to pull from. </p><p>Meanwhile, individual product managers or software engineers seeking rapid, text-prompt-driven UI wireframing without rigid design system constraints are better served by the immediacy of <b>Claude Design.</b></p><p>For the enterprise leader wary of overcommitting capital or locking their custom builds into proprietary AI backends, the wisest path forward is compartmentalization. Figma Make’s reliance on standard Git workflows—relying on local commits, isolated branches, and mandatory engineering pull request reviews—means it enforces the exact same security and code quality standards required for enterprise stability. By selecting Figma Make as a targeted frontend bridge for existing systems, and utilizing platforms like Lovable for external, greenfield prototyping, leaders can safely adopt productive new AI tooling without risking their core architectural integrity.</p><h2><b>Why Figma Needs to Keep Innovating</b></h2><p>Figma completed its initial public offering on July 31, 2025, pricing its shares at $33 after immense institutional demand oversubscribed the deal by 40 times. The stock immediately skyrocketed 250% to hit an intraday high of $115.50 on its first trading day. </p><p>However, in the subsequent months, Figma&#x27;s stock (NYSE: FIG) experienced a severe correction,<a href="https://finance.yahoo.com/news/why-figma-stock-crashed-81-154000916.html"> crashing 81% from its peak </a>to trade around the $21 to $22 range by May 2026, dropping well below its initial IPO price. </p><p>This collapse reduced its market capitalization to approximately $11.3 billion. Financial analysts attribute this aggressive re-rating to structural IPO pricing mechanics, a low float, and the broader &quot;software apocalypse,&quot; as investors rapidly rotate capital out of traditional SaaS products and into AI-native workflows.</p><p>The stakes for Figma&#x27;s current positioning are existential. As enterprises increasingly shift their software spending toward generative AI models and localized coding agents like Claude Design, Claude Code, and OpenAI Codex, traditional &quot;vanilla&quot; cloud design software looks increasingly commoditized. </p><p>Figma Make represents the company&#x27;s critical counter-offensive in this era of &quot;vibe coding.&quot; To regain its premium valuation, Figma must prove to Wall Street that its platform is not merely a static vector canvas that AI tools can easily bypass, but an indispensable, live orchestration layer where human intent, enterprise design systems, and AI-generated production code seamlessly integrate. </p><p>With the new Figma Make two-way Github integration and governance, the company appears well on its way to showing the doubters it has a path a forward in the AI-powered &quot;vibe coding&quot; development era .</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4ZspLJ5z8U1am5iAvqvW27/8cce5f08e7d1c970ee18ee7eefbada18/ChatGPT_Image_May_28__2026__11_13_56_AM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[SQL query logs hold the context AI agents need to stop hallucinating joins]]></title>
            <link>https://venturebeat.com/data/sql-query-logs-hold-the-context-ai-agents-need-to-stop-hallucinating-joins</link>
            <guid isPermaLink="false">4IcRNRxzaix7iCtgBD9uK7</guid>
            <pubDate>Thu, 28 May 2026 15:00:00 GMT</pubDate>
            <description><![CDATA[<p>When Miro’s data team pointed AI agents directly at its Snowflake environment, the agents got the wrong answer more than 65% of the time. The problem wasn’t the model — it was context. With more than 10,000 tables and no semantic layer to guide routing, the agents had no way to know which data assets matched which business questions.</p><p>DataHub is releasing a context intelligence layer Thursday that mines existing SQL query history to build a semantic index — and exposes it to agents via MCP, LangChain, Google’s Agent Development Kit and CrewAI. The company calls it Context Intelligence, and it’s built on the same query-log infrastructure DataHub has used for lineage tracking in production deployments worldwide.</p><p>The company was founded by the team that built DataHub as an open source project at LinkedIn, where co-founder and CTO Shirshanka Das led data infrastructure for nearly 11 years. The open source project now has more than 15,000 contributors and 3,000 production deployments worldwide.</p><p>&quot;For the first time, enterprises can turn years of analyst query history into a living, retrievable knowledge base where agents stop hallucinating joins because they have access to the joins that have worked before, validated by the people who ran them,&quot; Shirshanka Das, co-founder and CTO of DataHub, told VentureBeat in an exclusive interview.</p><h2>Why query history beats raw schema for agent routing</h2><p>DataHub began as a metadata management project at LinkedIn, built to solve two problems simultaneously: making data easy to find and use across the organization while ensuring it was only used for the right reasons. Das open-sourced the project in early 2020 after nearly six years of internal development.</p><p>The primary use case in the years since has been lineage — understanding how data flows from operational systems through streaming infrastructure into warehouses and out to business tools. Regulatory compliance audits, operational triage and new engineer onboarding all depend on that lineage graph. Postgres is the most-connected source in the DataHub deployment base globally, followed by MySQL, Oracle and the major cloud warehouses including Snowflake and Google BigQuery. The platform supports more than 100 connected metadata sources.</p><p>That deployed base matters for what DataHub is releasing. The query log extraction and SQL parsing capabilities powering Context Intelligence were developed across years of production deployment, not built for this release. The same infrastructure now serves agents querying a semantic index at runtime.</p><p>&quot;The consumption layer has changed from humans to agents,&quot; Das said.</p><h2>Context Intelligence mines validated query history, not raw logs</h2><p>Context Intelligence is a new capability layer built on top of DataHub&#x27;s existing open source metadata foundation. The open source platform has spent years extracting and parsing query logs from connected warehouses for lineage tracking. That same infrastructure is what Context Intelligence draws on to build the semantic index. The capability is new. The underlying plumbing is not.</p><p><b>Filtering for signal.</b> Warehouse query logs contain too much noise to use directly. DataHub&#x27;s engine filters for what Das describes as the &quot;golden queries,&quot; meaning high-quality analyst queries and scheduled pipelines that represent proven business logic.</p><p><b>Inverting SQL into semantic definitions.</b> The engine extracts patterns from those queries and translates them into structured text definitions DataHub calls semantic anchors. Those anchors form the retrieval basis agents draw on before generating SQL.
 &quot;You can almost think of it as inverting text to SQL,&quot; Das said.</p><p><b>Human validation on top.</b> Context Hub lets domain experts review AI-proposed context, resolve conflicting definitions and simulate the impact of changes before publishing. DataHub surfaces cases where different teams calculate the same metric differently and raises them for human resolution.</p><h2>How Miro got AI agents working across 10,000 Snowflake tables</h2><p>Miro, the digital collaboration platform, was already using DataHub for lineage tracking and impact analysis when it began testing analytics agents against its Snowflake environment. Ronald Angel, product manager for the data platform at Miro told VentureBeat that the scale of the data estate became the problem immediately. Sending natural language queries directly to the Snowflake MCP produced incorrect answers more than 65% of the time. Exposing more than 10,000 tables directly to agents caused too much confusion for reliable routing.</p><p>Miro addressed the problem by organizing data into well-defined data products that constrain what agents can see rather than exposing raw schema. The production architecture runs from user requests submitted via Claude Chat or Claude Cowork through a context layer where DataHub&#x27;s MCP maps natural language to the appropriate data assets, then hands off to Snowflake&#x27;s MCP for SQL generation.</p><p>Angel said the context layer pulls in metadata, entity relationships, query history and business intent for each Snowflake table, specifically what business question each entity is designed to answer. Those semantic signals allow the agent to identify the correct database entities before writing SQL rather than guessing from schema alone.</p><h2>Pinecone, Oracle, Redis, Microsoft: how DataHub fits the context stack</h2><p>Data vendors including<a href="https://venturebeat.com/data/the-rag-era-is-ending-for-agentic-ai-a-new-compilation-stage-knowledge-layer-is-what-comes-next"> Pinecone</a>,<a href="https://venturebeat.com/data/oracle-converges-the-ai-data-stack-to-give-enterprise-agents-a-single"> Oracle</a> and<a href="https://venturebeat.com/data/context-architecture-is-replacing-rag-as-agentic-ai-pushes-enterprise-retrieval-to-its-limits"> Redis</a> all have contextual memory capabilities. On the platform side Microsoft has built out its<a href="https://venturebeat.com/data/enterprise-ai-agents-keep-operating-from-different-versions-of-reality"> Fabric IQ</a> as a semantic layer for context.</p><p>DataHub’s argument isn’t feature parity. The company is positioning the context layer as platform-neutral — provisioning context into existing endpoints like Snowflake semantic views and Microsoft Fabric IQ rather than replacing them.</p><p>&quot;A lot of times people want to be platform neutral when it comes to their context layer,&quot; Das said. </p><p>Kevin Petrie, an analyst at BARC, told VentureBeat that he sees DataHub&#x27;s ability to integrate diverse metadata for both structured and unstructured objects, including documents and images, as differentiating them in the market. </p><p>&quot;Many other vendors are more focused on structured tables, which provide trusted facts but often lack the rich context of text objects,&quot; he said.</p><p>Michael Ni, VP and principal analyst at Constellation Research, told VentureBeat that for him what stands out about DataHub’s context layer is its support of the shift from passive cataloging to continuously refreshed semantic intelligence.

Ni described the competition for context as the next major platform war, arguing that whoever controls context at runtime controls the decision layer for data, agents, workflows and decisions. </p><p>&quot;Buyers need to be careful, since many vendors only support a portion of the full context capabilities required for AI and agentic solutions,&quot; Ni said. &quot;Buyers should be clear on their context management requirements, as vector memory isn&#x27;t business meaning, business meaning isn&#x27;t governance, and governance isn&#x27;t execution.&quot;</p>]]></description>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4duYnrZO66UDWj9I8CYJvF/e7f866540190ac5f7b57a52f8b75cdc8/mining-sql-queries-smk1.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Control within connection: How data sovereignty is rewriting the rules of critical infrastructure]]></title>
            <link>https://venturebeat.com/data/control-within-connection-how-data-sovereignty-is-rewriting-the-rules-of-critical-infrastructure</link>
            <guid isPermaLink="false">41YJ6nHtJI3z6QApKx2OXV</guid>
            <pubDate>Thu, 28 May 2026 07:00:00 GMT</pubDate>
            <description><![CDATA[<p><i>Presented by Equinix</i></p><hr/><p>Digital systems are central to economic resilience. But the governance models supporting them were designed for a bygone era, when systems were smaller, often centralized, and rarely crossing multiple jurisdictions. This structural mismatch is driving the realization across boardrooms and governments that data sovereignty is not only core to critical infrastructure, but its implications determine the trajectory of the global economy.</p><p>The scale of change is forcing the issue. <a href="https://www.giiresearch.com/report/id1726448-worldwide-idc-global-datasphere-forecast.html?">IDC projects</a> the global datasphere will continue to grow at an extraordinary pace, driven by AI workloads, real-time analytics, and always-on digital services. This is placing unprecedented demands on data center capacity, interconnection density, and operational reliability, a trend highlighted by both <a href="https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/ai-power-expanding-data-center-capacity-to-meet-growing-demand?">McKinsey</a> and <a href="https://www.goldmansachs.com/insights/articles/how-ai-is-transforming-data-centers-and-ramping-up-power-demand?">Goldman Sachs</a> last year.</p><p>More data means demand for more infrastructure. Infrastructure expansion means more interconnected systems. And more interconnected systems mean greater exposure when control is unclear.</p><p>That is why sovereignty is now coming into focus for nation states and private sector actors alike. It’s more than an abstract legal concept. There are practical questions around who has the authority when systems span countries, clouds, and ecosystems.</p><h2>Control determines resilience in a fragmented world</h2><p>Infrastructure resilience has always depended on clarity. Power grids work because ownership, responsibility, and control are well understood by stakeholders and the public. The same principle should apply to digital infrastructure, even if the underlying systems look much different.</p><p>Data sovereignty aligns authority with accountability. Organizations retain decision-making power over where data lives, how it moves, who can access it, and which technologies are allowed to touch it. When something breaks or regulators ask difficult questions, there is no ambiguity about who is responsible.</p><p><a href="https://www.gartner.com/en/articles/top-technology-trends-2026">Gartner’s Top Strategic Technology Trends for 2026</a> underscores this shift by emphasizing that modern infrastructure is inseparable from governance, resilience, and digital trust. Treating sovereignty as a bolt-on compliance requirement rather than an architectural principle is proving insufficient.</p><p>The challenge, of course, is that modern enterprises cannot simply look inward and ignore macro circumstances. Scale, performance, and innovation depend on participation in global digital ecosystems.</p><h2>A false paradox: scale vs. authority</h2><p>For years, organizations were told they had to choose. Either maintain tight control and accept limited connectivity, or embrace global platforms and accept reduced authority over data flows and infrastructure decisions. Neither holds up under real-world conditions.</p><p>Financial services firms require low-latency access to markets across regions, all while adhering to strict regulatory expectations. Healthcare organizations must have secure data control without walling themselves off from cloud-based analytics and AI innovation. Governments demand digital services that scale while remaining auditable and transparent.</p><p>This tension is why simplistic sovereignty narratives fail to pass muster. Sovereignty is more nuanced than isolation: the concept means control within connection.</p><p>The distinction is becoming clearer as hyperscalers, regulators, and enterprises sharpen their approaches. Public disclosures from leading hyperscalers demonstrate how sovereign cloud offerings attempt to address data residency and operational separation. However, most large organizations recognize long-term control cannot rely on any single provider or managed platform alone.</p><h2>A distinction of responsibility leads to an industry inflection point</h2><p>The infrastructure strategies showing the most durability share a common theme: clean separation between infrastructure operations and data authority.</p><p>In this model, providers are responsible for running highly resilient facilities, physical security, power, cooling, and high-performance interconnection at scale. Customers are fully in control of their data, applications, security posture, and governance decisions. Authority stays with the party that owns the risk.</p><p>This is where neutral infrastructure platforms like Equinix come in, not as a cloud service provider, but as an interconnected foundation where customers deploy and control their own environments while accessing a broad ecosystem of networks, clouds, and partners. <a href="https://www.equinix.com/lp/equinix-sovereignty">Equinix views sovereignty</a> as customer-controlled by design, with clear boundaries around possession, custody, and control. That approach is in high demand from regulated industries.</p><p>The benefits show up in auditability, legal clarity, and operational confidence. Trust comes with verification. When responsibilities are clear, compliance is verifiable rather than assumed.</p><h2>Ambiguity is unacceptably expensive for AI workloads</h2><p>Artificial intelligence accelerates these dynamics. AI systems are data-hungry and regulation-sensitive, a combination that leaves little room for governance shortcuts.</p><p>Financial institutions like <a href="https://institute.bankofamerica.com/content/dam/economic-insights/utility-spending.pdf">Bank of America</a> and <a href="https://www.morganstanley.com/ideas/ai-energy-demand-infrastructure">Morgan Stanley</a> have forecasted AI-driven data center growth will place new pressure on infrastructure planning, energy availability, and geographic distribution. Simultaneously, AI models need to operate close to sensitive data, rather than exporting that data across borders for centralized processing.</p><p>Without a clear sovereignty framework, organizations face difficult compromises. But with one, they achieve flexibility. Models move to data. Data remains controlled. Innovation accelerates without triggering regulatory alarms.</p><p>That balance is emerging as a competitive differentiator.</p><h2>Infrastructure in 2026 looks different, and expectations are reset</h2><p>The critical infrastructure powering the digital economy goes beyond physical assets. It now includes governance models, legal posture, and control structures that determine how systems behave under pressure.</p><p><a href="https://digital-strategy.ec.europa.eu/en/library/state-digital-decade-2025-report?">European Commission updates</a> to data sovereignty and digital strategy frameworks reflect this, as governments increasingly treat data governance as a matter of economic and national resilience. <a href="https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2026/tech-sovereignty.html?">Deloitte’s digital sovereignty research for 2026</a> echoes that theme across global enterprises, especially those operating in multiple regulatory regimes.</p><p>The organizations adapting fastest are not retreating from global connectivity. Rather, they are designing for it and embedding sovereignty as an architectural requirement. As enterprises navigate more fragmented regulatory environments, the ability to maintain jurisdictional control across interconnected digital ecosystems is a baseline infrastructure expectation rather than a specialized requirement.</p><p>That expectation is now shaping how infrastructure is built. Enterprises increasingly require network-level sovereignty enforcement that operates across hybrid multicloud environments automatically, including during outages, failovers, and congestion events where data can cross borders invisibly. Capabilities such as <a href="https://newsroom.equinix.com/2026-05-14-Equinix-Puts-Enterprises-in-Control-of-Data-Sovereignty-Across-Hybrid-Multicloud-Environments">Equinix Fabric Geo Zones</a> reflect that demand, delivering the first network-level, multicloud sovereignty enforcement layer built natively into the interconnection fabric itself.</p><p>The rules of infrastructure are being rewritten. Data sovereignty is the architectural foundation that resilient, globally connected enterprises demand. Organizations that treat it as such will be better equipped to operate, compete, and withstand pressure. Those that do not will find the status quo ambiguity increasingly costly.</p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/40FfJJXZ08UU3l5DwjYbQt/40ddcb825820133d640aa9971817094b/AdobeStock_1221490644.jpeg?w=300&amp;q=30" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[MiniMax teases upcoming M3 model with new sparse attention mechanism and 15.6X long-context response speed boost]]></title>
            <link>https://venturebeat.com/technology/minimax-teases-upcoming-m3-model-with-new-sparse-attention-mechanism-and-15-6x-response-speed-boost</link>
            <guid isPermaLink="false">3FoRBcJFdIFfZum4tICG2X</guid>
            <pubDate>Wed, 27 May 2026 19:59:06 GMT</pubDate>
            <description><![CDATA[<p>Among the many Chinese AI companies and laboratories vying for market share and attention (no pun intended) on the global marketplace, <a href="https://www.minimax.io/">MiniMax</a> stands out for its commitment to providing frontier-level intelligence across a range of modalities, including text, coding, and video (through its <a href="https://hailuoai.video/">Hailuo</a> model series) — often under permissive, enterprise-friendly, standard open source licenses. </p><p>Now, MiniMax is again raising the eyebrows of AI power users and developers around the world by releasing a new, <a href="https://huggingface.co/papers/2605.26494">in-depth technical report </a>on the making of its popular M2 series of language models (<a href="https://venturebeat.com/ai/minimax-m2-is-the-new-king-of-open-source-llms-especially-for-agentic-tool">M2</a>, <a href="https://venturebeat.com/technology/minimaxs-new-open-m2-5-and-m2-5-lightning-near-state-of-the-art-while">M2.5</a>, and <a href="https://venturebeat.com/technology/new-minimax-m2-7-proprietary-ai-model-is-self-evolving-and-can-perform-30-50">M2.7</a>) shedding light on its numerous engineering innovations and clever approaches — while the company and its leaders also teased a whole new <a href="https://x.com/SkylerMiao7/status/2059285750458544561">sparse attention approach for its upcoming MiniMax M3 series of models</a>, which it says yields up to 15.6 times faster decoding (or LLM response) speed at long contexts (a million tokens) by adopting a custom sub-quadratic framework. In so doing, MiniMax has designed M3 to make ultra-long-context AI agent deployment economically viable.</p><div></div><p>The M2 report is noteworthy for any enterprise working with AI models, and especially those looking to fine-tune and train their own in-house. After all, MiniMax&#x27;s M2 series models often achieved top benchmarks in the world for open source AI performance when they were released. </p><p>While the title has since been <a href="https://artificialanalysis.ai/models/open-source">eclipsed </a>by several other Chinese labs including DeepSeek and Xiaomi, MiniMax&#x27;s new report offers a blueprint that can be used to improve AI model and agent performance by enterprises around the world.</p><p>As Adina Yakup of Hugging Face <a href="https://x.com/AdinaYakup/status/2059567862134485043">observed on X</a>, &quot;Beyond the benchmarks, they’ve done some really solid work on MoE efficiency and agent oriented design. Excited to see where M3 goes next!&quot; </p><h2><b>The attention dilemma</b></h2><p>The core technical architecture of the M2 series relies on a sparse Mixture-of-Experts (MoE) decoder-only Transformer layout used by numerous other state-of-the-art LLMs.</p><p>The foundational backbone houses 229.9 billion total parameters, yet maintains a remarkably lean operational footprint by activating just 9.8 billion parameters per token across 256 fine-grained experts. </p><p>To optimize routing and avoid standard load-balancing issues, however, MiniMax implemented sigmoid gating paired with learnable, expert-specific bias terms, heavily reducing reliance on restrictive auxiliary losses.</p><p>The most definitive engineering decision documented in the M2 paper was the strict adherence to full multi-head attention with Grouped Query Attention (GQA) across all 62 layers. </p><p>In large language models, &quot;quadratic scaling&quot; refers to the computationally expensive reality of standard full attention mechanisms, where every token in a sequence must mathematically connect to every other token. To use a real-world analogy, it is akin to attending a networking event and being forced to have a deep conversation with every single person in the room while simultaneously monitoring all other ongoing conversations. </p><p>While this approach yields incredibly thorough context, the processing power and memory required explode at the square of the input length, creating a severe hardware bottleneck as models attempt to ingest hundreds of thousands of words.</p><h2><b>The problem with sub-quadratic scaling</b></h2><p>&quot;Sub-quadratic&quot; scaling introduces architectural shortcuts designed to bypass this exponential computational load. Instead of mapping every possible connection, sub-quadratic methods—such as Sliding Window Attention or compressed linear attention—might only analyze a localized window of nearby words or generate a compressed summary of the broader text. </p><p>These efficient methods drastically reduce hardware costs and allow models to process massive documents at high speeds, but they historically introduce severe trade-offs in accuracy, often causing the AI to miss the &quot;big picture&quot; or lose track of distant context.</p><p>This mathematical dilemma defines the architectural evolution from MiniMax&#x27;s M2 to its upcoming M3 series. During M2&#x27;s development, researchers rigorously tested sub-quadratic shortcuts but found they crippled the model&#x27;s &quot;multi-hop reasoning&quot;—its ability to connect disparate clues across a long document—forcing the team to absorb the massive computational cost of full quadratic attention to maintain frontier-level intelligence. </p><p>Indeed, they aggressively benchmarked efficient attention alternatives during pre-training but intentionally threw them out. They experimented extensively with hybrid setups, interleaving full attention with sub-quadratic architectures like Lightning Attention or hybrid Sliding Window Attention (SWA) configurations.</p><p>The empirical results were definitive: at a larger scale, linear and windowed attention variants exhibited severe reasoning deficits. </p><p>On evaluations exceeding 32K context windows, SWA variants performed significantly worse than full attention, dropping from a baseline score of 90.0 to 72.0 on the RULER 128K complex word extraction task. </p><p>Sub-quadratic configurations proved prone to memory-bound constraints during training, lacked native prefix caching support, and failed to smoothly align with Multi-Token Prediction (MTP) modules used for speculative decoding. Full attention was deemed necessary to preserve multi-hop reasoning capability.</p><p>However, recognizing that physical hardware limits cannot sustain quadratic scaling indefinitely, MiniMax is designing the M3 series around a novel sub-quadratic framework to finally deliver both high-speed processing and uncompromised reasoning.</p><h2><b>MiniMax Sparse Attention (MSA) and sub-quadratic scaling incoming</b></h2><p>The upcoming MiniMax-M3 breaks away from the compute-heavy constraints of its predecessor. As disclosed by MiniMax’s engineering team under the banner &quot;Something BIG is coming,&quot; M3 introduces &quot;MiniMax Sparse Attention&quot; (MSA). </p><p>Unlike DeepSeek’s Multi-head Latent Attention (MLA), which compresses keys and values into a low-dimensional latent space, MSA operates on a standard GQA backbone but utilizes block-level selection on real, uncompressed Key-Values. </p><p>Elie Bakouch at AI training infrastructure and platform lab Prime Intellect <a href="https://x.com/eliebakouch/status/2059321928205156568">posted on X </a>noting that the main changes feature &quot;block level selection like in CSA but attention is done on the real KV, not in [compressed space].&quot; </p><div></div><p>This solves the precision loss and prefix-caching obstacles noted in the M2 paper. By filtering and selecting block-level sequences dynamically, MSA delivers an architectural leap: early hardware profiling indicates a 9.7x speedup in prefilling latency and a massive 15.6x speedup during decoding phases at a 1-million token sequence length compared to the full-attention M2 architecture.</p><p>To understand why a speedup in the &quot;decoding phase&quot; is so significant, it helps to break down how an AI actually reads and writes information. When you interact with an AI, the processing happens in two distinct steps: prefilling and decoding.</p><p>When you hand an AI a prompt—whether it’s a short sentence or a massive 1,000-page document—it processes that entire chunk of text all at once in parallel, known as &quot;prefilling.&quot; It essentially &quot;reads&quot; the input in one big gulp to build its initial understanding and establish context.</p><p>In order to generate a response, the AI must enter a &quot;decoding phase.&quot; To predict the first word of its response, it looks at the prompt. To predict the second word, it has to look at the prompt <i>plus</i> the first word. To predict the hundredth word, it must recalculate the context of the prompt <i>and</i> the previous 99 words it just wrote. So the response actually becomes harder to generate as it goes on, with the end requiring a full review of all prior parts.</p><p>For a layperson, imagine reading a dense legal brief (prefilling) and then being forced to write a summary report where, before writing every single new word, you must rapidly reread the entire brief plus everything you&#x27;ve written so far to ensure your next word makes sense (decoding).</p><p>Because the AI must constantly and repetitively look backward to generate each new step forward, the decoding phase is the most severe computational bottleneck in generating text. It is why AI models often type out their answers word-by-word, and why they slow down significantly as conversations get longer.</p><p>Therefore, when the passage states the new architecture achieves a massive 15.6x speedup during the decoding phase at a 1-million token sequence length, it means the model has found a structural shortcut to generate its answer—token by token—nearly 16 times faster. It directly solves the exact bottleneck that normally makes AI chatbots freeze or stutter when handling massive amounts of information.</p><h2><b>The evolution of the MiniMax M series and the creation of &#x27;Forge&#x27;</b></h2><p>On a product level, MiniMax has consistently evolved its models from simple text generation interfaces into autonomous workers. </p><p>The M2 series pioneered an &quot;interleaved thinking&quot; protocol where the model alternates between natural-language planning traces and explicit tool invocations inside a single trajectory. Rather than dropping the intermediate chain-of-thought blocks between execution turns, M2 appends the full thinking history directly into the conversation context. This planning persistence prevents state drift, allowing the model to recover gracefully from runtime errors and revise its strategies based on environment feedback.</p><p>To train these long-horizon workflows, MiniMax built &quot;Forge,&quot; a scalable agent-native reinforcement learning system. Forge decouples execution into three independent modules—the Agent Side, the middleware abstraction layer (Gateway Server and Data Pool), and the Training/Inference engines. </p><p>As MiniMax engineer <a href="https://thursdai.news/guests/olive_jy_song">Olive Song explained on the ThursdAI podcast</a>, &quot;What we realized is that there&#x27;s a lot of potential with a small model like this if we train reinforcement learning on it with a large amount of environments and agents... But it&#x27;s not a very easy thing to do,&quot; adding that this environmental training was where the team spent a significant portion of their development timeline. To absorb the extreme trajectory-length variance common in multi-step agent environments, Forge implements two vital engineering solutions:</p><ol><li><p><b>Windowed FIFO Scheduling: </b>A training scheduler that maps a sliding window over the generation queue. It permits greedy, high-throughput fetching of completed tasks within the window to prevent cluster idle time, while strictly enforcing FIFO boundaries to maintain distributional stability and avoid gradient oscillation.</p></li><li><p><b>Prefix Tree Merging:</b> An optimization that restructures batch training into tree computation. Completions sharing identical conversation prefixes are calculated exactly once in the forward pass before branching. This eliminates redundant calculations, generating up to a 40x training speedup with zero approximation error.</p></li></ol><p>This reinforcement infrastructure directly spawned the M2.7 checkpoint, moving the series toward &quot;self-evolution&quot;. Operating inside an automated agent harness, M2.7 functions as an independent machine learning engineer. The model profiles its own active training runs, diagnoses anomalies, reads logs, and automatically modifies its own codebase and configurations. </p><p>According to MiniMax, M2.7 successfully handled between 30% and 50% of its own development workflow. </p><p>On OpenAI’s rigorous MLE Bench Lite suite, which tests autonomous ML research capability, M2.7 achieved a 66.6% medal rate across independent 24-hour trials, effectively tying Google’s closed-weight Gemini 3.1 Pro.</p><p>The continuous cadence from M2 to M2.5, which famously completed 30% of internal tasks and 80% of newly committed code at MiniMax HQ, underlines a broader vision. </p><p>As the MiniMax team noted during that phase of deployment, &quot;we believe that M2.5 provides virtually limitless possibilities for the development and operation of agents in the economy.&quot; </p><p>With the technical report codifying the M2 generation&#x27;s successes and the MSA tech blog on the horizon, MiniMax is signaling that the next frontier of AI is explicitly about translating a mini-activation footprint into maximum real-world intelligence.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5cHY46ueCDnRXIZBImepQn/55f72e82f91c3b690c7c5245d745f24b/Gemini_Generated_Image_xnaqxbxnaqxbxnaq__1_.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Merck and Mastercard are seeing real agentic AI results. Both say the plumbing came first.]]></title>
            <link>https://venturebeat.com/infrastructure/merck-and-mastercard-are-seeing-real-agentic-ai-results-both-say-the-plumbing-came-first</link>
            <guid isPermaLink="false">2bQbPPxFYTJ9lZk20p1OHC</guid>
            <pubDate>Wed, 27 May 2026 18:23:45 GMT</pubDate>
            <description><![CDATA[<p>Merck is using AI agents to cut drug discovery cycles by a third and ship compliant marketing materials up to 80% faster — but VP of Digital Platforms Sean Finnerty says the only reason it&#x27;s working is because they built the infrastructure first.</p><p>And the pharmaceutical manufacturer is seeing promising early results: AI is generating marketing drafts that are “99% right” when it comes to compliance, shrinking review cycles from months to days and accelerating delivery by 70% to 80%. In the company’s medical research, meanwhile, one AI-assisted discovery cycle was reduced by 33%.</p><p>Still, agentic AI only works if companies first build the underlying “plumbing,” Finnerty said of digital platforms and services at a recent AI Impact Series event. </p><p>“If we do one-offs, we&#x27;re gonna end up with thousands and thousands of things that are ultimately just gonna be debt that we&#x27;ll have to deal with later,” he said. “And that&#x27;s gonna be a drag on any further innovation.” </p><h2>Starting with the plumbing</h2><p>Merck’s plumbing-first strategy comes from lessons learned during the early days of cloud in the 2010s “when nobody knew what the heck was going on,” Finnerty said. </p><p>Getting the cloud right meant building from the ground up; at Merck, that infrastructure now supports 2,500 AWS accounts, numerous Microsoft Azure subscriptions, and new Google Cloud Platform (GCP) integrations. </p><p>“AI is gonna be the same exact thing,” Finnerty said. “We&#x27;re going to have thousands and thousands of agents.” The questions then pile up: How do you register them? How do you secure them? How do you ensure they&#x27;re connected to the right tools, and have access to the right data and the right context? </p><p>Context delivery is also critical; Merck works with three hyperscalers and has forty-seven edge locations and hundreds of databases. “Many, many petabytes” of structured and unstructured data are stored in Oracle databases, SQL databases, Excel spreadsheets, phone transcripts, and other repositories, Finnerty said. </p><p>His team is building scaffolding to deliver meaningful context in various situations, he explained. Data must be organized and ingested into various platforms, because “there’s no one solution to solve every single problem.” Sometimes it&#x27;s Databricks, other times it&#x27;s Amazon Redshift, “plus four other things.” </p><p>The goal is: “Let&#x27;s make that easy and frictionless for people to do, and secure it, and make sure it&#x27;s well integrated with MCP [model context protocol], and A2A [Agent2Agent], and upstream compute,” Finnerty said. “If you wanna run stuff on GCP or you wanna run stuff on AWS, we&#x27;ve got the plumbing in place so you can run your adjacent workloads wherever you want.” </p><h2>How Merck is using agents</h2><p>As it builds out its technical plumbing, Merck is experimenting with agents across regulated enterprise operations, scientific discovery workflows, and app modernization. </p><p>Notably, AI is accelerating drug discovery. Finnerty explained that scientists look at molecular structures and disease states to determine if a given condition is druggable. But even if a disease state is known, developing a drug to target it can take years. </p><p>Now with AI, teams are starting to see “very promising things,” such as cutting one particular research cycle down by one-third. “That&#x27;s a year off of the life of the discovery cycle,” Finnerty said. “Which means, theoretically, we can get it to a patient who needs that therapy a year faster.” </p><p>Once developed and approved, these products are regulated and marketing materials around them must be clearly and explicitly articulated. “The way you communicate that information per market, per country, per state, per region, is all very carefully governed and regulated,” Finnerty said. It’s also variable: An ad campaign for a vaccine in the state of Georgia looks much different from one launched in Canada. </p><p>Historically, humans did the due diligence to make sure the company complied with various laws. Draft materials go through iterations of reviews; when a mistake is discovered, it gets “kicked back to the beginning, and it goes through it again, and then it takes another however many weeks and months,” Finnerty said. </p><p>But now, AI can do that “much, much more effectively,” and the process is increasingly evolving from a human-in-the-loop to essentially a &quot;human-as-governor.&quot; With human oversight, AI can deliver a first draft in a day or week that is 99% there, allowing teams to ship materials up to 80% faster. </p><p>Meanwhile, when it comes to app modernization, AI can discover architecture, document data interactions, APIs, network paths, and do authentication checks and authorization; it can also write code for Terraform for deployment and refactor JavaScript into Python. </p><p>Where the company would have previously spent weeks and months and hundreds of thousands of dollars to update one application, Finnerty said, agents are now handling the work through prompts.</p><h2>Running into &quot;wackiness&quot; </h2><p>That’s not to say there aren’t significant challenges; Finnerty noted that his team has run into some “wackiness”; for example in automated code and scenario testing. AI has blatantly made up scenarios, whether due to incorrect context, infrastructure, “or if it was just getting creative with, ‘You should be testing these three functions that don&#x27;t even exist in the code that you&#x27;re trying to test.’”</p><p>“That surprised me a little bit because I thought we were further past some of the hallucination challenges in these later models,” he said. </p><p>To address this, his team has engineered guardrails to keep hallucinations to a minimum, essentially using AI to supervise AI and applying confidence scores. So if Claude created the first output, they’ll instruct Microsoft Copilot to assess it. </p><p>“So if you ask something once, have AI check it, then ask it a third time, the confidence increases every time, and it minimizes some of the garbage that gets created in the early runs,” Finnerty said.  </p><h2>Use cases for agentic AI in financial services</h2><p>Meanwhile, at Mastercard, Chief Data Officer Andrew Reiskind and his team are focusing agentic experimentation on highly orchestrated transaction and dispute workflows. As he noted, a chargeback or fraud dispute is not a single event.</p><p>When a consumer disputes a charge (typically online), that “kicks off an entire other process on the back-end that tends to be very labor-intensive,” Reiskind said. </p><p>Mastercard has to collect specifics about the actual dispute; then the merchant has its own investigations (Was the card reported as lost or stolen? Does the consumer dispute charges often?). Further, the network sitting in the middle has its own rules for timing and information submission. </p><p>“You have each and every one of these steps, many of which are unstructured, but there are also structured data elements to this,” Reiskind said. Whether a card was lost or stolen tends to be structured, but the consumer complaint is “unstructured data of questionable reliability.”</p><p>“So you&#x27;re sitting there with a decisioning system that has deterministic decisions, but also probabilistic decisions,” he said.</p><p>This problem can be sped up and potentially solved by AI agents, but that can be a complex process: Which tasks are you handing off to agents? When are they kicking things back to human reps? How many agents are you ultimately using? What are the cost implications? </p><p>Then there are reputational questions and costs: Have you just called a consumer potentially a liar when they weren&#x27;t lying? </p><p>“It&#x27;s an exact problem where you want to, as a bank, maintain trust with your consumer,” Reiskind said. “But you also wanna make this efficient and take costs out of the system.” </p><h2>The PB&amp;J versus turkey mistake: Determine what risks are acceptable</h2><p>There’s always going to be risk with AI, and enterprises should assess it from the beginning of product design, Reiskind said. There’s also the question of acceptable risk. </p><p>As an example: Did you serve a customer a peanut butter jelly sandwich instead of a turkey sandwich (a minor inconvenience)? Or did you serve gluten to someone with celiac disease?</p><p>“Is it an acceptable risk if one percent of the time it makes the mistake? If it is, let&#x27;s go to the next stage of how you&#x27;re mitigating that risk,” Reiskind said. </p><p>Leaders must perform cost-benefit analysis, break problems down to their “constituent pieces,” and calculate cost for each one.  But these are estimates; it’s near-impossible to forecast real usage, Reiskind said. “It is not a simple process to get to the cost,” he said. “But it is doable.”</p>]]></description>
            <author>taryn.plumb@venturebeat.com (Taryn Plumb)</author>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4CBUUbOwnAvyCtaosmRxYr/eea81e58c1a207d16773ebad282b251f/Collaboration.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[DataGrail report finds your vendor may be sending data to AI models you never approved]]></title>
            <link>https://venturebeat.com/security/datagrail-report-finds-your-vendor-may-be-sending-data-to-ai-models-you-never-approved</link>
            <guid isPermaLink="false">6QkX7JTVdaJWQeypRbzgJ7</guid>
            <pubDate>Wed, 27 May 2026 16:00:00 GMT</pubDate>
            <description><![CDATA[<p>The data processing agreement (DPA) — the bedrock contract companies use to evaluate how vendors handle personal data — can no longer be trusted at face value. That is the central, and arguably most alarming, conclusion of DataGrail&#x27;s <a href="https://www.datagrail.io/resources/interactive/data-privacy-trends-report-2026/"><i>Privacy and AI Trends Report 2026</i></a>, released today.</p><p>The San Francisco-based privacy platform analyzed 2,400 popular business software providers and found that 63.6% of vendors that prominently advertise AI capabilities do not disclose a third-party AI subprocessor in their legal documentation. The implication: the majority of companies purchasing AI-enabled software may be unknowingly exposing their customers&#x27; data to AI models and pipelines they never reviewed, never approved, and may not even know exist.</p><p>&quot;All software vendors are trying to move to become AI vendors, which makes sense, but the technologies are moving faster than AI governance can actually keep up,&quot; DataGrail co-founder and CEO Daniel Barber told VentureBeat in an exclusive interview ahead of the report&#x27;s release. &quot;The DPA should be the reliable document that teams use to evaluate AI risk, but based on that number, that&#x27;s not enough in 2026.&quot;</p><p>The finding drops into an enterprise landscape where organizations with high levels of shadow AI already experience average breach costs of $4.63 million — $670,000 more than those with low or no shadow AI, according to IBM&#x27;s <a href="https://www.ibm.com/reports/data-breach">2025 Cost of Data Breach Report</a>. And it arrives in a year when U.S. states gave out <a href="https://www.gartner.com/en/newsroom/press-releases/2026-04-28-gartner-estimates-us-states-privacy-fines-totaled-3-point-425-billion-dollars-in-2025-trend-expected-to-accelerate-through-2028?utm_campaign=SM_GB_YOY_GTR_SOC_SF1_SM-PR&amp;utm_source=threads,twitter&amp;utm_medium=social">$3.425 billion in privacy-related fines</a> — more than the last five years combined — a trend Gartner expects to accelerate through 2028.</p><h2><b>How researchers uncovered the growing gap between AI vendor contracts and reality</b></h2><p>DataGrail&#x27;s methodology for arriving at the 63.6% figure goes well beyond reading contracts. The company&#x27;s research team cross-referenced DPA disclosures against product documentation, GitHub environments, API connections, and marketing materials for each of the 2,400 vendors in its tracking universe.</p><p>Barber walked VentureBeat through the process: &quot;We looked at the DPA as the baseline, but then what we also looked at is the GitHub environment, the API connections that a particular vendor has, the product documentation, the marketing documentation, and triangulate that information to discern — okay, so the DPA document says use OpenAI, but actually you&#x27;ve got these three AI subprocessors over here in your product documentation outlining features and functionality, but that is not reflected in your DPA.&quot;</p><p>When asked directly about how confident he was that these gaps represent actual shadow AI risk rather than vendors using proprietary technology, Barber was unequivocal. &quot;Very confident, because we looked at the sample of the 2,400 systems, and we spent a substantial amount of time actually looking at product documentation, GitHub environments, looking at actual API connections, because we integrate with these systems as well, so we know how they process personal information. It is from primary research.&quot;</p><p>The disclosure gap matters because it undermines the entire chain of trust that privacy programs rely on. Consider a scenario Barber described: A company invests in an AI recruiting tool. The tool&#x27;s DPA lists Claude as its foundational model. The company dutifully performs a security review of Anthropic&#x27;s AI. But the recruiting tool also quietly uses OpenAI and Gemini behind the scenes — models the company never evaluated. </p><p>Those undisclosed models then process thousands of resumes and execute automated hiring decisions. The company, without knowing it, has exposed sensitive personal information — home addresses, financial data, possibly Social Security numbers — to AI systems it never vetted, potentially violating FTC regulations on automated decision-making in employment. &quot;How those vendors are evaluating and performing that automated decision making could be really disastrous for a business,&quot; Barber said.</p><h2><b>One-third of AI systems also process sensitive data, and the true number is likely higher</b></h2><p>The disclosure gap alone would be concerning enough. But <a href="https://www.datagrail.io/resources/interactive/data-privacy-trends-report-2026/">DataGrail&#x27;s report</a> layers on another finding that makes the problem materially worse: 32.8% of AI systems that disclose AI capabilities also disclose at least one other high-risk activity, such as processing sensitive personal information or powering automated decision-making. Among AI systems with self-reported risk factors, 47.1% process personal data, 20.7% have the potential to power automated decision-making, 16.5% process sensitive data categories like health or financial information, and 7.5% process biometric data.</p><p>The report argues these figures almost certainly undercount actual exposure, since they reflect only what vendors have formally disclosed. Vendors could underreport access to personal data, and the inherent flexibility of AI means even good-faith vendors might not predict riskier user applications of their tools.</p><p>This has immediate regulatory implications. The <a href="https://cppa.ca.gov/announcements/2025/20250923.html">CCPA&#x27;s new risk assessment requirement</a>, effective January 1, 2026, requires businesses to conduct and document risk assessments for processing activities that present significant privacy risks — and will require submission to CalPrivacy by April 2028, with executive attestation under penalty of perjury. </p><p>Processing sensitive personal information with AI, or using AI for automated decision-making, are precisely the activities that trigger this obligation. The report finds that 42% of companies abandoned AI initiatives in 2025 with data privacy concerns cited as a primary obstacle — a statistic sourced to <a href="https://www.spglobal.com/market-intelligence/en/news-insights/research/2025/10/generative-ai-shows-rapid-growth-but-yields-mixed-results">S&amp;P Global research</a>. Privacy teams that engage early with AI projects, Barber argues, can prevent that waste by ensuring safeguards are in place before launch, with AI risk assessments serving as the right starting point.</p><h2><b>Why consent management became 2025&#x27;s most punished privacy failure</b></h2><p>While shadow AI is still a newer category of threat, the report makes clear that traditional privacy challenges have not eased — they have intensified. Consent management was the busiest enforcement topic of 2025. California alone publicly reported $4.3 million in CCPA consent settlements, and 2025 saw over 1,400 class action wiretapping suits driven by private firms investigating tracking pixels and session replay software.</p><p>Despite this enforcement wave, 63% of the 5,000 websites DataGrail audited still fail to comply with universal opt-out mechanisms such as the Global Privacy Control signal. While that figure represents an improvement from 75% non-compliance in 2023, the pace of improvement is slow relative to the acceleration in enforcement.</p><p>Barber pointed to the case of <a href="https://www.toddsnyder.com/">Todd Snyder</a>, the menswear retailer that the California Privacy Protection Agency <a href="https://cppa.ca.gov/announcements/2025/20250506.html">fined $345,178</a> in May 2025, as evidence that enforcement is no longer reserved for big tech. &quot;This is a business that has two or three stores across the U.S. They have 300 employees,&quot; he said. &quot;They run tight margins because they&#x27;re a consumer menswear clothing store.&quot;</p><p>The California Attorney General also reached a <a href="https://oag.ca.gov/news/press-releases/california-wont-let-it-go-attorney-general-bonta-announces-275-million">$2.75 million settlement with Disney</a> over failures to honor opt-out signals, while the California Privacy Protection Agency has brought enforcement actions against <a href="https://privacy.ca.gov/2026/03/youth-sports-media-company-to-pay-1-1-million-fine-change-practices-over-privacy-violations/">PlayOn Sports</a> and <a href="https://www.koleyjessen.com/insights/publications/lessons-for-businesses-from-2026s-first-california-privacy-enforcement-actions">Ford</a> — a pattern that demonstrates both the breadth and depth of regulatory activity. Among the trackers that fire even after a user sends a GPC signal, the report found that 27.1% come from Google Analytics and 43.8% are for targeted advertising via platforms like Meta and Microsoft.</p><p>For users who do engage with consent banners, 48.3% click &quot;Accept all,&quot; while only 12.4% select &quot;Essential only&quot; and 2.3% customize their preferences. A full 37% simply exit the banner without making a selection. The practical takeaway: less than 15% of users make a conscious choice to opt out of tracking, which means consent banners present relatively low business risk when properly configured — but enormous regulatory risk when they are not.</p><h2><b>Data deletion requests surge 567% as the cost of manual processing hits $1.5 million a year</b></h2><p>Data subject request volume hit an all-time high for the fifth consecutive year. Deletion requests have surged 567% since 2021 and now represent 87% of all data subject requests. Access requests, by contrast, have gradually declined as consumers skip visibility and reach straight for the delete button.</p><p>The cost is staggering. For a mid-sized organization receiving 5 million annual web visitors, the report estimates manual DSR management now runs approximately $1.5 million per year, based on Gartner&#x27;s <a href="https://trustarc.com/resource/dsr-request-management-global-comparison/">estimated cost of $1,524 per manual DSR</a>. The average cost has climbed from $238,000 in 2021 to $1.51 million in 2025 — a trajectory that makes manual processing not just inefficient but, as the report argues, &quot;irresponsible.&quot;</p><p>Barber emphasized that these numbers reflect verified human requests with bot and spam traffic excluded, and that data broker scenarios — which will see their own massive influx of requests under <a href="https://en.wikipedia.org/wiki/Delete_Act">California&#x27;s Delete Act</a> — are reported separately. &quot;That is a natural increase,&quot; Barber told VentureBeat. &quot;If you&#x27;ve now got 20-plus U.S. states with privacy regulation, it&#x27;s unlikely that we see a federal bill passed, even though we&#x27;ve seen one proposed. And while we don&#x27;t see federal awareness and regulation, we do see at the state level over 20 states, and that may actually increase awareness for the consumer even more.&quot;</p><p>He added a telling detail about how businesses are responding in practice: &quot;99% of DataGrail customers do process that deletion&quot; even for residents of states without privacy laws, &quot;simply because it&#x27;s too hard at this point. Discerning and even communicating to the person, &#x27;Hey, you live in Montana, sorry, you&#x27;re just in an unfortunate state without regulation&#x27; — you just can&#x27;t do that.&quot; Data brokers felt the impact most acutely, with a 398% increase in deletion requests compared to 2024 and an average of over 2,000 deletion requests handled per month.</p><h2><b>State regulators issued $3.4 billion in privacy fines last year, and both parties want more</b></h2><p>The regulatory landscape underpinning all of these trends has fundamentally shifted from education to punishment. Nearly half of U.S. states now have a <a href="https://pro.bloomberglaw.com/insights/privacy/state-privacy-legislation-tracker/">comprehensive privacy law</a> in effect, plus <a href="https://www.brookings.edu/articles/how-different-states-are-approaching-ai/">over 160 AI-specific laws</a>. State legislatures enacted 145 AI-related laws in 2025 alone, with another thousand introduced or reworked. According to Gartner, over 50% of the U.S. population is now covered by a comprehensive state privacy law, with 24 additional states expected to pass laws within five years. States have also begun pooling their resources, with ten forming the <a href="https://www.jdsupra.com/legalnews/two-more-states-join-consortium-of-6791648/">Consortium of Privacy Regulators</a> last year and pledging to coordinate investigations across state lines.</p><p>Barber argued that privacy enforcement is fundamentally bipartisan, which insulates it from the shifting political winds of the current administration. &quot;Privacy overall is a pretty bipartisan issue,&quot; he said. &quot;It&#x27;s easy to pass privacy regulation because constituents somewhat expect privacy in their day-to-day living. If you were flying on an airline and they said, &#x27;Okay, this seat, if you want your privacy, you&#x27;re going to have to pay $6 more,&#x27; you&#x27;re like, &#x27;I&#x27;m going to go to another airline.&#x27; It&#x27;s an expected part of a transaction at this stage.&quot;</p><p>He predicted that other states will replicate California&#x27;s enforcement model. &quot;California has their enforcement division, CalPrivacy. That group has one task: to ensure enforcement of privacy throughout businesses. Is it likely that we see other states get funding and support to fund these types of groups? Highly likely. The enforcement fines — the actual payments — go back to us as constituents. That type of model, you could imagine, being very popular across the country.&quot;</p><h2><b>Privacy teams are losing a third of their staff just as AI governance demands explode</b></h2><p>Perhaps the most paradoxical finding in the report is that privacy teams lost as much as <a href="https://www.isaca.org/resources/reports/state-of-privacy-2026">33% of their headcount last year</a>, even as their workloads expanded across every metric the report tracks. Cisco data cited in the report shows that 90% of privacy programs expanded in 2025 due to AI, while only 12% of AI governance programs are considered mature. Meanwhile, 74% of privacy teams planned to apply AI to privacy-related tasks in 2026, according to <a href="https://www.isaca.org/about-us/newsroom/press-releases/2026/new-isaca-study-privacy-teams-are-shrinking-increasingly-stressed">ISACA&#x27;s State of Privacy 2026 survey</a>.</p><p>Barber sees this as part of a broader macroeconomic pattern rather than a sign that organizations do not value privacy. &quot;It&#x27;s actually a fascinating macro trend, and probably one you&#x27;ve seen across all functions,&quot; he said. &quot;Businesses are driving more efficiency in all parts of the business. Privacy teams, five years ago, we would have said, &#x27;Well, there&#x27;s more regulation, the volume of deletions have increased 500%, we need more humans.&#x27; It&#x27;s become clear that AI provides capabilities that can do the work for privacy individuals.&quot; He drew an analogy: &quot;They might have had a design team of 20 people five years ago, now they have a design team of five, courtesy of Claude Design or Gamma or whatever the tool may be. I think that&#x27;s what we&#x27;re seeing here as well.&quot;</p><p>DataGrail has positioned its own AI agent, <a href="https://www.datagrail.io/blog/product/introducing-vera-the-first-complete-ai-privacy-agent/">Vera</a> — launched in March 2026 — as part of the answer. Vera is embedded within DataGrail&#x27;s existing platform and aims to automate privacy workflows across multiple jurisdictions. The company was also named the first production-ready<a href="https://www.datagrail.io/blog/product/whats-new-from-datagrail-february-2026/"> Model Context Protocol server for privacy</a>, using the standard created by Anthropic to enable customers to launch DataGrail tools from whatever application they are already working in, whether Slack, email, or Claude.</p><h2><b>Can a vendor-produced report be trusted to diagnose the problems that vendor sells solutions for?</b></h2><p>DataGrail is, of course, a company that directly benefits from the problems its report identifies. The company has raised a total of $84.2 million over five rounds, with its largest being a <a href="https://www.datagrail.io/press/datagrail-raises-45-million/">$45 million Series C</a> in October 2022 led by Third Point Ventures. Its platform addresses precisely the data mapping, DSR automation, consent management, and risk assessment challenges the report spotlights.</p><p>Barber acknowledged the tension directly. &quot;It&#x27;s a fair statement,&quot; he said when asked about potential skepticism. &quot;DataGrail doesn&#x27;t provide a service to keep DPAs up to date — that&#x27;s on a business to evaluate how they work with a vendor. What DataGrail does help to do is assessments, and automate those assessments using our AI agent, Vera, to assess that increased risk.&quot;</p><p>He argued that the more neutral reading of the data is structural: &quot;This is evidence to show that the DPA unfortunately is not keeping up with technology and the speed at which technology is innovating. That&#x27;s both exciting but also we need to accept that&#x27;s where we are.&quot; The methodology does lend some credibility to this claim. </p><p>The report draws on anonymized privacy operations data from hundreds of enterprise customers, the 2,400-system AI tracking database, and the 5,000-website consent audit — sources that are at least partially independent of DataGrail&#x27;s commercial interests. And the broader findings on enforcement spending, DSR volume trends, and regulatory expansion align closely with independently published data from Gartner, Cisco, and state enforcement agencies.</p><h2><b>The next frontier: agentic AI could spread unvetted data across entire organizations autonomously</b></h2><p>When asked about the most important trend that did not make it into the report, Barber pointed to a next-generation risk that extends the shadow AI problem into far more dangerous territory: agentic AI workflows. Gartner predicts <a href="https://www.pagerduty.com/resources/itops/analyst-report/gartner-predicts-report-2026-ai-agents-transform-it-infrastructure-operations/">40% of enterprise applications</a> will feature task-specific AI agents by end of 2026, up from under 5% in 2025 — a pace of adoption that could rapidly outstrip the governance mechanisms companies are only now beginning to build.</p><p>&quot;Where we go next with this research is agent processing,&quot; Barber said. &quot;How are agents then leveraging that information? Because the downstream ramifications would be far more concerning for a business. One particular system is using shadow AI, the business has no idea that that&#x27;s happening, and then an agent is propagating that information across a whole bunch of other places. The guardrails of you and I checking the system will be lower than maybe what we&#x27;ve seen in the past with agentic workflows.&quot;</p><p>He framed the distinction in human terms: &quot;The identity of an agent is different than a human. There is thought that goes into what am I about to use here, where did this information come from, how was it collected — that may not be considered in the same way for an agentic workflow. We need to solve the root of the problem, which is how are these businesses leveraging AI subprocessors. But this quickly becomes an agentic problem that could be far more concerning.&quot;</p><p>For the enterprise privacy and security leaders absorbing this report today, the uncomfortable truth is that the foundational documents and processes they have relied on to manage vendor risk for years are decomposing in real time. The DPA is breaking down as a reliable instrument. State enforcement is accelerating on a bipartisan basis. Privacy teams are shrinking even as their mandates expand. And the next wave of agentic AI systems threatens to distribute unvetted data processing across networks of autonomous agents that operate with even less human oversight than today&#x27;s tools.</p><p>Five years ago, when DataGrail published its first trends report, deletion requests were a fraction of what they are today, only a handful of states had privacy laws on the books, and the phrase &quot;shadow AI&quot; did not exist. Every year since, the report has warned that the problem was getting worse. Every year, the data has proved it right. The companies that survive the next chapter will not be the ones with the biggest compliance teams or the thickest policy binders. They will be the ones that accept a disorienting new reality: in 2026, the contracts you signed may not describe the AI that is already processing your customers&#x27; data — and by 2027, autonomous agents may be deciding what to do with it.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Security</category>
            <category>Data</category>
            <category>Business</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/7AZgKVWkH5KjpSGWlPf4sO/8ccdefa0059b057a7fc9950fc323ac5a/Listing_image.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
    </channel>
</rss>