<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Wed, 13 May 2026 21:53:08 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[Anthropic finally beat OpenAI in business AI adoption — but 3 big threats could erase its lead]]></title>
            <link>https://venturebeat.com/technology/anthropic-finally-beat-openai-in-business-ai-adoption-but-3-big-threats-could-erase-its-lead</link>
            <guid isPermaLink="false">vDhn8EUlHvFIuZ0z264X8</guid>
            <pubDate>Wed, 13 May 2026 21:53:05 GMT</pubDate>
            <description><![CDATA[<p>For the first time since the AI race began, more American businesses are paying for Anthropic&#x27;s Claude than for OpenAI&#x27;s ChatGPT. </p><p><a href="https://ramp.com/leading-indicators/ai-index-may-2026">Adoption of Anthropic rose 3.8% in April to 34.4% of businesses</a>, according to the May 2026 release of the <a href="https://ramp.com/data/ai-index">Ramp AI Index</a>. OpenAI&#x27;s adoption fell 2.9% to 32.3%. Overall AI adoption among businesses rose 0.2 percentage points to 50.6%.</p><p>The crossover — published Tuesday by <a href="https://ramp.com/">Ramp</a>, the corporate card and finance automation platform that tracks spending patterns across more than 50,000 U.S. businesses — marks the culmination of a yearlong surge by Anthropic that few in the industry predicted. Anthropic has quadrupled its business adoption over the past year, while OpenAI grew its business adoption by only 0.3%.</p><p>But the same report that crowns a new market leader also warns that Anthropic&#x27;s position may be more fragile than it appears — threatened by escalating costs, compute constraints, and the very token-based pricing model that has fueled the company&#x27;s extraordinary revenue growth.</p><h2><b>How Anthropic went from a niche player to the most popular AI model in corporate America</b></h2><p>To appreciate the scale of the shift, consider where the two companies stood a year ago. In April 2025, <a href="https://ramp.com/leading-indicators/ai-index-may-2026">OpenAI commanded roughly 32% of business AI adoption </a>according to Ramp&#x27;s underlying data, while Anthropic stood at under 8%. OpenAI had built an early, commanding lead as the consumer default — ChatGPT was where most people first encountered AI, and that momentum carried into corporate purchasing decisions.</p><p>Anthropic&#x27;s path was different. The company was popular early on with the earliest adopters — engineers, AI evangelists, the technical vanguard inside organizations. As Ramp lead economist <a href="https://ramp.com/leading-indicators/top-saas-vendors-on-ramp-may-2026">Ara Kharazian</a> noted in the March 2026 edition of the index, Anthropic leveraged that early-adopter base to go mainstream. By February, Anthropic was winning about 70% of head-to-head matchups against OpenAI among businesses purchasing AI services for the first time — a complete reversal of the trends observed in 2025.</p><p>The trajectory is visible in Ramp&#x27;s underlying data. The company&#x27;s adoption figures show Anthropic climbing from 0.03% of businesses in June 2023 to 7.94% by April 2025, then rocketing to 34.44% by April 2026.</p><p>OpenAI, meanwhile, peaked near 36.5% in mid-2025 and has been slowly declining since. The engine behind much of this growth is a single product: <a href="https://code.claude.com/docs/en/desktop">Claude Code</a>, the company&#x27;s agentic AI coding tool, which has become the fastest-growing product in Anthropic&#x27;s history. A recent analysis estimated that 4% of all GitHub public commits worldwide were being authored by Claude Code — double the percentage from just one month prior.</p><p>Business Insider reported in April that the <a href="https://www.businessinsider.com/anthropic-may-soon-pass-openai-measure-ai-business-spending-ramp-2026-4">crossover was imminent</a>. A Ramp spokesperson told the outlet that &quot;at the current pace, Anthropic is on track to surpass OpenAI within the next two months,&quot; noting that it already led &quot;among early adopters, including VC-backed companies, and in key sectors like software, finance, and professional services.&quot; That prediction proved accurate almost to the day.</p><h2><b>AI adoption reaches a workplace tipping point, but the productivity revolution hasn&#x27;t arrived yet</b></h2><p>The Ramp data on business spending finds its complement in a separate workforce survey that underscores just how deeply AI has embedded itself into American economic life. For the first time in Gallup&#x27;s measurement, <a href="https://www.gallup.com/workplace/704225/rising-adoption-spurs-workforce-changes.aspx">half of employed American adults say they use AI in their role at least a few times a year</a>, up from 46% the previous quarter. Frequent use is also increasing, with 13% of employees now saying they use AI daily and 28% reporting they use it a few times a week or more.</p><p>But the Gallup data, based on a <a href="https://www.gallup.com/699797/indicator-artificial-intelligence.aspx">February 2026 survey of 23,717 U.S. employees</a>, also suggests that the benefits of AI remain concentrated at the level of individual tasks rather than organizational transformation. Only about one in 10 employees in AI-adopting organizations strongly agree that artificial intelligence has transformed how work gets done. That finding is consistent with firm-level studies across the U.S., U.K., Germany, and Australia showing chief executives reporting minimal broad productivity effects from AI over the past three years — a notable gap between the hype cycle and operational reality.</p><p>The <a href="https://ramp.com/data/ai-index">Ramp methodology </a>captures a different but complementary signal. Where Gallup asks employees whether they use AI, Ramp measures whether their employer is writing checks for it. The index counts corporate card and invoice-based payments, identifying firms as AI adopters if they have a positive transaction amount for an AI product or service in a given month. As Ramp&#x27;s methodology page notes, its results likely underestimate actual adoption because many employees use free AI tools or personal accounts for work tasks. Taken together, the two datasets paint a picture of AI that is ubiquitous in the American workplace but has not yet delivered on its promise to fundamentally transform how organizations operate.</p><h2><b>Why Anthropic&#x27;s biggest threat might be the success of its own best-selling product</b></h2><p>Perhaps the most striking aspect of Ramp&#x27;s analysis is its refusal to declare a lasting winner. Kharazian identified three specific risks facing Anthropic even as the company takes the lead — and the most serious one stems from a structural tension baked into the company&#x27;s business model.</p><p>Anthropic <a href="https://ramp.com/leading-indicators/ai-index-may-2026">makes more money when businesses purchase more tokens</a>, meaning the company is incentivized to drive users toward more expensive models even when cheaper ones are sufficient. This dynamic is already creating budget crises at major enterprises. Uber&#x27;s CTO revealed that <a href="https://finance.yahoo.com/sectors/technology/articles/ubers-anthropic-ai-push-hits-223109852.html">the company spent its entire 2026 AI budget in just four months</a>, largely on Claude Code and Cursor, with engineers reporting monthly API costs <a href="https://byteiota.com/uber-blows-2026-ai-budget-on-claude-code-in-4-months/">between $500 and $2,000 per person</a>. Adoption jumped from 32% to 84% of Uber engineers in a matter of months, and about 70% of committed code at Uber now comes from AI. The Uber case is a microcosm of a broader tension: Claude Code works — perhaps too well. When a productivity tool becomes so valuable that an organization&#x27;s $3.4 billion R&amp;D operation can&#x27;t afford to keep the lights on, the resulting cost scrutiny could push enterprises toward cheaper alternatives.</p><p>At the same time, quality and reliability have suffered under the weight of demand. In recent weeks, users have experienced <a href="https://www.cnbc.com/2026/05/06/anthropic-spacex-data-center-capacity.html">frequent outages</a>, <a href="https://www.anthropic.com/engineering/april-23-postmortem">rate limits</a>, and <a href="https://fortune.com/2026/04/14/anthropic-claude-performance-decline-user-complaints-backlash-lack-of-transparency-accusations-compute-crunch/">increasing dissatisfaction with Claude&#x27;s results</a>. Anthropic has responded by <a href="https://www.anthropic.com/engineering/april-23-postmortem">resetting usage limits</a> and by <a href="https://www.cnbc.com/2026/05/06/anthropic-spacex-data-center-capacity.html">striking a compute deal with SpaceX</a> to access more than 300 megawatts of new capacity at the Colossus 1 data center in Memphis. CEO Dario Amodei said the company saw &quot;<a href="https://venturebeat.com/technology/anthropic-says-it-hit-a-30-billion-revenue-run-rate-after-crazy-80x-growth">80x growth per year in revenue and usage</a>&quot; for Q1 2026, when it had only planned for 10x. And Ramp economist Rafael Hajjar found that Anthropic&#x27;s latest model update would triple token costs for any prompt that includes an image — a change that seems at odds with the company&#x27;s already-acute cost and compute problems.</p><h2><b>Open-source models and OpenAI&#x27;s Codex could quickly erode Anthropic&#x27;s narrow lead</b></h2><p>The <a href="https://ramp.com/leading-indicators/ai-index-may-2026">Ramp report</a> points to competitive dynamics that could reshape the market within months. Some of the fastest-growing vendors on Ramp&#x27;s platform in April were AI inference platforms that give companies access to cheap, open-source models — offering enterprises a way to get &quot;good enough&quot; AI at a fraction of the cost, particularly for routine tasks that don&#x27;t require frontier model capabilities.</p><p>OpenAI&#x27;s Codex presents an even more direct threat. By most measures, it is a strong product that does many of the <a href="https://composio.dev/content/claude-code-vs-openai-codex">same tasks as Claude Code at a lower price point</a> — and the switching cost between models is minimal. <a href="https://newsletter.pragmaticengineer.com/p/how-uber-uses-ai-for-development">Uber itself is already testing Codex as a hedge</a>, a move that could preview a broader pattern across enterprise tech. OpenAI also retains enormous structural advantages. <a href="https://searchengineland.com/chatgpt-900-million-weekly-active-users-470492">ChatGPT reached 900 million weekly active users by March 2026</a>, dwarfing Claude&#x27;s consumer footprint. Enterprise revenue now makes up more than 40% of OpenAI&#x27;s total and is on track to reach parity with consumer revenue by the end of 2026. And <a href="https://openai.com/index/accelerating-the-next-phase-ai/">OpenAI&#x27;s $122 billion funding round</a>, closed in March at an $852 billion valuation, gives it vast resources to compete on pricing, capacity, and product development.</p><p>Anthropic is not standing still on distribution. AWS recently launched <a href="https://aws.amazon.com/claude-platform/">Claude Platform on AWS</a>, giving enterprises direct access to Anthropic&#x27;s native platform through existing AWS credentials, billing, and access controls — a move that lowers procurement friction considerably. Anthropic has also announced <a href="https://www.anthropic.com/news/microsoft-nvidia-anthropic-announce-strategic-partnerships">compute agreements totaling billions of dollars</a> with Amazon, Google, Microsoft, Nvidia, and others, though <a href="https://www.anthropic.com/news/google-broadcom-partnership-compute">much of that capacity won&#x27;t come online until late 2026</a> or 2027. Anthropic is reportedly in talks to raise another $50 billion at a valuation approaching $900 billion.</p><h2><b>The unlikely reason businesses are choosing Claude over cheaper alternatives</b></h2><p>Beneath the spending data and market share charts lies a more intriguing question: Why are businesses choosing Anthropic over a cheaper, comparably performing alternative?</p><p>Kharazian explored this in his March analysis. <a href="https://www.leanware.co/insights/codex-vs-claude-code">Claude Code and OpenAI&#x27;s Codex are roughly comparable products</a> — on certain benchmarks, Codex is arguably better, and it&#x27;s also cheaper. Yet <a href="https://www.cnbc.com/2026/04/17/ai-tokens-anthropic-openai-nvidia.html">Anthropic can&#x27;t meet its own demand</a>. Every plan still has usage limits and rate caps. The company is actively turning away revenue because it doesn&#x27;t have the compute to serve it. Despite charging more for roughly equivalent performance, Anthropic&#x27;s demand is growing.</p><p>Kharazian suggested the answer might be cultural. Earlier this year, <a href="https://www.reuters.com/world/us-judge-blocks-pentagons-anthropic-blacklisting-now-2026-03-26/">Anthropic refused to agree to the Pentagon&#x27;s terms of use for Claude</a>, resulting in a blacklisting by the Department of Defense. OpenAI stepped in to offer its services in Anthropic&#x27;s place. In the wake of that episode, users rallied around Anthropic, and Claude temporarily surpassed ChatGPT on the App Store. The question, Kharazian wrote, is whether choosing an AI model is becoming less like an enterprise procurement decision and &quot;more like the <a href="https://time.com/article/2026/03/11/anthropic-claude-disruptive-company-pentagon/">green bubble/blue bubble distinction in iMessage</a>: a signal of identity as much as a choice of technology.&quot;</p><p>That observation may sound absurd for an enterprise software category. But Ramp&#x27;s data tells a story that pure economics cannot fully explain. In a market where the products perform similarly, where the cheaper option is arguably better on benchmarks, and where switching costs are negligible, something other than spreadsheet logic is driving the biggest shift in AI market share since the industry began. As Kharazian noted in his report: &quot;We have never seen a software industry as dynamic, where newcomers can disrupt market leaders in a matter of months, and where the pace of development overrides the typical forces of vendor stickiness.&quot;</p><p>That dynamism cuts both ways. The same forces that propelled a company from 8% to 34% market share in twelve months could just as easily work in reverse. Anthropic&#x27;s two-point lead was earned in the <a href="https://www.wsj.com/finance/stocks/the-1-6-trillion-meltdown-that-swept-through-software-stocks-86c8b3a2">most volatile software market in modern history</a> — and in this market, the distance between the throne and the floor has never been shorter.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Business</category>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4m169U8ajMEpWjEn6pQgzK/7690906968897882b8756a902d8848c6/Nuneybits_Vector_art_of_two_rising_lines_on_a_graph_burnt_orang_937edfc7-d114-495e-aad5-a2f1297757c6.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Frontier AI models don't just delete document content — they rewrite it, and the errors are nearly impossible to catch]]></title>
            <link>https://venturebeat.com/orchestration/frontier-ai-models-dont-just-delete-document-content-they-rewrite-it-and-the-errors-are-nearly-impossible-to-catch</link>
            <guid isPermaLink="false">26YbOQWu3qmG1REz9ivhZd</guid>
            <pubDate>Wed, 13 May 2026 20:10:54 GMT</pubDate>
            <description><![CDATA[<p>As large language models become more capable, users are tempted to delegate knowledge tasks where models process documents on their behalf and provide the finished results. But how far can you trust the model to stay faithful to the content of your documents when it has to iterate over them across multiple rounds?</p><p>A <a href="https://arxiv.org/abs/2604.15597">new study</a> by researchers at Microsoft shows that large language models silently corrupt documents that they work on by introducing errors. The researchers developed a benchmark that simulates multi-step autonomous workflows across 52 professional domains, using a method that automatically measures how much content degrades over time.</p><p>Their findings show that even top-tier frontier models corrupt an average of 25% of document content by the end of these workflows. And providing models with agentic tools or realistic distractor documents actually worsens their performance.</p><p>This serves as a warning that while there is increasing pressure to automate knowledge work, current language models are not fully reliable for these tasks.</p><h2>The mechanics of delegated work</h2><p>The Microsoft study focuses on “delegated work,” an emerging paradigm where users allow LLMs to complete knowledge tasks on their behalf by analyzing and modifying documents.</p><p>A prominent example of this paradigm is <a href="https://venturebeat.com/orchestration/vibe-coding-with-overeager-ai-lessons-learned-from-treating-google-ai-studio">vibe coding</a>, where a user delegates software development and code editing to an AI. But delegated workflows extend far beyond programming into other domains. In accounting, for example, a user might supply a dense ledger and instruct the model to split the document into separate files organized by specific expense categories.</p><p>Because users might lack the time or the specialized expertise to manually review every modification the AI implements, delegation often hinges on trust. Users expect that the model will faithfully complete tasks without introducing unchecked errors, unauthorized deletions, or hallucinations in the documents.</p><p>To measure how far AI systems can be trusted in extended, iterative delegated workflows, the researchers developed the <a href="https://github.com/microsoft/DELEGATE52">DELEGATE-52 benchmark</a>. The benchmark is composed of 310 work environments spanning 52 diverse professional domains, including financial accounting, software engineering, crystallography, and music notation.</p><p>Each work environment relies on real-world seed text documents ranging from 2,000 to 5,000 tokens. Alongside the seed document, the environments include five to ten complex, non-trivial editing tasks.</p><p>Grading a complex, multi-step editing process usually requires expensive human review. DELEGATE-52 bypasses this by using a “round-trip relay” simulation method that evaluates answers without requiring human-annotated reference solutions. The approach is inspired by the backtranslation technique used in machine translation evaluation, where an AI model is told to translate a document from one language to another and back to see how perfectly it reproduces the original version.</p><p>Accordingly, every edit task in DELEGATE-52 is designed to be fully reversible, pairing a forward instruction with its precise inverse. For example, an instruction to split the ledger into separate files by expense category is paired with an instruction to merge all category files back into a single ledger.</p><p>In comments provided to VentureBeat, Philippe Laban, Senior Researcher at Microsoft Research and co-author of the paper, clarified that this is not simply a test of whether an AI can hit &quot;undo.&quot; Because human workers cannot be forced to instantly &quot;forget&quot; a task they just did, this round-trip evaluation is uniquely suited for AI. By starting a new conversational session, the researchers force the model to attempt the inverse task completely independently.</p><p>The models in their experiments “do not know whether a task is a forward or backward step and are unaware of the overall experiment design,&quot; Laban explained. &quot;They are simply attempting each task as thoroughly as they can at each step.&quot;</p><p>These roundtrip tasks are chained together into a continuous relay to simulate long-horizon workflows spanning 20 consecutive interactions. To make the environment more realistic, the benchmark introduces distractor files in the context of each task. These contain 8,000 to 12,000 tokens of topically related but completely irrelevant documents. Distractors measure whether the AI can maintain focus or if it gets confused and pulls in the wrong data.</p><h2>Testing frontier models in the relay</h2><p>To understand how different architectures and scales handle delegated work, the researchers tested 19 different language models from OpenAI, Anthropic, Google, Mistral, xAI, and Moonshot. The main experiment subjected these models to a simulation of 20 consecutive editing interactions.</p><p>Across all models, documents suffered an average degradation of 50% by the end of the simulation. Even the best frontier models in the experiment, specifically Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4, corrupted an average of 25% of the document content.</p><p>Out of 52 professional domains, Python was the only one where most models achieved a ready status with a score of 98% or higher. Models excel in programmatic tasks but struggle severely in natural language and niche domains like fiction, earning statements, or recipes. The overall top model, Gemini 3.1 Pro, was deemed ready for delegated work in only 11 out of the 52 domains.</p><p>Interestingly, the corruption was not caused by death by a thousand cuts where the models slowly accumulate tiny errors. Instead, about 80% of total degradation is caused by sparse but massive critical failures, which are single interactions where a model suddenly drops at least 10% of the document&#x27;s content. The frontier models do not necessarily avoid small errors better. They simply delay these catastrophic failures to later rounds.</p><p>Another important observation is that when weaker models fail, their degradation originates primarily from content deletion. However, when frontier models fail, they actively corrupt the existing content. The text is still there, but it has been subtly distorted or hallucinated, making it much harder for a human overseer to detect the error.</p><p>Interestingly, giving models an agentic harness with generic tools for code execution and file read/write access actually worsened their performance, adding an average of 6% more degradation. Laban explained that the failure lies in relying on generic tools rather than domain-specific ones.</p><p>&quot;Models lack the capability to write effective programs on the fly that can manipulate files across diverse domains without mistakes,&quot; he noted. &quot;When they cannot do something programmatically, they resort to reading and rewriting entire files, which is less efficient and more error prone.&quot; The solution for developers is to build tightly scoped tools (such as specific functions to calculate or move entries within .ledger files) to keep agents on track.</p><p>Degradation also snowballs as documents get larger or as more distractor files are added to the workspace. For enterprise teams investing heavily in retrieval-augmented generation (RAG), these distractor documents serve as a direct warning about the compounding cost of messy context. While a noisy context window might cause a minimal 1% performance drop after just two interactions, that degradation compounds to a massive 2-8% drop over a long simulation.</p><p>&quot;For the retrieval community: RAG pipelines should be evaluated over multi-step workflows, not just single-turn retrieval benchmarks,&quot; Laban said. &quot;Single-turn measurements systematically underestimate the harm of imprecise retrieval.&quot;</p><h2>Reality check for the autonomous enterprise</h2><p>The findings from the DELEGATE-52 benchmark offer a critical reality check for the current hype surrounding fully autonomous AI agents.</p><p>The benchmark&#x27;s design also implies a practical constraint: because models can maintain a clean record for several steps before a sudden catastrophic failure, incremental human review is necessary — not a single final check. Laban recommends building AI applications around short, transparent tasks rather than complex long-horizon agents. This keeps the action implication without the writer delivering the prescription.</p><p>For organizations wanting to deploy autonomous agents safely today, the DELEGATE-52 methodology provides a practical blueprint for testing in-house data pipelines. Laban explained that &quot;… an enterprise team wanting to adopt this framework needs to build three components: (a) a set of reversible editing tasks representative of their workflows, (b) a parser that converts their domain documents into a structured representation, and (c) a similarity function that compares two parsed representations.&quot; Teams do not even need to build parsers from scratch. The Microsoft research team successfully repurposed existing parsing libraries for 30 out of the 52 domains tested.</p><p>Laban is optimistic about the rate of improvement. &quot;Progress is real and fast. Looking at the GPT family alone, models go from scoring below 20% to around 70% in 18 months,&quot; Laban said. &quot;If that trajectory continues, models will soon be able to achieve saturated scores on DELEGATE-52.&quot;</p><p>However, Laban cautioned that DELEGATE-52 is purposefully small compared to massive enterprise environments. Even as foundation models inevitably master this benchmark, the endless long-tail of unique enterprise data and workflows means organizations will always need to invest in custom, domain-specific tooling to keep their autonomous agents reliable.</p>]]></description>
            <author>bendee983@gmail.com (Ben Dickson)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5A114QnbRiFMtlZuY18ROe/8a6fb481f9761188b83d3ec44d26714e/LLM_data_corruption.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Protect your enterprise now from the Shai-Hulud worm and npm vulnerability in 6 actionable steps]]></title>
            <link>https://venturebeat.com/security/shai-hulud-worm-172-npm-pypi-packages-valid-provenance-ci-cd-audit</link>
            <guid isPermaLink="false">7cicO7UI0zXAqaiain0QwJ</guid>
            <pubDate>Tue, 12 May 2026 18:49:52 GMT</pubDate>
            <description><![CDATA[<p>Any development environment that installed or imported one of the 172 compromised npm or PyPI packages published since May 11 should be treated as potentially compromised. On affected developer workstations, the worm <a href="https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem">harvests credentials from over 100 file paths</a>: AWS keys, SSH private keys, npm tokens, GitHub PATs, HashiCorp Vault tokens, Kubernetes service accounts, Docker configs, shell history, and cryptocurrency wallets. For the first time in a TeamPCP campaign, it targets password managers including 1Password and Bitwarden, according to <a href="https://www.securityweek.com/tanstack-mistral-ai-uipath-hit-in-fresh-supply-chain-attack/">SecurityWeek</a>. </p><p>It steals Claude and Kiro AI agent configurations, including MCP server auth tokens for every external service an agent connects to. And it does <i>not leave</i> when the package is removed.</p><p>The worm installs persistence in Claude Code (.claude/settings.json) and VS Code (.vscode/tasks.json with runOn: folderOpen) that re-execute every project open, plus a system daemon (macOS LaunchAgent / Linux systemd) that survives reboots. These live in the project tree, not in node_modules. Uninstalling the package does not remove them. On CI runners, the worm <a href="https://tanstack.com/blog/npm-supply-chain-compromise-postmortem">reads runner process memory directly</a> via /proc/pid/mem to extract secrets, including masked ones, on Linux-based runners. If you revoke tokens before isolating the machine, <a href="https://www.wiz.io/blog/mini-shai-hulud-strikes-again-tanstack-more-npm-packages-compromised">Wiz’s analysis found</a> a destructive daemon wipes your home directory.</p><p>Between 19:20 and 19:26 UTC on May 11, the Mini Shai-Hulud worm published 84 malicious versions across 42 @tanstack/* npm packages. Within 48 hours the campaign expanded to 172 packages across 403 malicious versions spanning npm and PyPI, according to <a href="https://www.mend.io/blog/mini-shai-hulud-is-back-172-npm-and-pypi-packages-compromised-in-latest-wave/">Mend’s tracking</a>. @tanstack/react-router alone receives 12.7 million weekly downloads. <a href="https://github.com/TanStack/router/issues/7383">CVE-2026-45321</a>, CVSS 9.6. <a href="https://thehackernews.com/2026/05/mini-shai-hulud-worm-compromises.html">OX Security</a> reported 518 million cumulative downloads affected. Every malicious version carried a valid SLSA Build Level 3 provenance attestation. The provenance was real. The packages were poisoned.</p><p>“TanStack had the right setup on paper: OIDC trusted publishing, signed provenance, 2FA on every maintainer account. The attack worked anyway,” Peyton Kennedy, senior security researcher at <a href="https://www.endorlabs.com/learn/shai-hulud-compromises-the-tanstack-ecosystem-80-packages-compromised">Endor Labs</a>, told VentureBeat in an exclusive interview. “What the orphaned commit technique shows is that OIDC scope is the actual control that matters here, not provenance, not 2FA. If your publish pipeline trusts the entire repository rather than a specific workflow on a specific branch, a commit with no parent history and no branch association is enough to get a valid publish token. That’s a one-line configuration fix.”</p><h2><b>Three vulnerabilities chained into one provenance-attested worm</b></h2><p><a href="https://tanstack.com/blog/npm-supply-chain-compromise-postmortem">TanStack’s postmortem</a> lays out the kill chain. On May 10, the attacker forked TanStack/router under the name zblgg/configuration, chosen to avoid fork-list searches per <a href="https://snyk.io/blog/tanstack-npm-packages-compromised/">Snyk’s analysis</a>. A pull request triggered a pull_request_target workflow that checked out fork code and ran a build, giving the attacker code execution on TanStack’s runner. The attacker poisoned the GitHub Actions cache. When a legitimate maintainer merged to main, the release workflow restored the poisoned cache. Attacker binaries read /proc/pid/mem, extracted the OIDC token, and POSTed directly to registry.npmjs.org. Tests failed. Publish was skipped. 84 signed packages still reached the registry.</p><p>“Each vulnerability bridges the trust boundary the others assumed,” <a href="https://tanstack.com/blog/npm-supply-chain-compromise-postmortem">the postmortem states</a>. Published tradecraft from the March 2025 tj-actions/changed-files compromise, recombined in a new context.</p><h2><b>The worm crossed from npm into PyPI within hours</b></h2><p><a href="https://x.com/MsftSecIntel/status/2054041471280423424">Microsoft Threat Intelligence confirmed</a> the mistralai PyPI package v2.4.6 executes on import (not on install), downloading a payload disguised as Hugging Face Transformers. npm mitigations (lockfile enforcement, --ignore-scripts) do not cover Python import-time execution.</p><p>Mistral AI published a <a href="https://docs.mistral.ai/resources/security-advisories">security advisory</a> confirming the impact. Compromised npm packages were available between May 11 at 22:45 UTC and May 12 at 01:53 UTC (roughly three hours). The PyPI release mistralai==2.4.6 is quarantined. Mistral stated an affected developer device was involved but no Mistral infrastructure was compromised. <a href="https://safedep.io/mass-npm-supply-chain-attack-tanstack-mistral/">SafeDep confirmed</a> Mistral never released v2.4.6; no commits landed May 11 and no tag exists.</p><p><a href="https://www.wiz.io/blog/mini-shai-hulud-strikes-again-tanstack-more-npm-packages-compromised">Wiz documented</a> the full blast radius: 65 UiPath packages, Mistral AI SDKs, OpenSearch, Guardrails AI, 20 Squawk packages. <a href="https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem">StepSecurity attributes</a> the campaign to TeamPCP, based on toolchain overlap with prior Shai-Hulud waves and the Bitwarden CLI/Trivy compromises. The worm <a href="https://www.bleepingcomputer.com/news/security/shai-hulud-attack-ships-signed-malicious-tanstack-mistral-npm-packages/">runs under Bun rather than Node.js</a> to evade Node.js security monitoring.</p><h2><b>The attacker treated AI coding agents as part of the trusted execution environment</b></h2><p><a href="https://socket.dev/blog/tanstack-npm-packages-compromised-mini-shai-hulud-supply-chain-attack">Socket’s technical analysis</a> of the 2.3 MB router_init.js payload identifies ten credential-collection classes running in parallel. The worm writes persistence into .claude/ and .vscode/ directories, hooking Claude Code’s SessionStart config and VS Code’s folder-open task runner. <a href="https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem">StepSecurity’s deobfuscation</a> confirmed the worm also harvests Claude and Kiro MCP server configurations (~/.claude.json, ~/.claude/mcp.json, ~/.kiro/settings/mcp.json), which store API keys and auth tokens for external services. This is an early but confirmed instance of supply-chain malware treating AI agent configurations as high-value credential targets. The npm token description the worm sets reads: “IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner.” It is not a bluff.</p><p>“What stood out to me about this payload is where it planted itself after running,” Kennedy told VentureBeat. “It wrote persistence hooks into Claude Code’s SessionStart config and VS Code’s folder-open task runner so it would re-execute every time a developer opened a project, even after the npm package was removed. The attacker treated the AI coding agent as part of the trusted execution environment, which it is. These tools read your repo, run shell commands, and have access to the same secrets a developer does. Securing a development environment now means thinking about the agents, not just the packages.”</p><h2><b>CI/CD Trust-Chain Audit Grid</b></h2><p><i>Six gaps Mini Shai-Hulud exploited. What your CI/CD does today. The control that closes each one.</i></p><table><tbody><tr><td><p><b>Audit question</b></p></td><td><p><b>What your CI/CD does today</b></p></td><td><p><b>The gap</b></p></td></tr><tr><td><p>1. Pin OIDC trusted publishing to a specific workflow file on a specific protected branch. Constrain id-token: write to only the publish job. Ensure that job runs from a clean workspace with no restored untrusted cache</p></td><td><p>Most orgs grant OIDC trust at the repository level. Any workflow run in the repo can request a publish token. id-token: write is often set at the workflow level, not scoped to the publish job.</p></td><td><p>The worm achieved code execution inside the legitimate release workflow via cache poisoning, then extracted the OIDC token from runner process memory. Branch/workflow pinning alone would not have stopped this attack because the malicious code was already running inside the pinned workflow. The complete fix requires pinning PLUS constraining id-token: write to only the publish job PLUS ensuring that job uses a clean, unshared cache.</p></td></tr><tr><td><p>2. <!-- -->Treat SLSA provenance as necessary but not sufficient. Add behavioral analysis at install time</p></td><td><p>Teams treat a valid Sigstore provenance badge as proof a package is safe. npm audit signatures passes. The badge is green. Procurement and compliance workflows accept provenance as a gate.</p></td><td><p>All 84 malicious TanStack versions carry valid SLSA Build Level 3 provenance attestations. First widely reported npm worm with validly-attested packages. Provenance attests where a package was built, not whether the build was authorized. Socket’s AI scanner flagged all 84 artifacts within six minutes of publication. Provenance flagged zero.</p></td></tr><tr><td><p>3. Isolate GitHub Actions cache per trust boundary. Invalidate caches after suspicious PRs. Never check out and execute fork code in pull_request_target workflows</p></td><td><p>Fork-triggered workflows and release workflows share the same cache namespace. Closing or reverting a malicious PR is treated as restoring clean state. pull_request_target is widely used for benchmarking and bundle-size analysis with fork PR checkout.</p></td><td><p>Attacker poisoned pnpm store via fork-triggered pull_request_target that checked out and executed fork code on the base runner. Cache survived PR closure. The next legitimate release workflow restored the poisoned cache on merge. actions/cache@v5 uses a runner-internal token for cache saves, not the workflow’s GITHUB_TOKEN, so permissions: contents: read does not prevent mutation. Kennedy: &#x27;Branch protection rules don’t apply to commits that aren’t on any branch, so that whole layer of hardening didn’t help.&#x27;</p></td></tr><tr><td><p>4. Audit optionalDependencies in lockfiles and dependency graphs. Block github: refs pointing to non-release commits</p></td><td><p>Static analysis and lockfile enforcement focus on dependencies and devDependencies. optionalDependencies with github: commit refs are not flagged by most tools.</p></td><td><p>The worm injected optionalDependencies pointing to a github: orphan commit in the attacker’s fork. When npm resolves a github: dependency, it clones the referenced commit and runs lifecycle hooks (including prepare) automatically. The payload executed before the main package’s own install step completed. SafeDep confirmed Mistral never released v2.4.6; no commits landed and no tag exists.</p></td></tr><tr><td><p>5. Audit Python dependency imports separately from npm controls. Cover AI/ML pipelines consuming guardrails-ai, mistralai, or any compromised PyPI package</p></td><td><p>npm mitigations (lockfile enforcement, --ignore-scripts) are applied to the JavaScript stack. Python packages are assumed safe if pip install completes. AI/ML CI pipelines are treated as internal testing infrastructure, not as supply-chain attack targets.</p></td><td><p>Microsoft Threat Intelligence confirmed mistralai PyPI v2.4.6 executes on import, not install. Injected code in __init__.py downloads a payload disguised as Hugging Face Transformers. --ignore-scripts is irrelevant for Python import-time execution. guardrails-ai@0.10.1 also executes on import. Any agentic repo with GitHub Actions id-token: write is exposed to the same OIDC extraction technique. LLM API keys, vector DB credentials, and external service tokens all in the blast radius.</p></td></tr><tr><td><p>6. Isolate and image affected machines before revoking stolen tokens. Do not revoke npm tokens until the host is forensically preserved</p></td><td><p>Standard incident response: revoke compromised tokens first, then investigate. npm token list and immediate revocation is the instinctive first step.</p></td><td><p>The worm installs a persistent daemon (macOS LaunchAgent / Linux systemd) that polls GitHub every 60 seconds. On detecting token revocation (40X error), it triggers rm -rf ~/, wiping the home directory. The npm token description reads: &#x27;IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner.&#x27; Microsoft reported geofenced destructive behavior: a 1-in-6 chance of rm -rf / on systems appearing to be in Israel or Iran. Kennedy: &#x27;Even after the package is gone, the payload may still be sitting in .claude/ with a SessionStart hook pointing at it. rm -rf node_modules doesn’t remove it.&#x27;</p></td></tr></tbody></table><p><i>Sources: TanStack postmortem, StepSecurity, Socket, Snyk, Wiz, Microsoft Threat Intelligence, Mend, Endor Labs. May 12, 2026.</i></p><h2><b>Security director action plan</b></h2><ul><li><p><b>Today: </b>“The fastest check is find . -name &#x27;router_init.js&#x27; -size +1M and grep -r &#x27;79ac49eedf774dd4b0cfa308722bc463cfe5885c&#x27; package-lock.json,” Kennedy said. If either returns a hit, isolate and image the machine immediately. Do not revoke tokens until the host is forensically preserved. The worm’s destructive daemon triggers on revocation. Once the machine is isolated, rotate credentials in this order: npm tokens first, then GitHub PATs, then cloud keys. Hunt for .claude/settings.json and .vscode/tasks.json persistence artifacts across every project that was open on the affected machine.</p></li><li><p><b>This week: </b>Rotate every credential accessible from affected hosts: npm tokens, GitHub PATs, AWS keys, Vault tokens, K8s service accounts, SSH keys. Check your packages for unexpected versions after May 11 with commits by claude@users.noreply.github.com. Block filev2.getsession[.]org and git-tanstack[.]com.</p></li><li><p><b>This month: </b>Audit every GitHub Actions workflow against the six gaps above. Pin OIDC publishing to specific workflows on protected branches. Isolate cache keys per trust boundary. Set npm config set min-release-age=7d. For AI/ML teams: check guardrails-ai and mistralai against compromised versions, audit CI pipelines for id-token: write exposure, and rotate every LLM API key and vector DB credential accessible from CI.</p></li><li><p><b>This quarter (board-level): </b>Fund behavioral analysis at the package registry layer. Provenance verification alone is no longer a sufficient procurement criterion for supply-chain security tooling. Require CI/CD security audits as part of vendor risk assessments for any tool with publish access to your registries. Establish a policy that no workflow with id-token: write runs from a shared cache. Treat AI coding agent configurations (.claude/, .kiro/, .vscode/) as credential stores subject to the same access controls as cloud key vaults.</p></li></ul><h2><b>The worm is iterating. Defenders must, as well</b></h2><p>This is the fifth Shai-Hulud wave in eight months. Four SAP packages became 84 TanStack packages in two weeks. <a href="https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem">intercom-client@7.0.4 fell 29 hours later</a>, confirming active propagation through stolen CI/CD infrastructure. Late on May 12, malware research collective <a href="https://x.com/vxunderground/status/2054238093734015419">vx-underground reported</a> that the fully weaponized Shai-Hulud worm code has been open-sourced. If confirmed, this means the attack is no longer limited to TeamPCP. Any threat actor can now deploy the same cache-poisoning, OIDC-extraction, and provenance-attested publishing chain against any npm or PyPI package with a misconfigured CI/CD pipeline.</p><p>“We’ve been tracking this campaign family since September 2025,” Kennedy said. “Each wave has picked a higher-download target and introduced a more technically interesting access vector. The orphaned commit technique here is genuinely novel. Branch protection rules don’t apply to commits that aren’t on any branch. The supply chain security space has spent a lot of energy on provenance and trusted publishing over the last two years. This attack walked straight through both of those controls because the gap wasn’t in the signing. It was in the scope.”</p><p>Provenance tells you where a package was built. It does not tell you whether the build was authorized. That is the gap this audit is designed to close.</p>]]></description>
            <author>louiswcolumbus@gmail.com (Louis Columbus)</author>
            <category>Security</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/3Fpkq8k1bNbp4FzvFx6VNs/d5acd06709dd0292e387c70978080a44/worm.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google]]></title>
            <link>https://venturebeat.com/technology/perceptron-mk1-shocks-with-highly-performant-video-analysis-ai-model-80-90-cheaper-than-anthropic-openai-and-google</link>
            <guid isPermaLink="false">1WGzLcJhg1qGXiRGzwynpv</guid>
            <pubDate>Tue, 12 May 2026 18:45:17 GMT</pubDate>
            <description><![CDATA[<p>AI that can see and understand what&#x27;s happening in a video — especially a live feed — is understandably an attractive product to lots of enterprises and organizations. Beyond acting as a security &quot;watchdog&quot; over sites and facilities, such an AI model could also be used to clip out the most exciting parts of marketing videos and repurpose them for social, identify inconsistencies and gaffs in videos and flag them for removal, and identify body language and actions of participants in controlled studies or candidates applying for new roles. </p><p>While there are some AI models that offer this type of functionality today, it&#x27;s far from a mainstream capability. The two-year-old startup Perceptron Inc. is seeking to change all that, however. Today, it announced the release of its <a href="https://www.perceptron.inc/blog/introducing-perceptron-mk1">flagship proprietary video analysis reasoning model, Mk1</a> (short for &quot;Mark One&quot;) at a cost —  $0.15 per million tokens input / $1.50 per million output through its application programming interface (API) — that comes in about 80-90% less than other leading proprietary rivals, namely, Anthropic&#x27;s Claude Sonnet 4.5, OpenAI&#x27;s GPT-5, and Google&#x27;s Gemini 3.1 Pro. </p><p>Led by Co-founder and CEO Armen Aghajanyan, formerly of Meta FAIR and Microsoft, the company spent 16 months developing a &quot;multi-modal recipe&quot; from the ground up to address the complexities of the physical world. </p><p>This launch signals a new era where models are expected to understand cause-and-effect, object dynamics, and the laws of physics with the same fluency they once applied to grammar.</p><p>Interested users and potential enterprise customers can try it out for themselves on a <a href="https://www.perceptron.inc/demo">public demo site from Perceptron here.</a></p><h2><b>Performance across spatial and video benchmarks</b></h2><p>The model&#x27;s performance is backed by a suite of industry-standard benchmarks focused on grounded understanding. </p><p>In spatial reasoning (ER Benchmarks), Mk1 achieved a score of 85.1 on EmbSpatialBench, surpassing Google’s Robotics-ER 1.5 (78.4) and Alibaba’s Q3.5-27B (approx. 84.5). </p><p>In the specialized RefSpatialBench, Mk1&#x27;s score of 72.4 represents a massive leap over competitors like GPT-5m (9.0) and Sonnet 4.5 (2.2), highlighting a significant advantage in referring expression comprehension. </p><p>Video benchmarks show similar dominance; on the EgoSchema &quot;Hard Subset&quot;—where first-and-last-frame inference is insufficient—Mk1 scored 41.4, matching Alibaba’s Q3.5-27B and significantly beating Gemini 3.1 Flash-Lite (25.0). </p><p>On the VSI-Bench, Mk1 reached 88.5, the highest recorded score among the compared models, further validating its ability to handle actual temporal reasoning tasks.</p><h2><b>Market positioning and the efficiency frontier</b></h2><p>Perceptron has explicitly targeted the &quot;Efficiency Frontier,&quot; a metric that plots mean scores across video and embodied reasoning benchmarks against the blended cost per million tokens. </p><p>Benchmarking data reveals that Mk1 occupies a unique position: it matches or exceeds the performance of &quot;frontier&quot; models like GPT-5 and Gemini 3.1 Pro while maintaining a cost profile closer to &quot;Lite&quot; or &quot;Flash&quot; versions.</p><p>Specifically, Perceptron Mk1 is priced at $0.15 per million input tokens and $1.50 per million output tokens. In comparison, the &quot;Efficiency Frontier&quot; chart shows GPT-5 at a significantly higher blended cost (near $2.00) and Gemini 3.1 Pro at approximately $3.00, while Mk1 sits at the $0.30 blended cost mark with superior reasoning scores. </p><p>This aggressive pricing strategy is intended to make high-end physical AI accessible for large-scale industrial use rather than just experimental research.</p><h2><b>Architecture and temporal continuity</b></h2><p>The technical core of Perceptron Mk1 is its ability to process native video at up to 2 frames per second (FPS) across a significant 32K token context window. </p><p>Unlike traditional vision-language models (VLMs) that often treat video as a disjointed sequence of still images, Mk1 is designed for temporal continuity. </p><p>This architecture allows the model to &quot;watch&quot; extended streams and maintain object identity even through occlusions, a critical requirement for robotics and surveillance applications. </p><p>Developers can query the model for specific moments in a long stream and receive structured time codes in return, streamlining the process of video clipping and event detection. </p><h2><b>Reasoning with the laws of physics</b></h2><p>A primary differentiator for Mk1 is its &quot;Physical Reasoning&quot; capability. Perceptron defines this as a high-precision spatial awareness that allows the model to understand object dynamics and physical interactions in real-world settings. </p><p>For example, the model can analyze a scene to determine if a basketball shot was taken before or after a buzzer by jointly reasoning over the ball&#x27;s position in the air and the readout on a shot clock. </p><p>This requires more than just pattern recognition; it requires an understanding of how objects move through space and time. </p><p>The model is capable of &quot;pixel-precise&quot; pointing and counting into the hundreds within dense, complex scenes. It can also read analog gauges and clocks, which have historically been difficult for purely digital vision systems to interpret with high reliability.</p><p>It also seems to have strong general world and historical knowledge. In my brief test, I uploaded a vintage public domain<a href="https://www.loc.gov/item/00694391"> film of skyscraper construction in New York City dated 1906</a> from the U.S. Library of Congress, and Mk1 was able to not only correctly describe the contents of the footage — including odd, atypical sights as workers being suspended by ropes — but did so rapidly and even correctly identified the rough date (early 1900s) from the look of the footage alone.</p><h2><b>A developer platform for physical AI</b></h2><p>Accompanying the model release is an expanded developer platform designed to turn these high-level perception capabilities into functional applications with minimal code. </p><p>The Perceptron SDK, available via Python, introduces several specialized functions such as &quot;Focus,&quot; &quot;Counting,&quot; and &quot;In-Context Learning&quot;. </p><p>The Focus feature allows users to zoom and crop into specific regions of a frame automatically based on a natural language prompt, such as detecting and localizing personal protective equipment (PPE) on a construction site. The Counting function is optimized for dense scenes, such as identifying and pointing to every puppy in a group or individual items of produce. </p><p>Furthermore, the platform supports in-context learning, allowing developers to adapt Mk1 to specific tasks by providing just a few examples, such as showing an image of an apple and instructing the model to label every instance of Category 1 in a new scene.</p><h2><b>Licensing strategies and the Isaac series</b></h2><p>Perceptron is employing a dual-track strategy for its model weights and licensing. The flagship Perceptron Mk1 is a closed-source model accessed via API, designed for enterprise-grade performance and security. </p><p>However, the company is also maintaining its &quot;Isaac&quot; series, which kicked off with the <a href="https://www.perceptron.inc/blog/introducing-isaac-0-1">launch of Isaac 0.1 in September 2025</a>, as an open-weights alternative.<a href="https://www.perceptron.inc/blog/introducing-isaac-0-2"> Isaac 0.2-2b-preview</a>, released in December 2025, is a 2-billion parameter vision-language model with reasoning capabilities that is available for edge and low-latency deployments. </p><p>While the weights for the Isaac models are open on the popular AI code sharing community <a href="https://huggingface.co/PerceptronAI">Hugging Face</a>, Perceptron offers commercial licenses for companies that require maximum control or on-premise deployment of the weights. </p><p>This approach allows the company to support both the open-source community and specialized industrial partners who need proprietary flexibility. The documentation notes that Isaac 0.2 models are specifically optimized for sub-200ms time-to-first-token, making them ideal for real-time edge devices.</p><h2><b>Background on Perceptron founding and focus</b></h2><p>Perceptron AI is a Bellevue, Washington-based physical AI startup founded by Aghajanyan and Akshat Shrivastava, both former research scientists at Meta’s Facebook AI Research (FAIR) lab. </p><p>The company’s public materials date its founding to November 2024, while a Washington corporate filing record for Perceptron.ai Inc. shows an<a href="https://www.bizprofile.net/wa/carnation/perceptron-ai-inc?utm_source=chatgpt.com"> earlier foreign registration filing on October 9, 2024</a>, listing Shrivastava and Aghajanyan as governors. </p><p>In founder launch posts from late 2024, <a href="https://www.linkedin.com/posts/armenag_after-nearly-6-years-at-meta-im-excited-share-7265412761990369280-Aoyw/?utm_source=chatgpt.com">Aghajanyan</a> said he had left Meta after nearly six years and “joined forces” with Shrivastava to build AI for the physical world, while Shrivastava said the company grew out of his work on efficiency, multimodality and new model architectures.</p><p>The founding appears to have followed directly from the pair’s work on multimodal foundation models at Meta. In May 2024, <a href="https://www.researchgate.net/publication/380635519_Chameleon_Mixed-Modal_Early-Fusion_Foundation_Models?utm_source=chatgpt.com">Meta researchers published Chameleon</a>, a family of early-fusion models designed to understand and generate mixed sequences of text and images, work that Perceptron later described as part of the lineage behind its own models. </p><p>A July 2024 follow-on paper, <a href="https://arxiv.org/abs/2407.21770">MoMa</a>, explored more efficient early-fusion training for mixed-modal models and listed both Shrivastava and Aghajanyan among the authors. Perceptron’s stated thesis extends that research direction into “physical AI”: models that can process real-world video and other sensory streams for use cases such as robotics, manufacturing, geospatial analysis, security and content moderation.</p><h2><b>Partner ecosystems and future outlook</b></h2><p>The real-world impact of Mk1 is already being demonstrated through Perceptron&#x27;s partner network. Early adopters are using the model for diverse applications, such as auto-clipping highlights from live sports, which leverages the model&#x27;s temporal understanding to identify key plays without human intervention. </p><p>In the robotics sector, partners are curating teleoperation episodes into training data, effectively automating the process of labeling and cleaning data for robotic arms and mobile units. </p><p>Other use cases include multimodal quality control agents on manufacturing lines, which can detect defects and verify assembly steps in real-time, and wearable assistants on smart glasses that provide context-aware help to users.</p><p>Aghajanyan stated that these releases are the culmination of research intended to make AI function best in the physical world, moving toward a future where &quot;physical AI&quot; is as ubiquitous as digital AI.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/792xpzV56beWnPgMn8j8Ku/2e0da2eadb098389cada35ebac209693/ChatGPT_Image_May_12__2026__02_26_29_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Running Claude Code or Claude in Chrome? Here's the audit matrix for every blind spot your security stack misses]]></title>
            <link>https://venturebeat.com/security/claude-confused-deputy-audit-matrix-security-blind-spots</link>
            <guid isPermaLink="false">4R2BK4LB0KNVlh4heNkQK</guid>
            <pubDate>Tue, 12 May 2026 15:59:19 GMT</pubDate>
            <description><![CDATA[<p>Between May 6 and 7, four security research teams published findings about Anthropic’s Claude that most outlets covered as three separate stories. One involved a water utility in Mexico, another targeted a Chrome extension, and a third hijacked OAuth tokens through Claude Code. In one case, Claude identified a water utility’s SCADA gateway without being told to look for one.</p><p>These are not three bugs. They are one architectural question playing out on three surfaces. No single patch released so far addresses all of them.</p><p>The common thread is the <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html">confused deputy</a>, a trust-boundary failure where a program with legitimate authority executes actions on behalf of the wrong principal. In each case, Claude held real capabilities on every surface and handed them to whoever showed up. An attacker probing a water utility&#x27;s network. A Chrome extension with zero permissions. A malicious npm package rewriting a config file.</p><p>Carter Rees, VP of Artificial Intelligence at <a href="https://reputation.com/">Reputation</a>, identified the structural reason this class of failure is so dangerous. The flat authorization plane of an LLM fails to respect user permissions, Rees told VentureBeat in an exclusive interview. An agent operating on that flat plane does not need to escalate privileges, it already has them.</p><p>Kayne McGladrey, an <a href="https://www.ieee.org/">IEEE</a> senior member who advises enterprises on identity risk, described the same dynamic independently in an interview with VentureBeat. Enterprises are cloning human permission sets onto agentic systems, McGladrey said. The agent does whatever it needs to do to get its job done, and sometimes that means using far more permissions than a human would.</p><h2>Dragos found Claude targeting a water utility’s SCADA gateway without being told to look for one</h2><p>Dragos <a href="https://www.dragos.com/blog/ai-assisted-ics-attack-water-utility">published its analysis</a> on May 6. Between December 2025 and February 2026, an unidentified adversary compromised <a href="https://venturebeat.com/security/claude-mexico-breach-four-blind-domains-security-stack">multiple Mexican government organizations</a>. In January 2026, the campaign reached Servicios de Agua y Drenaje de Monterrey, the municipal water and drainage utility serving the Monterrey metropolitan area.</p><p>Dragos analyzed more than 350 artifacts. The adversary used Claude as the primary technical executor and OpenAI’s GPT models for data processing. Claude wrote a 17,000-line Python framework containing 49 modules for network discovery, credential harvesting, privilege escalation, and lateral movement. Claude compressed what would traditionally take days or weeks of tooling development into hours, according to the Dragos analysis.</p><p>Without any prior ICS/OT context, Claude identified a server running a vNode SCADA/IIoT management interface, classified the platform as high-value, generated credential lists, and launched an automated password spray. The attack failed, and no OT breach occurred, but Claude did the targeting. Dragos noted that this was not a product vulnerability in the traditional sense because Claude performed exactly as designed. The architectural gap, as the firm described it, is that the model cannot distinguish an authorized developer from an adversary using the same interface.</p><p>Jay Deen, associate principal adversary hunter at Dragos, wrote that the investigation showed how commercial AI tools have made OT more visible to adversaries already operating within IT.</p><p>CrowdStrike CTO Elia Zaitsev told VentureBeat why this class of incident evades detection. Nothing bad has happened until the agent acts, Zaitsev said. It is almost always at the action layer. The Monterrey reconnaissance looked like a developer querying internal systems. The developer tool just had an adversary at the keyboard.</p><p><b>Stack blind spot: </b>OT monitoring does not flag AI-generated recon from IT-side developer tools. EDR sees the process but has no visibility into intent.</p><h2>LayerX proved any Chrome extension can hijack Claude through a trust boundary Anthropic partially patched</h2><p>On May 7, LayerX researcher Aviad Gispan <a href="https://layerxsecurity.com/blog/a-flaw-in-claudes-browser-extension-allows-any-extension-to-hijack-it/">disclosed ClaudeBleed</a>. Claude in Chrome uses Chrome’s externally connectable feature to allow communication with scripts on the claude.ai origin, but does not verify whether those scripts came from Anthropic or were injected by another extension. Any Chrome extension can inject commands into Claude’s messaging interface. Zero permissions required.</p><p>LayerX reported the flaw on April 27. Anthropic shipped version 1.0.70 on May 6. LayerX found that the patch did not remove the vulnerable handler. LayerX bypassed the new protections through the side-panel initialization flow and by switching Claude into &quot;Act without asking&quot; mode, which required no user notification. Anthropic&#x27;s patch survived less than a day.</p><p>Mike Riemer, SVP of Network Security Group and Field CISO at Ivanti, told VentureBeat that threat actors are now reverse engineering patches within 72 hours using AI assistance. If a vendor releases a patch and the customer has not applied it within that window, the vulnerability is already being exploited, Riemer said. Anthropic&#x27;s ClaudeBleed patch did not survive even a third of that window.</p><p><b>Stack blind spot: </b>EDR watches files and processes but does not monitor extension-to-extension messaging within the browser. ClaudeBleed produces no file writes, no network anomalies, and no process spawns.</p><h2>Mitiga showed a config file rewrite steals OAuth tokens and survives rotation</h2><p>Also on May 7, <a href="https://www.mitiga.io/mitiga-labs">Mitiga Labs</a> researcher Idan Cohen <a href="https://www.mitiga.io/blog/claude-code-mcp-token-theft-mitm">published a man-in-the-middle attack chain</a> targeting Claude Code. Claude Code stores MCP configuration and OAuth tokens in ~/.claude.json, a single user-writable file. A malicious npm postinstall hook can rewrite the MCP server URL to route traffic through an attacker&#x27;s proxy, capturing OAuth tokens for Jira, Confluence, and GitHub. Because the postinstall hook fires on every Claude Code load, it reasserts the malicious endpoint even after token rotation — meaning the standard incident response step of rotating credentials does not break the attack chain unless the hook itself is removed first.</p><p>Mitiga reported the finding on April 10. On April 12, Anthropic classified it as out of scope, according to Mitiga’s published disclosure.</p><p>Riemer described the principle this chain violates. I do not know you until I validate you, Riemer told VentureBeat. Until I know what it is and I know who is on the other side of the keyboard, I am not going to communicate with it. The ~/.claude.json rewrite substitutes the attacker’s endpoint for the legitimate one. Claude Code never re-validates.</p><p>Riemer has spent 21 years architecting the product he now leads and holds five patents on its security infrastructure. He applies the same defensive logic he built into his own platform. If a threat actor gets in, drop all connections. That is a fail-safe design. Anthropic&#x27;s architecture does the opposite. It fails open.</p><p><b>Stack blind spot: </b>Web application firewalls never see local config rewrites. EDR treats JSON file writes as normal developer behavior. Rotating tokens does not break the chain unless responders also confirm the hook is removed.</p><h2>Anthropic’s response pattern treats the user’s trust decision as the security boundary</h2><p>Anthropic classified Mitiga&#x27;s MCP token theft as out of scope on April 12. The company called OX Security&#x27;s STDIO vulnerability affecting an estimated 200,000 MCP servers <a href="https://venturebeat.com/security/mcp-stdio-flaw-200000-ai-agent-servers-exposed-ox-security-audit">&quot;expected&quot; and by design</a>. Anthropic declined <a href="https://adversa.ai/blog/trustfall-coding-agent-security-flaw-rce-claude-cursor-gemini-cli-copilot/">Adversa AI&#x27;s TrustFall</a> as outside its threat model, according to Adversa&#x27;s published disclosure. ClaudeBleed was partially patched. Across all four disclosures, the researchers say the underlying trust model remains exploitable.</p><p>Alex Polyakov, co-founder of Adversa AI, told The Register that each vulnerability gets patched in isolation, but the <a href="https://www.theregister.com/security/2026/05/07/claude-code-trust-prompt-can-trigger-one-click-rce/5235319">underlying class has not been fixed</a>.</p><p>Zaitsev offered a frame for why consent alone cannot serve as the trust boundary. If you think you can always understand intent, Zaitsev told VentureBeat, then you would also think it is possible to write a program that reads a text transcript and figures out if someone is lying. That is intuitively an impossible problem to solve.</p><h2>Adversa AI showed that a cloned repo can auto-execute arbitrary code the moment a developer clicks trust</h2><p>Adversa AI researcher Alex Polyakov published <a href="https://adversa.ai/blog/trustfall-coding-agent-security-flaw-rce-claude-cursor-gemini-cli-copilot/">TrustFall</a>, demonstrating that project-scoped Claude configuration files in a cloned repository can silently authorize MCP servers to run as native OS processes with full user privileges. The moment a developer clicks the generic “Yes, I trust this folder” dialog, any MCP server defined in the project config launches. The dialog does not show what it authorizes.</p><p>In automated build pipelines where Claude Code runs without a screen, the trust dialog never appears. The attack executes with zero human interaction. <a href="https://adversa.ai/">Adversa</a> confirmed the pattern is not unique to Claude Code. All four major coding agents (Claude Code, Cursor, Gemini CLI, and GitHub Copilot) can auto-execute project-defined MCP servers the moment a developer accepts that dialog.</p><p><b>Stack blind spot:</b> No current security tooling can tell the difference between a legitimate project config and a malicious one. The trust dialog is the only thing standing between the developer and arbitrary code execution, and it does not show what it is about to authorize.</p><p><i>The matrix below maps each surface that Claude wrongly trusted, the stack blind spot, the detection signal, and the recommended action.</i></p><h2>Claude Confused Deputy Audit Matrix</h2><table><tbody><tr><td><p><b>Surface</b></p></td><td><p><b>Who Claude Trusted</b></p></td><td><p><b>Why Your Stack Misses It</b></p></td><td><p><b>Detection Signal</b></p></td><td><p><b>Recommended Action</b></p></td></tr><tr><td><p><b>claude.ai / API</b></p><p>Dragos, May 6</p><p><i>350+ artifacts analyzed</i></p><p><i></i></p><p><i></i></p><p><i></i></p><p><i></i></p><p><i></i></p><p><i></i></p><p><i></i></p><p><i></i></p><p><i></i></p><p><i></i></p></td><td><p>Attacker posing as an authorized user via Claude’s prompt interface.</p><p>Claude cannot distinguish a developer mapping internal systems from an adversary doing the same thing through the same interface.</p></td><td><p>OT monitoring watches ICS protocols and anomalous traffic patterns.</p><p>AI-generated recon originates from an IT-side developer tool, not from the OT network. The queries look identical to legitimate developer activity because they ARE legitimate developer activity with an adversary at the keyboard.</p></td><td><p><b>Query:</b></p><p>Claude API logs for requests referencing internal hostnames, IP ranges, or SCADA/ICS keywords.</p><p><b>Alert trigger:</b></p><p>&gt;5 credential generation requests against internal services in 60 minutes.</p><p><b>Escalation:</b></p><p>OT team notified on any AI-originated query touching vNode, SCADA, HMI, or PLC keywords.</p></td><td><p>Segment AI-assisted sessions from OT-adjacent network segments.</p><p>Log all Claude API calls referencing internal hostnames or IP ranges.</p><p>Alert on automated credential generation targeting internal authentication interfaces.</p><p>Require explicit OT authorization for any AI tool with internal network access.</p></td></tr><tr><td><p><b>Claude in Chrome</b></p><p>LayerX, May 7</p><p><i>v1.0.70 patch bypassed &lt;24hrs</i></p></td><td><p>Any script running in the claude.ai browser context, including scripts injected by zero-permission extensions.</p><p>The externally connectable manifest trusts the origin (claude.ai), not the execution context. Any extension can inject into that origin.</p></td><td><p>EDR monitors file system activity, process execution, and network connections.</p><p>Extension-to-extension messaging happens entirely within the browser runtime. No file writes. No network anomalies. No process spawns. EDR has zero visibility into Chrome’s internal messaging API.</p></td><td><p><b>Query:</b></p><p>Chrome extension inventory for any extension with content scripts targeting claude.ai in the manifest.</p><p><b>Alert trigger:</b></p><p>New extension installed with claude.ai in permissions or content script targets.</p><p><b>Escalation:</b></p><p>Browser security team reviews any extension communicating with Claude’s messaging interface.</p></td><td><p>Audit Chrome extensions across the fleet for claude.ai content script access.</p><p>Disable “Act without asking” mode in Claude in Chrome enterprise-wide.</p><p>Deploy browser security tooling that inspects extension messaging channels.</p><p>Monitor for extensions injecting content scripts into claude.ai domain.</p></td></tr><tr><td><p><b>Claude Code MCP</b></p><p>Mitiga, May 7</p><p><i>Anthropic: “out of scope” April 12</i></p></td><td><p>Rewritten ~/.claude.json routing MCP traffic through attacker-controlled proxy.</p><p>Claude Code reads the MCP server URL from the config file on every load. It never re-validates that the URL matches the endpoint the user originally authorized.</p></td><td><p>WAF inspects HTTP traffic between clients and servers. It never sees a local config file rewrite.</p><p>EDR treats JSON file writes in the user’s home directory as normal developer behavior. Token rotation feeds the chain because the npm postinstall hook reasserts the malicious URL on every Claude Code load.</p></td><td><p><b>Query:</b></p><p>File integrity monitor on ~/.claude.json for MCP server URL changes.</p><p><b>Alert trigger:</b></p><p>MCP server URL changed to endpoint not on approved allowlist.</p><p><b>Escalation:</b></p><p>IR team confirms postinstall hook removal before closing ticket. Token rotation alone is insufficient.</p></td><td><p>Monitor ~/.claude.json for unexpected MCP endpoint changes against an allowlist.</p><p>Block or alert on npm postinstall hooks that modify files outside the package directory.</p><p>Maintain a centralized MCP server URL allowlist.</p><p>Do NOT assume token rotation breaks the chain without confirming the malicious hook is removed first.</p></td></tr><tr><td><p><b>Claude Code project settings</b></p><p>Adversa AI, May 7</p><p><i>Affects Claude, Cursor, Gemini CLI, Copilot</i></p></td><td><p>Project-scoped .claude configuration file in a cloned repository.</p><p>Clicking the generic “Yes, I trust this folder” dialog silently authorizes any MCP server defined in the project config. The dialog does not show what it authorizes.</p></td><td><p>No current security tooling can tell the difference between a legitimate project config and a malicious one.</p><p>In automated build pipelines, Claude Code runs without a screen. The attack executes with zero human interaction against pull-request branches.</p></td><td><p><b>Query:</b></p><p>Pre-clone scan for .claude, .claude.json, .mcp.json, CLAUDE.md files in repository root.</p><p><b>Alert trigger:</b></p><p>Repo contains MCP server definition not on approved organizational list.</p><p><b>Escalation:</b></p><p>DevSecOps reviews before any developer opens the repo in Claude Code or any coding agent.</p></td><td><p>Scan cloned repositories for .claude configuration files before opening in any AI coding agent.</p><p>Require explicit per-server MCP approval rather than blanket folder trust.</p><p>Flag repos that define custom MCP servers in project configuration.</p><p>Audit CI/CD pipelines running Claude Code headless where trust dialogs are skipped entirely.</p></td></tr></tbody></table><p><b>The deputy changed </b></p><p>Norm Hardy described the confused deputy in 1988. The deputy he had in mind was a compiler. This one writes <a href="https://www.dragos.com/blog/ai-assisted-ics-attack-water-utility">17,000-line exploitation frameworks</a>, identifies SCADA gateways on its own, and holds <a href="https://www.mitiga.io/blog/claude-code-mcp-token-theft-mitm">OAuth tokens to Jira, Confluence, and GitHub</a>. Four research teams found the same failure class on four surfaces in the same week. Anthropic&#x27;s response to each one was some version of &quot;the user consented.&quot; The matrix above is the audit Anthropic has not built. If your team runs Claude Code or Claude in Chrome, start there.</p>]]></description>
            <author>louiswcolumbus@gmail.com (Louis Columbus)</author>
            <category>Security</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4SkR8jNCpfSRyi8zx9Bpt4/90a858841864009a2a8062003a9baa4e/hero.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Is your enterprise adaptive to AI?]]></title>
            <link>https://venturebeat.com/orchestration/is-your-enterprise-adaptive-to-ai</link>
            <guid isPermaLink="false">2JCFyEp20FbokATB0hEiEh</guid>
            <pubDate>Tue, 12 May 2026 07:00:00 GMT</pubDate>
            <description><![CDATA[<p>Presented by <i>EdgeVerve</i></p><hr/><p>For most enterprises, AI adoption began with a straightforward ambition: automate work faster, cheaper, and at scale. Chatbots replaced basic service requests, machine‑learning models optimized forecasts, and analytics dashboards promised sharper insights. Yet many organizations are now discovering that deploying individual AI solutions does not automatically translate into enterprise‑level impact. Pilots proliferate, but value plateaus.</p><p>The next phase of AI maturity is no longer about deploying more models. It is about adapting AI continuously to changing business objectives, regulatory expectations, operating conditions, and customer contexts. This shift is particularly critical for complex, globally distributed organizations such as Global Business Services (GBS), where outcomes depend on orchestrating work across functions, regions, systems, and stakeholders.</p><h2>From automation to adaptation</h2><p>AI can no longer be treated as a standalone tool to accelerate discrete tasks. To remain competitive, enterprises must move from isolated, single‑purpose models toward systems that can sense context, coordinate actions, and evolve over time.</p><p>This is where adaptive AI ecosystems come into play. An adaptive AI ecosystem is a network of interoperable AI agents, models, data sources, and decision services that work together dynamically. These ecosystems integrate capabilities such as natural language processing, computer vision, predictive analytics, and autonomous decision‑making, while remaining grounded in human oversight and enterprise governance.</p><p>For GBS organizations, the relevance is clear. GBS operates at the intersection of scale, standardization, and variation, managing high‑volume processes across markets that differ in regulation, customer behavior, and operational constraints. Static automation struggles in such environments. Adaptive AI, by contrast, allows GBS teams to orchestrate end‑to‑end processes, intelligently route work, and continuously improve outcomes based on real‑time signals.</p><h2>Why enterprise AI deployments stall</h2><p>Despite strong intent, scaling AI remains a challenge. Research consistently shows that while many organizations invest in generative and agentic AI initiatives, far fewer succeed in operationalizing them across workflows and business units. The issue is rarely ambition; it is fragmentation.</p><p>SSON Research highlights several persistent barriers to generative AI adoption in GBS, including poor data quality, lack of specialized skills, data privacy concerns, unclear ROI, and budget constraints. Beneath these symptoms lies a common root cause: siloed environments. Data is fragmented, ownership is unclear, and AI initiatives are driven locally rather than through a shared enterprise strategy.</p><p>As a result, enterprises accumulate AI solutions that cannot easily work together. Models lack shared context, decisions are hard to explain, and governance becomes an afterthought rather than a design principle.</p><h2>Adaptive AI ecosystems and platforms: Clarifying the relationship</h2><p>An adaptive AI ecosystem describes the enterprise‑wide outcome for how AI capabilities collaborate across the organization. An adaptive AI platform is the foundation that makes this possible.</p><p>The platform provides common services and guardrails that allow AI agents and models to:</p><ul><li><p>access harmonized, trusted data</p></li><li><p>orchestrate end‑to‑end processes</p></li><li><p>enable intelligent agent handoffs between systems and humans</p></li><li><p>interoperate with both agentic and legacy applications through out‑of‑the‑box connectors</p></li><li><p>operate within defined security, compliance, and ethical boundaries</p></li></ul><p>Without this platform layer, adaptive ecosystems remain theoretical. With it, AI becomes composable, governable, and scalable.</p><h2>What an adaptive AI platform must enable</h2><p>To meet the demands of modern enterprises, and especially GBS organizations, an adaptive AI platform must deliver a set of core capabilities.</p><p>Real‑time data harmonization is foundational. Adaptive decisions require access to both structured and unstructured data across functions and regions. Platforms must provide a unified data foundation, with observability built in, so AI systems understand not just the data itself but its quality, lineage, and relevance. Edge‑to‑cloud architectures play a role here, ensuring insights are available where decisions occur whether at the point of interaction or within a centralized decision engine.</p><p>Adaptive process orchestration is equally critical. GBS organizations increasingly rely on AI platforms that can orchestrate workflows dynamically across business units and systems. This includes coordinating multiple AI agents, enabling seamless agent‑to‑agent and human‑in‑the‑loop handoffs, and adjusting process paths in response to real‑time conditions.</p><p>Cognitive automation with governance moves beyond rule‑based automation. AI systems must be able to make context‑aware decisions with minimal human intervention, while still providing explainability, confidence indicators, and ethical constraints. The goal is not to remove humans from the loop, but to elevate their role from manual execution to oversight and judgment.</p><p>Decision governance and observability tie these capabilities together. Enterprises must be able to trace how decisions are made, understand which models contributed, and audit outcomes across markets. As regulatory expectations around AI risk management, data protection, and accountability increase globally, embedding governance into the platform becomes essential rather than optional.</p><h2>Establishing trust at scale</h2><p>Trust is the foundation of scalable AI. Enterprises that lack confidence in their AI systems across data integrity, model behavior, and regulatory compliance will struggle to move beyond experimentation into sustained adoption.</p><p>Building this trust requires deliberate investment. Organizations must ensure explainable AI, so decision logic is transparent to business and risk stakeholders, alongside privacy‑ and security‑by‑design principles that protect sensitive data from the outset. Continuous bias detection, model reliability, performance management, and clearly defined responsible AI guardrails are critical to maintaining consistent and ethical outcomes.</p><p>Equally important is a clear Target Operating Model. This model defines ownership across the AI lifecycle, clarifies roles and escalation paths, and aligns accountability from frontline teams to executive leadership. In GBS environments where AI‑driven decisions often span functions, geographies, and regulatory regimes these trust mechanisms are not optional. They are essential.</p><h2>The road ahead</h2><p>Enterprises that continue to rely on fragmented AI deployments and siloed operating models will find it increasingly difficult to keep pace. The future belongs to organizations that adopt a platform‑based approach — one that enables them to move from incremental efficiency gains to transformational, enterprise‑wide impact.</p><p>Success will not be defined by a single model or use case. It will be defined by adaptive AI ecosystems built on strong agent architectures, interoperable connectors across agentic and legacy landscapes, and shared foundations for data, orchestration, and governance. For GBS organizations in particular, this approach provides a clear path to scale AI responsibly delivering agility, trust, and sustained value in an increasingly complex world. In an era where change is constant and scrutiny is rising; the real question is no longer whether enterprises use AI but whether they are truly adaptive to it.</p><p><i>N. Shashidar is SVP &amp; Global Head, Product Management at EdgeVerve.</i></p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6hhS5iS8ST3UVr5B6cayaI/845f1c716d09d18e542f36aec4438930/AdobeStock_930514867.jpeg?w=300&amp;q=30" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[Turning AI cost spikes into strategic growth opportunities]]></title>
            <link>https://venturebeat.com/orchestration/turning-ai-cost-spikes-into-strategic-growth-opportunities</link>
            <guid isPermaLink="false">6ZeoBK4XeqdWzEl7APABLn</guid>
            <pubDate>Tue, 12 May 2026 07:00:00 GMT</pubDate>
            <description><![CDATA[<p><i>Presented by Apptio, an IBM company</i></p><hr/><p>AI spending is surging, but the full impact often remains an open question. Closing the gap requires clear answers to how AI is governed, measured, and tied to business outcomes.</p><p>ROI uncertainty isn’t unique to AI: In the <a href="https://www.apptio.com/resources/research-reports/2026-technology-investment-management-report/">Apptio 2026 Technology Investment Management Repor</a>t, 90% of technology leaders surveyed said that ROI uncertainty has a moderate or major impact on overall tech investment decisions, a 5-percentage point year-over-year increase. In other words, tech leaders are increasing their reliance on ROI – even if they don’t fully know how to measure it. And AI economics involves new and unpredictable costs, further complicating ROI calculations. Faced with increasing uncertainty and increasing budgets, technology leaders need a clear, reliable framework for evaluating AI ROI. </p><p>Organizations increasingly expect scaled AI to pay its own way, at least partially. According to Apptio’s technology investment management report, 45% of organizations surveyed intend to fund innovation by reinvesting savings from AI-driven efficiencies. That model assumes that such savings are both achievable and quantifiable. Meanwhile, the two-thirds of organizations planning to reallocate existing budget capital to AI will need clarity on the trade-offs involved. </p><p>Much like the early days of public cloud, AI costs and returns are difficult to predict. Pricing varies widely across providers and continues to evolve, while consumption is unpredictable. The pressure to adopt quickly is also formidable as organizations navigate the threat of disruption by more agile competitors. </p><h2>The new math of AI ROI</h2><p>Considering the many variables, tech leaders should view AI ROI as a matter of optimization. At a high level, the implementation of AI initiatives is inevitable. The question is how to achieve the greatest possible returns — both financial and organizational. </p><p><b>Start with the business problem.</b> There are many ways AI can deliver positive impact, but organizational resources and focus may be limited. Make sure you’re prioritizing the right initiatives by basing your AI investment strategy on quantifiable goals tied to real business outcomes. Are you trying to improve decision-making speed? Increase throughput or capacity? Or chasing cool edge cases with high potential returns but minimal strategic relevance?</p><p><b>Determine what success looks like.</b> AI can introduce a new capability or augment an existing one. For new capabilities, articulate the possibilities you’d like to unlock, such as new revenue opportunities, workflows, or decision-making processes. For augmentations, establish baseline performance and the expected lift you aim to achieve with AI. </p><p>Consider how finances will influence your evaluation. Some use cases may show minimal results in the near-term but drive significant value in the long-term. What’s your timeframe for return? On the other hand, more successful rollouts with rapid adoption can generate unexpectedly high inference bills. Would that mean pulling the plug — or leaning in further? What should your cost and return curve look like over the years? As you map your timeline, establish clear thresholds to determine whether you’ll proceed, pause, stop, or accelerate your investment.</p><p><b>Identify the right KPIs.</b> The returns on an AI investment can be even more difficult to evaluate than the costs. Usage, efficiency, and financial impact all matter. But AI success metrics won’t always be straightforward. There may be new usage patterns you don’t yet have a way to measure. Your technology environment may experience follow-on shifts that call for further evaluation. Will you be able to lessen your reliance on other tools, such as reducing seats in your data analytics platform? How will you factor in cross-tool pricing comparisons for multiple AI providers with shifting rates? </p><p>To gain full context and insight, you must also take into account the alignment of the initiative with your broader strategy and consider the opportunity cost of the investments you might otherwise have made. Remember that you’re not evaluating AI business value in isolation; you’re deciding whether it&#x27;s the best use of finite capital across all your investments. </p><p>These decisions will call for a level of insight far exceeding what was needed to justify traditional purchases like network infrastructure or enterprise software. Tech leaders navigating the complexities of AI economics should consider a new framework for data-driven decision-making. </p><h2>Making AI investment sustainable with TBM</h2><p>Technology business management (TBM) helps make ROI more concrete and measurable, so it can be relevant to the business. By bringing together IT Financial Management (ITFM), AI FinOps (cloud financial management for AI workloads), and Strategic Portfolio Management (SPM), a TBM framework connects financial, operational, and business data across the enterprise.This makes it possible to account for AI value and cost across a wide array of dimensions — and translate hypothetical innovation into board presentations and budget justifications that hold up under scrutiny. </p><p>TBM can help leaders build a trustworthy cost foundation that captures AI spend across labor, infrastructure, inference, storage, and applications. As AI workloads shift dynamically, TBM provides visibility into how that spend is distributed across on-premises systems and cloud environments — both of which require different capacity planning for specialized skill sets. The framework also connects investments to business outcomes, aligning AI initiatives with strategic priorities and measurable results. With increased visibility, you’re able to identify issues and make decisions fast, such as catching cost spikes early. Early detection can help to determine if the usage shift merits shifting funding. This unified view of financial and operational data helps leaders scale what’s working and reassess what isn’t as adoption increases. TBM provides essential visibility and context across the entire AI spend management conversation. Even as pricing evolves, tooling changes, and workflows shift, you can apply the same analytical approach and understand what’s actually working and demonstrate ROI. Leaders who operationalize AI within a TBM framework can: </p><ul><li><p>Evaluate ROI at both project and portfolio levels</p></li><li><p>Spot unexpected cost spikes</p></li><li><p>Compare multiple AI tools </p></li><li><p>Understand ripple effects across run-the-business systems </p></li><li><p>Defend investment decisions with confidence</p></li><li><p>Understand and manage total costs and usage across the AI investment lifecycle</p></li></ul><h2>From theory to practice</h2><p>Organizations are moving beyond AI experiments, and we’re past the point where these investments can be funded on optimism alone. Amid heightened uncertainty and cost sensitivity, boards are asking more strategic questions and finance wants trustworthy data. </p><p>Enterprise leaders who treat AI as a managed investment, rather than a bet on innovation, are those who will scale it successfully. To fund AI responsibly, leaders must establish clarity around scope, outcomes, cost drivers, and readiness. A TBM-driven approach provides the data foundation, visibility, and accountability to make those decisions. </p><p>Learn more here about how <a href="https://www.apptio.com/solutions/ai-investment-management/">Apptio TBM transforms IT spend management in the AI era</a>.</p><hr/><p><i>Ajay Patel is General Manager at Apptio, an IBM Company.</i></p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4O9CPzsYyNtOhUGTGohRjT/481f64ffd72a999a2e25e53cb2a6719f/AdobeStock_1502029205.jpeg?w=300&amp;q=30" length="0" type="image/jpeg"/>
        </item>
    </channel>
</rss>