<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Thu, 23 Apr 2026 14:00:31 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[OpenAI unveils Workspace Agents, a successor to custom GPTs for enterprises that can plug directly into Slack, Salesforce and more]]></title>
            <link>https://venturebeat.com/orchestration/openai-unveils-workspace-agents-a-successor-to-custom-gpts-for-enterprises-that-can-plug-directly-into-slack-salesforce-and-more</link>
            <guid isPermaLink="false">57EiqUX3d4x1WsB6szFvEO</guid>
            <pubDate>Wed, 22 Apr 2026 23:53:00 GMT</pubDate>
            <description><![CDATA[<p>OpenAI introduced a new paradigm and product today that is likely to have huge implications for enterprises seeking to adopt and control fleets of AI agent workers.</p><p>Called &quot;<a href="https://openai.com/index/introducing-workspace-agents-in-chatgpt/">Workspace Agents</a>,&quot; OpenAI&#x27;s new offering essentially allows users on its ChatGPT Business ($20 per user per month) and variably priced Enterprise, Edu and Teachers subscription plans to design or select from pre-existing agent templates that can take on work tasks across third-party apps and data sources including Slack, Google Drive, Microsoft apps, Salesforce, Notion, Atlassian Rovo, and other popular enterprise applications.</p><div></div><p>Put simply: these agents can be created and accessed from ChatGPT, but users can also add them to third-party apps like Slack, communicate with them across disparate channels, ask them to use information from the channel they&#x27;re in and other third-party tools and apps, and the agents will go off and do work like drafting emails to the entire team, selected members, or pull data and make presentations.</p><p>Human users can trust that the agent will manage all this complexity and complete the task as requested, even if the user who requested it leaves.</p><p>It&#x27;s the end of &quot;babysitting&quot; agents and the start of letting them go off and get shit done for your business — according to your defined business processes and permissions, of course. </p><p>The product experience appears centered on the Agents tab in the ChatGPT sidebar, where teams can discover and manage shared agents. </p><p>This functions as a kind of team directory: a place where agents built by coworkers can be reused across a workspace. The broader idea is that AI becomes less of an individual productivity trick and more of a shared organizational resource.</p><p>In this sense, OpenAI is targeting one of office work’s oldest pain points: the handoff between people, systems, and steps in a process.</p><p>OpenAI says workspace agents will be free for the next two weeks, until May 6, 2026, after which credit-based pricing will begin. The company also says more capabilities are on the way, including new triggers to start work automatically, better dashboards, more ways for agents to take action across business tools, and support for workspace agents in its AI code generation app, Codex.</p><p>For more information on how to get started building and using them, OpenAI recommends heading over to its <a href="https://openai.com/academy/workspace-agents/">online academy page on them here</a> and <a href="https://help.openai.com/en/articles/20001143-chatgpt-workspace-agents-for-enterprise-and-business">its help desk documentation here</a>.</p><h2><b>The Codex backbone</b></h2><p>The most significant shift in this announcement is the move away from purely session-based interaction. Workspace agents are powered by Codex — the cloud-based, partially open-source AI coding harness that OpenAI has been aggressively expanding in 2026 — which gives them access to a workspace for files, code, tools, and memory.</p><p>OpenAI says the agents can do far more than answer a prompt. They can write or run code, use connected apps, remember what they have learned, and continue work across multiple steps. </p><p>That description lines up closely with the <a href="https://venturebeat.com/technology/openai-drastically-updates-codex-desktop-app-to-use-all-other-apps-on-your-computer-generate-images-preview-webpages">capabilities OpenAI shipped into Codex just six days ago</a>, including background computer use, more than 90 new plugins spanning tools like Atlassian Rovo, CircleCI, GitLab, Microsoft Suite, Neon by Databricks, and Render, plus image generation, persistent memory, and the ability to schedule future work and wake up on its own to continue across days or weeks.</p><p>Workspace agents inherit that plumbing. When one pulls a Friday metrics report, it is effectively spinning up a Codex cloud session with the right tools attached, running code to fetch and transform data, rendering charts, writing the narrative, and persisting what it learned for next week. </p><p>When that same agent is deployed to a Slack channel, it is a Codex instance listening for mentions and threading its work back in.</p><p>This is the technical decision enterprise buyers should focus on. Building an agent on a code-execution substrate rather than a pure LLM-call-and-response loop is what gives workspace agents the ability to do real work — transforming a CSV, reconciling two systems of record, generating a chart that is actually correct — rather than describing what the work would look like.</p><h2><b>Persistence and scheduling</b></h2><p>In earlier AI assistant models, progress paused when the user stopped interacting. Workspace agents change that by running in the cloud and supporting long-running workflows. Teams can also set them to run on a schedule.</p><p>That means a recurring reporting agent can pull data on a set cadence, generate charts and summaries, and share the results with a team without anyone manually kicking off the process. </p><p>Here at VentureBeat, we analyze story traffic and user return rate on a weekly basis — exactly the kind of recurring, multi-step, multi-source task that could theoretically be automated with a single workspace agent. Any enterprise with a weekly reporting rhythm pulling from dynamic data sources is likely to find a use for these agents.</p><p>Agents also retain memory across runs. OpenAI says they can be guided and corrected in conversation, so they improve the more a team uses them. </p><p>Over time they start to reflect how a team actually works — its processes, its standards, its preferred ways of handling recurring jobs — which is a meaningfully different proposition from the static instruction-set GPTs that preceded them.</p><h2><b>The integrated ecosystem</b></h2><p>OpenAI&#x27;s claim is that agents should gather information and take action where work already happens, rather than forcing teams into a separate interface. That point becomes clearest in the Slack examples. OpenAI&#x27;s launch materials show a product-feedback agent operating inside a channel named #user-insights, answering a question about recent mobile-app feedback with a themed summary pulled from multiple sources.</p><p>The company&#x27;s demo lineup walks through a sample team directory of agents: Spark for lead qualification and follow-up, Slate for software-request review, Tally for metrics reporting, Scout for product feedback routing, Trove for third-party vendor risk, and Angle for marketing and web content. </p><p>OpenAI also shared more functional examples its own teams use internally — a Software Reviewer that checks employee requests against approved-tools policy and files IT tickets; an accounting agent that prepares parts of month-end close including journal entries, balance-sheet reconciliations, and variance analysis, with workpapers containing underlying inputs and control totals for review; and a Slack agent used by the product team that answers employee questions, links relevant documentation, and files tickets when it surfaces a new issue.</p><p>In a sense, it is a continuation of the philosophy OpenAI espoused for individuals with last week&#x27;s Codex desktop release: the agent joins the workflow where work is already happening, draws in context from the surrounding apps, takes action where permitted, and keeps moving.</p><h2><b>From GPTs to a broader agent push</b></h2><p>Workspace agents are not a standalone launch. They sit inside a roughly 12-month arc in which OpenAI has been systematically rebuilding ChatGPT, the API, and the developer platform around agents.</p><p>Workspace agents are explicitly positioned by OpenAI as an evolution of its <a href="https://venturebeat.com/ai/openai-announces-customizable-gpts-for-businesses-and-consumers">custom GPTs, introduced in late 2023</a>, which gave users a way to create customized versions of ChatGPT for particular roles and use cases.</p><p>However, now OpenAI says it is deprecating the custom GPT standard for organizations in a yet-to-be determined future date, and will require Business, Enterprise, Edu and Teachers users to update their GPTs to be new workspace agents. </p><p>Individuals who have made custom GPTs can continue using them for the foreseeable future, according to our sources at the company.</p><p>In October 2025, <a href="https://venturebeat.com/ai/openai-unveils-agentkit-that-lets-developers-drag-and-drop-to-build-ai">OpenAI introduced AgentKit</a>, a developer-focused suite that includes Agent Builder, a Connector Registry, and ChatKit for building, deploying, and optimizing agents. </p><p>In February 2026,<a href="https://venturebeat.com/orchestration/openai-launches-centralized-agent-platform-as-enterprises-push-for-multi"> it introduced Frontier</a>, an enterprise platform focused on helping organizations manage AI coworkers with shared business context, execution environments, evaluation, and permissions. </p><p>Workspace agents arrive as the no-code, in-product entry point that sits on top of that stack — even if OpenAI does not explicitly describe the architectural relationship in its materials.</p><p>The subtext across all three launches is the same: OpenAI has decided that the future of ChatGPT-for-work is fleets of permissioned agents, not single chat windows — and that GPTs, its first attempt at letting businesses customize ChatGPT, were not enough.</p><h2><b>Governance and enterprise safeguards</b></h2><p>Because workspace agents can act across business systems, OpenAI puts heavy emphasis on governance. Admins can control who is allowed to build, run, and publish agents, and which tools, apps, and actions those agents can reach. </p><p>The role-based controls are more granular than the ones most custom-GPT rollouts ever had: admins can toggle, per role, whether members can browse and run agents, whether they can build them, whether they can publish to the workspace directory, and — separately — whether they can publish agents that authenticate using personal credentials. </p><p>That last setting is the risky case, and OpenAI explicitly recommends keeping it narrowly scoped.</p><p>Authentication itself comes in two flavors, and the choice has real consequences. In end-user account mode, each person who runs the agent authenticates with their own credentials, so the agent only ever sees what that individual is allowed to see. </p><p>In agent-owned account mode, the agent uses a single shared connection so users don&#x27;t have to authenticate at run time. OpenAI&#x27;s documentation strongly recommends service accounts rather than personal accounts for the shared case, and flags the data-exfiltration risk of publishing an agent that authenticates as its creator.</p><p>Write actions — sending email, editing a spreadsheet, posting a message, filing a ticket — default to Always ask, requiring human approval before the agent executes. </p><p>Builders can relax specific actions to &quot;Never ask&quot; or configure a custom approval policy, but the default posture is human-in-the-loop.</p><p>OpenAI also claims built-in safeguards against prompt-injection attacks, where malicious content in a document or web page tries to hijack an agent. The claim is welcome but not yet proven in the wild.</p><p>For organizations that want deeper visibility, <a href="https://community.openai.com/t/compliance-api-documentation-and-sandbox/1115444">OpenAI says its Compliance API</a> surfaces every agent&#x27;s configuration, updates, and run history. </p><p>Admins can suspend agents on the fly, and OpenAI says an admin-console view of every agent built across the organization, with usage patterns and connected data sources, is coming soon. </p><p>Two caveats worth flagging for security-sensitive buyers: workspace agents are off by default at launch for ChatGPT Enterprise workspaces pending admin enablement, and they are not available at all to Enterprise customers using Enterprise Key Management (EKM).</p><h2><b>Analytics and early customer signal</b></h2><p>OpenAI also ships an analytics dashboard aimed at helping teams understand how their agents are being used. Screenshots in the launch materials show measures like total runs, unique users, and an activity feed of recent runs, including one by a user named Ethan Rowe completing a run in a #b2b-sales channel. </p><p>The mockup detail supports OpenAI&#x27;s broader point: the company wants organizations to measure not just whether agents exist, but whether they are being used.</p><p>The clearest early-adopter signal in the launch itself comes from Rippling. Ankur Bhatt, who leads AI Engineering at the HR platform, says workspace agents shortened the traditional development cycle enough that a sales consultant was able to build a sales agent without an engineering team. &quot;It researches accounts, summarizes Gong calls, and posts deal briefs directly into the team&#x27;s Slack room,&quot; Bhatt says. &quot;What used to take reps 5–6 hours a week now runs automatically in the background on every deal.&quot; </p><p>OpenAI&#x27;s announcement names SoftBank Corp., Better Mortgage, BBVA, and Hibob as additional early testers.</p><h2><b>The era of the digital coworker</b></h2><p>Workspace agents do not land in a vacuum. They land in the middle of a broader OpenAI push — through AgentKit, through Frontier, through the Codex overhaul — to make agents more persistent, more connected, and more useful inside real organizational workflows. </p><p>They also land in a deeply crowded field: <a href="https://adoption.microsoft.com/en-us/ai-agents/copilot-studio/">Microsoft Copilot Studio</a> is wired into the Microsoft 365 base, Google is pushing <a href="https://cloud.google.com/blog/products/ai-machine-learning/google-agentspace-enables-the-agent-driven-enterprise">Agentspace</a>, Salesforce has rebuilt itself as agent infrastructure with <a href="https://venturebeat.com/orchestration/salesforces-agentforce-vibes-2-0-targets-a-hidden-failure-context-overload-in-ai-agents">Agentforce</a>, and Anthropic recently introduced <a href="https://venturebeat.com/orchestration/anthropics-claude-managed-agents-gives-enterprises-a-new-one-stop-shop-but">Claude Managed Agents</a>, all different flavors of similar ideas — agents that cut across your apps and tools, take actions on schedules repeatedly as desired, and retain some degree of memory, context, and permissions and policies.  </p><p>But this launch matters because it turns OpenAI&#x27;s strategy into something concrete for the teams already paying for ChatGPT, and because it quietly retires the product those teams were most recently told to standardize on. </p><p>If workspace agents live up to the pitch — shared, reusable, scheduled, permissioned coworkers that follow approved processes and keep work moving when their human is offline — it would mark a meaningful change in what workplace software does. Less passive software waiting for input, more active systems helping teams coordinate, execute, and move faster together.</p><p>The era of the digital coworker has begun. And, on OpenAI&#x27;s plans at least, the era of the custom GPT is ending.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4Xdu5CqjmyRwz1NBaDX14Z/b43e969be10254ca838bf1ad60a187a6/ChatGPT_Image_Apr_22__2026__07_40_47_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Google and AWS split the AI agent stack between control and execution]]></title>
            <link>https://venturebeat.com/orchestration/google-and-aws-split-the-ai-agent-stack-between-control-and-execution</link>
            <guid isPermaLink="false">2iCiwdkGm9enJMkLHOmgl6</guid>
            <pubDate>Wed, 22 Apr 2026 21:37:00 GMT</pubDate>
            <description><![CDATA[<p>The era of enterprises stitching together prompt chains and shadow agents is nearing its end as more options for orchestrating complex multi-agent systems emerge. As organizations move AI agents into production, the question remains: &quot;how will we manage them?&quot;</p><p>Google and Amazon Web Services offer fundamentally different answers, illustrating a split in the AI stack. Google’s approach is to run agentic management on the system layer, while AWS’s harness method sets up in the execution layer. </p><p>The debate on how to manage and control gained new energy this past month as competing companies released or updated their agent builder platforms—Anthropic with <a href="https://venturebeat.com/orchestration/anthropics-claude-managed-agents-gives-enterprises-a-new-one-stop-shop-but">the new Claude Managed Agents</a> and OpenAI with enhancements to <a href="https://openai.com/index/the-next-evolution-of-the-agents-sdk/">the Agents SDK</a>—giving developer teams options for managing agents. </p><p>AWS with new capabilities added to <a href="https://aws.amazon.com/bedrock/agentcore/">Bedrock AgentCore</a> is optimizing for velocity—relying on harnesses to bring agents to product faster—while still offering identity and tool management. </p><p>Meanwhile, <a href="https://cloud.google.com/blog/products/ai-machine-learning/the-new-gemini-enterprise-one-platform-for-agent-development">Google’s Gemini Enterprise</a> adopts a governance-focused approach using a Kubernetes-style control plane. Each method offers a glimpse into how agents move from short-burst task helpers to longer-running entities within a workflow. </p><h2><b>Upgrades and umbrellas</b></h2><p>To understand where each company stands, here’s what’s actually new. </p><p>Google released a new version of Gemini Enterprise, bringing its enterprise AI agent offerings—Gemini Enterprise Platform and Gemini Enterprise Application—under one umbrella. </p><p>The company has rebranded<a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform"> Vertex AI as Gemini Enterprise Platform</a>, though it insists that, aside from the name change and new features, it’s still fundamentally the same interface. </p><p>“We want to provide a platform and a front door for companies to have access to all the AI systems and tools that Google provides,” Maryam Gholami, senior director, product management for Gemini Enterprise, told VentureBeat in an interview. “The way you can think about it is that the Gemini Enterprise Application is built on top of the Gemini Enterprise Agent Platform, and the security and governance tools are all provided for free as part of Gemini Enterprise Application subscription.”</p><p>On the other hand, <a href="https://aws.amazon.com/blogs/machine-learning/get-to-your-first-working-agent-in-minutes-announcing-new-features-in-amazon-bedrock-agentcore/">AWS added a new managed agent harness to Bedrock Agentcore</a>. The company said in a press release shared with VentureBeat that the harness “replaces upfront build with a config-based starting point powered by Strands Agents, AWS’s open source agent framework.” </p><p>Users define what the agent does, the model it uses and the tools it calls, and AgentCore does the work to stitch all of that together to run the agent. </p><h2><b>Agents are now becoming systems</b></h2><p>The shift toward stateful, long-running autonomous agents has forced a rethink of how AI systems behave. As agents move from short-lived tasks to long-running workflows, a new class of failure is emerging: state drift.</p><p>As agents continue operating, they accumulate state—memory, too, responses and evolving context. Over time, that state becomes outdated. Data sources change, or tools can return conflicting responses. But the agent becomes more vulnerable to inconsistencies and becomes less truthful. </p><p>Agent reliability becomes a systems problem, and managing that drift may need more than faster execution; it may require visibility and control. </p><p>It’s this failure point that platforms like Gemini Enterprise and AgentCore try to prevent. </p><p>Though this shift is already happening, Gholami admitted that customers will dictate how they want to run and control any long-running agent. </p><p>“We are going to learn a lot from customers where they would be using long-running agents, where they just assign a task to these autonomous agents to just go ahead and do,” Gholami said. “Of course, there are tricks and balances to get right and the agent may come back and ask for more input.”</p><h2><b>The new AI stack</b></h2><p>What’s becoming increasingly clear is that the AI stack is separating into distinct layers, solving different problems.  </p><p>AWS and, to a certain extent, Anthropic and OpenAI, optimize for faster deployment. Claude Managed Agents abstracts much of the backend work for standing up an agent, while the Agents SDK now includes support for sandboxes and a ready-made harness. These approaches aim to lower the barrier to getting agents up and running.</p><p>Google offers a centralized control panel to manage identity, enforce policies and monitor long-running behaviors. </p><p>Enterprises likely need both. </p><p>As some practitioners see it, their businesses have to have a serious conversation on how much risk they are willing to take. </p><p>“The main takeaway for enterprise technology leaders considering these technologies at the moment may be formulated this way: while the agent harness vs. runtime question is often perceived as build vs. buy, this is primarily a matter of risk management. If you can afford to run your agents through a third-party runtime because they do not affect your revenue streams, that is okay. On the contrary, in the context of more critical processes, the latter option will be the only one to consider from a business perspective,” Rafael Sarim Oezdemir, head of growth at EZContacts, told VentureBeat in an email.</p><p>Iterating quickly lets teams experiment and discover what agents can do, while centralized control adds a layer of trust. What enterprises need is to ensure they are not locked into systems designed purely for a single way of executing agents. </p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/1Vp25PZDnQQMBBb7WLhth0/f565b23d4dfbc56def7a55fae6405769/crimedy7_illustration_of_a_schism_but_related_to_artificial_i_2d035a47-60c4-4cab-9029-d5ee39e3dbdc_0.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Are you paying an AI ‘swarm tax’? Why single agents often beat complex systems]]></title>
            <link>https://venturebeat.com/orchestration/are-you-paying-an-ai-swarm-tax-why-single-agents-often-beat-complex-systems</link>
            <guid isPermaLink="false">27cHFi06Z203YGxQaliSDo</guid>
            <pubDate>Wed, 22 Apr 2026 21:24:49 GMT</pubDate>
            <description><![CDATA[<p>Enterprise teams building multi-agent AI systems may be paying a compute premium for gains that don&#x27;t hold up under equal-budget conditions. New Stanford University research finds that single-agent systems match or outperform multi-agent architectures on complex reasoning tasks when both are given the same thinking token budget.</p><p>However, multi-agent systems come with the added baggage of computational overhead. Because they typically use longer reasoning traces and multiple interactions, it is often unclear whether their reported gains stem from architectural advantages or simply from consuming more resources.</p><p>To isolate the true driver of performance, researchers at Stanford University <a href="https://arxiv.org/abs/2604.02460">compared single-agent systems against multi-agent architectures</a> on complex multi-hop reasoning tasks under equal &quot;thinking token&quot; budgets.</p><p>Their experiments show that in most cases, single-agent systems match or outperform multi-agent systems when compute is equal. Multi-agent systems gain a competitive edge when a single agent&#x27;s context becomes too long or corrupted.</p><p>In practice, this means that a single-agent model with an adequate thinking budget can deliver more efficient, reliable, and cost-effective multi-hop reasoning. Engineering teams should reserve multi-agent systems for scenarios where single agents hit a performance ceiling.</p><h2>Understanding the single versus multi-agent divide</h2><p>Multi-agent frameworks, such as planner agents, role-playing systems, or debate swarms, break down a problem by having multiple models operate on partial contexts. These components communicate with each other by passing their answers around.</p><p>While multi-agent solutions show strong empirical performance, comparing them to single-agent baselines is often an imprecise measurement. Comparisons are heavily confounded by differences in test-time computation. Multi-agent setups require multiple agent interactions and generate longer reasoning traces, meaning they consume significantly more tokens.</p><p>ddConsequently, when a multi-agent system reports higher accuracy, it is difficult to determine if the gains stem from better architecture design or from spending extra compute.</p><p><a href="https://venturebeat.com/orchestration/research-shows-more-agents-isnt-a-reliable-path-to-better-enterprise-ai">Recent studies</a> show that when the compute budget is fixed, elaborate multi-agent strategies frequently underperform compared to strong single-agent baselines. However, they are mostly very broad comparisons that don’t account for nuances such as different multi-agent architectures or the difference between prompt and reasoning tokens.</p><p>“A central point of our paper is that many comparisons between single-agent systems (SAS) and multi-agent systems (MAS) are not apples-to-apples,” paper authors Dat Tran and Douwe Kiela told VentureBeat. “MAS often get more effective test-time computation through extra calls, longer traces, or more coordination steps.”</p><h2>Revisiting the multi-agent challenge under strict budgets</h2><p>To create a fair comparison, the Stanford researchers set a strict “thinking token” budget. This metric controls the total number of tokens used exclusively for intermediate reasoning, excluding the initial prompt and the final output.</p><p>The study evaluated single- and multi-agent systems on multi-hop reasoning tasks, meaning questions that require connecting multiple pieces of disparate information to reach an answer.</p><p>During their experiments, the researchers noticed that single-agent setups sometimes stop their internal reasoning prematurely, leaving available compute budget unspent. To counter this, they introduced a technique called SAS-L (single-agent system with longer thinking).</p><p>Rather than jumping to multi-agent orchestration when a model gives up early, the researchers suggest a simple prompt-and-budgeting change.</p><p>&quot;The engineering idea is simple,&quot; Tran and Kiela said. &quot;First, restructure the single-agent prompt so the model is explicitly encouraged to spend its available reasoning budget on pre-answer analysis.&quot;</p><p>By instructing the model to explicitly identify ambiguities, list candidate interpretations, and test alternatives before committing to a final answer, developers can recover the benefits of collaboration inside a single-agent setup. </p><p>The results of their experiments confirm that a single agent is the strongest default architecture for multi-hop reasoning tasks. It produces the highest accuracy answers while consuming fewer reasoning tokens. When paired with specific models like Google&#x27;s Gemini 2.5, the longer-thinking variant produces even better aggregate performance.</p><p>The researchers rely on a concept called “Data Processing Inequality” to explain why a single agent outperforms a swarm. Multi-agent frameworks introduce inherent communication bottlenecks. Every time information is summarized and handed off between different agents, there is a risk of data loss.</p><p>In contrast, a single agent reasoning within one continuous context avoids this fragmentation. It retains access to the richest available representation of the task and is thus more information-efficient under a fixed budget.</p><p>The authors also note that enterprises often overlook the secondary costs of multi-agent systems.</p><p>&quot;What enterprises often underestimate is that orchestration is not free,&quot; they said. &quot;Every additional agent introduces communication overhead, more intermediate text, more opportunities for lossy summarization, and more places for errors to compound.&quot;</p><p>On the other hand, they discovered that multi-agent orchestration is superior when a single agent&#x27;s environment gets messy. If an enterprise application must handle highly degraded contexts, such as noisy data, long inputs filled with distractors, or corrupted information, a single agent struggles. In these scenarios, the structured filtering, decomposition, and verification of a multi-agent system can recover relevant information more reliably.</p><p>The study also warns about hidden evaluation traps that falsely inflate multi-agent performance. Relying purely on API-reported token counts heavily distorts how much computation an architecture is actually spending. The researchers found these accounting artifacts when testing models like Gemini 2.5, proving this is an active issue for enterprise applications today.</p><p>&quot;For API models, the situation is trickier because budget accounting can be opaque,&quot; the authors said. To evaluate architectures reliably, they advise developers to &quot;log everything, measure the visible reasoning traces where available, use provider-reported reasoning-token counts when exposed, and treat those numbers cautiously.&quot;</p><h2>What it means for developers</h2><p>If a single-agent system matches the performance of multiple agents under equal reasoning budgets, it wins on total cost of ownership by offering fewer model calls, lower latency, and simpler debugging. Tran and Kiela warn that without this baseline, &quot;some enterprises may be paying a large &#x27;swarm tax&#x27; for architectures whose apparent advantage is really coming from spending more computation rather than reasoning more effectively.&quot;</p><p>Another way to look at the decision boundary is not how complex the overall task is, but rather where the exact bottleneck lies.</p><p>&quot;If it is mainly reasoning depth, SAS is often enough. If it is context fragmentation or degradation, MAS becomes more defensible,&quot; Tran said.</p><p>Engineering teams should stay with a single agent when a task can be handled within one coherent context window. Multi-agent systems become necessary when an application handles highly degraded contexts. </p><p>Looking ahead, multi-agent frameworks will not disappear, but their role will evolve as frontier models improve their internal reasoning capabilities.</p><p>&quot;The main takeaway from our paper is that multi-agent structure should be treated as a targeted engineering choice for specific bottlenecks, not as a default assumption that more agents automatically means better intelligence,&quot; Tran said.</p>]]></description>
            <author>bendee983@gmail.com (Ben Dickson)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/50hUpeJCVrEi4aq4xSiteQ/7567801aa13be503c7ff3264c7fa5f54/single_vs_multi-agent_systems.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets]]></title>
            <link>https://venturebeat.com/data/openai-launches-privacy-filter-an-open-source-on-device-data-sanitization-model-that-removes-personal-information-from-enterprise-datasets</link>
            <guid isPermaLink="false">37uJPz4z7bnTXZaWO5SOP8</guid>
            <pubDate>Wed, 22 Apr 2026 18:01:00 GMT</pubDate>
            <description><![CDATA[<p>In a significant shift toward local-first privacy infrastructure, OpenAI has released <b>Privacy Filter</b>, a specialized open-source model designed to detect and redact personally identifiable information (PII) before it ever reaches a cloud-based server. </p><p>Launched today on AI code sharing community <a href="https://huggingface.co/openai/privacy-filter">Hugging Face</a> under a permissive <b>Apache 2.0 license</b>, the tool addresses a growing industry bottleneck: the risk of sensitive data &quot;leaking&quot; into training sets or being exposed during high-throughput inference.</p><p>By providing a 1.5-billion-parameter model that can run on a standard laptop or directly in a web browser, the company is effectively handing developers a &quot;privacy-by-design&quot; toolkit that functions as a sophisticated, context-aware digital shredder.</p><p>Though OpenAI was founded with a focus on open source models such as this, the company shifted during the ChatGPT era to providing more proprietary (&quot;closed source&quot;) models available only through its website, apps, and API — only to return to open source in a big way last year with the launch of the<a href="https://venturebeat.com/ai/openai-returns-to-open-source-roots-with-new-models-gpt-oss-120b-and-gpt-oss-20b"> gpt-oss family of language models</a>.</p><p>In that light, and combined with<a href="https://github.com/openai/symphony"> OpenAI&#x27;s recent open sourcing of agentic orchestration</a> tools and frameworks, it&#x27;s safe to say that the generative AI giant is clearly still heavily invested in fostering this less immediately lucrative part of the AI ecosystem. </p><h2><b>Technology: a gpt-oss variant with bidirectional token classifier that reads from both directions</b></h2><p>Architecturally, Privacy Filter is a derivative of OpenAI’s <b>gpt-oss</b> family, a series of open-weight reasoning models released earlier this year. </p><p>However, while standard large language models (LLMs) are typically autoregressive—predicting the next token in a sequence—Privacy Filter is a <b>bidirectional token classifier</b>.</p><p>This distinction is critical for accuracy. By looking at a sentence from both directions simultaneously, the model gains a deeper understanding of context that a forward-only model might miss. </p><p>For instance, it can better distinguish whether &quot;Alice&quot; refers to a private individual or a public literary character based on the words that follow the name, not just those that precede it.</p><p>The model utilizes a Sparse Mixture-of-Experts (MoE) framework. Although it contains 1.5 billion total parameters, only 50 million parameters are active during any single forward pass. </p><p>This sparse activation allows for high throughput without the massive computational overhead typically associated with LLMs. Furthermore, it features a massive <b>128,000-token context window</b>, enabling it to process entire legal documents or long email threads in a single pass without the need for fragmenting text—a process that often causes traditional PII filters to lose track of entities across page breaks.</p><p>To ensure the redacted output remains coherent, OpenAI implemented a constrained Viterbi decoder. Rather than making an independent decision for every single word, the decoder evaluates the entire sequence to enforce logical transitions. </p><p>It uses a &quot;BIOES&quot; (Begin, Inside, Outside, End, Single) labeling scheme, which ensures that if the model identifies &quot;John&quot; as the start of a name, it is statistically inclined to label &quot;Smith&quot; as the continuation or end of that same name, rather than a separate entity.</p><h2><b>On-device data sanitization</b></h2><p>Privacy Filter is designed for high-throughput workflows where data residency is a non-negotiable requirement. It currently supports the detection of eight primary PII categories:</p><ul><li><p><b>Private Names:</b> Individual persons.</p></li><li><p><b>Contact Info:</b> Physical addresses, email addresses, and phone numbers.</p></li><li><p><b>Digital Identifiers:</b> URLs, account numbers, and dates.</p></li><li><p><b>Secrets:</b> A specialized category for credentials, API keys, and passwords.</p></li></ul><p>In practice, this allows enterprises to deploy the model on-premises or within their own private clouds. By masking data locally before sending it to a more powerful reasoning model (like GPT-5 or gpt-oss-120b), companies can maintain compliance with strict GDPR or HIPAA standards while still leveraging the latest AI capabilities.</p><p>For developers, the model is available via Hugging Face, with native support for <code>transformers.js</code>, allowing it to run entirely within a user&#x27;s browser using WebGPU.</p><h2><b>Fully open source, commercially viable Apache 2.0 license</b></h2><p>Perhaps the most significant aspect of the announcement for the developer community is the <b>Apache 2.0 license</b>. Unlike &quot;available-weight&quot; licenses that often restrict commercial use or require &quot;copyleft&quot; sharing of derivative works, Apache 2.0 is one of the most permissive licenses in the software world.For startups and dev-tool makers, this means:</p><ol><li><p><b>Commercial Freedom:</b> Companies can integrate Privacy Filter into their proprietary products and sell them without paying royalties to OpenAI.</p></li><li><p><b>Customization:</b> Teams can fine-tune the model on their specific datasets (such as medical jargon or proprietary log formats) to improve accuracy for niche industries.</p></li><li><p><b>No Viral Obligations:</b> Unlike the GPL license, builders do not have to open-source their entire codebase if they use Privacy Filter as a component.</p></li></ol><p>By choosing this licensing path, OpenAI is positioning Privacy Filter as a standard utility for the AI era—essentially the &quot;SSL for text&quot;.</p><h3><b>Community reactions</b></h3><p>The tech community reacted quickly to the release, with many noting the impressive technical constraints OpenAI managed to hit. </p><p>Elie Bakouch (<a href="https://x.com/eliebakouch/status/2046979020890198503">@eliebakouch</a>), a research engineer at agentic model training platform startup Prime Intellect, <a href="https://x.com/eliebakouch/status/2046979020890198503">praised the efficiency of Privacy Filter&#x27;s architecture on X:</a></p><blockquote><p>&quot;Very nice release by @OpenAI! A 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion scale data cheaply. keeping 128k context with such a small model is quite impressive too&quot;.</p></blockquote><p>The sentiment reflects a broader industry trend toward &quot;small but mighty&quot; models. While the world has focused on massive, 100-trillion parameter giants, the practical reality of enterprise AI often requires small, fast models that can perform one task—like privacy filtering—exceptionally well and at a low cost.</p><p>However, OpenAI included a &quot;High-Risk Deployment Caution&quot; in its documentation. The company warned that the tool should be viewed as a &quot;redaction aid&quot; rather than a &quot;safety guarantee,&quot; noting that over-reliance on a single model could lead to &quot;missed spans&quot; in highly sensitive medical or legal workflows. </p><p>OpenAI’s Privacy Filter is clearly an effort by the company to make the AI pipeline fundamentally safer. </p><p>By combining the efficiency of a Mixture-of-Experts architecture with the openness of an Apache 2.0 license,  OpenAI is providing a way for many enterprises to more easily, cheaply and safely redact PII data.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6XCBm5srH1Bxz7O3CpDLSp/a22758d387829873af7990a15e208306/ChatGPT_Image_Apr_22__2026__01_50_30_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Google doesn't pay the Nvidia tax. Its new TPUs explain why.]]></title>
            <link>https://venturebeat.com/orchestration/google-doesnt-pay-the-nvidia-tax-its-new-tpus-explain-why</link>
            <guid isPermaLink="false">oT6H02ArYKj4yYDQPhZGG</guid>
            <pubDate>Wed, 22 Apr 2026 17:04:17 GMT</pubDate>
            <description><![CDATA[<p>Every frontier AI lab right now is rationing two things: electricity and compute. Most of them buy their compute for model training from the same supplier, at the steep gross margins that have turned Nvidia into one of the most valuable companies in the world. Google does not.</p><p>On Tuesday night, inside a private gathering at F1 Plaza in Las Vegas, Google previewed its eighth-generation Tensor Processing Units. The pitch: two custom silicon designs shipping later this year, each purpose-built for a different half of the modern AI workload. TPU 8t targets training for frontier models, and TPU 8i targets the low-latency, memory-hungry world of agentic inference and real-time sampling.</p><p>Amin Vahdat, Google&#x27;s SVP and chief technologist for AI and infrastructure (pictured above left), used his time onstage to make a point that matters more to enterprise buyers than any individual spec: Google designs every layer of its AI stack end-to-end, and that vertical integration is starting to show up in cost-per-token economics that Google says its rivals cannot match.</p><h2>&quot;One chip a year wasn&#x27;t enough&quot;: Inside Google&#x27;s 2024 bet on a two-chip roadmap</h2><p>The more interesting story behind v8t and v8i is when the decision to split the roadmap was made. The call came in 2024, according to Vahdat — a year before the industry at large pivoted to reasoning models, agents and reinforcement learning as the dominant frontier workload.</p><p>At the time, it was a contrarian read. &quot;We realized two years ago that one chip a year wouldn&#x27;t be enough,&quot; Vahdat said during the fireside. &quot;This is our first shot at actually going with two super high-powered specialized chips.&quot;</p><p>For enterprise buyers, the implication is concrete. Customers running fine-tuning or large-scale training on Google Cloud and customers serving production agents on <a href="https://cloud.google.com/vertex-ai"><u>Vertex AI</u></a> have been renting the same accelerators and eating the inefficiency. V8 is the first generation where the silicon itself treats those as different problems with two sets of chips.</p><h2>TPU 8t: A training fabric that scales to a million chips</h2><p>On paper, TPU 8t is an aggressive generational step. According to Google, 8t delivers 2.8x the FP4 EFlops per pod (121 vs 42.5) against Ironwood, the seventh-generation TPU that shipped in 2025, doubles bidirectional scale-up bandwidth to 19.2 Tb/s per chip, and quadruples scale-out networking to 400 Gb/s per chip. Pod size grows modestly from 9,216 to 9,600 chips, held together by Google&#x27;s 3D Torus topology.</p><p>The number that matters most to IT leaders evaluating where to run frontier-scale training: 8t clusters (Superpods) can scale beyond 1 million TPU chips in a single training job via a new interconnect Google is calling Virgo networking. </p><p>8t also introduces TPU Direct Storage, which moves data from Google&#x27;s managed storage tier directly into HBM without the usual CPU-mediated hops. For long training runs where wall-clock time is the cost driver, collapsing that data path reduces the number of pod-hours needed to finish each epoch.</p><h2>TPU 8i and Boardfly: Re-engineering the network for agents</h2><p>If 8t is an evolutionary step, TPU 8i is the more architecturally interesting chip. It is also where the story for IT buyers gets most compelling.</p><p>The year-over-year spec jumps are, as Vahdat put it, “stunning.” According to Google, 8i delivers 9.8x the FP8 EFlops per pod (11.6 vs 1.2), 6.8x the HBM capacity per pod (331.8 TB vs 49.2), and a pod size that grows 4.5x from 256 to 1,152 chips.</p><p>What drove those numbers is a rethink of the network itself. Vahdat explained the insight directly: Google&#x27;s default way of connecting chips together supported bandwidth over latency — good for moving large amounts of data through, not built for the minimum time it takes a response to get back. That profile works for training. For agents, it does not. In partnership with Google DeepMind, the TPU team built what Google calls Boardfly topology specifically to reduce the network diameter — shrinking the number of hops between any two chips in a pod. Paired with a Collective Acceleration Engine and what Google describes as very large on-chip SRAM, 8i delivers a claimed 5x improvement in latency for real-time LLM sampling and reinforcement learning.</p><h2>The vertical-integration moat: Why Google doesn&#x27;t pay the &quot;Nvidia tax&quot;</h2><p>The subtext across Vahdat&#x27;s presentation was a six-layer diagram Google calls its AI stack: energy at the foundation, then data center land and enclosures, AI infrastructure hardware, AI infrastructure software, models (Gemini 3), and services on top. Vahdat noted that designing each layer in isolation forces you to the least common denominator for each layer. Google designs them together.</p><p>This is where the competitive story for IT buyers and analysts crystallizes. OpenAI, Anthropic, xAI and Meta all depend heavily on Nvidia silicon to train their frontier models. Every H200 and Blackwell GPU they buy carries Nvidia’s data-center gross margin — the informal &quot;Nvidia tax&quot; that industry analysts have flagged for two years running as a structural cost disadvantage for anyone renting rather than designing. Google pays fab, packaging and engineering costs on its TPUs. It does not pay that margin. </p><h2>What v8 means for the compute race: A new evaluation checklist for IT leaders</h2><p>For procurement and infrastructure teams, TPUv8 reframes the 2026–2027 cloud evaluation in concrete ways.</p><p>Teams training large proprietary models should look at 8t availability windows, Virgo networking access, and goodput SLAs — not just headline EFlops. Teams serving agents or reasoning workloads should evaluate 8i availability on Vertex AI, independent latency benchmarks as they emerge, and whether HBM-per-pod sizing fits their context windows. Teams consuming Gemini through Gemini Enterprise should inherit the 8i lift and should expect the ceiling on what they can deploy in production to rise meaningfully through 2026.</p><p>The caveats are real. General availability is still &quot;later in 2026.&quot; The v8 is a roadmap signal, not a procurement decision today. Google&#x27;s benchmarks are self-reported; undoubtedly independent numbers will come from early cloud customers and third-party evaluators over the next two quarters. And portability between JAX/XLA and the CUDA/PyTorch ecosystem remains a friction cost worth thinking about when negotiating any multi-year commitment.</p><p>Looking further out, Vahdat made two predictions worth noting. First, general-purpose CPUs will see a resurgence inside AI systems — not as accelerators, but as orchestration compute for agent sandboxes, virtual machines and tool execution. Second, framed explicitly as an industry prediction rather than a Google roadmap preview, specialization also keeps going strong. As general-purpose CPUs gain plateau at a few percent a year, workloads that matter will demand purpose-built silicon. &quot;Two chips might become more,&quot; Vahdat said — without specifying whether the &quot;more&quot; would mean future TPU variants or other classes of specialized accelerators.</p><p>The frontier compute race used to be a question of who could buy the most H100s. It is now a question of who controls the stack. The shortlist of companies that genuinely do is, for the moment, two: Google and Nvidia.</p>]]></description>
            <author>sam.witteveen@venturebeat.com (Sam Witteveen)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/7Lv60ZibdUZoJ7qlsTTtBS/28670e6d74be8e010be8906057eb9bae/PXL_20260422_025515735.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Salesforce’s Agentforce Vibes 2.0 targets a hidden failure: context overload in AI agents]]></title>
            <link>https://venturebeat.com/orchestration/salesforces-agentforce-vibes-2-0-targets-a-hidden-failure-context-overload-in-ai-agents</link>
            <guid isPermaLink="false">2SKosQpj57xObnRTlhMng9</guid>
            <pubDate>Wed, 22 Apr 2026 15:54:00 GMT</pubDate>
            <description><![CDATA[<p>When startup fundraising platform VentureCrowd began deploying AI coding agents, they saw the same gains as other enterprises: they cut the front-end development cycle by 90% in some projects.</p><p>However, it didn’t come easy or without a lot of trial and error. </p><p>VentureCrowd’s first challenge revolved around data and context quality, since Diego Mogollon, chief product officer at VentureCrowd, told VentureBeat that “agents reason against whatever data they can access at runtime” and would then be confidently “wrong” because they’re only basing their knowledge on the context given to them.</p><p>Their other roadblock, like many others, was messy data and unclear processes. Similar to context, Mogollon said coding agents would amplify bad data, so the company had to build a well-structured codebase first.  </p><p>“The challenges are rarely about the coding agents themselves; they are about everything around them,” said  Mogollon. “It’s a context problem disguised as an AI problem, and it is the number one failure mode I see across agentic implementations.”</p><p>Mogollon said VentureCrowd encountered several roadblocks in overhauling its software development. </p><p>VentureCrowd&#x27;s experience illustrates a broader issue in AI agent development. The models are not failing the agents; rather, they become overwhelmed by too much context and too many tools at once. </p><h2><b>Too much context  </b></h2><p>This comes from a phenomenon <a href="https://medium.com/@999daza/why-your-ai-agents-fail-the-context-overload-problem-and-the-slice-framework-that-fixes-it-873653ce0db6">called Context bloat</a>, when AI systems accumulate more and more data, tools or instructions, the more complex the workflows become. </p><p>The problem arises because agents need context to work better, but too much of it creates noise. And the more context an agent has to sift through, the more tokens it uses, the work slows down and the costs increase. </p><p>One way to curb context bloat is through context engineering. <a href="https://venturebeat.com/ai/how-context-engineering-can-save-your-company-from-ai-vibe-code-overload">Context engineering</a> helps agents understand code changes or pull requests and align them with their tasks. </p><p>However, context engineering often becomes an external task rather than built into the coding platforms enterprises use to build their agents. </p><h2><b>How coding agent providers respond</b></h2><p>VentureCrowd relied on one solution in particular to help it overcome the issues with context bloat plaguing its enterprise AI agent deployment: Salesforce’s Agentforce Vibes, a coding platform that lives within Salesforce and is <a href="https://www.salesforce.com/agentforce/pricing/?d=afx">available for all plans starting with the free one</a>. </p><p>Salesforce recently updated <!-- -->Agentforce Vibes to version 2.0, expanding support for third-party frameworks like ReAct. Most important for companies like VentureCrowd, Agentforce Vibes added Abilities and Skills, which they can use to direct agent behavior. </p><p>“For context, our entire platform, frontend and backend, runs on the Salesforce ecosystem. So when Agentforce Vibes launched, it slotted naturally into an environment we already knew well,” Mogollon said.</p><p>Salesforce’s approach doesn’t minimize the context agents use; rather, it helps enterprises ensure that context stays within their data models or codebases. Agentforce Vibes adds additional execution through the new Skills and Abilities feature. Abilities define what agents want to accomplish, and Skills are the tools they will use to get there.</p><p>Other coding agent platforms manage context differently. For example, Claude Code and OpenAI’s Codex focus on autonomous execution, continuously reading files, running commands and as tasks evolve, expanding context. Claude Code has a <a href="https://docs.anthropic.com/en/docs/claude-code/sub-agents?utm_source=chatgpt.com">context indicator</a> that which compacts context when it becomes too large.</p><p>With these different approaches, the consistent pattern is that most systems manage growing contexts for agents, not necessarily to limit them. Context keeps growing, especially as workflows become more complex, making it more difficult for enterprises to control costs, latency and reliability. </p><p>Mogollon said his company chose Agentforce Vibes not only because a large portion of their data already lives on Salesforce, making it easier to integrate, but also because it would allow them to control more of the context they feed their agents. </p><h2><b>What builders should know</b></h2><p>There’s no single way to address context bloat, but the pattern is now clear: more context doesn&#x27;t always mean better results.</p><p>Along with investing in context engineering, enterprises have to experiment with the context constraint approach they are most comfortable with. For enterprises, that means the challenge isn’t just giving agents more information—it’s deciding what to leave out.</p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6VdAsRWYUYHxLrBm6wNunN/59f5d901453d586c1258054428db24c6/crimedy7_illustration_of_a_robot_that_is_so_overwhelmed_with__e33daa08-90c2-46bf-8da1-914393d67905_1.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Google’s Gemini can now run on a single air-gapped server — and vanish when you pull the plug]]></title>
            <link>https://venturebeat.com/technology/googles-gemini-can-now-run-on-a-single-air-gapped-server-and-vanish-when-you-pull-the-plug</link>
            <guid isPermaLink="false">30ELGiGLE8WLLrxSK7SE3L</guid>
            <pubDate>Wed, 22 Apr 2026 12:00:00 GMT</pubDate>
            <description><![CDATA[<p><a href="https://www.cirrascale.com/">Cirrascale Cloud Services</a> today announced it has expanded its partnership with <a href="https://cloud.google.com/?hl=en">Google Cloud </a>to deliver the <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/">Gemini model</a> <a href="https://cloud.google.com/blog/products/ai-machine-learning/run-gemini-and-ai-on-prem-with-google-distributed-cloud">on-premises</a> through <a href="https://cloud.google.com/distributed-cloud?hl=en">Google Distributed Cloud</a>, making it the first neocloud provider to offer Google&#x27;s most advanced AI model as a fully private, disconnected appliance. The announcement, timed to coincide with <a href="https://www.googlecloudevents.com/next-vegas">Google Cloud Next 2026</a> in Las Vegas, addresses a stubborn problem that has plagued regulated industries since the generative AI boom began: how to access frontier-class AI models without surrendering control of your data.</p><p>The offering packages Gemini into a <a href="https://www.cirrascale.com/">Dell-manufactured, Google-certified hardware appliance</a> equipped with eight Nvidia GPUs and wrapped in confidential computing protections. Enterprises and government agencies can deploy the system inside Cirrascale&#x27;s data centers or their own facilities, fully disconnected from the internet and from Google&#x27;s cloud infrastructure. The product enters preview immediately, with general availability expected in June or July.</p><p>In an exclusive interview with VentureBeat ahead of the announcement, Dave Driggers, CEO of Cirrascale Cloud Services, described the deployment as &quot;the next step of the partnership” and “being able to offer their most important model they have, which is Gemini.&quot; He was emphatic about what customers would be getting: &quot;It is full blown Gemini. It&#x27;s not pulled,” he told VentureBeat. “Nothing&#x27;s missing from it, and it&#x27;ll be available in a private scenario, so that we can guarantee them that their data is secure, their inputs are secure, their outputs are secure.&quot;</p><p>The move signals a deepening shift in the enterprise AI market, where the most capable models are migrating out of hyperscaler data centers and into customers&#x27; own racks — a reversal of the cloud computing orthodoxy that defined the past decade.</p><h2><b>The impossible tradeoff that kept banks and governments on the AI sidelines</b></h2><p>For years, organizations in financial services, healthcare, defense and government faced a binary choice: access the most powerful AI models through public cloud APIs, exposing sensitive data to third-party infrastructure, or settle for less capable open-source models they could host themselves. Cirrascale&#x27;s new offering attempts to eliminate that tradeoff entirely.</p><p>Driggers described how the trust problem escalated in stages. First, companies worried about handing their proprietary data to hyperscalers. Then came a deeper realization. &quot;They started realizing, holy crap, when my users type stuff in, they&#x27;re giving private information away — and the output is private too,&quot; Driggers told VentureBeat. &quot;And then the hyperscalers said, &#x27;Your prompts and the responses? That&#x27;s our stuff. We need that in order to answer your question.&#x27;&quot; That was the moment, he argued, when the demand for fully private AI became impossible to ignore.</p><p>Unlike <a href="https://cloud.google.com/distributed-cloud?hl=en">Google Distributed Cloud</a>, which Google already offers as its own on-premises cloud extension, the Cirrascale deployment places the actual model — weights and all — outside of Google&#x27;s infrastructure entirely. &quot;Google doesn&#x27;t own this hardware. We own the hardware, or the customer owns the hardware,&quot; Driggers said. &quot;It is completely outside of Google.&quot;</p><p>Driggers drew a sharp distinction between this offering and what competitors provide. When asked about Microsoft Azure&#x27;s <a href="https://azure.microsoft.com/en-us/products/deployment-environments">on-premises deployments with OpenAI models</a> and <a href="https://aws.amazon.com/outposts/">AWS Outposts</a>, he was blunt: &quot;Those are a lot different. This is the actual model being deployed on prem outside of their cloud. It&#x27;s not a cut down version. It&#x27;s the actual model.&quot; </p><h2><b>Pull the plug and the model vanishes: how confidential computing guards Google&#x27;s crown jewel</b></h2><p>The technical underpinnings of the deployment reveal how seriously both Google and Cirrascale are treating the security question. The Gemini model resides entirely in volatile memory — not on persistent storage. &quot;As soon as the power is off, the model is gone,&quot; Driggers explained. User sessions operate through caches that clear automatically when a session ends. &quot;A company&#x27;s user inputs, once that session&#x27;s over, they&#x27;re gone. They can be saved, but by default, they&#x27;re gone,&quot; he said.</p><p>Perhaps the most striking security feature is what happens when someone attempts to tamper with the appliance. Driggers described a mechanism that effectively renders the machine inoperable: &quot;You do anything that is against confidential compute, and it&#x27;s gone. Not only does the machine turn off, and therefore the model is gone, it actually puts in a marker that says, &#x27;You violated the confidential compute.&#x27; That machine has to come back to us, or back to Dell or back to Google.&quot; He characterized the appliance as something that &quot;does time bomb itself if something goes wrong.&quot;</p><p>This level of protection reflects Google&#x27;s own anxiety about releasing its flagship model&#x27;s weights into environments it doesn&#x27;t control. The appliance is effectively a vault: the model runs inside it, but nobody — not even the customer — can extract or inspect the weights. The confidential computing envelope ensures that even physical possession of the hardware doesn&#x27;t grant access to the model&#x27;s intellectual property.</p><p>When Google releases a new version of Gemini, the appliance needs to reconnect — but only briefly, and through a private channel. &quot;It does have to get connected back to Google to load the new model. But that can go via a private connection,&quot; Driggers said. For the most security-sensitive customers who can never allow their machine to connect to an outside network, Cirrascale offers a physical swap: &quot;The server will be unplugged, purged, all the data gone, guaranteed it&#x27;s gone, a new server will show up with a new version of the model.&quot;</p><h2><b>From Wall Street to drug labs, the rush for air-gapped AI is accelerating</b></h2><p>Driggers identified three primary drivers of demand: trust, security and guaranteed performance. Financial services institutions top the list. &quot;They&#x27;ve got regulatory issues where they can&#x27;t have something out of their control. They&#x27;ve got to be the one who determines where everything is. It&#x27;s got to be air gap,&quot; Driggers said. The minimum deployment footprint — a single eight-GPU server — makes the product accessible in a way that Google&#x27;s own private offerings do not. Running Gemini on Google&#x27;s TPU-based infrastructure, Driggers noted, requires a much larger commitment. &quot;If you want a private [instance] from Google, they require a much bigger bite, because to build something private for you, Google requires a gigantic footprint. Here we can do it down to a single machine.&quot;</p><p>Beyond finance, Driggers pointed to drug discovery, medical data, public-sector research, and any business handling personal information. He also flagged an increasingly critical use case: data sovereignty. &quot;How about your business that&#x27;s doing business outside of the United States, and now you&#x27;ve got data sovereignty laws in places where GCP is not? We can provide private Gemini in these smaller countries where the data can&#x27;t leave.&quot;</p><p>The public sector is another major target. Cirrascale launched a dedicated <a href="https://www.cirrascale.com/blogs/partnering-with-google">Government Services division</a> in March as part of its earlier partnership with Google Public Sector around the GPAR (Google Public Sector Program for Accelerated Research) initiative. That program provides higher education and research institutions access to AI tools including <a href="https://deepmind.google/science/alphafold/">AlphaFold</a>, AI Co-Scientist, and Gemini Enterprise for Education. Today&#x27;s announcement extends that relationship from the research tooling layer to the model itself.</p><p>The performance guarantee is the third pillar. Driggers noted that frontier models accessed through public APIs deliver inconsistent response times — a problem for mission-critical business applications. The private deployment eliminates that variability. Cirrascale layers management software on top of the Gemini appliance that allows administrators to prioritize users, allocate tokens by role, adjust context window sizes, and load-balance across multiple appliances and regions. &quot;Your primary data scientists or your programmers may need to have really large context windows and get priority, especially maybe nine to five,&quot; Driggers explained, &quot;but yet, the rest of the time, they want to share the Gemini experience over a wider group of people.&quot; He also noted that agentic AI workloads, which can run around the clock, benefit from the ability to consume unused capacity during off-peak hours — a scheduling flexibility that public cloud deployments don&#x27;t easily support.</p><h2><b>Seat licenses, token billing and all-you-can-eat pricing: a model built for enterprise flexibility</b></h2><p>The pricing model reflects Cirrascale&#x27;s broader philosophy of meeting customers where they are. Driggers described several consumption options: seat-based licensing (with both enterprise and standard tiers), per-token billing, and flat &quot;all-you-can-eat&quot; pricing per appliance. The minimum commitment is a single dedicated server — the appliances are not shared between customers in any configuration. &quot;We&#x27;ll meet the customer, what they&#x27;re used to,&quot; Driggers said. &quot;If they&#x27;re currently taking a seat license, we&#x27;ll create a seat license for them.&quot;</p><p>Customers can also choose to purchase the hardware outright while still consuming Gemini as a managed service, an arrangement Cirrascale has offered since its earliest days in the AI wave. Driggers said OpenAI has been a customer since 2016 or 2017, and in that engagement, OpenAI purchased its own GPUs while Cirrascale &quot;took those GPUs, incorporated them into our servers and storage and networking, and then presented it back as a cloud service to them so they didn&#x27;t have to manage anything.&quot;</p><p>That flexible ownership model is particularly relevant for universities and government-funded research institutions, where mandates often require a specific mix of capital expenditure, operating expenditure, and personnel investment. &quot;A lot of government funding requires a mixture of CapEx, OPEX and employment development,&quot; Driggers said. &quot;So we allow that as well.&quot;</p><h2><b>Inside the neocloud that built the world&#x27;s first eight-GPU server — and just landed Google&#x27;s biggest AI model</b></h2><p>Cirrascale&#x27;s announcement arrives during a period of explosive growth for the <a href="https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/the-evolution-of-neoclouds-and-their-next-moves">neocloud sector </a>— the tier of specialized AI cloud providers that sit between the hyperscalers and traditional hosting companies. The neocloud market is projected to be <a href="https://www.mordorintelligence.com/industry-reports/neocloud-market">worth $35.22 billion</a> in 2026 and is growing at a compound annual growth rate of 46.37%, according to Mordor Intelligence. Leading neocloud providers include <a href="https://www.coreweave.com/">CoreWeave</a>, <a href="https://www.crusoe.ai/cloud">Crusoe Cloud</a>, <a href="https://lambda.ai/">Lambda</a>, <a href="https://nebius.com/">Nebius</a> and <a href="https://www.vultr.com/">Vultr</a>, and these companies specialize in GPU-as-a-Service for AI and high-performance computing workloads.</p><p>But Cirrascale occupies a different niche within this booming category. While companies like CoreWeave have focused primarily on providing raw GPU compute at scale — CoreWeave boasts a $55.6 billion backlog — Cirrascale has positioned itself around private AI, managed services and longer-term engagements rather than on-demand elastic compute. Driggers described the company as &quot;not an on-demand place&quot; but rather a provider focused on &quot;longer-term workloads where we&#x27;re really competing against somebody doing it back on prem.&quot;</p><p>The company&#x27;s history supports that claim. Cirrascale traces its roots to a hardware company that &quot;designed the world&#x27;s first eight GPU server in 2012 before anybody thought you&#x27;d ever need eight GPUs in a box,&quot; as Driggers put it. It pivoted to pure cloud services roughly eight years ago and has since built a client roster that includes the <a href="https://allenai.org/">Allen Institute for AI</a>, which in August 2025 tapped Cirrascale as the managed services provider for a $152 million open AI initiative funded by the National Science Foundation and Nvidia. Earlier this month, Cirrascale announced a three-way alliance with Rafay Systems and Cisco to deliver end-to-end enterprise AI solutions combining Cirrascale&#x27;s inference platform, Rafay&#x27;s GPU orchestration, and Cisco&#x27;s networking and compute hardware.</p><h2><b>The private AI era is arriving faster than anyone expected</b></h2><p>The Gemini partnership is the highest-profile move yet — and it taps into a broader industry current. The push to move frontier AI out of the public cloud and into private infrastructure is no longer a niche demand. Industry analysts predict that by 2027, 40% of AI model training and inference will occur outside public cloud environments. That projection helps explain why Google is willing to let its crown-jewel model run on hardware it doesn&#x27;t own, in data centers it doesn&#x27;t operate, managed by a company in San Diego. The alternative — watching regulated enterprises default to open-source models or to Microsoft&#x27;s Azure OpenAI Service — is apparently a worse outcome.</p><p>The announcement also carries major implications for Google&#x27;s competitive positioning. Microsoft has built its enterprise AI strategy around the <a href="https://azure.microsoft.com/">Azure OpenAI Service</a> and its deep partnership with OpenAI, while AWS has invested in <a href="https://aws.amazon.com/bedrock/">Amazon Bedrock</a> and its own on-premises solutions through Outposts. Google Cloud Platform still trails both rivals in market share, though Q4 cloud revenue rose 48% year-over-year. Enabling Gemini to run on third-party infrastructure via partners like Cirrascale broadens its distribution surface in exactly the segments — government, finance, healthcare — where Microsoft and Amazon have historically held advantages. For Cirrascale, the partnership represents a chance to differentiate sharply in a market where most neoclouds are competing on GPU availability and price.</p><p>Driggers expects rapid uptake in the second half of 2026. &quot;It&#x27;s going to be crazy towards the end of this year,&quot; he said. &quot;Major banks will finally do stuff like this, because they can secure it. They can do it globally. Big research institutions who have labs all over the world will do these types of things.&quot; He predicted other frontier model providers will follow with similar offerings soon, and he doesn&#x27;t see Gemini as the end of the story. &quot;We really think that the enterprise have been waiting for private AI, not just Gemini, but all sorts of private AI,&quot; Driggers said.</p><p>That may be the most telling line of all. For three years, the AI revolution has been defined by a simple bargain: send your data to the cloud and get intelligence back. Cirrascale&#x27;s bet — and increasingly, Google&#x27;s — is that the biggest customers in the world are done accepting those terms. The most powerful AI on the planet is now available on a single locked box that can sit in a bank vault, a university basement, or a government facility in a country where Google has no data center. The cloud, it turns out, is finally ready to come back down to earth.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/3DxCqVFkKH7Ri00GX6fSRv/fcd7955f0a8df921b46d1d2b7ca09b3a/nuneybits_Vector_art_of_an_unplugged_enterprise_server_glowing__a74cc515-0dde-4112-8f6a-38e454be24ef.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
    </channel>
</rss>