<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Wed, 10 Jun 2026 17:47:10 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[MassMutual's AI strategy: 12-month contracts, 30% productivity gains, zero lock-in]]></title>
            <link>https://venturebeat.com/orchestration/massmutuals-ai-strategy-12-month-contracts-30-productivity-gains-zero-lock-in</link>
            <guid isPermaLink="false">6fFu5zdKcVe5mafzLpphJh</guid>
            <pubDate>Wed, 10 Jun 2026 17:31:51 GMT</pubDate>
            <description><![CDATA[<p>Enterprise AI teams face a dilemma: The best models today might not be the best models a year from now. MassMutual&#x27;s answer is to stop making long-term bets — and build infrastructure that can swap models as the market shifts.</p><p>“The world of AI today is extremely dynamic,” Sears Merritt, MassMutual CIO, <a href="https://www.youtube.com/watch?v=7IcU0QBrYng">explained in a new VB Beyond the Pilot podcast</a>. “We wanted to make sure we were positioned to ride that wave of dynamism.”</p><p>The strategy appears to be paying off in a big way. MassMutual has measured a roughly 30% increase in developer productivity, while AI-powered contact center workflows have reduced resolution times from 10 minutes to one and cut costs from dollars to cents. </p><p>But the broader lesson for IT leaders may be less about the results and more about how the company is thoughtfully building its AI infrastructure and keeping users at the center. </p><div></div><h2><b>Maintaining optionality for the possibilities of tomorrow</b></h2><p>MassMutual works with vendors at the leading edge, but keeps those relationships on a clock. “Those relationships are capped so that we maintain optionality for best-of-breed tools as things mature in this space, and at some point, settle down and stabilize,” Merritt said. </p><p>That philosophy extends to open-source models. Merritt says his team is “100%” looking at open-source tools, and sees the technology playing a big role in how MassMutual (and similar companies) use AI. </p><p>“We&#x27;re certainly going to need frontier models and leading edge capabilities to do what today is impossible, and tomorrow will be possible,” he said. </p><h2><b>Measuring outcomes from the start</b></h2><p>MassMutual&#x27;s AI efforts fall into two broad categories.</p><p>The first focuses on enablement: Putting productivity-enhancing tools such as Copilot and virtual assistants into the hands of all employees. The second involves what Merritt describes as “deepen and focus” initiatives, where teams target a specific workflow or business process that will have a strong impact on advisors, policyholders, or employees.</p><p>Rather than focusing on adoption metrics, these projects begin with predefined success criteria. “Everything we do is measured,” Merritt said. “There&#x27;s always a success metric that we define upfront to determine whether or not we&#x27;re going to scale up some of these things.”</p><p>The company is also deliberately encouraging experimentation, giving employees access to a range of best-in-class models, “token-consumptive workflows” and other possible capabilities so they can weigh the benefits relative to “simpler, lower cost” large language models (LLMs). </p><p>At the same time, MassMutual is collecting increasingly detailed analytics around usage patterns, developer workflows, model performance, and costs. The goal is to reduce spending while also building operational intelligence to eventually route workloads to the right model based on cost, response quality, and user experience.</p><p>Those insights will eventually drive optimization decisions around model routing, prompt selection, response times, and infrastructure design.</p><p>“We&#x27;re gaining access to analytics that let us, in a very granular way, look at usage patterns, developer workflows, and begin to make sense of who&#x27;s using what, when, and for what types of tasks,” Merritt said.</p><h2><b>Why MassMutual sometimes chooses the more expensive model</b></h2><p>Another interesting aspect of MassMutual&#x27;s approach is how it evaluates AI quality. Rather than focusing exclusively on benchmarks or token costs, the company uses what Merritt calls a “trust score” framework.</p><p>The process combines user feedback with operational metrics to understand how employees perceive AI-generated responses and whether those responses actually improve outcomes. </p><p>The contact center rebuild put that framework to the test. During development, employees were given access to two different LLMs. One generated responses in near-real-time but the quality was noisier. The other more expensive option took several additional seconds to respond but consistently delivered higher-quality answers.</p><p>Conventional wisdom and the speed of business might suggest users would prefer the former; but they overwhelmingly chose quality. Merritt’s team asked users about the quality of response, their preferred model, and their overall thoughts on the experience. </p><p>Most of the time, users said: “We want the more expensive one. We&#x27;re willing to wait, but the quality difference is so high that the two extra seconds actually is worth it to us.” </p><p>That feedback ultimately determined which model MassMutual deployed.</p><p>“We factored that experience piece into the decision-making, and that led us to say, on a relative basis, the costs were immaterial, so we&#x27;re going to use the more complex model,&quot; Merritt said.  </p><p>Listen to the full podcast to hear more about: </p><ul><li><p>Why Mythos “completely changed” the cybersecurity landscape — not the type of threats, but the rate at which those threats appear; </p></li><li><p>How a team of AI engineers modernized MassMutual’s mainframe in 7 days (a process that previously would have taken 3 months); </p></li><li><p>Why MassMutual specifically avoided tokenmaxxing to rein in AI use and spending and has been going “unlimited,” to shield from cost blowups. </p></li><li><p>How a “multi-harness type of environment” will support agentic AI. </p></li></ul><p><b>You can also listen and subscribe to </b><a href="https://beyondthepilot.ubpages.com/"><b>Beyond the Pilot</b></a><b> on </b><a href="https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4"><b>Spotify</b></a><b>, </b><a href="https://podcasts.apple.com/us/podcast/beyond-the-pilot-enterprise-ai-in-action/id1839285239"><b>Apple</b></a><b> or wherever you get your podcasts.</b></p>]]></description>
            <author>taryn.plumb@venturebeat.com (Taryn Plumb)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6EsRZ1hgZ3BImNLPflymgq/cc85e35c325fbcd5962ee0773f53c6d4/u7277289442_Vector_art_of_an_AI_agent_sitting_at_a_desk._The__9330606f-5ac0-4911-a5f1-680f3f3f2d11_0.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Apple’s new Siri AI is more than just a smarter assistant — it's a new enterprise app layer]]></title>
            <link>https://venturebeat.com/technology/apples-new-siri-ai-is-more-than-just-a-smarter-assistant-its-a-new-enterprise-app-layer</link>
            <guid isPermaLink="false">6hWJWZJDatea5Ms8dcB5Uf</guid>
            <pubDate>Tue, 09 Jun 2026 21:49:00 GMT</pubDate>
            <description><![CDATA[<p>Apple’s new Siri AI, <a href="https://www.apple.com/newsroom/2026/06/apple-introduces-siri-ai-a-profoundly-more-capable-and-personal-assistant/">unveiled yesterday</a> at Apple&#x27;s annual Worldwide Developers Conference (WWDC 2026), may look like a consumer product story on the surface. </p><p>But for enterprise developers and IT leaders, the bigger news from WWDC26 is that Apple is turning Siri into a systemwide AI interface for apps, data and workplace actions across iPhone, iPad, Mac, Apple Watch and Vision Pro, as revealed in the <a href="https://developer.apple.com/wwdc26/guides/apple-intelligence/">WWDC26 Apple Intelligence developer guide</a>.</p><p>In other words, if your company offers an application on Apple devices, whether it&#x27;s served on iOS mobile device or Mac, the new Siri AI may force you to change how that application is discovered, served, and its contents and workflows made available to end users. </p><p>Enterprise developers can expose app content through App Entities, make it available to Apple’s Spotlight semantic index, define actions through App Intents and App Schemas, and map onscreen user interface elements to app objects through View Annotations.</p><p>That makes Siri AI much more than a voice assistant. Apple is positioning it as an AI-powered app action and content-discovery layer built into its operating systems.</p><h2><b>Siri becomes an app action layer</b></h2><p>For enterprise developers, the shift could be significant. </p><p>A business app that properly adopts Apple’s new frameworks could let users ask Siri to find, summarize, update or act on app content without the developer having to build a separate chatbot interface. </p><p>Apple says <a href="https://developer.apple.com/documentation/appintents">App Intents</a>, its existing framework for exposing app actions to system features like Siri and Shortcuts, is the path for connecting apps to Apple Intelligence and Siri AI, while schemas make app content and actions usable through natural language.</p><p>In practical terms, that could apply to customer records in a CRM, open tickets in an IT service desk, project tasks, invoices, calendar events, documents, expenses, notes, messages or field-service records. </p><p>Instead of opening an app, searching manually and clicking through menus, an employee could ask Siri to act on the specific object they are viewing or retrieve a related item from another app.</p><h2><b>Spotlight becomes the enterprise search hook</b></h2><p>Apple says in its <a href="https://developer.apple.com/wwdc26/guides/apple-intelligence/">WWDC26 Apple Intelligence guide</a> that entity schemas contribute app content to the Spotlight semantic index, while intent schemas let users take action on that indexed content without developers defining a rigid list of command phrases. </p><p>Apple also says the new View Annotations API lets developers map views to entities so users can refer to what is onscreen conversationally — for example, “summarize this customer thread,” “add this invoice to my expenses,” or “follow up on this task tomorrow.”</p><p>That is an important distinction from earlier voice-assistant integrations, which often required narrow command structures and explicit invocation phrases. </p><p>Apple is instead giving developers a way to describe an app’s data and capabilities so Siri, Spotlight and Shortcuts can use them through the system.</p><h2><b>Developers get testing tools for Siri and app actions</b></h2><p>Apple is also adding <a href="https://developer.apple.com/videos/play/wwdc2026/295/">AppIntentsTesting</a>, a framework that validates App Intents through the same infrastructure used by Siri, Shortcuts and Spotlight without requiring UI automation. </p><p>That matters for enterprise software teams because natural-language app actions need to be testable, repeatable and reliable before they are trusted in production workflows. </p><p>It also gives developers a path to include Siri and Spotlight behavior in ordinary testing pipelines instead of treating assistant integration as a manual demo feature.</p><p>The result is a clearer developer mandate: if an app wants to show up well inside Siri AI, it will likely need to expose its data, actions and onscreen context through Apple’s system frameworks. </p><p>For enterprise SaaS vendors, that could become an important part of Apple-platform competitiveness, especially in categories such as productivity, collaboration, CRM, project management, finance, design, knowledge management, healthcare, logistics and field operations.</p><h2><b>Apple expands its model stack for developers</b></h2><p>Apple is also using WWDC26 to expand its AI developer stack beyond Siri. </p><p>The updated <a href="https://developer.apple.com/documentation/foundationmodels">Foundation Models framework</a> gives Swift developers access to <a href="https://venturebeat.com/technology/on-device-ai-agents-hit-a-hard-memory-limit-apples-new-architecture-routes-around-it">Apple’s on-device models</a>, Apple models running through Private Cloud Compute and third-party model providers that conform to Apple’s Language Model protocol. That gives developers more flexibility than a single Apple-only model path. </p><p>Apple says in its <a href="https://developer.apple.com/wwdc26/guides/apple-intelligence/">Apple Intelligence developer guide</a> that the framework now supports multimodal prompts, Vision tools, dynamic model profiles and evaluations. </p><p>In theory, an enterprise app could use an Apple on-device model for private or lightweight tasks, call Apple’s <a href="https://developer.apple.com/videos/play/wwdc2026/319/">Private Cloud Compute</a> for heavier reasoning, or plug in an outside provider such as Claude, Gemini, an open-source model or a company-controlled model through Apple’s model-provider interface.</p><h2><b>Core AI brings custom models onto Apple silicon</b></h2><p>Apple is also introducing <a href="https://developer.apple.com/wwdc26/guides/ipados/">Core AI</a>, an operating system-level framework for running developers’ own models on Apple silicon. </p><p>For enterprises that do not want sensitive data sent to a cloud model at all, local inference remains one of Apple’s most important advantages. </p><p>Core AI gives developers a first-party way to deploy custom models with Swift APIs, memory controls and optimized execution on Apple hardware.</p><h2><b>Evaluations signal a more mature enterprise AI posture</b></h2><p>The company’s new <a href="https://developer.apple.com/videos/play/wwdc2026/298/">Evaluations framework</a> also points at a more mature enterprise AI posture. AI features are difficult to test with conventional unit tests because model outputs can vary. Apple says the framework helps developers define metrics, automatically grade outputs and aggregate statistics. </p><p>For enterprise buyers, that matters because AI features need measurable reliability, not just impressive demos.</p><p>Apple is also explicitly addressing the security risks of app agents. WWDC26 developer materials include a session on how developers can <a href="https://developer.apple.com/videos/play/wwdc2026/347/">mitigate risks to agentic features</a>, covering indirect prompt injection, data exfiltration, unintended actions, threat modeling, user confirmations, authentication and safeguards for App Intents and Foundation Models. </p><p>That is a notable acknowledgement that AI assistants able to read context and take action across apps create new attack surfaces.</p><h2><b>Enterprise IT gets new Apple Intelligence controls</b></h2><p>For enterprise IT, Apple also answered some of the governance questions raised by Siri AI’s initial announcement.</p><p>Its <a href="https://support.apple.com/guide/deployment/device-management-updates-depd638aa061/1/web/1.0">WWDC26 device management documentation</a> describes new management controls for Apple Intelligence, Siri and external intelligence integrations. </p><p>Supervised devices can use Apple’s intelligence settings configuration to allow or deny features such as Genmoji, Image Playground, Writing Tools, Image Wand, app-specific intelligence in Mail, Notes and Safari, Apple Intelligence Report, Visual Intelligence Summary and on-device-only processing for dictation and translation.</p><p>Apple says additional management for Siri AI and Visual Intelligence will arrive in later beta releases. That means enterprise controls are not complete yet, but Apple is clearly building Siri AI into its managed-device architecture rather than treating it as an unmanaged consumer feature.</p><h2><b>Apple also adds controls for outside AI services</b></h2><p>Apple is also adding controls for external intelligence services. Its <a href="https://support.apple.com/guide/deployment/device-management-updates-depd638aa061/1/web/1.0">deployment docs</a> describe a configuration for managing external intelligence integrations, including whether users can access outside AI services and whether they can sign in to those services. That will matter for organizations trying to control when employees use Apple’s own models, Apple’s private cloud architecture or third-party AI systems.</p><p>Those controls could help Apple compete with Microsoft and Google in enterprise AI, but with a different pitch. Microsoft Copilot and Google Gemini are tied deeply to their respective productivity clouds. </p><p>Apple’s strategy is more device- and OS-centered: make AI available where the user already works, expose app actions through system frameworks and emphasize on-device processing and Private Cloud Compute as privacy advantages.</p><h2><b>Apple’s privacy pitch remains central</b></h2><p>Apple’s privacy architecture remains central to that pitch. Siri AI uses Apple Foundation Models on device and through Private Cloud Compute. </p><p>Apple says in its <a href="https://www.apple.com/newsroom/2026/06/apple-introduces-siri-ai-a-profoundly-more-capable-and-personal-assistant/">Siri AI announcement</a> that requests handled by Private Cloud Compute do not store personal data or make it accessible to Apple. For industries such as healthcare, financial services, legal, education and government, that claim may be more important than any single assistant feature.</p><p>But enterprises will still need more detail before treating Siri AI as a fully governed workplace assistant. Apple’s WWDC26 materials show progress on management controls, external AI restrictions and app-level governance, but the full picture is still emerging. </p><p>Key questions remain around auditability, retention, work-versus-personal data boundaries, role-based access, compliance certifications, and how much control IT departments will have over Siri’s ability to act inside specific business apps.</p><h2><b>Availability limits could complicate rollout</b></h2><p>Availability also complicates enterprise rollout. Siri AI is in developer testing now for iOS 27, iPadOS 27, macOS 27 and visionOS 27, with watchOS support coming in a later beta. Apple says the user-facing beta arrives later this year. The feature requires Apple Intelligence-capable hardware, which means many older corporate devices will not support it. Apple also says Siri AI will not initially be available on iPhone and iPad in the European Union, and that Siri AI and other new Apple Intelligence features are not available in China while the company works through regulatory requirements.</p><p>That means global enterprises may face fragmented deployment, with different feature availability by hardware, operating system, language and region.</p><h2><b>App Store changes give business software vendors another opening</b></h2><p>Apple also introduced enterprise-adjacent <a href="https://www.apple.com/newsroom/2026/06/apple-expands-app-store-capabilities-to-help-developers-grow-and-reach-new-users/">App Store changes</a> that could matter for business software vendors. StoreKit 2 will support subscriptions for groups and organizations, including volume purchasing through Apple Business and Apple School Manager. </p><p>IT teams will be able to buy and assign App Store subscriptions through device management workflows, while developers will be able to manage subscription availability for organizations. That gives Apple a more business-friendly path for selling app subscriptions into managed environments.</p><p>The company is also unifying Apple Business Manager, Apple Business Essentials and Apple Business Connect under <a href="https://support.apple.com/guide/deployment/apple-services-updates-dep5a7629d2f/web">Apple Business</a>, which Apple describes as a broader platform for Managed Apple Accounts, device management, volume licensing, Admin APIs, Apple Maps locations, Tap to Pay on iPhone, Branded Mail and multi-seat subscriptions.</p><h2><b>Apple’s enterprise AI strategy comes into focus</b></h2><p>Taken together, the WWDC26 enterprise story is bigger than Siri alone. Apple is building an AI stack that spans user-facing assistant features, developer integration frameworks, local and private-cloud model infrastructure, AI testing, App Store business subscriptions and device-management controls.</p><p>The strategic question is whether Apple can make this more than another Siri reset. Developers will need to adopt Apple’s app-intelligence frameworks. Enterprises will need stronger governance assurances. Users will need the assistant to work reliably across real workflows, not just Apple’s own apps.</p><p>But the direction is now much clearer. Apple is not trying to compete in enterprise AI by launching a standalone chatbot. It is embedding AI into the operating system, making apps addressable through Siri and Spotlight, giving developers model and testing tools, and giving IT teams at least the beginnings of policy controls.</p><p>For enterprise developers, that means App Intents, App Schemas, App Entities, Spotlight indexing and View Annotations may become core parts of building competitive Apple-platform apps. For enterprise technology leaders, it means Apple’s devices could soon include a native AI assistant that can act across business workflows — if Apple can prove that the privacy, security and management model is strong enough for production use.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6uthaZc1i2Q9MgfkivyFpf/7625484349f502786a87695e198930dc/ChatGPT_Image_Jun_9__2026__05_32_55_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Cohere open-sources a coding agent that runs on a single H100]]></title>
            <link>https://venturebeat.com/technology/cohere-open-sources-a-coding-agent-that-runs-on-a-single-h100</link>
            <guid isPermaLink="false">54dMMkEGVnAKLMkXjIhn3J</guid>
            <pubDate>Tue, 09 Jun 2026 21:41:04 GMT</pubDate>
            <description><![CDATA[<p>Engineering teams building agentic coding pipelines now have a concrete open-source alternative to managed models like <a href="https://venturebeat.com/technology/anthropic-brings-mythos-to-the-masses-with-claude-fable-5-its-most-powerful-generally-available-model-ever">Claude Fable 5</a> — one that runs on a single H100. The tradeoff: Cohere&#x27;s North Mini Code, which launched Tuesday, generated three times the output tokens of comparable models in independent testing, a verbosity cost that compounds in high-volume production workloads.</p><p>The new open-source model is a 30 billion parameter mixture-of-experts (MoE) model with 3 billion parameters active per token, built for agentic software engineering including sub-agent orchestration, architecture mapping, code review and terminal work. The model supports a 256,000 token context window with a 64,000 token maximum generation length, and is available on<a href="https://huggingface.co/CohereLabs/North-Mini-Code-1.0"> Hugging Face</a> under an Apache 2.0 license.</p><h2>What North Mini Code can do</h2><p>North Mini Code targets the full agentic coding stack. Here is what the model does and what it runs on.</p><p><b>Software engineering.</b> Cohere built North Mini Code specifically for agentic software engineering, not adapted from a general-purpose base. It has integrated tool-use capabilities and supports interleaved thinking, which Cohere says improves performance across multi-step agentic work.</p><p><b>Architecture mapping and code review.</b> North Mini Code can analyze and map systems architecture, surface dependencies and perform code review across large codebases. With a 256,000 token context window, it can hold substantial multi-file projects in a single context pass.</p><p><b>Terminal-based agentic task</b>s. The model is trained for terminal environments, handling shell interactions, package scripts and command-line tooling. Cohere benchmarked it on Terminal-Bench v2, which tests agents in real terminal environments rather than synthetic code generation tasks.</p><h2>How it was built</h2><p>North Mini Code is a sparse mixture-of-experts model with 128 experts, of which 8 activate per token. The compute requirement at inference time is closer to a 3 billion parameter model despite 30 billion total parameters. Nick Frosst, co-founder of Cohere, <a href="https://x.com/cohere/status/2064378058329526556">demoed it running on a Mac Studio</a> via MLX at around 20 gigabytes of RAM, the same machine he uses for his own local coding work.</p><p>Cohere trained the model through two stages of supervised fine-tuning followed by reinforcement learning with verifiable rewards across more than 70,000 verifiable tasks spanning approximately 5,000 repositories, deduplicated against SWE-Bench. </p><p>Rather than optimizing against a single agent scaffold, Cohere trained across three. SWE-Agent uses a rich CLI with specialized commands. Mini-SWE-Agent uses a single bash tool with raw shell output. OpenCode uses individually typed tools returning structured JSON. Cohere reports a 10 percentage point gain on OpenCode evaluation from the multi-harness approach while maintaining SWE-Agent performance.</p><h2>Where it fits</h2><p>North Mini Code enters a market that now includes Mistral Devstral Small 2, GitHub Copilot, Cursor, and Claude Fable 5 — each with distinct cost and deployment tradeoffs.</p><p>Cohere&#x27;s primary benchmark comparison is against<a href="https://venturebeat.com/ai/mistral-launches-powerful-devstral-2-coding-model-including-open-source"> Mistral Devstral Small 2</a>, a 24 billion parameter dense model. In vendor-reported internal tests, Cohere claims 2.8x higher output throughput and a 30% inter-token latency advantage over Devstral Small 2 in internal tests under identical hardware configurations. Cohere also claims, in its<a href="https://huggingface.co/blog/CohereLabs/introducing-north-mini-code"> Hugging Face technical post</a>, that North Mini Code outperforms open-source models up to four times its parameter count on its reported benchmarks, including models at 120 billion parameters. </p><p><a href="https://artificialanalysis.ai/models/north-mini-code">Artificial Analysis</a> independently ranks it eighth of 127 comparable open-weight models on output speed at 210 tokens per second, with a time to first token of 0.25 second against a class median of 1.95 seconds. It places 18th of 127 on the Artificial Analysis Intelligence Index. One flag from the same data: the model generated 75 million output tokens to complete the Intelligence Index against a class median of 25 million. In high-volume agentic pipelines, that verbosity compounds into inference cost and latency.</p><p>&quot;Suddenly people are thinking like hey, am I getting enough economic value out of the tokens from a model?&quot; Frosst said during the launch video. &quot;Local deployment is one way of empowering people and making AI really something that works for them.&quot;</p><p>GitHub Copilot, Cursor and Claude Code operate on per-usage or subscription pricing with no on-premises option. Anthropic&#x27;s Claude Fable 5, now the most capable publicly available managed coding model, runs at $50 per million output tokens. For Frosst, the model is the polar opposite of Fable.</p><p>&quot;Its small, cost effective, apache 2.0, and locally deployable. This is the way LLMs should go. small, open source, transparent and sovereign, vs large, expensive, proprietary and hegemonic,&quot; Frosst wrote in a<a href="https://x.com/nickfrosst/status/2064396337404096809?s=20"> post on X</a>.</p><div></div><h2>What this means for enterprises</h2><p>For teams building production agentic coding pipelines, North Mini Code&#x27;s release clarifies a set of decisions that have been forming for months.</p><p><b>Purpose-built agentic training is now a baseline to evaluate against.</b> The distinction between models fine-tuned for code and models trained specifically for agentic workflows, with verified tool calls and multi-harness robustness, is now a material factor in pipeline decisions. Any model vendor claiming agentic coding capability should be able to answer whether its training used verifiable agentic tasks or was adapted from a general-purpose base.</p><p><b>Verbosity is a hidden pipeline cost that benchmarks do not surface.</b> Artificial Analysis measured North Mini Code generating three times the output tokens of comparable models. That verbosity compounds across inference cost and latency in high-volume pipelines. Throughput testing against actual workload volume is the evaluation step the benchmark rankings skip.</p><p><b>The frontier pricing split is now a real architectural decision.</b> Fable 5 at $50 per million output tokens and North Mini Code on a single H100 represent a genuine tradeoff between cost control and data residency on one side, and managed infrastructure overhead on the other. Teams running high-volume agentic coding pipelines should model both cost paths against their actual workload before committing to either.</p>]]></description>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/2hmLU69mEDG09CllK8wDeJ/43818a9c508618ce7dd7fc6ff006a0e5/cohere-north-smk1.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.]]></title>
            <link>https://venturebeat.com/technology/on-device-ai-agents-hit-a-hard-memory-limit-apples-new-architecture-routes-around-it</link>
            <guid isPermaLink="false">7rnF3fZ3tDuU9urU81FaE6</guid>
            <pubDate>Tue, 09 Jun 2026 17:49:06 GMT</pubDate>
            <description><![CDATA[<p>On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models and limited on-device ones. Apple&#x27;s third-generation foundation models, announced at WWDC26, <!-- -->break that constraint by moving the weight set off DRAM entirely<!-- -->.</p><p>The AFM 3 family was developed in collaboration with Google and spans five models: two on-device and three server-based, all running within Apple&#x27;s Private Cloud Compute boundary. The server-side models, including AFM 3 Cloud Pro for agentic tool use and complex reasoning, run on Nvidia GPUs in Google Cloud. The on-device architecture is Apple&#x27;s own. AFM 3 Core Advanced is a 20-billion-parameter model that stores weights in NAND flash rather than DRAM.</p><p>&quot;Instead of forcing the entire model into DRAM, the full model is stored in flash memory,&quot; <a href="https://machinelearning.apple.com/research/introducing-third-generation-of-apple-foundation-models">Apple&#x27;s research team wrote</a>. &quot;Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt.&quot;</p><h2>How the architecture actually works</h2><p>The memory wall Apple is working around is one every local AI developer runs into.

&quot;You can&#x27;t put 20B parameters in RAM at any reasonable precision,&quot;<a href="https://www.linkedin.com/in/awni-hannun-36382a17/"> Awni Hannun</a>, a researcher at Anthropic and former Apple research scientist,<a href="https://x.com/awnihannun/status/2064202168618422396"> posted on X</a>. &quot;To make it work they are using pretty exotic architecture by today&#x27;s standards. A small model predicts from the query (or prompt) which experts to load from NAND into RAM.&quot;</p><div></div><p>That prediction-and-load mechanism has three distinct components, each driven by the hardware constraints of consumer silicon.</p><p><b>The full 20B weight set lives in flash, not DRAM.</b> AFM 3 Core Advanced stores its entire parameter set in NAND flash rather than active memory. Standard on-device deployments require the full model to fit in DRAM, which is what caps their parameter counts. Apple&#x27;s approach, which it calls Instruction-Following Pruning (IFP) and developed with its own researchers, treats flash as the model&#x27;s permanent home and DRAM as a working buffer for whichever experts a given prompt requires.</p><p><b>Expert routing happens once per prompt, not per token.</b> In a conventional Mixture of Experts model, a router selects different experts for every token generated — which would require continuous weight movement between flash and DRAM at inference speed. NAND-to-DRAM bandwidth cannot support that. AFM 3 Core Advanced routes once at prompt time, selects a fixed expert set, loads it into DRAM alongside always-active shared experts, and generates all tokens from that same configuration.

 &quot;The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts,&quot; Hannun wrote.</p><p><b>Active parameter count scales from 1B to 4B depending on task complexity.</b> Rather than running a fixed model size for every request, AFM 3 Core Advanced adjusts how many parameters it activates based on what the task requires — 1 billion for simpler operations, up to 4 billion for harder ones, all drawn from the 20-billion-parameter pool in flash. </p><h2>What Apple has and hasn&#x27;t disclosed</h2><p>The architecture paper is detailed on the memory design and sparse activation mechanism. It is less forthcoming on practical deployment constraints.</p><p>Apple&#x27;s profiling tools expose timing but not the metrics that decide production viability. &quot;Energy, memory bandwidth, thermal? Not in the docs,&quot; Marco Abis, who is building Ziraph, a profiler for local AI on Apple silicon,<a href="https://x.com/capotribu/status/2064267804476383427"> posted on X</a>. &quot;A notable gap, given those decide most of on-device performance.&quot; </p><div></div><p>Abis also did not find a statement in Apple&#x27;s documentation — across the Core AI docs, the Foundation Models docs or the Private Cloud Compute security post — of when an on-device request transparently offloads, or whether that routing is visible to the developer or the user. For enterprises that need to document where inference runs, that is a direct compliance problem.</p><p>Not all the information is currently available. Apple has indicated a full technical report with benchmarks is coming later this summer.</p><h2>What this means for enterprise architects</h2><p>Regulated industries evaluating agentic AI deployments now have a concrete architectural decision to make.</p><ul><li><p><b>The DRAM wall for on-device agents just moved. </b>Enterprises evaluating agents that need to run without a cloud round-trip now have a 20-billion-parameter local option to evaluate. The constraint shifts from model capability to device hardware.</p></li><li><p><b>The private/cloud boundary is now an architectural decision, not a default. </b>Simpler requests stay on-device; complex agentic tasks route to AFM 3 Cloud Pro on Private Cloud Compute. Apple has not publicly specified when a request offloads or whether that routing is visible to the developer — a gap that complicates policy decisions for organizations that need to document where inference runs.</p></li><li><p><b>The agentic server tier depends on Google Cloud. </b>AFM 3 Cloud Pro runs on Nvidia GPUs in Google Cloud. The Private Cloud Compute guarantee covers data privacy. It does not eliminate the Google Cloud dependency for server-side inference.</p></li></ul><p>AFM 3 Core Advanced gives enterprises a 20-billion-parameter on-device option that did not exist before WWDC26. Whether it is deployable at scale depends on answers Apple has not yet published. Those details are due in the summer technical report.</p>]]></description>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/3hpTQydwOHQ99Vln5T3uLo/14ac8c1a0953035ee0b9fe021f39e232/AI-in-phone-smk1.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Anthropic brings Mythos to the masses with Claude Fable 5, its most powerful generally available model ever]]></title>
            <link>https://venturebeat.com/technology/anthropic-brings-mythos-to-the-masses-with-claude-fable-5-its-most-powerful-generally-available-model-ever</link>
            <guid isPermaLink="false">33CQ4xodRQ1MrnZU9IYMRL</guid>
            <pubDate>Tue, 09 Jun 2026 17:19:29 GMT</pubDate>
            <description><![CDATA[<p>Anthropic today <a href="https://www.anthropic.com/news/claude-fable-5-mythos-5">launched two new AI models </a>— Claude Fable 5 and Claude Mythos 5 — marking the company’s first broad release of the powerful “Mythos-class” AI capabilities it previously made available only to participating organizations in its restricted cybersecurity program, <a href="https://venturebeat.com/technology/anthropic-says-its-most-powerful-ai-cyber-model-is-too-dangerous-to-release">Project Glasswing</a>, which it announced two months ago.</p><p>The company says Fable 5, which is the version most users and developers will get starting today, exceeds every Claude model it has previously made generally available — featuring stronger performance across software engineering, knowledge work, vision, scientific research and long-running tasks. </p><p>It smashes the existing benchmarks and comes atop on nearly all of them, though the prior Claude Mythos Preview version of the model still takes the top spots on computer use and multidisciplinary reasoning (see benchmark chart below and <a href="https://www.anthropic.com/news/claude-fable-5-mythos-5">here</a>). </p><p>The new Claude Mythos 5, by contrast, is less restricted in its capabilities, but more restricted in its availability. It is an upgraded version of the prior, similarly capable but limited release Mythos Preview model. As such, it has certain safeguards lifted — but it’s only officially  accessible to Anthropic-approved users, including Anthropic&#x27;s cybersecurity partners in its Project Glasswing effort, and select biology researchers. </p><p>The key difference is that the general purpose Fable 5 wraps the same underlying Mythos-class capability in new safeguards. Anthropic says requests involving certain high-risk areas — including cybersecurity, biology and chemistry, and model distillation — are automatically routed to <a href="https://venturebeat.com/technology/anthropics-claude-opus-4-8-is-here-with-3x-cheaper-fast-mode-and-near-mythos-level-alignment">Claude Opus 4.8,</a> Anthropic&#x27;s previously flagship general model, instead, with users notified when that happens. That is not the case on Mythos 5.</p><p>The company says more than 95% of Fable 5 sessions run entirely on Fable 5’s own responses, with no fallback, and that internal and external red-teaming efforts found no “universal jailbreaks” after more than 1,000 hours of testing.</p><p>Anthropic says Fable 5 is available to the general public today through its website, apps, and <a href="https://platform.claude.com/docs/en/about-claude/models/overview">API</a>, but that Mythos 5 will initially only be made available to users who already have access to the older Claude Mythos Preview.</p><h2><b>Pricing, access and a tricky rollout</b></h2><p>Anthropic is pricing both Fable 5 and Mythos 5 at $10 per million input tokens and $50 per million output tokens. The company says that is less than half the price of Claude Mythos Preview, but still ranks as the most expensive of major AI models available globally. </p><h1><b>VentureBeat Frontier AI Model API Pricing Snapshot</b></h1><table><tbody><tr><td><p><b>Model</b></p></td><td><p><b>Input</b></p></td><td><p><b>Output</b></p></td><td><p><b>Total Cost</b></p></td><td><p><b>Source</b></p></td></tr><tr><td><p>MiMo-V2.5 Flash</p></td><td><p>$0.10</p></td><td><p>$0.30</p></td><td><p>$0.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>deepseek-v4-flash</p></td><td><p>$0.14</p></td><td><p>$0.28</p></td><td><p>$0.42</p></td><td><p><a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek</a></p></td></tr><tr><td><p>deepseek-v4-pro</p></td><td><p>$0.435</p></td><td><p>$0.87</p></td><td><p>$1.305</p></td><td><p><a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek</a></p></td></tr><tr><td><p>MiniMax-M3</p></td><td><p>$0.30</p></td><td><p>$1.20</p></td><td><p>$1.50</p></td><td><p><a href="https://platform.minimax.io/subscribe/token-plan?tab=api-enterprise">MiniMax</a></p></td></tr><tr><td><p>Gemini 3.1 Flash-Lite</p></td><td><p>$0.25</p></td><td><p>$1.50</p></td><td><p>$1.75</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Qwen3.7-Plus</p></td><td><p>$0.40</p></td><td><p>$1.60</p></td><td><p>$2.00</p></td><td><p><a href="https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&amp;url=2840914_2&amp;modelId=qwen3.7-plus&amp;serviceSite=international">Alibaba Cloud</a></p></td></tr><tr><td><p>MiMo-V2.5</p></td><td><p>$0.40</p></td><td><p>$2.00</p></td><td><p>$2.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>Grok 4.3 (low context)</p></td><td><p>$1.25</p></td><td><p>$2.50</p></td><td><p>$3.75</p></td><td><p><a href="https://docs.x.ai/developers/models/grok-4.3">xAI</a></p></td></tr><tr><td><p>GLM-5</p></td><td><p>$1.00</p></td><td><p>$3.20</p></td><td><p>$4.20</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p>Kimi-K2.6</p></td><td><p>$0.95</p></td><td><p>$4.00</p></td><td><p>$4.95</p></td><td><p><a href="https://platform.kimi.ai/docs/pricing/chat-k26">Moonshot/Kimi</a></p></td></tr><tr><td><p>GLM-5.1</p></td><td><p>$1.40</p></td><td><p>$4.40</p></td><td><p>$5.80</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p>Grok 4.3 (high context)</p></td><td><p>$2.50</p></td><td><p>$5.00</p></td><td><p>$7.50</p></td><td><p><a href="https://docs.x.ai/developers/models/grok-4.3">xAI</a></p></td></tr><tr><td><p>Qwen3.7-Max</p></td><td><p>$2.50</p></td><td><p>$7.50</p></td><td><p>$10.00</p></td><td><p><a href="https://modelstudio.console.alibabacloud.com/ap-southeast-1?spm=a2ty_o05.31384571.0.0.52649f6b7G0D55&amp;tab=doc#/doc/?type=model&amp;url=2840914_2&amp;modelId=qwen3.7-max&amp;serviceSite=international">Alibaba Cloud</a></p></td></tr><tr><td><p>Gemini 3.5 Flash</p></td><td><p>$1.50</p></td><td><p>$9.00</p></td><td><p>$10.50</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Gemini 3.1 Pro Preview (≤200K)</p></td><td><p>$2.00</p></td><td><p>$12.00</p></td><td><p>$14.00</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>GPT-5.4</p></td><td><p>$2.50</p></td><td><p>$15.00</p></td><td><p>$17.50</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr><tr><td><p>Gemini 3.1 Pro Preview (&gt;200K)</p></td><td><p>$4.00</p></td><td><p>$18.00</p></td><td><p>$22.00</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Claude Opus 4.8</p></td><td><p>$5.00</p></td><td><p>$25.00</p></td><td><p>$30.00</p></td><td><p><a href="https://platform.claude.com/docs/en/about-claude/pricing">Anthropic</a></p></td></tr><tr><td><p>GPT-5.5</p></td><td><p>$5.00</p></td><td><p>$30.00</p></td><td><p>$35.00</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr><tr><td><p><b>Claude Fable 5 / Claude Mythos 5</b></p></td><td><p><b>$10.00</b></p></td><td><p><b>$50.00</b></p></td><td><p><b>$60.00</b></p></td><td><p><b></b><a href="https://platform.claude.com/docs/en/about-claude/models/overview"><b>Anthropic</b></a></p></td></tr></tbody></table><p>For developers, Fable 5 is available through the Claude API as <code>claude-fable-5</code>. Anthropic says Fable 5 is fully available today on the Claude API and on consumption-based Enterprise plans.</p><p>For subscription users, the rollout is more complicated. Anthropic says Fable 5 will be included on Pro, Max, Team and seat-based Enterprise plans at no extra cost from today through June 22. </p><p>On June 23, the company plans to remove Fable 5 from those plans, after which using it will require usage credits. Anthropic says it aims to restore Fable 5 as a standard part of subscription plans as quickly as possible.</p><h2><b>The difference between Fable 5 and Mythos 5</b></h2><p>Anthropic is not presenting Fable 5 and Mythos 5 as two separate models in the usual “small versus large” sense. Instead, they appear to share the same base capability level. The difference is access <i>control — </i>that is, how easily it will be for users to get their hands on the models, and the guardrails embedded in each.</p><p>As previously mentioned Fable 5 includes a new safeguard layer that detects certain high-risk requests — including cybersecurity, biology and chemistry, and attempts to distill the model’s capabilities into other systems — and routes those requests to Claude Opus 4.8. </p><p>Mythos 5 lifts some of those restrictions for trusted users working in approved domains.</p><p>In practical terms, Mythos 5 is more powerful for sensitive cyber and biology work because it can answer in areas where Fable 5 falls back. </p><p>For most ordinary enterprise and developer tasks, however, Anthropic says Fable 5 performs effectively the same as Mythos 5.</p><p>The launch also signals how Anthropic plans to bring frontier models with dangerous dual-use capabilities into the market: not by releasing all capabilities to everyone, and not by simply refusing risky questions, but by routing some requests to a less capable model while keeping the stronger model available for the majority of everyday work.</p><h2><b>A major improvement in autonomous coding</b></h2><p>For enterprise buyers, the most immediate use case is likely software engineering. Anthropic says Fable 5 can work unattended for longer and with more independence than previous Claude models, which is exactly the capability enterprises need if they want AI agents to do more than autocomplete code or answer developer questions.</p><p>On <b>SWE-bench Pro, which measures a model&#x27;s ability to complete difficult software engineering tasks, Anthropic says Fable 5 and Mythos 5 reach 80.3%</b>, vastly outperforming OpenAI&#x27;s latest and greatest general model GPT-5.5, which scored 58.6%. </p><p>On Cognition’s FrontierCode Diamond benchmark, which tests high-quality, maintainable agentic coding, the models score 29.3%, compared with 13.4% for Claude Opus 4.8 and 5.7% for GPT-5.5, according to the benchmark table included in Anthropic’s materials. </p><p>Anthropic also says Fable 5 scores highest among frontier models on FrontierCode even at medium reasoning effort, suggesting the model may deliver stronger coding results without always needing maximum compute.</p><p>The most striking customer example comes from Stripe. Anthropic says Stripe tested Fable 5 in a 50-million-line Ruby codebase and found that the model completed a codebase-wide migration in one day that otherwise would have taken a team more than two months by hand. Stripe said, “Fable 5 compresses months of engineering into days. In our 50-million-line Ruby codebase, it did in a day what would&#x27;ve taken us more than two months by hand.”</p><p>Other early users describe the model as especially useful for long-horizon development tasks. Cursor said, “Fable 5 is the state of the art model on CursorBench. It&#x27;s opened up a class of long-horizon problems that were out of reach for earlier models.” Replit said Fable 5 is the highest-performing model it has tested on ViBench, its end-to-end “vibe-coding” benchmark, and that it builds apps in less time with fewer tokens. Figma said Fable 5 is “a clear step forward on agentic coding and prototyping.”</p><p>This is the enterprise shift Anthropic is trying to sell: AI coding systems that can take on larger units of work, not just individual tickets. That could include codebase migrations, app prototyping, pull request review, test generation, debugging across unfamiliar tools, user interface design and multi-step internal software projects.</p><p>Base44 said, “Fable 5 is much deeper and better at one-shotting full apps, and its tool calling is excellent.” Genspark said, “Fable 5 came out #1 on our evals, winning head-to-head against every model we tested. It was significantly stronger on the hardest tasks in the set — UI design and game coding.” Rakuten said, “At the highest effort, Fable 5 reflects on and validates its own work. For us, that&#x27;s what makes highly autonomous operations possible — the extra thinking pays for itself.”</p><p>For CTOs and engineering leaders, that suggests the model’s value may come less from raw code generation and more from sustained execution: understanding an intent, planning steps, calling tools, checking its own work and continuing through a task without constant human steering.</p><h2><b>Knowledge work, finance, legal and operations</b></h2><p>Anthropic is also positioning Fable 5 as a stronger model for enterprise knowledge work. On GDPval-AA, Anthropic reports a score of 1932 for Fable 5 and Mythos 5, compared with 1890 for Claude Opus 4.8, 1769 for GPT-5.5 and 1314 for Gemini 3.1 Pro. </p><p>On GDPpdf, a benchmark focused on visual document reasoning, Fable 5 and Mythos 5 score 29.8% without tools, compared with 22.5% for Opus 4.8, 24.9% for GPT-5.5 and 16.7% for Gemini 3.1 Pro.</p><p>That matters for enterprises because much of corporate work still lives in messy documents: PDFs, spreadsheets, charts, reports, contracts, filings, slide decks and screenshots. Anthropic says Fable 5 shows gains in document-based reasoning, chart and table interpretation and complex problem solving.</p><p>Hex said, “Fable 5 is the first to break 90% on our core analytics benchmark of complex, long-running analytical tasks — a 10-point jump over Opus. On the hardest questions, it shows strong judgment and attention to nuance.” Hebbia said Fable 5 was the highest-scoring model on its Finance Benchmark for senior-level reasoning, with double-digit gains in document reasoning, chart and table interpretation, and problem solving.</p><p>The finance examples are notable because they point to AI agents moving beyond summarization into higher-stakes analytical workflows. </p><p>IMC said Fable 5 “aced our trading-analysis evaluations nearly across the board: factual lookup, conceptual reasoning, root-cause analysis, expected-value analysis.” Optiver said the model was stronger than Opus 4.8 on its trading benchmark and “remarkably consistent,” scoring identically across repeated runs. Balyasny Asset Management said Fable 5 was the strongest finance-first model it had tested.</p><p>Legal and operations teams may also see immediate impact. Crosby Legal said, “Fable 5 feels materially different. In blind review, our lawyers found its redlines matched or beat our current model every time.” Notion said the model can take work “you&#x27;d chip away at all afternoon” and turn messy notes into a functioning project plan. Zapier said Fable 5 is the new leader on AutomationBench and is more autonomous than Opus 4.8: “Where Opus stops to ask, Fable 5 keeps looking.”</p><p>For enterprise software vendors, that points toward more capable embedded agents in workflow products: agents that can review a contract, update a project plan, assemble a spreadsheet, inspect a chart, file a ticket, run a query, call an internal API and keep going until the work is complete.</p><h2><b>Vision and interface understanding</b></h2><p>Anthropic says Fable 5 is also its strongest vision model. In its launch materials, the company says the model can extract precise numbers from detailed scientific figures and complete vision-based tasks such as rebuilding a web app’s source code from screenshots alone.</p><p>That has immediate implications for enterprise automation. Many business processes still depend on visual interfaces that are not cleanly exposed through APIs: dashboards, PDFs, forms, legacy apps, screenshots, scans and image-heavy reports. A stronger vision model could help agents operate across those environments with less custom integration work.</p><p>Anthropic also says Fable 5 needs less scaffolding than previous Claude models. As an example, the company says earlier Claude models struggled to play Pokémon FireRed even with extra tools, while <a href="https://youtu.be/CIQBP1w4B1M?si=QCoJ9amBEVMqoTUl">Fable 5 impressively beat the game using a minimal vision-only harness. </a>Anthropic posted a fast forwarded video of its playthrough to YouTube and in its blog post:</p><div></div><p>The point is not gaming itself, but the broader agentic skill: reading a visual environment, remembering progress, deciding what to do next and executing over a long horizon.</p><p>In another internal test, Anthropic says it had the model play the deck-building game Slay the Spire with access to persistent file-based memory. The company says persistent memory improved Fable 5’s performance three times more than it improved Opus 4.8’s, and that Fable reached the game’s final act three times more often. For enterprise users, this suggests Fable 5 may make better use of notes, logs and stored context during multi-step work.</p><p>That could matter for internal agents that operate over days or weeks: sales operations agents that track account research, engineering agents that manage migrations, finance agents that update models, or support agents that remember what they tried across many turns.</p><h2><b>From restricted cyber model to general-purpose enterprise AI</b></h2><p>The announcement follows Anthropic’s April 2025 rollout of Claude Mythos Preview through <a href="https://venturebeat.com/technology/anthropic-says-its-most-powerful-ai-cyber-model-is-too-dangerous-to-release">Project Glasswing</a>, a restricted program for cyber defenders, critical infrastructure providers and major software maintainers. Anthropic created Glasswing after internal evaluations showed Mythos-class models could find and exploit software vulnerabilities at a level that raised meaningful misuse concerns.</p><p>Following the debut of Glasswing and Mythos, <a href="https://www.nextgov.com/cybersecurity/2026/04/anthropics-glasswing-initiative-raises-questions-us-cyber-operations/412721/">U.S. officials and intelligence agencies began weighing</a> how such models could reshape both cyber defense and offensive operations, while Sen. Mark Warner warned that AI-assisted vulnerability discovery should force industry to “accelerate and reprioritize patching.” Financial regulators also took notice: <a href="https://www.theguardian.com/technology/2026/apr/22/what-is-anthropic-mythos-ai-threat-global-cybersecurity">The Guardian reported</a> that Mythos entered discussions among senior banking officials and regulators in the U.S. and U.K. because of fears that AI-accelerated cyberattacks could threaten payment systems and broader financial stability.</p><p>The reaction has not been limited to alarm. Governments also want access: <a href="https://www.reuters.com/legal/litigation/south-korea-secures-access-anthropics-mythos-ai-model-science-ministry-says-2026-06-03/">Reuters reported</a> that South Korea’s national internet security agency had secured Mythos access through Project Glasswing, reflecting a broader geopolitical race to use frontier AI for national cyber defense. At the same time, Anthropic has faced scrutiny over whether it can safely gate the very capabilities it says are too risky for general release. <a href="https://www.theverge.com/ai-artificial-intelligence/917644/anthropic-claude-mythos-breach-humiliation">The Verge reported</a> that unauthorized users accessed Mythos after its limited rollout, calling the incident damaging for a company that has built its brand around responsible AI. </p><p>Critics have also questioned whether Anthropic’s warning-heavy framing risks becoming a form of market positioning, since it casts the company as both the source of the new capability and the gatekeeper deciding which governments, companies and researchers get to use it.</p><p>With Fable 5, Anthropic is leaning into its gatekeeper role, attempting to separate the general enterprise value of a Mythos-class model from the riskiest parts of its capability profile. The company says Fable 5 can handle software engineering, research, visual reasoning, document analysis and long-running agentic workflows, while classifiers block or reroute requests that could provide what Anthropic calls “uplift” to malicious actors.</p><p>Those classifiers cover three main areas. </p><ol><li><p>Cybersecurity, where Anthropic says Mythos-class models can discover and exploit vulnerabilities and perform broader “agentic hacking” tasks such as reconnaissance, discovery and lateral movement. </p></li><li><p>Biology and chemistry, where the company says the same reasoning that can help researchers design therapies could also help well-resourced malicious actors pursue dangerous biological work. </p></li><li><p>Model distillation, where Anthropic says users may try to extract Claude’s capabilities to train competing models, including models that could be released without similar safeguards.</p></li></ol><p>When Fable 5’s classifiers detect one of those categories, the response is automatically handled by Claude Opus 4.8. Anthropic says users will be told when this happens. That is a notable product decision: rather than declining those requests outright, Anthropic is trying to keep the user experience functional while reducing access to the most capable version of the model in sensitive areas.</p><p>Anthropic says it red-teamed the new classifier system internally and externally. The company says an external bug bounty produced no universal jailbreaks after more than 1,000 hours of testing, and external red-teaming organizations also failed to find a universal jailbreak. One external partner found that Fable 5 complied with zero harmful single-turn cyber requests related to planning cyberattacks, exploit development or defense evasion, even when prompts used any of 30 public jailbreak techniques, according to Anthropic.</p><p>The company is still acknowledging tradeoffs. Anthropic says the safeguards are deliberately cautious and may sometimes trigger on benign requests. That could frustrate security professionals, biology researchers and advanced enterprise users whose legitimate work overlaps with the blocked categories. The company says it plans to reduce false positives over time.</p><h2><b>Mythos 5 and the restricted frontier</b></h2><p>While Fable 5 is the broad commercial launch, Mythos 5 is the model to watch for enterprises operating in security, critical infrastructure and life sciences.</p><p>The company says all users with Claude Mythos Preview access can upgrade to Mythos 5 beginning today. It plans to expand access through a trusted access program, in collaboration with the U.S. government.</p><p>The distinction is important for sectors where the blocked capabilities are not edge cases but core workflows. A security team may need to reproduce vulnerabilities, test exploitability, analyze lateral movement or simulate attacker behavior in a controlled environment. A biology research team may need to reason through molecular design workflows that would trigger general-use safeguards. Fable 5 is not designed to give every user unrestricted access to those capabilities; Mythos 5 is designed for vetted users who need them.</p><p>Anthropic says Mythos 5 has the strongest cybersecurity capabilities of any model in the world. In the company’s benchmark table, the model family scores 78.0% on ExploitBench, compared with 69.0% for Claude Mythos Preview, 40.0% for Opus 4.8 and 34.0% for GPT-5.5. On CyberGym, Anthropic’s chart shows Mythos 5 at 83.8%, slightly ahead of Mythos Preview at 83.1% and far above Opus 4.8 with default safeguards.</p><p>The company is making a similar argument in biology. Anthropic says Mythos-class models outperform dedicated protein language models on a task involving adeno-associated viruses, a delivery mechanism used in gene therapies. The company frames that as both promising and risky: the same capability that could help gene therapy research could also be misused in dangerous biological work.</p><p>Anthropic says its internal protein design experts used Mythos 5 to accelerate parts of the drug design process by about tenfold. In one example, the company says Mythos 5, using protein design and bioinformatics tools without human assistance, matched or beat skilled human operators by choosing binding sites, selecting and running tools, and recovering from failures. Anthropic says nine of 14 protein targets in the study produced strong candidates for drug design that it is now investigating.</p><p>The company also says Mythos 5 produced novel molecular biology hypotheses that Anthropic scientists preferred over Opus-class model hypotheses about 80% of the time in blinded comparisons. Anthropic says several of those ideas have advanced to experimental evaluation, and one hypothesis involving an E. coli protein was later corroborated by an independent lab working on the same problem.</p><p>Those claims are potentially significant, but they should be treated carefully until more details are published. Anthropic says it intends to publish additional results in the coming months. For now, the strongest enterprise implication is directional: the company believes its highest-end models can already perform parts of scientific research workflows with less human intervention than prior systems.</p><h2><b>New, longer data retention requirement</b></h2><p>The company also introduced a new data-retention policy for Mythos-class models. Anthropic says it will require 30-day retention for all traffic on Fable 5, Mythos 5 and future models with similar or higher capability levels, across both first-party and third-party surfaces. The company says it will not use that data to train new Claude models or for non-safety purposes, and says it has added privacy protections including logging human access and deleting the data after 30 days in almost all cases.</p><p>That policy may become one of the most important enterprise buying questions around Fable 5. Many businesses want frontier AI capability but also want strict control over data retention, especially in regulated sectors. Anthropic’s position is that stronger monitoring is necessary for models with this level of capability. Enterprise customers will have to decide whether the capability gain justifies the retention requirement.</p><h2><b>Enterprise implications</b></h2><p>The broader enterprise significance of Fable 5 is that Anthropic is trying to commercialize a more autonomous class of AI model without exposing all of its capabilities to every user. That could become a template for how frontier labs release increasingly powerful systems: one model family, multiple access tiers, and domain-specific restrictions depending on user trust and risk.</p><p>If Fable 5 performs as Anthropic and early customers describe, developers may hand off larger tasks: code migrations, refactors, UI builds, test writing, bug fixing, documentation, internal tooling and multi-step app creation. </p><p>For knowledge-work-heavy enterprises, Fable 5 could make AI more useful in workflows where earlier models were too brittle: finance research, spreadsheet analysis, legal redlines, procurement review, board materials, market research, sales operations and project planning. The main gain is not just better answers; it is fewer turns, fewer corrections and more ability to keep working through ambiguity.</p><p>For security teams, the launch is more complicated. Most organizations will get Fable 5, not unrestricted Mythos 5. That means they may see stronger general coding and analysis, but not full access to the cyber capabilities Anthropic considers risky. Trusted defenders inside Project Glasswing will get Mythos 5, giving them a more direct way to use the model for vulnerability discovery and defensive testing.</p><p>For life sciences companies, the pattern is similar. Fable 5 may help with general research, literature analysis, data interpretation and scientific reasoning, but the more sensitive biological capabilities will be restricted. Anthropic is effectively creating a separate access path for vetted researchers whose work requires capabilities that could be dangerous in the wrong hands.</p><p>The launch also raises competitive pressure across the AI industry. Anthropic is claiming state-of-the-art results across agentic coding, knowledge work, vision, cybersecurity, legal reasoning, spatial reasoning and health benchmarks. But the more strategically important claim may be that it has found a workable release mechanism for models above its Opus class. If Fable 5’s safeguards hold up under real-world use, Anthropic will argue it can bring more powerful models to market sooner without fully opening the riskiest capabilities.</p><p>That is still a large “if.” The enterprise market will test not only Fable 5’s benchmark performance, but also its reliability, false-positive rate, data-retention tradeoffs and cost at scale. A model that can complete more work autonomously can also burn more tokens, trigger more governance questions and create new review burdens for teams that must verify its output.</p><p>Still, today’s launch marks a clear shift in the Claude lineup. Opus is no longer Anthropic’s top commercial capability tier. Mythos-class models now sit above it. Fable 5 is the first version of that tier for general users; Mythos 5 is the restricted version for trusted high-risk work. Together, they show how Anthropic plans to push frontier AI deeper into enterprise workflows while trying to keep the most dangerous capabilities gated.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5Rlcmqte5CXctov1ExzTWN/a26021b5f21bc357ff8cbac73dffbc6f/ChatGPT_Image_Jun_9__2026__01_10_24_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Every World Cup fan deserves a seat. Norton Neo says its free browser is the ticket]]></title>
            <link>https://venturebeat.com/technology/every-world-cup-fan-deserves-a-seat-norton-neo-says-its-free-browser-is-the-ticket</link>
            <guid isPermaLink="false">7jMd3G5Cz5JdeYlpTaOgqN</guid>
            <pubDate>Tue, 09 Jun 2026 14:00:15 GMT</pubDate>
            <description><![CDATA[<p><i>Presented by Norton </i></p><hr/><p>For 39 days this summer, the planet will be doing roughly the same thing at the same time. The 2026 World Cup spans 104 matches across 16 cities in the United States, Canada, and Mexico, with billions of people likely to watch over the course of the tournament. It could very well be one of the largest shared events the internet has ever been asked to carry. </p><p>What’s changed since the last tournament isn’t the scale, it’s the screen. For a growing share of that audience, the match won’t come through television. It’ll come through a browser tab. The problem is the browser you have today simply does not give you a frictionless and reliable way to watch the World Cup for free.</p><p>In the U.S., a majority of viewers now expect to stream the tournament digitally rather than watch on cable or satellite. It only works when you have a paid subscription. But there is a challenge — for example, fans coming from Europe for who want to watch the game in the U.S. just like they are able to watch it for free in Europe. </p><h2>Watching the World Cup has been harder than it should be</h2><p>Ask anyone who has tried to watch a tournament online and the answers are remarkably consistent across every cycle. Streams stutter and buffer when it matters most. The “free stream” someone forwarded turns out to be a chain of lookalike sites and dead-end links. And the legitimate platforms want a credit card, and maybe a generous helping of personal data too before they’ll play a single minute.</p><h2>One company’s bet: Neo </h2><p>Norton’s answer is <a href="https://neobrowser.ai/tournament">Neo</a>, a browser built on the premise that protection and access can live in the browser software itself rather than in a stack of add-ons the user has to go find, install, and pay for. It&#x27;s less about adding features than removing friction. Remove the steps between a person and the thing they came to do.</p><p>“The tournament is the kind of moment the modern web was supposed to be great at: everyone, everywhere, on the same thing in real time,” says Howie Xu, Chief AI and Innovation Officer at Gen and its family of brands including Norton. “It sometimes takes a PhD to figure out how to watch matches the right way. Our view is that the browser should have done more of the heavy lifting. That’s why we reinvented the browser to enable a true frictionless, safe, and fast access to the contents they deserve.”</p><p>It&#x27;s a notable position coming from a security brand that historically sold protection as a separate thing you bought and remembered to run. <!-- -->But Neo removes this separation.<!-- --> Now, they are reinventing the model so the browser is the whole solution for safe, frictionless, fast streaming.</p><h2>The scams arrive before the match does</h2><p>Before official ticket sales opened, fraudsters were already working the tournament: fake listings, cloned resale sites, phishing messages built to harvest money and personal details. The methods are well-worn. Counterfeit “official” portals copy real branding behind lookalike URLs; phishing emails impersonate organizers and dangle exclusive access; social ads promise guaranteed seats at suspiciously low prices and deliver a doctored PDF, or nothing. The same logic follows fans to streaming, where the cheapest, most convenient link is often the most dangerous one.</p><p>This is where Norton’s back catalogue shows up inside everyday browsing. Anti-phishing, scam-site detection, and malicious-page blocking run in the background, flagging dangerous links as they appear rather than after a card number has already been entered. Whether that’s enough to change fan behavior is the open question. People are remarkably willing to click past a warning when a match is about to kick off. But moving the safeguard into the browser, instead of a separate app, puts it where the risk actually is.</p><h2>Access, without the setup wizard</h2><p>There’s also the matter of simply reaching a legitimate stream. Finding officially licensed providers by country, throttled connections at peak hours, and varying restrictions across platforms all get in the way. The usual remedy is configuring a separate VPN with its own account and billing, which is its own kind of friction. Neo folds Norton’s award-winning VPN technology into the browser itself, and it can easily be turned on or off. That matters most in the situations the tournament actually creates: connecting through an unfamiliar hotel network, an airport layover, a bar’s public Wi-Fi where a sizable share of fans say they’ll watch.</p><p>Neo also builds the search for a legitimate stream directly into the browser. It has a dedicated widget with live game schedules, match reminders, and direct streaming links for every game, surfacing the right licensed source for your market without a separate search.</p><p>“Most people don’t want to manage their security or legitimacy of a link, they want to watch the game,” Xu says. “So we shifted the burden away from the person. The protection is on, the connection is private, and you never had to set anything up.”</p><h2>Calm by design</h2><p>Underneath the tournament use case is the idea Neo keeps coming back to: calm by design, with privacy and security working together inside a clean interface rather than buried in a settings menu. Because the browser can anticipate rather than wait to be asked, it surfaces what a fan likely wants next. A reminder about an upcoming match, a quick summary of the day’s results, a nudge to resume where they left off. Personal data stays on the device unless the person decides otherwise.</p><p>Whether this approach wins a meaningful share of a market Chrome still dominates is far from settled. But Norton Neo says they reinvented a browser to make 5.8 billion potential viewers’ lives easier.</p><p>Fans can explore available streams for their market at <a href="https://lp.neobrowser.ai/tournament_stream">lp.neobrowser.ai/tournament_stream</a>.</p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/2cbOtN8GAEH0kdpo2VWc6c/b9c41bf6228453fdaece3a3930db754d/world_cup_bill.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[AI is about to replace the interface. Business leaders aren’t ready]]></title>
            <link>https://venturebeat.com/orchestration/ai-is-about-to-replace-the-interface-business-leaders-arent-ready</link>
            <guid isPermaLink="false">7f6PLd0elDACJkvTu3ngAr</guid>
            <pubDate>Tue, 09 Jun 2026 07:00:00 GMT</pubDate>
            <description><![CDATA[<p><i>Presented by Snowflake </i></p><hr/><p>As AI agents become capable of reasoning across systems and taking action, software is evolving from something employees operate into something that understands intent. Instead of navigating disparate applications and dashboards, a single system will increasingly ask: What are you trying to accomplish?</p><p>That sounds like a user experience breakthrough. It is. But the more important implication is organizational. When software no longer relies on humans to provide context, companies can no longer assume that knowledge lives in employees&#x27; heads or is buried inside disconnected applications. The company itself has to become machine-readable.</p><p>The winners in the AI era won&#x27;t simply deploy more intelligent models. They&#x27;ll build the data foundations, semantic context, and governance frameworks that allow machines to understand how the business works and act on that understanding with confidence.</p><h2>Context is becoming infrastructure </h2><p>For years, companies treated context as a human layer on top of data. The data platform held the records, then the BI tool visualized them, and the analyst interpreted them. And finally, the business leader made the judgment call. Agents collapse those layers.</p><p>When an executive asks, “Why is customer churn rising in our enterprise segment?” an effective agent needs to know far more than where the customer data lives. It needs to understand how the company defines churn, which accounts count as enterprise, whether product usage data is more reliable than survey data, which renewal events matter, what the sales team has logged, what support tickets suggest, and whether the answer differs by geography or product line.</p><p>This is why semantics — the definitions, relationships, rules, and assumptions that give data meaning — are moving from a technical concern to a boardroom issue. A semantic layer used to sound like plumbing for data teams. In an agentic enterprise, it becomes the shared language between humans and machines.</p><p>If every department teaches its own agent a different version of the business, companies will get inaccuracy at scale. The organizations that pull ahead will be the ones that create a common business knowledge base: consistent definitions, governed access, documented workflows, clear lineage, and enough flexibility to evolve as the business changes. In that world, context is treated as infrastructure, rather than just a nice-to-have.</p><h2>From dashboards to decisions </h2><p>The first wave of enterprise AI largely gave us assistants and copilots that answer questions. Useful, but still limited. You ask a question, get a response, and then return to the work of stitching systems together yourself.</p><p>The next era of AI will be different. Agents will move beyond coordinating answers, and start getting actual work done. A sales leader starting the day will not need to open a CRM, a forecasting tool, a support dashboard, and a Slack thread to understand what changed overnight. They will simply ask an agent what needs attention. The agent will identify which accounts are at risk, explain why, summarize recent customer interactions, draft follow-up actions, and perhaps initiate the next workflow.</p><p>The dashboard does not disappear because charts become useless. It disappears because static reporting becomes too slow for how businesses need to operate. The center of gravity shifts from “show me what happened” to “help me decide what to do next.”</p><h2>The new governance problem: agents that act </h2><p>As long as AI is mostly answering questions, governance is about controlling what it can access. That is already difficult. Employees have different permissions, sensitive data needs protection, and answers must be traceable to trusted sources. As agents begin taking action, governance becomes even more consequential.</p><p>It’s one thing for an agent to summarize a customer complaint. It’s another for it to issue a refund, reorder inventory, or send an email to a customer. This is where many companies will be tempted to choose between two imperfect paths.</p><p>One path is to tightly constrain agents from the start: define the data sources, tools, workflows, and actions they can access. This is easier to manage and measure. It also risks limiting the creativity of employees who understand their workflows best.</p><p>The other path is to let teams experiment freely: connect agents to the tools and data they use every day, and allow new use cases to emerge organically. This can produce faster adoption and unexpected innovation. It can also create real risk: stale data, inappropriate access, duplicated workflows, runaway costs, or automated actions no one fully understands.</p><p>The right answer is not maximum control or maximum freedom. It’s to prioritize governed flexibility. Companies need architectures where governance is embedded from the beginning. An agent should know not only what it can read, but what it can do, when it needs approval, how its reasoning is inspected, and how its performance is evaluated over time. In other words, governance cannot be a review meeting after the pilot. It has to be part of the system design.</p><h2>The boundary between builder and user is collapsing </h2><p>One of the least appreciated consequences of agentic AI is that it will blur the line between people who use software and people who create it. When employees can describe a workflow in natural language and have an agent help build it, software development becomes less confined to engineering teams. A marketer can create a campaign analysis workflow. A finance manager can automate variance explanations. An HR leader can build a policy assistant. A support manager can design a triage process.</p><p>These employees are not becoming software engineers in the traditional sense, but they are becoming builders. That changes the talent model. Technical fluency will matter more because employees need to understand what’s possible, what’s risky, and how to evaluate an AI-generated result. Judgment becomes the most important skill.</p><p>The winners will be the people who know how to ask better questions, inspect evidence, refine workflows, and combine domain expertise with enough technical understanding to move from idea to execution.</p><p>For business leaders, this means AI adoption extends beyond an IT rollout, and is actually an organizational redesign. The distance between insight and action will shrink, and companies will need to rethink who is empowered to build, approve, and operate the workflows that run the business.</p><h2>Software economics will change too </h2><p>The shift from interfaces to agents will also challenge how companies buy and measure software, and change how software is priced. Per-seat licensing is giving way to consumption models, where costs reflect actual usage. For most organizations this is a better deal. You pay for value delivered, not licenses that may sit idle.</p><p>But it also changes the accountability calculus. When costs are fixed per seat, budget conversations happen once a year. When costs scale with usage, they require continuous oversight. Without visibility into how agents are used and what they produce, costs can rise quickly.</p><p>The answer is to build measurement in from the start, connecting AI usage to business outcomes, whether that is deals closed, tickets resolved, or cycle times reduced.The companies that succeed will treat AI cost management as part of operational excellence, not procurement cleanup. The question should not be, “How many tokens did we use?” It should be, “What business outcome did that intelligence produce?”</p><h2>Your customers may stop using your interface </h2><p>While the internal implications of agents are significant, the external ones may be even larger. Today, companies obsess over the customer experience inside their applications: the homepage, the navigation, the checkout flow, the dashboard, the mobile screen. Those things will still matter. But increasingly, customers may interact with businesses through their own agents rather than directly through a company’s app or website.</p><p>If a procurement agent compares suppliers, a travel agent books a trip, or a financial agent evaluates products, the customer may never see the interface a company spent years perfecting. The agent will care less about visual design and more about whether the company’s data, policies, pricing, inventory, documentation, and transaction systems are accessible, structured, trustworthy, and machine-readable.</p><p>That means the competitive surface area changes. A company’s brand may still be emotional, but its operational interface will increasingly be data. Businesses that expose confusing, inconsistent, or poorly governed information will be harder for agents to work with. Businesses with clean semantics, reliable APIs, governed data, and clear policies will become easier to choose, easier to transact with, and easier to trust.</p><p>The interface does not vanish only inside the enterprise. It may vanish between enterprises, too.</p><h2>The real AI readiness test </h2><p>Most executives know they need an AI strategy, but fewer have internalized what that really requires. AI readiness is not the number of pilots launched, the number of models tested, or the number of employees with access to a chatbot. It is whether the organization’s knowledge, data, permissions, workflows, and decision logic are ready for machines to reason over them safely.</p><p>For decades, enterprise software forced humans to become translators between business intent and machine logic. AI is reversing that relationship. Machines are beginning to adapt to human intent. But they can only do that if the enterprise has done the work to make its own context legible.</p><p>The future of software is not another screen. It is a system that understands the business well enough to help run it. And that means the next great interface will not look like an interface at all.</p><p><i>Baris Gultekin is VP of AI at Snowflake.</i></p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/22QMeppvsh3UoFwN43WV7v/d42d715a1edb0dfc157ca94558a75bcf/Gemini_Generated_Image_5n8coh5n8coh5n8c.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
    </channel>
</rss>