<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Wed, 24 Jun 2026 01:41:28 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[Enterprise-grade AI image generation in 2 seconds is here: Krea 2 Raw and Turbo available as open weights under custom license]]></title>
            <link>https://venturebeat.com/technology/enterprise-grade-ai-image-generation-in-2-seconds-is-here-krea-2-raw-and-turbo-available-as-open-weights-under-custom-license</link>
            <guid isPermaLink="false">5eEttAoMBw6Qbnri3f3M1i</guid>
            <pubDate>Tue, 23 Jun 2026 18:53:03 GMT</pubDate>
            <description><![CDATA[<p>While many enterprises have already begun integrating AI-generated images, visuals, graphics and videos into their production workflows — there is also a<a href="https://gizmodo.com/ai-image-generators-default-to-the-same-12-photo-styles-study-finds-2000702012"> growing pool of data</a> and subjective commentary indicating AI imagery ultimately looks non-distinct, monotonous, and too unoriginal to ensure a brand and its assets stand out from the pack. That it&#x27;s &quot;AI slop,&quot; in other words. </p><p>AI creative tools startup Krea is hoping to change that trend by<a href="https://x.com/krea_ai/status/2069435590995812396"> opening up the weights</a> to its new frontier AI image model Krea 2 as two versions, &quot;<a href="https://huggingface.co/krea/Krea-2-Raw">Krea 2 Raw</a>&quot; and &quot;<a href="https://huggingface.co/krea/Krea-2-Turbo">Krea 2 Turbo</a>,&quot; under a <a href="https://huggingface.co/krea/Krea-2-Raw/blob/main/LICENSE.pdf">custom license </a>that requires firms with more than 50 seats to pay for Enterprise usage, and mandates all users of any size to implement technical safeguards to <!-- -->prevent the generation of illegal materials, non-consensual intimate imagery (NCII), child sexual abuse material (CSAM), or defamatory assets.</p><p>Both models are available for public download on <a href="https://huggingface.co/krea">Hugging Face</a>. The company says the models provide more visual variety than typical AI generators, while maintaining high prompt accuracy, fidelity, and quality. Importantly, they also offer enterprises and users the ability to customize the generative outputs much more than typical proprietary or even other open source models. </p><p>And, for those seeking to generate imagery at high-throughput, <a href="https://www.krea.ai/blog/krea-2-turbo">Krea 2 Turbo&#x27;s generation speed is only 2 seconds</a>, making it among the fastest now available across open and proprietary AI image generation models.</p><h2><b>AI Image Generator API Speed &amp; Licensing Benchmarks (Mid-2026)</b></h2><table><tbody><tr><td><p><b>Model / Generator</b></p></td><td><p><b>Developer / Platform</b></p></td><td><p><b>Avg. Generation Time</b></p></td><td><p><b>Licensing &amp; Commercial Use</b></p></td><td><p><b>Key Characteristics</b></p></td></tr><tr><td><p>FLUX.1 [schnell] (fast)</p></td><td><p>Prodia</p></td><td><p>0.5 seconds</p></td><td><p>Open Weights (Apache 2.0).</p><p> Fully permissive for free commercial use.</p></td><td><p>Highly optimized endpoint utilizing step distillation to deliver sub-second generation times, representing the absolute floor for current API latency.</p></td></tr><tr><td><p>Z-Image Turbo</p></td><td><p>Replicate / fal.ai</p></td><td><p>1.8 seconds</p></td><td><p>Proprietary.</p><p> Commercial rights require active API usage contracts.</p></td><td><p>Designed for instantaneous inference bursts. Both Replicate and fal.ai achieve identical 1.8-second median times on this model.</p></td></tr><tr><td><p><b>Krea 2 Turbo</b></p></td><td><p><b>Krea</b></p></td><td><p><b>2.0 seconds</b></p></td><td><p><b>Open Weights / Proprietary Hybrid.</b></p><p><b> Available via platform trial or API.</b></p></td><td><p><b>Maintains the base model&#x27;s compatibility with style references and LoRAs while utilizing Trajectory Distribution Matching (TDM) to accelerate the creative ideation loop.</b></p></td></tr><tr><td><p>Midjourney v8.1 (Turbo Mode)</p></td><td><p>Midjourney</p></td><td><p>3 – 6 seconds </p></td><td><p>Proprietary. Commercial use requires an active Standard, Pro, or Mega tier subscription. </p></td><td><p>Delivers generation speeds &quot;three times faster than v8&quot; while maintaining the model&#x27;s signature &quot;painterly realism with sophisticated lighting,&quot; though it requires a &quot;higher credit cost&quot;. </p></td></tr><tr><td><p>FLUX.2 [klein] 4B</p></td><td><p>Black Forest Labs</p></td><td><p>3.9 seconds</p></td><td><p>Open Weights.</p><p> Permissive commercial use.</p></td><td><p>The lightweight 4-billion parameter variant of the FLUX.2 architecture, balancing prompt adherence with high-speed generation.</p></td></tr><tr><td><p>FLUX.2 [klein] 9B</p></td><td><p>Black Forest Labs</p></td><td><p>4.6 seconds</p></td><td><p>Open Weights.</p><p> Permissive commercial use.</p></td><td><p>The medium-weight 9-billion parameter open model. It scales up compositional intelligence while keeping generation firmly under the 5-second barrier.</p></td></tr><tr><td><p>MAI Image 2 Efficient</p></td><td><p>Microsoft</p></td><td><p>4 – 7 seconds </p></td><td><p>Proprietary. Commercial use requires consumption-based API billing via Azure AI Foundry. </p></td><td><p>A throughput-optimized variant explicitly designed to &quot;out-pace Google’s Imagen Flash&quot;. It makes a slight trade-off in detail for &quot;substantially lower latency&quot; that suits &quot;automated pipelines&quot; perfectly. </p></td></tr><tr><td><p>Midjourney v8.1 (Fast Mode)</p></td><td><p>Midjourney</p></td><td><p>5 – 9 seconds </p></td><td><p>Proprietary. Commercial use requires an active Standard, Pro, or Mega tier subscription. </p></td><td><p>The standard operational mode for v8.1. Average wait times &quot;consistently lands below 10 seconds for most prompts&quot; while offering &quot;excellent handling of complex multi-element scenes&quot;. </p></td></tr><tr><td><p>FLUX.2 [dev]</p></td><td><p>fal.ai / DeepInfra</p></td><td><p>6.1 – 6.4 seconds</p></td><td><p>Open Weights (Non-Commercial).</p><p> Strictly for research and non-commercial development.</p></td><td><p>The developer-focused research model. API endpoint optimizations cause slight variance, with fal.ai operating at 6.1 seconds and DeepInfra at 6.4 seconds.</p></td></tr><tr><td><p>Midjourney v8.1 (Relax Mode)</p></td><td><p>Midjourney</p></td><td><p>8 – 14 seconds </p></td><td><p>Proprietary. Commercial use requires an active Standard, Pro, or Mega tier subscription. </p></td><td><p>Processes standard 1024x1024 resolution images without consuming fast GPU hours. The model retains &quot;strong compositional instincts&quot; and &quot;consistent color grading and mood&quot;. </p></td></tr><tr><td><p>FLUX.2 [pro]</p></td><td><p>Black Forest Labs</p></td><td><p>11.1 seconds</p></td><td><p>Proprietary.</p><p> Commercial rights require paid API consumption.</p></td><td><p>The closed, professional-grade tier. It drops extreme step-distillation to prioritize high-fidelity commercial rendering and strict spatial alignments.</p></td></tr><tr><td><p>Seedream 4.0</p></td><td><p>BytePlus</p></td><td><p>11.6 seconds</p></td><td><p>Proprietary.</p><p> Commercial use via BytePlus enterprise contracts.</p></td><td><p>The base commercial generation model for the Seedream architecture, focused on reliable, standard-resolution outputs.</p></td></tr><tr><td><p>MAI Image 2 Standard</p></td><td><p>Microsoft</p></td><td><p>12 – 20 seconds </p></td><td><p>Proprietary. Commercial use requires consumption-based API billing via Azure AI Foundry. </p></td><td><p>Operates as a &quot;full-quality output optimized for photorealism&quot;. It acts as a literal renderer, delivering &quot;high-fidelity skin tones and material textures&quot; and &quot;strong literal prompt adherence&quot;. </p></td></tr><tr><td><p>Nano Banana Pro (Gemini 3 Pro Image)</p></td><td><p>Google DeepMind</p></td><td><p>17.7 seconds</p></td><td><p>Proprietary.</p><p> Commercial rights granted via Gemini API terms.</p></td><td><p>Prioritizes exact semantic accuracy and prompt adherence through an extended reasoning phase, trading raw speed for complex contextual execution.</p></td></tr><tr><td><p>Seedream 4.5</p></td><td><p>BytePlus</p></td><td><p>18.2 seconds</p></td><td><p>Proprietary.</p><p> Commercial use via BytePlus enterprise contracts.</p></td><td><p>The upgraded high-fidelity variant, requiring an additional 6.6 seconds of compute time over the 4.0 version to refine complex textures and text rendering.</p></td></tr><tr><td><p>Krea 2 Large</p></td><td><p>Krea</p></td><td><p>23.7 seconds</p></td><td><p>Proprietary / Open Weights.</p><p> Commercial rights depend on deployment.</p></td><td><p>The un-distilled foundation model. It ignores the speed-focused Trajectory Distribution Matching of the Turbo variant to maximize aesthetic polish and structural stability.</p></td></tr><tr><td><p>FLUX.2 [max]</p></td><td><p>Black Forest Labs</p></td><td><p>25.6 seconds</p></td><td><p>Proprietary.</p><p> Closed enterprise API.</p></td><td><p>The heaviest parameter model in the FLUX lineup. It operates exclusively as a deep reasoning renderer for complex commercial assets.</p></td></tr><tr><td><p>GPT-Image-2</p></td><td><p>OpenAI</p></td><td><p>200.8 seconds</p></td><td><p>Proprietary.</p><p> Full commercial usage under standard OpenAI terms.</p></td><td><p>A massive outlier in the latency landscape. It dedicates over three minutes to complex, multi-step semantic reasoning, likely utilizing an expansive chain-of-thought process prior to finalizing pixel outputs.</p></td></tr></tbody></table><p><i>Sources: </i><a href="https://artificialanalysis.ai/image/models"><i>Artificial Analysis</i></a><i>, </i><a href="https://www.krea.ai/blog/krea-2-turbo"><i>Krea</i></a><i>, </i><a href="https://www.mindstudio.ai/blog/midjourney-v8-1-vs-microsoft-mai-image-2"><i>MindStudio.AI</i></a><i></i></p><h2><b>Architectural bifurcation and the 12B parameter Transformer</b></h2><p>At the <a href="https://www.krea.ai/blog/krea-2-technical-report">technical core</a> of the release sits an architectural framework built entirely from scratch: a Diffusion Transformer scaled to 12 billion parameters. </p><p>Rather than deploying a single, heavily fine-tuned model for all downstream tasks, Krea open-sources two highly differentiated checkpoints captured at distinct milestones of the model&#x27;s training lifecycle.</p><p>Departing from multi-stream configurations for structural clarity, the core engine standardizes on a single-stream transformer block architecture wherein attention and MLP layers are shared natively between text and image tokens. </p><p>To maximize computational efficiency, Krea incorporates a SwiGLU MLP layer operating at a 4x expansion factor alongside Grouped-Query Attention (GQA) combined with gated sigmoid attention layers to stabilize training dynamics. </p><p>Timestep conditioning is heavily optimized; the network replaces traditional per-block MLP modules with a lightweight, per-block tunable bias term, successfully cutting total block modulation parameters by 20% to 30% and reallocating that parameter budget directly into core layers. </p><p>Positional encoding is managed via a 3D Axial Rotary Position Embedding (RoPE) scheme mapping across individual frame, height, and width coordinate</p><p><b>Krea 2 Raw </b>represents an undistilled base release checkpoint taken directly from the mid-training stage of the larger Krea 2 Medium development cycle. </p><p>Because it lacks post-training alignment, reinforcement learning from human feedback (RLHF), or final aesthetic distillation, Krea 2 Raw functions as a blank canvas. </p><p>It retains a vast, uncurated latent space that makes it poorly suited for immediate out-of-the-box prompting, but highly optimized for structural training. </p><p>Operating this model via the Hugging Face `diffusers` library requires a heavy compute footprint, executing via `Krea2Pipeline` in `torch.bfloat16` precision across 52 inference steps with a guidance scale of 3.5.</p><p>To accelerate early-stage architectural convergence during the first epoch of this 256px baseline training phase, Krea applied internal Representation Alignment (iREPA) techniques before decoupling them to let the underlying model develop independent structural representations.</p><p>The second checkpoint, <b>Krea 2 Turbo,</b> represents the opposite end of the optimization spectrum. </p><p>It is a distilled, post-trained variant derived from Krea 2 Medium. Through knowledge distillation, the network&#x27;s complex multi-step generation sequence is compressed into an incredibly lean operational profile. </p><p>Krea 2 Turbo slashes the required generation cycle down to just 8 inference steps with a guidance scale of 0.0, enabling it to render native 2k resolution imagery on standard consumer-grade hardware in <b>approximately 2 seconds.</b></p><p>The underlying latent representations for both models are optimized through the integration of the Qwen Image VAE and the FLUX 2 VAE to guarantee rapid convergence while maintaining high reconstruction fidelity.</p><h2><b>Data and training</b></h2><p>The underlying dataset strategy for the Krea 2 family relies on a hybrid blend of publicly harvested data, third-party licensed image repositories, and highly curated synthetic datasets built via proprietary generation methods. </p><p>Prior to final training, Krea processed these collections through rigorous algorithmic filters designed to strip out duplicative frames, low-resolution media, and explicit or harmful material, ensuring high fidelity and strong prompt compliance across both models.</p><p>Krea enforces a <i>zero-synthetic data policy</i> within its primary pretraining mix. </p><p>To prevent the upper-bound quality limitations and output biases induced by AI-generated data, the engineering team deployed custom in-house filtering classifiers built on top of DINOv3 and SigLIP-2 architectures to completely purge synthetic images at scale. </p><p>Furthermore, rather than using traditional model-based aesthetic filters that inadvertently strip away artistic intents like motion blur, Krea preserves wide stylistic boundaries. </p><p>The team trained a Sparse Autoencoder (SAE) on SigLIP-2 embeddings to isolate and filter out genuine visual artifacts using an unsupervised tagging framework. </p><h2><b>Krea 2 Raw vs. Krea 2 Turbo: Distinctions and use cases</b></h2><p>The release establishes a highly deliberate operational paradigm for professional studios and independent creators: &quot;train on Raw, generate with Turbo.&quot; This workflow leverages the unique architectural properties of both open-weight files to optimize both training accuracy and rendering speed.</p><p>In creative production pipelines, engineers can use Krea 2 Raw to train custom Low-Rank Adaptations (LoRAs) or domain-specific fine-tunes. </p><p>Because the Raw checkpoint contains no baked-in stylistic opinions or aggressive post-training constraints, it absorbs unique aesthetic directions—such as architectural drafting styles, specific brand assets, or complex lighting designs—with high fidelity and zero stylistic interference. </p><p>Once the training phase is complete, creators can port those exact LoRAs directly over to Krea 2 Turbo.</p><p>This methodology is reflected in Krea&#x27;s own development ecosystem, which hosts an in-house collection of custom LoRAs trained entirely on the Raw foundation model but optimized for execution within Turbo workflows. </p><p>On the user-facing application layer, Krea integrates this dual-engine setup with a powerful style transfer system. Rather than relying on erratic text descriptions to achieve an artistic look, users can feed multiple style reference images directly into the system. </p><p>Krea 2 maps these references across its latent space, allowing creators to isolate individual aesthetic components, combine distinct moodboards, adjust style strength via generative sliders, and fine-tune batch variation levels to maintain visual cohesion across large-scale design iterations.</p><p>To address the gap between raw textual training captions and brief user inputs, Krea paired this suite with an advanced LLM Prompt Expander. Refined via Generalized Deep Q-Network Preference Optimization (GDPO) and trained on synthetic thinking traces to preserve intent reconstruction, the expander applies a photographic-medium bias to photorealistic requests and integrates an active DINOv3 embedding diversity score across rollout groups to prevent automated prompting routines from collapsing into a singular house style.</p><p>While Krea 2 Medium and Krea 2 Large remain the company&#x27;s flagship models for high-fidelity composition and absolute stylistic adherence, Turbo fills the critical role of rapid visual ideation. </p><p>It serves as an interactive scratchpad for early concept creation, quick prompt experimentation, and iterative art direction where near-instantaneous feedback loops are required to maintain creative momentum.</p><h2><b>The custom license and its particulars</b></h2><p>The open-weight assets deploy under the <a href="https://huggingface.co/krea/Krea-2-Raw/blob/main/LICENSE.pdf">Krea 2 Community License Agreemen</a>t operating alongside an official Acceptable Use Policy. </p><p>At a macro level, this legal framework mirrors recent industry trends toward commercial-use permissions that target small businesses while restricting large enterprise exploitation. </p><p>The license explicitly permits individuals, independent creators, and <i>small</i> commercial companies to build applications, monetize generated imagery, and integrate the open weights directly into commercial software products without royalty obligations. </p><p>Furthermore, Krea states that it &quot;does not claim copyright or other intellectual property rights over content generated by users of this model,&quot; leaving output ownership entirely in the hands of the operator.</p><p>For organizations scaling beyond this baseline, the ecosystem shifts into a paid, custom-tier structure. </p><p>While Krea&#x27;s official documentation lacks a rigid revenue threshold defining a &quot;large enterprise,&quot; the company structurally demarcates the boundary based on organizational footprint: standard commercial usage caps at a &quot;Business&quot; tier accommodating up to 50 seats. </p><p>Therefore, any entity requiring more than 50 seats, Single Sign-On (SSO) integrations, guaranteed Service Level Agreements (SLAs), or custom Data Processing Agreements (DPAs) qualifies as an Enterprise. </p><p>These larger entities fall outside the free Community License scope and must pay for a custom commercial license—operating under &quot;Custom Terms of Service&quot;—negotiated directly with Krea&#x27;s sales team. </p><p>Additionally, developer access to Krea&#x27;s official API remains entirely decoupled from the open-weights release; API usage operates as a distinct, paid service billed dynamically on a per-generation basis (measured in microdollars) and requires a prepaid USD balance independent of standard monthly compute subscriptions.</p><p>However, a close examination reveals a significant structural shift regarding legal and behavioral compliance for all self-hosted deployments. </p><p>Unlike traditional open-source permissions like the MIT or Apache 2.0 licenses—which grant unconditional usage rights and completely waive liability—the Krea 2 Community License implements strict downstream behavioral guardrails.</p><p>Because Krea relinquishes centralized control over the downstream deployment of its open weights, the contract legally binds deployers to enforce content moderation protocols at the infrastructure layer. </p><p>Under the terms of the agreement, any developer or platform hosting Krea 2 models must implement active input/output classifiers or equivalent content filtering mechanisms to actively prevent the generation of illegal materials, non-consensual intimate imagery (NCII), child sexual abuse material (CSAM), or defamatory assets. </p><p>Developers who fail to deploy these defensive safety layers stand in immediate breach of contract, giving Krea the explicit right to update model weights or revoke access to the model family entirely.</p><h2><b>Background on Krea</b></h2><p>Founded in 2022 by audiovisual systems engineering dropouts Víctor Perez and Diego Rodriguez Prado, San Francisco-based Krea initially captured market traction as a highly fluid user interface layer built to orchestrate disparate, third-party AI generative engines. </p><p>The startup&#x27;s rapid scaling via product-led adoption culminated in an aggregate<a href="https://techcrunch.com/2025/04/07/kreas-founders-snubbed-postgrad-grants-from-the-king-of-spain-to-build-their-ai-startup-now-its-valued-at-500m/"> $83 million </a>in disclosed venture capital funding from major VCs including Andreessen Horowitz and Bain Capital Ventures, as well as early-stage institutional backers including Pebblebed, Abstract Ventures, and Gradient Ventures.</p><p>The company&#x27;s user base surpassed <a href="https://www.krea.ai/">30 million individuals across 191 countries as of June 2026</a>, according to its website. </p><p>The open-weights launch of the Krea 2 model family represents the culmination of Krea’s deliberate evolution from a multi-model SaaS aggregator into a self-sustaining media research lab. </p><p>Early in its lifecycle, Krea focused on building workflow tools, editing systems, and a node-based automation pipeline that allowed digital artists to unify models from competitors like Runway, Midjourney, and Adobe under a single subscription. </p><p>However, to insulate itself against upstream platform dependencies and supplier margin pressures, the company aggressively shifted toward developing proprietary architectures. This transition began taking public shape in July 2025 with the open-weights release of the custom-curated FLUX.1 Krea checkpoint, followed in October 2025 by Krea Realtime 14B—an autoregressive video model distilled from Wan 2.1 capable of rendering 11 frames per second on localized enterprise hardware.</p><p>This underlying technical maturation parallels Krea&#x27;s accelerating push into high-end enterprise workflows. Large-scale creative production operations have shifted toward treating Krea as core creative infrastructure; for example, the digital creative services platform </p><p><a href="https://www.youtube.com/watch?v=OLNbn4L2fUM">Superside reported migrating workflows</a> from fragmented open-source setups to route roughly 80 percent of its total AI generative production through Krea. </p><p>Furthermore, Krea established a strategic co-development partnership with Copenhagen-headquartered architecture firm <a href="https://henninglarsen.com/news/we-re-partnering-with-krea">Henning Larsen</a> to build highly restricted, domain-specific design tools tuned to meet the compliance frameworks mandated by the EU AI Act. </p><p>By releasing Krea 2 Raw and Turbo as open weights, Krea is continuing its expansion from an AI tools provider to being a model provider in its own right.</p><h2><b>An alternative to typical rigid AI imagery APIs?</b></h2><p>Creators are focusing heavily on the structural freedom offered by the unaligned Raw checkpoint, viewing it as an important alternative to the locked-down APIs provided by closed-source models.</p><p>Through the<a href="https://x.com/krea_ai/status/2069435590995812396"> official announcement on X,</a> Krea emphasized the foundational shift this launch represents for open AI workflows.</p><p>Developers note that by treating AI as an &quot;actual creative medium&quot; that feels &quot;raw, flexible, unopinionated, and unconstrained,&quot; Krea is intentionally providing an infrastructure that creators can &quot;break if [they] want to,&quot; moving far away from the rigid safety guardrails that frequently limit the visual range of competing enterprise tools.</p><p>As independent model builders begin compiling the Hugging Face repositories, the practical value of the release will be determined by how effectively the open-source community can scale customized LoRAs using Krea 2 Raw.</p><p>By providing clear commercial terms and lowering hardware entry barriers via Turbo&#x27;s 8-step inference pipeline, Krea has introduced a highly competitive alternative to the open-weights market, challenging dominant models by prioritizing artistic control over centralized corporate alignment.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/40PtOuCZ5IxnrcPzLnsEXH/921bf5750ef6de7995d4167ad8544135/6pAhBmfGUeBiqqzdBdyyh_28cdd3c23c2347a18026d5763e56fc64.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Anthropic launches Claude Tag, replacing its Slack app with a persistent AI teammate that learns, monitors and works autonomously]]></title>
            <link>https://venturebeat.com/technology/anthropic-launches-claude-tag-replacing-its-slack-app-with-a-persistent-ai-teammate-that-learns-monitors-and-works-autonomously</link>
            <guid isPermaLink="false">12j1eg9aKvebXKXvPqaEfB</guid>
            <pubDate>Tue, 23 Jun 2026 17:00:00 GMT</pubDate>
            <description><![CDATA[<p><a href="https://www.anthropic.com/">Anthropic</a> on Tuesday launched <a href="http://anthropic.com/news/introducing-claude-tag"><u>Claude Tag</u></a>, a new product that embeds its most advanced AI model directly inside Slack as a persistent, shared teammate that anyone on a team can delegate work to by simply typing @Claude.</p><p>The product, available today in beta for<a href="https://support.claude.com/en/articles/9797531-what-is-the-enterprise-plan"> Claude Enterprise</a> and <a href="https://support.claude.com/en/articles/9266767-what-is-the-team-plan">Team</a> customers, replaces Anthropic&#x27;s existing Claude in Slack app and represents the company&#x27;s most aggressive move yet to colonize the enterprise collaboration layer — the place where decisions get made, work gets assigned, and institutional knowledge accumulates in real time.</p><p>For enterprise technology leaders who have spent the past two years evaluating where AI fits into their operational stack, <a href="anthropic.com/news/introducing-claude-tag">Claude Tag</a> reframes the question entirely. This is not a chatbot, a coding assistant, or a search tool bolted onto a messaging platform. It is an AI agent designed to function as a standing member of a team — one that builds memory, takes initiative, works asynchronously, and interacts with every person in a channel rather than serving a single user. The implications for enterprise workflow, governance, and vendor strategy are significant.</p><p>Anthropic says 65% of its own product team&#x27;s code is now created by its internal version of Claude Tag, and the company runs internal support and data insight channels through the same system. The claim is striking: Anthropic is asserting that the majority of its own product engineering output already flows through the tool it just put in customers&#x27; hands.</p><div></div><h2><b>How Claude Tag works inside enterprise Slack channels</b></h2><p>At its core, <a href="anthropic.com/news/introducing-claude-tag">Claude Tag</a> works like this: an administrator pairs it with a Slack workspace, grants it access to specific tools and data sources, sets spending limits, and defines which channels it can operate in. From that point on, any team member in those channels can tag @Claude with a request — write a pull request, pull sales numbers, run a data analysis — and Claude will break the task into stages, execute them using the tools it has access to, and respond in a Slack thread with the result. The product runs on <a href="https://www.anthropic.com/news/claude-opus-4-8">Claude Opus 4.8</a>, the model Anthropic released less than a month ago.</p><p>Four capabilities differentiate <a href="https://www.anthropic.com/news/introducing-claude-tag">Claude Tag </a>from its predecessors and from competing integrations. First, it is multiplayer. Within a given Slack channel, there is one Claude that interacts with everyone, not a separate instance per user. Anyone can see what it is working on, and anyone can pick up the conversation where the last person left off. This is a direct contrast to most existing AI integrations in Slack, which tend to operate as single-player tools.</p><p>Second, it learns over time. As Claude follows along with its channel, it accumulates context about the work happening there. Users do not need to re-explain projects from scratch. If granted permission, Claude can also pull context from other Slack channels and data sources, though Anthropic says it will not report from private channels. Third, it takes initiative. With ambient behavior enabled, Claude will proactively surface relevant information from across the channels it monitors and the tools it is connected to, and will follow up on threads or tasks that have gone quiet without resolution. This is a notable expansion of agency: Claude is not just responding to requests but monitoring the information environment and deciding what its human teammates need to know. Fourth, it works asynchronously, pursuing projects autonomously over hours or days. Anthropic says its own teams &quot;now spend much more of our time delegating tasks to many Claudes in parallel.&quot;</p><h2><b>Enterprise security controls and administrative governance get a central role</b></h2><p><a href="https://www.anthropic.com/">Anthropic</a> has designed the system with enterprise-grade isolation at its center. System administrators define separate Claude identities for different uses, scoped to specific channels with specific tools and data access. Everything, including Claude&#x27;s accumulated memories, stays within those boundaries. A Claude configured for sales work will not share memories or data access with one configured for engineering.</p><p>Administrators can set token-spend limits at both the organizational and channel level, and can review a complete log of every action Claude has taken and which user requested each task. For organizations managing compliance, audit, or regulatory requirements, this logging and scoping architecture is table stakes — and its absence has been a dealbreaker for many enterprises evaluating AI collaboration tools over the past year.</p><p>Migration from the existing <a href="https://slack.com/marketplace/A08SF47R6P4-claude">Claude in Slack app</a> requires an administrator opt-in within 30 days, and Anthropic says it is issuing introductory launch credits to eligible Enterprise and Team organizations. The four-step setup process — pair with Slack, connect tools, set spend limits, test in a private channel — is designed to reduce friction for IT teams already managing sprawling SaaS portfolios.</p><h2><b>The Slack battleground is now the most contested real estate in enterprise AI</b></h2><p><a href="anthropic.com/news/introducing-claude-tag">Claude Tag</a> arrives in the middle of what has become the most fiercely contested territory in enterprise AI: the Slack channel. Slack itself has been aggressively positioning the platform as an &quot;agentic operating system,&quot; and the major AI players have responded by racing to plant their flags.</p><p>Salesforce, which <a href="https://slack.com/blog/news/salesforce-completes-acquisition-of-slack">acquired Slack for $27.7 billion in 2021</a>, announced more than <a href="https://venturebeat.com/orchestration/slack-adds-30-ai-features-to-slackbot-its-most-ambitious-update-since-the">30 new capabilities for Slackbot</a> in March — the most sweeping overhaul of the platform since the acquisition — transforming it from a simple conversational assistant into a full-spectrum enterprise agent. OpenAI introduced &quot;<a href="https://openai.com/index/introducing-workspace-agents-in-chatgpt/">Workspace Agents</a>&quot; in April, allowing enterprise subscribers to design agents that take on work tasks across third-party apps including Slack, Google Drive, Microsoft apps, Salesforce, and Notion. <a href="https://venturebeat.com/technology/perplexity-takes-its-computer-ai-agent-into-the-enterprise-taking-aim-at">Perplexity launched its enterprise &quot;Computer&quot; agent</a> with direct Slack integration, letting employees query @computer directly inside Slack channels. Cognition&#x27;s <a href="https://devin.ai/">Devin</a>, the autonomous AI software engineer, has been built around Slack as a primary interface since its early days. Even Microsoft has brought GitHub Copilot into Teams.</p><p>The logic driving this convergence is straightforward: the average enterprise juggles over 1,000 applications, and employees waste countless hours on context switching, draining productivity by up to 40%. Whichever AI system becomes the default presence in the communication layer where work is coordinated gains an enormous distribution advantage — and, critically, an enormous data advantage. The AI that lives in the channel where work happens absorbs the institutional context that makes it increasingly difficult to replace.</p><h2><b>Anthropic built Claude Tag on a foundation two years in the making</b></h2><p>To understand Claude Tag&#x27;s strategic significance, it helps to trace the product arc that led to it. Anthropic first integrated <a href="https://techcrunch.com/2025/08/20/anthropic-bundles-claude-code-into-enterprise-plans/">Claude with Slack </a>in October 2025, offering two-way connectivity: users could invoke Claude from within Slack or connect Slack as a data source for Claude&#x27;s chatbot. The initial integration was focused on individual productivity — direct messages, AI assistant panels, and thread participation. In January 2026, Anthropic expanded Claude&#x27;s Slack presence when it launched interactive Claude apps, which included workplace tools like Slack, Canva, Figma, Box, and Clay.</p><p>In parallel, Anthropic was building out its enterprise infrastructure stack. In August 2025, the company bundled <a href="https://techcrunch.com/2025/08/20/anthropic-bundles-claude-code-into-enterprise-plans/">Claude Code into enterprise plans</a>, a move its product lead Scott White called &quot;the most requested feature from our business team and enterprise customers.&quot; In April 2026, Anthropic launched <a href="https://venturebeat.com/orchestration/anthropics-claude-managed-agents-gives-enterprises-a-new-one-stop-shop-but">Claude Managed Agents</a>, a suite of composable APIs for building and deploying cloud-hosted AI agents at scale, with early adopters including Notion, Rakuten, Asana, and Sentry. </p><p>Then came <a href="https://www.anthropic.com/news/claude-opus-4-8">Claude Opus 4.8 </a>in late May, which Anthropic described as &quot;a more effective collaborator&quot; with &quot;sharper judgement, more honesty about its progress, and the ability to work independently for longer than its predecessors.&quot; Benchmark improvements included <a href="https://9to5mac.com/2026/05/28/anthropic-upgrades-claude-with-new-opus-4-8-model-heres-whats-new/">a jump in agentic coding scores</a> from 64.3% to 69.2% and a knowledge work score increase from 1753 to 1890. Claude Tag is the synthesis of all of these threads — combining the Slack channel presence, the enterprise security architecture, the Managed Agents infrastructure, and the Opus 4.8 model&#x27;s improved agentic capabilities into a single product that Anthropic frames as &quot;the beginning of an evolution of Claude Code.&quot;</p><h2><b>Anthropic&#x27;s explosive growth explains why it is betting big on the collaboration layer</b></h2><p>The financial stakes behind this launch are enormous. Anthropic <a href="https://www.anthropic.com/news/series-h">raised $65 billion in Series H funding</a> in late May at a $965 billion post-money valuation, and its run-rate revenue crossed $47 billion earlier this month. Claude Code&#x27;s run-rate revenue alone has grown to over $2.5 billion, more than doubling since the beginning of 2026, and enterprise use has grown to represent over half of all Claude Code revenue.</p><p>Those numbers explain why Anthropic is investing so heavily in channel-level presence. Every enterprise customer who grants Claude persistent access to a Slack channel — with connected tools, accumulated context, and ambient monitoring enabled — represents a dramatically deeper integration than a chatbot conversation or an API call. The usage patterns become stickier, the token consumption grows, and the switching costs rise. Deloitte&#x27;s deployment of Claude across more than 470,000 employees in 150 countries — reportedly its largest-ever enterprise AI deployment — illustrates the scale at which these dynamics play out.</p><p>The broader market trajectory reinforces the bet. Fortune Business Insights projects the global agentic AI market will grow <a href="https://www.fortunebusinessinsights.com/agentic-ai-market-114233">from $9.14 billion in 2026 to $139 billion by 2034</a>, and Gartner forecasts that <a href="https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025">40% of enterprise applications</a> will feature task-specific AI agents by 2026, up from less than 5% in 2025. Anthropic is not alone in seeing this future, but with Claude Tag it is making one of the most direct plays yet to own the enterprise agent layer.</p><h2><b>The risks enterprise buyers need to weigh before granting Claude a permanent seat at the table</b></h2><p><a href="anthropic.com/news/introducing-claude-tag">Claude Tag </a>raises several questions that enterprise buyers will need to evaluate carefully. The first is vendor dependency. As VentureBeat reported when analyzing <a href="https://venturebeat.com/orchestration/anthropics-claude-managed-agents-gives-enterprises-a-new-one-stop-shop-but">Claude Managed Agents</a> earlier this year, once an organization&#x27;s agents, operational configurations, and monitoring run on Anthropic&#x27;s managed infrastructure, switching costs increase significantly. Claude Tag deepens this dynamic: a Claude that has accumulated months of channel context and institutional memory becomes very difficult to replace. Enterprise procurement teams accustomed to negotiating multi-cloud flexibility will need to think hard about what it means to give a single vendor&#x27;s AI persistent access to the communication layer where institutional knowledge lives.</p><p>The second is governance around ambient monitoring. The proactive behavior mode — in which Claude monitors channels and surfaces information it decides is relevant — represents a meaningful expansion of what enterprise AI systems do. Organizations will need to develop clear frameworks for an AI agent that is not just responding to requests but actively surveilling information flows and making editorial judgments about what humans need to know. For regulated industries, this raises questions that existing AI governance policies may not yet address.</p><p>The third is pricing. Anthropic has not published detailed pricing for <a href="anthropic.com/news/introducing-claude-tag">Claude Tag</a> beyond noting that it runs on token-based spending with administrative controls. For an agent that monitors channels continuously, builds memory, and works asynchronously over hours or days, the token consumption profile could look very different from traditional AI usage. And the fourth is reliability: Anthropic has been candid in recent months about infrastructure strain caused by surging demand, and for a product positioned as an always-on team member, downtime carries a different kind of cost than it does for a tool invoked on demand.</p><h2><b>What Claude Tag signals about the future of enterprise work</b></h2><p>Anthropic says its goal is to expand <a href="anthropic.com/news/introducing-claude-tag">Claude Tag</a> beyond Slack &quot;so that teams can tag @Claude in the many other places they work.&quot; The company is clearly eyeing the full collaboration surface — Microsoft Teams, email, project management tools, and beyond. If Claude Tag succeeds, it will validate a model of enterprise AI that looks less like a tool and more like a new category of worker: one that never sleeps, never forgets what was discussed in the channel last Tuesday, and never needs to be onboarded twice.</p><p>But the deeper significance of this launch may be what it reveals about the competitive dynamics reshaping enterprise software. For decades, the most valuable real estate in business technology was the system of record — the database, the CRM, the ERP. The current AI arms race suggests that the next era of enterprise value will be captured not by the system that stores the data, but by the agent that sits in the room where the work happens and understands what to do with it. Anthropic just gave that agent a name, a permanent seat in the channel, and permission to speak up when it thinks it has something to say. The question for every enterprise technology leader is no longer whether that agent will arrive. It is whether they are ready to manage it when it does.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Business</category>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6CFsd7sGf60QrI8jTqYfuy/788282d94743cf3a42c6c4bb28553ad6/Nuneybits_Vector_art_of_entirely_burnt_orange_chat_panel_mergin_c649b311-f246-4715-9849-8358622f4bee.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[A proof of concept forgives a fragile data path. Operational AI does not.]]></title>
            <link>https://venturebeat.com/orchestration/a-proof-of-concept-forgives-a-fragile-data-path-operational-ai-does-not</link>
            <guid isPermaLink="false">1cFP0WWYEA4seoXO1SsvYv</guid>
            <pubDate>Tue, 23 Jun 2026 07:00:00 GMT</pubDate>
            <description><![CDATA[<p><i>Presented by F5</i></p><hr/><p>When enterprises move AI workloads from pilot to production, data delivery often becomes the factor that determines whether those systems can scale reliably. Point-to-point architectures connecting storage directly to compute hold up under demonstration conditions, but they often break down under sustained, concurrent production traffic. The result is stalled inference pipelines, delayed RAG systems, underutilized GPUs, and SLA violations, all of which carry direct business consequences. </p><p>&quot;Organizations successfully operationalize AI when their infrastructure is built to handle real-world failures, not just controlled conditions,&quot; says Hunter Smit, senior manager of product marketing at F5. </p><h2>Production traffic exposes architectural weaknesses</h2><p>In a pilot, a stalled transfer is an inconvenience, while in production, that same stall is an outage someone now owns. The underlying architecture is often identical in both cases: when a client is wired directly to storage, the system becomes increasingly fragile under sustained, concurrent production traffic because that direct connection has no answer when a node fails or traffic spikes. From there, retries and timeouts cascade, and the entire pipeline backs up right at the moment the business is depending on the output.</p><p>&quot;Point-to-point architectures, where the S3 client connects directly to S3 storage, are not resilient,&quot; says Paul Pindell, principal solutions architect for technology alliances at F5. &quot;If a single storage node fails, all traffic to that cluster degrades, and in some cases the cluster can fail entirely.&quot;</p><p>The problem is that AI workflows, including RAG-based inference and agentic AI, increasingly treat S3 storage as a first-class citizen in the AI cluster. However, the network connectivity between that storage and the cluster was never designed for the high-throughput, uninterrupted data movement that&#x27;s needed to keep GPUs running optimally.</p><h2>The real cost of stalled pipelines and underutilized GPUs</h2><p>&quot;Enterprise leaders tend to frame AI infrastructure around GPU utilization, but what makes AI different from traditional deterministic workloads is that infrastructure continuously influences those outcomes at every interaction,&quot; says Tanu Mutreja, senior director of product management at F5. &quot;In AI environments, infrastructure is no longer just a back-end concern. It shapes customer experience, quality, resilience, and cost with every transaction.&quot;</p><p>There can be significant business consequences. For instance, when inference pipelines stall, it becomes an SLA and customer experience issue. When RAG systems are delayed, models lose access to timely, relevant context, which results in inaccurate, outdated, or hallucinated responses, all of which create operational, compliance, and reputational risks. At the same time, the infrastructure issues that create those problems can also drive up costs by leaving expensive GPU resources idle or underutilized.</p><p>&quot;When GPUs are underutilized, it signals infrastructure inefficiencies that inflate costs while limiting scalability and responsiveness,&quot; Mutreja says. &quot;The leadership question is whether the end-to-end AI infrastructure consistently delivers reliable, secure, high-quality, and governed AI experiences at sustainable unit economics.&quot;</p><h2>Building a production-ready data delivery layer</h2><p>F5 treats data delivery as a first-class infrastructure layer rather than assuming the network path will simply work. Where application delivery optimized the flow of requests between users and applications, data delivery optimizes the flow of data between storage, networks, and compute, including AI compute. </p><p>Making data delivery a first-class layer means building three properties into it:</p><p>Observability provides real-time visibility into latency, throughput, and flow health.</p><p>Programmability enables policy-driven control over how data moves, through dynamic routing, traffic optimization, rate management, and automated failover. </p><p>Failure-awareness builds resilience for degraded networks, storage throttling, and service disruptions.</p><p>In the <a href="https://www.f5.com/resources/deployment-guides/f5-big-ip-ltm-dell-objectscale-s3-storage">architecture F5 has developed for Dell ObjectScale</a>, F5 BIG-IP sits between ObjectScale and AI compute as a programmable control point at the storage edge. </p><p>&quot;We have seen cases where a misconfiguration in the AI compute layer effectively DDoS&#x27;d the S3 storage infrastructure, &quot; Pindell says. &quot;Not in a malicious way, more of an &#x27;Oh no, what did I do?&#x27; moment, but it still took storage down for the entire organization.&quot; </p><p>Placing BIG-IP as the application delivery controller between the storage and compute layers protects storage with QoS, rate limits, and connection limits, keeping it resilient and operational under that kind of load. <a href="https://www.f5.com/go/report/validated-ai-data-delivery-resiliency-f5-big-ip-dell-objectscale">SecureIQLab-validated testing</a> confirmed that this protection does not come at the cost of throughput, which matters architecturally, Pindell says. </p><p>&quot;Preserving, and even improving, throughput is a must-have,&quot; he explains. &quot;It&#x27;s what lets you layer on the higher-level functionality, resilience and enhanced security, without giving up performance to get there.&quot;</p><h2>The added complexity of hybrid and multicloud AI</h2><p>AI deployments in hybrid multicloud environments have an even greater data delivery challenge because of the heterogeneity involved. In other words, data traversing these environments must contend with inconsistent policies, security controls, identity systems, governance requirements, fragmented visibility, and distinct failure boundaries.</p><p>Programmable traffic management and observability address this complexity together. Observability provides a unified view of application, network, and infrastructure health across otherwise disconnected environments. Programmable traffic management uses those insights to intelligently route, balance, and fail over traffic in real time. Together, they create a closed-loop feedback system that enforces consistent policies, improves resilience across failure domains, and ensures reliable, high-performance <a href="https://www.f5.com/solutions/use-cases/ai-data-delivery">AI data delivery</a> regardless of where applications, data, or users reside.</p><h2>What separates production AI from perpetual pilots</h2><p>The organizations that move beyond perpetual pilots share a specific engineering discipline, Smit says. </p><p>&quot;They&#x27;re the ones that reach for production design with failure as the normal state, not the exception,&quot; he explains. &quot;They will assume latency, congestion, and partial outages will happen. And they build a data path observable and failure-aware enough to absorb them, with explicit mitigation for every degraded condition rather than a hope that the network will hold.&quot;</p><p>Organizations stuck in perpetual pilots are still optimizing for the perfect lab result and discovering the real-world gap only when a workload goes live. The issue is not model quality or GPU count, but whether the data delivery layer was engineered with the same rigor as the compute.</p><p>&quot;Teams need to understand that a real-world network behaves very differently from an optimized lab network,&quot; Pindell says. &quot;They need a mitigation plan for the failure states and performance bottlenecks they will hit in production.&quot;</p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/65ymd1QIkohqORWOIhzJxk/24536b35ec26c9b4e0c4606f30fc9311/AdobeStock_635481072.jpeg?w=300&amp;q=30" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[Alibaba's AI video model rises to No. 2 in global rankings, as OpenAI's Sora and ByteDance's Seedance fall away]]></title>
            <link>https://venturebeat.com/technology/alibabas-ai-video-model-rises-to-no-2-in-global-rankings-as-openais-sora-and-bytedances-seedance-fall-away</link>
            <guid isPermaLink="false">Ff198lZmN6ZRGaii3Qog8</guid>
            <pubDate>Mon, 22 Jun 2026 20:22:56 GMT</pubDate>
            <description><![CDATA[<p><a href="https://www.alibabacloud.com/en?_p_lc=1">Alibaba Cloud</a> on Sunday released <a href="https://www.happyhorse.com/">HappyHorse 1.1</a>, a major upgrade to its AI video generation model that the company says delivers production-ready video synthesis across core content creation scenarios. The model is now live on <a href="https://modelstudio.alibabacloud.com/">Alibaba Cloud Model Studio</a> with full API access for enterprise customers and developers, accompanied by a 40% sitewide launch discount for the first two weeks.</p><p>The release arrives at a moment of remarkable upheaval in the AI video generation market — and Alibaba appears keenly aware of the timing. OpenAI <a href="https://help.openai.com/en/articles/20001152-what-to-know-about-the-sora-discontinuation">discontinued Sora</a> after it proved financially unsustainable. ByteDance <a href="https://www.cnbc.com/2026/03/17/bytedance-seedance-shut-down-tiktok-marsha-blackburn-peter-welch.html">indefinitely shelved</a> the international rollout of Seedance 2.0 following a barrage of copyright complaints from Hollywood studios. For enterprise procurement teams that had been evaluating or integrating those tools into marketing, advertising, and content production workflows, the competitive landscape has contracted sharply in a matter of months.</p><p>That contraction creates both an opportunity and a test for Alibaba. HappyHorse 1.1 is not a research demo or a consumer toy — it is an API-first product built for integration into enterprise software stacks, priced for volume, and backed by a $52.7 billion global infrastructure buildout. Whether it can convert technical capability into enterprise adoption, particularly in Western markets navigating intensifying U.S.-China tech tensions, will determine whether Alibaba can establish itself as a serious player in the generative video market that analysts expect to reach tens of billions of dollars by the end of the decade.</p><h2><b>How HappyHorse climbed from anonymous benchmark entry to top-ranked video model</b></h2><p><a href="https://www.happyhorse.com/">HappyHorse</a> first appeared in early April as an anonymous submission on the <a href="https://x.com/arena/status/2044977389185482998">Artificial Analysis Video Arena</a>, an independent benchmarking platform where real users compare model outputs in blind, side-by-side evaluations. The model immediately claimed the top position in both text-to-video and image-to-video rankings. Alibaba was subsequently confirmed as the creator, revealing it was built by the company&#x27;s ATH (Alibaba Token Hub) AI Innovation Unit — a team previously part of the Future Life Lab under the Taobao and Tmall Group before a strategic organizational restructuring.</p><p>According to <a href="http://arena.ai">Arena.ai</a>, HappyHorse 1.0 now holds the No. 2 position across all three Video Arena leaderboards. The platform noted the model scores 1,444 in both text-to-video and image-to-video categories, leading Google&#x27;s Veo-3.1 (with audio) by 69 points in text-to-video and xAI&#x27;s Grok-Imagine-Video by 23 points in image-to-video. In Elo-based ranking systems like Arena&#x27;s, models gain or lose points based on whether users prefer their outputs in head-to-head comparisons, meaning persistent double-digit leads reflect a consistent quality gap as perceived by human evaluators — not a statistical fluke.</p><p>The model&#x27;s architecture helps explain why. According to community-compiled technical documentation, HappyHorse is built around a 15-billion-parameter unified self-attention Transformer that processes text, image, video, and audio tokens within a single token sequence. Unlike many competitors that stitch together separate models for video and audio, HappyHorse operates as a unified system that handles all modalities in a single generation pass, eliminating the need for third-party dubbing or post-processing audio tools. For enterprise buyers evaluating total cost of ownership, that architectural simplicity translates directly into fewer integration points, fewer vendor dependencies, and faster time to production.</p><h2><b>What the 1.1 upgrade fixes — and why it matters for commercial video production</b></h2><p>The 1.1 upgrade targets a set of pain points that enterprise video production teams know intimately. <a href="https://www.alibabacloud.com/en?_p_lc=1">Alibaba Cloud</a> described the release as &quot;systematically optimized across core content generation scenarios,&quot; and the specific improvements reveal a model that has been tuned for commercial deployment rather than viral social media demos.</p><p>The most consequential upgrade is multi-image reference capability, which Alibaba calls R2V (Reference-to-Video). The feature allows users to upload multiple character reference images and maintain consistent identity across generated video — directly addressing one of the hardest problems in AI video production, where subjects tend to drift in appearance between frames or shots. For brands producing advertising campaigns, product videos, or serialized marketing content, identity consistency is not a nice-to-have; it is a requirement that has historically forced teams back to traditional production methods.</p><p>Motion quality receives a significant overhaul, with what Alibaba describes as &quot;strengthened motion modeling&quot; that addresses prior limitations in speed and fluidity. The company also made targeted improvements to visual texture, specifically calling out the elimination of &quot;facial oiliness,&quot; &quot;over-sharpening,&quot; and &quot;unnatural textures&quot; — artifacts that have plagued commercial AI video since the technology emerged and that immediately signal to viewers that content is machine-generated.</p><p>Two additional upgrades round out the release. <a href="https://www.happyhorse.com/">HappyHorse 1.1</a> improves audio-visual synchronization, including what Alibaba claims is &quot;zero-drift lip sync&quot; for dialogue scenes and context-aware speech pacing — building on the 1.0 version&#x27;s already notable ability to generate up to 15 seconds of 1080p video with synchronized audio output. The model also improves instruction-following for long and complex prompts, a critical differentiator for enterprise users who need to specify precise camera movements, lighting conditions, and narrative beats in a single generation pass rather than iterating through dozens of attempts.</p><h2><b>Sora&#x27;s collapse and Seedance&#x27;s freeze leave enterprise buyers with fewer choices than ever</b></h2><p>The competitive context surrounding this launch is unusually favorable for Alibaba, and it is worth understanding why.</p><p>OpenAI&#x27;s Sora web and app experiences were <a href="https://help.openai.com/en/articles/20001152-what-to-know-about-the-sora-discontinuation">discontinued on April 26</a>, with the Sora API set to follow on September 24. The shutdown came after the product proved financially untenable: Sora cost roughly $1 million per day to operate but generated only about $2.1 million in total revenue, while active users dropped from a peak near 1 million to under 500,000. For enterprise teams that had integrated Sora into production pipelines, the abrupt withdrawal underscored the risks of depending on AI products that lack a sustainable business model — a cautionary tale that procurement officers are unlikely to forget quickly.</p><p>ByteDance&#x27;s <a href="https://seed.bytedance.com/en/seedance2_0">Seedance 2.0</a>, which many considered Sora&#x27;s most formidable successor, ran into a different kind of wall. Netflix, Warner Bros., Disney, Paramount, and Sony sent legal threats to ByteDance over allegations of systematic copyright infringement after users generated viral clips featuring Hollywood intellectual property. <a href="https://techcrunch.com/2026/03/15/bytedance-reportedly-pauses-global-launch-of-its-seedance-2-0-video-generator/">ByteDance indefinitely postponed</a> the international launch, and the global rollout remains suspended.</p><p>That leaves <a href="https://blog.google/innovation-and-ai/technology/ai/veo-3-1-lite/">Google&#x27;s Veo 3.1</a> as the primary Western competitor in the enterprise video generation space. But Alibaba&#x27;s Arena rankings suggest HappyHorse is outperforming Veo on user-perceived quality, and the 40% launch discount on Alibaba Cloud Model Studio could make HappyHorse significantly cheaper at scale. At the 1.0 level, pricing through third-party API platforms ran roughly $1.82 per 10-second clip at 720p and $3.12 at 1080p. With the promotional pricing, HappyHorse 1.1 could bring production-quality AI video generation within reach of mid-market companies and agencies that previously considered the technology too expensive for anything beyond experimentation.</p><h2><b>Alibaba&#x27;s $52.7 billion infrastructure bet gives HappyHorse a distribution advantage rivals can&#x27;t match</b></h2><p><a href="https://www.happyhorse.com/">HappyHorse 1.1</a> does not exist in isolation. It sits atop a global infrastructure offensive that distinguishes Alibaba from pure-play AI model companies that build impressive technology but lack the physical and commercial machinery to serve regulated enterprise customers at scale.</p><p>Just five days before the HappyHorse 1.1 launch, <a href="https://www.alibabacloud.com/en?_p_lc=1">Alibaba Cloud</a> opened its first data centers in France, establishing its third European hub after Germany and the United Kingdom. The Paris region features two availability zones, bringing the company&#x27;s global footprint to 105 availability zones across 32 regions. &quot;The expansion of our cloud infrastructure into France reinforces our ongoing commitment to empowering European businesses with sovereign, secure, and intelligent solutions,&quot; said Dr. Feifei Li, Alibaba Cloud&#x27;s CTO and president of international business, in the company&#x27;s announcement. In Japan, the company opened its fifth data center in Tokyo on June 19.</p><p>As reported by <a href="https://www.datacenterdynamics.com/en/news/alibaba-cloud-launches-france-region/">Data Center Dynamics</a>, CEO Eddie Wu has committed to investing $52.7 billion in building a &quot;unified global cloud network,&quot; with the company later considering increasing this to $69 billion. This year alone, Alibaba has launched new regions in Mexico, Thailand, Malaysia&#x27;s Johor, and France. The France deployment is also part of Alibaba Cloud&#x27;s plan to roll out enterprise-grade agentic AI services across Europe in the second half of the year, including <a href="https://help.aliyun.com/en/functioncompute/fc/what-is-agentrun">AgentRun</a> (a development platform for AI agents), <a href="https://help.aliyun.com/en/starops/product-overview/introduction-of-starops">STAROps</a> (an intelligent operations platform), and <a href="https://www.alibabacloud.com//blog/one-click-openclaw-deployment-building-enterprise-grade-ai-agent-applications-with-acs-agent-sandbox_602980/_____tmd_____/punish?x5secdata=xcybsQIh5Cown%2fWZGmvZM4R8tzrKeLy38z%2bxF39tV8%2fJwaQbn3Vu7Pb7GOOHfHTc9jfWBSal7fUMFaPB4md90IQbPqDwo4rlivLRDyLVfZwpl0vKVA7dwDSrf6Scw4ClRD9ZUte6ZkHtjGJxj2KB%2f4rQdKygWtukQNfv494%2fgbCGHwYB5Pg08kF18V9%2bYRULrQ6hp2PCkXtH%2f3pVnvORQU3ViffPPs%2fa1PN%2fDb4vdHSw5EdZZoZdHfv15xALfTrN4w__bx__www.alibabacloud.com%2fblog%2fone-click-openclaw-deployment-building-enterprise-grade-ai-agent-applications-with-acs-agent-sandbox_602980&amp;x5step=1">ACS Agent Sandbox</a> (which provides hardware-level security isolation for agent workloads).</p><p>The infrastructure buildout serves a dual purpose for a product like <a href="https://www.happyhorse.com/">HappyHorse</a>. Running a 15-billion-parameter video generation model with integrated audio is extraordinarily compute-intensive, and having local infrastructure reduces latency for enterprise API calls while keeping customer data within regulatory boundaries. For European buyers operating under the European Commission&#x27;s new tech sovereignty framework — published June 3 with the explicit goal of protecting the bloc&#x27;s &quot;digital independence&quot; — the ability to run AI video generation workloads on locally hosted infrastructure is not a luxury. It is increasingly a compliance requirement.</p><h2><b>The Pentagon listing and geopolitical risk loom over Alibaba&#x27;s Western ambitions</b></h2><p>Alibaba&#x27;s global push is unfolding under significant geopolitical headwinds that enterprise buyers cannot afford to ignore. The <a href="https://www.cnbc.com/2026/06/09/alibaba-baidu-byd-named-on-pentagons-china-military-list-.html">Pentagon added Alibaba</a>, along with BYD and Baidu, to its list of Chinese military companies on June 8, preventing them from securing U.S. defense contracts. Alibaba rejected the designation, saying it is &quot;not a Chinese military company nor part of any military-civil fusion strategy.&quot;</p><p>The listing does not automatically trigger sanctions, and it does not directly restrict commercial transactions between private U.S. companies and Alibaba. But it adds a layer of reputational and regulatory complexity to procurement decisions, particularly for companies with U.S. government exposure, defense supply chain connections, or transatlantic operations. Enterprise technology purchases are rarely evaluated on technical merit alone — vendor risk assessments, board-level compliance reviews, and geopolitical scenario planning all factor into buying decisions for cloud infrastructure and AI tooling.</p><p>For European customers specifically, the calculus is layered in a different way. The continent&#x27;s growing emphasis on digital sovereignty cuts in two directions simultaneously: it creates demand for alternatives to the dominant U.S. hyperscalers (<a href="https://aws.amazon.com/">Amazon Web Services</a>, <a href="https://azure.microsoft.com/en-us">Microsoft Azure</a>, and <a href="https://cloud.google.com/">Google Cloud</a> control roughly 70 percent of European cloud infrastructure revenue, according to Synergy Research Group), but it also raises questions about whether a Chinese provider represents a meaningful improvement in strategic autonomy. Alibaba&#x27;s strategy of building sovereignty-compliant infrastructure in-market is a direct attempt to answer that question — but the Pentagon listing ensures it will be asked repeatedly.</p><h2><b>What enterprise teams should watch as the AI video market consolidates</b></h2><p>The practical implications of <a href="https://www.happyhorse.com/">HappyHorse 1.1</a> for enterprise teams are substantial. HappyHorse supports four modes of generation — text-to-video, image-to-video, subject-to-video, and the newly added video editing — covering the full spectrum of commercial video needs from ideation through production to post-production, all with integrated audio at no additional cost. That breadth of capability, delivered through a single API endpoint, simplifies what has historically been a fragmented and expensive production pipeline.</p><p>The question going forward is whether Alibaba can convert benchmark dominance and competitive timing into durable enterprise relationships. The company plans to release HappyHorse through Alibaba Cloud Model Studio with full enterprise SLAs, security certifications, and regional compliance — the table stakes that separate research breakthroughs from production-grade services. Watch for customer disclosures, usage metrics, and whether third-party platforms like fal.ai and Atlas Cloud (which already host HappyHorse 1.0) update to the 1.1 version quickly, which would signal genuine developer demand beyond Alibaba&#x27;s own ecosystem.</p><p>The AI video generation market entered 2026 with three credible enterprise contenders. One is dead. One is frozen. And the one still standing is a Chinese company backed by $52.7 billion in infrastructure spending, ranked No. 2 across every major independent benchmark, and offering a 40% discount to anyone willing to place the bet. In enterprise technology, the best product does not always win — but it rarely loses when the competition has already left the field.</p><p>
</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Business</category>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5KFpqkXqsJ1UadPksN3wpB/437fe886256a70c820f5e152f0512430/Nuneybits_Vector_art_of_cheerful_horse_trotting_across_computer_f02e9dc0-d6b4-4a8c-b0f8-de134058b9c8.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[No Claude Fable 5? No problem: Sakana achieves frontier performance with new Fugu multi-model, auto synthesis system]]></title>
            <link>https://venturebeat.com/orchestration/no-claude-fable-5-no-problem-sakana-achieves-frontier-performance-with-new-fugu-multi-model-auto-synthesis-system</link>
            <guid isPermaLink="false">5CzhsFGdeqZF7g6AWNwWC9</guid>
            <pubDate>Mon, 22 Jun 2026 16:13:00 GMT</pubDate>
            <description><![CDATA[<p>Last night, the increasingly enterprise-focused AI startup <a href="https://sakana.ai/fugu/">Sakana launched Fugu</a>, a multi-agent orchestration system that delivers frontier-level AI performance through a single, OpenAI-compatible API. </p><p>Designed for developers, enterprises, and nations seeking resilience against vendor lock-in and geopolitical export controls, Fugu (Japanese for &quot;pufferfish&quot;), bypasses the traditional monolithic model structure by dynamically routing queries to a swappable pool of specialized AI agents. </p><p>Sakana CEO and co-founder David Ha, formerly of Google Brain, positioned Fugu as a more reliable option for enterprise workflows than any single AI model provider in the wake of<a href="https://venturebeat.com/technology/anthropic-blocks-all-public-access-to-claude-fable-5-mythos-5-following-us-government-order-what-enterprises-should-do"> Anthropic&#x27;s move on June 12 to revoke public access</a> to its most powerful models, Claude Mythos 5 and Claude Fable 5, in the wake of a U.S. government export control order. As <a href="https://x.com/hardmaru/status/2068884466056225025">Ha wrote in a post today on X:</a></p><blockquote><p>&quot;Fugu dynamically orchestrates the world’s best models to tackle complex tasks. We are proving that a well-orchestrated pool of swappable agents can match restricted frontier models like Fable and Mythos.

But Fugu is about more than just performance. I believe that Orchestration Models are the next frontier, beyond bigger models.

Relying on a single company’s model for national infrastructure is a massive risk. As recent export controls have shown, access to top models can disappear overnight.

Collective intelligence is the practical hedge against this concentration of power. Fugu simply routes around vendor restrictions by relying on an entirely swappable agent pool.&quot;</p></blockquote><p>Sakana AI explicitly states that the specific models Fugu selects and how it coordinates them are proprietary, meaning this routing information is hidden from the user by design. The documentation only refers generally to a &quot;diverse pool of powerful models,&quot; &quot;multiple LLMs,&quot; or &quot;specialized models&quot; without providing a specific count.</p><p>By acting as a sophisticated coordinator rather than a standalone foundation model, Fugu matches the output quality of top-tier models like Fable and Mythos on third-party benchmarks of agentic tasks, while fundamentally altering how developers deploy critical AI infrastructure.</p><h2><b>How Sakana Fugu works and where it beats Anthropic&#x27;s Claude Fable 5</b></h2><p>At its core, Sakana Fugu operates like a master general contractor. When presented with a complex request, Fugu does not attempt to execute every step itself. </p><p>Instead, it breaks the problem down, delegates sub-tasks to a pool of expert foundation models, verifies their work, and synthesizes the final output.</p><p>&quot;Fugu is itself an LLM, trained to call various LLMs in an agent pool, including instances of itself recursively,&quot; the Sakana AI team noted in their technical release. </p><p>Grounded in two of Sakana&#x27;s 2026 research papers, <a href="https://sakana.ai/trinity/">TRINITY</a> and the <a href="https://sakana.ai/learning-to-orchestrate/">Conductor</a>, the system autonomously manages the entire lifecycle of model selection and verification using learned coordination strategies rather than hand-designed workflows. To the end user, this multi-agent swarm is entirely abstracted behind a standard API endpoint.</p><p>Sakana AI is offering two variants of the system to cater to different operational workloads:</p><ul><li><p><b>Fugu:</b> A high-speed, low-latency model optimized for everyday tasks. It is designed to act as the default engine for interactive chatbots and integrates directly into coding environments like Codex.</p></li><li><p><b>Fugu Ultra:</b> The flagship tier engineered for complex, high-stakes tasks such as AI research, cybersecurity analysis, and multi-step patent investigations. According to Sakana, Fugu Ultra coordinates a deeper pool of experts and matches industry-leading monolithic models across rigorous scientific and reasoning benchmarks.</p></li></ul><p>Additionally, on the pay-as-you-go plan, standard Fugu charges a dynamic rate based on the specific underlying models activated, whereas Fugu Ultra utilizes a fixed pricing structure starting at $5 per million input tokens and $30 per million output tokens.</p><p>As indicated by benchmark charts shared by Sakana, Fugu actually exceeds the performance of Anthropic&#x27;s Claude Fable 5 on <a href="https://huggingface.co/blog/leaderboard-livecodebench">LiveCodeBench</a>, an open source benchmark testing coding performance on regularly refreshed, software problem-solving tasks (Fugu Ultra: 93.2, Fugu: 92.9, Fable: 89.8), and beats the prior Claude Mythos Preview model on <a href="https://epoch.ai/benchmarks/gpqa-diamond">GPQA-D (Diamond)</a> , a test of 198 graduate-level multiple-choice questions in biology, physics, and chemistry (Fugu Ultra: 95.5, Fugu: 95.5, Mythos Preview: 94.6).</p><p>By orchestrating multiple models from different providers, Fugu essentially builds native redundancy into the AI stack. If one provider suffers an outage or faces sudden regulatory restrictions, Fugu routes around the disruption to maintain uptime.</p><h2><b>Licensing and availability</b></h2><p>Fugu is offered as a commercial, proprietary API service, not an open-source framework. </p><p>Because Sakana’s core intellectual property lies in its non-obvious collaboration patterns, the specific routing information—meaning exactly which underlying models Fugu selects for a given query—remains proprietary and is intentionally hidden from the user.</p><p>However, Sakana offers critical controls for enterprise data compliance. Developers can explicitly opt specific models or providers out of their Fugu routing pool to maintain strict corporate privacy standards. </p><p>Additionally, users can opt out of having their prompts used for future training data. Geographically, Fugu is restricted from operating within the European Union (EU) and European Economic Area (EEA) while Sakana works to align its black-box data routing architecture with GDPR regulations.</p><h2><b>Pricing is fairly steep</b></h2><p>Fugu is available immediately in most regions—with the temporary exception of the EU and EEA—at subscription tiers and pay-as-you-go pricing.</p><p>Teams can opt for monthly <a href="https://sakana.ai/fugu/">subscription allowances </a>designed for individual or hands-on use: a Standard tier at $20/month for lightweight workflows, a Pro tier at $100/month providing 10x standard usage, and a Max tier at $200/month offering 20x usage for continuous, long-running tasks. I wasn&#x27;t able to find the actual amount of tokens covered under these plans, but I&#x27;ve reached out to Ha on X for more information.</p><p>As part of the initial rollout, Sakana is offering a free second month for users who subscribe to any tier by July 31, 2026.</p><p>For enterprise scaling and production deployments, Sakana offers an elastic pay-as-you-go plan. Crucially for high-stakes environments, requests made under this consumption-based model are served at a higher priority than those from monthly subscription plans. </p><p>Under this framework, the standard Fugu engine charges the single rate of the highest-tier underlying model involved in a query, without ever stacking multi-agent fees. The flagship Fugu Ultra tier (fugu-ultra-20260615) utilizes a fixed pricing structure per one million tokens: $5 for input, $30 for output, and $0.50 for cached input. These rates increase to $10, $45, and $1.00 respectively for extreme workloads utilizing context windows above 272K tokens. That puts it among the more expensive options compared to single AI models via provider APIs:</p><h1><b>VentureBeat Frontier AI Model API Pricing Snapshot</b></h1><table><tbody><tr><td><p><b>Model</b></p></td><td><p><b>Input</b></p></td><td><p><b>Output</b></p></td><td><p><b>Total Cost</b></p></td><td><p><b>Source</b></p></td></tr><tr><td><p>MiMo-V2.5 Flash</p></td><td><p>$0.10</p></td><td><p>$0.30</p></td><td><p>$0.40</p></td><td><p>Xiaomi MiMo</p></td></tr><tr><td><p>deepseek-v4-flash</p></td><td><p>$0.14</p></td><td><p>$0.28</p></td><td><p>$0.42</p></td><td><p>DeepSeek</p></td></tr><tr><td><p>deepseek-v4-pro</p></td><td><p>$0.435</p></td><td><p>$0.87</p></td><td><p>$1.305</p></td><td><p>DeepSeek</p></td></tr><tr><td><p>MiniMax-M3</p></td><td><p>$0.30</p></td><td><p>$1.20</p></td><td><p>$1.50</p></td><td><p>MiniMax</p></td></tr><tr><td><p>Gemini 3.1 Flash-Lite</p></td><td><p>$0.25</p></td><td><p>$1.50</p></td><td><p>$1.75</p></td><td><p>Google</p></td></tr><tr><td><p>Qwen3.7-Plus</p></td><td><p>$0.40</p></td><td><p>$1.60</p></td><td><p>$2.00</p></td><td><p>Alibaba Cloud</p></td></tr><tr><td><p>MiMo-V2.5</p></td><td><p>$0.40</p></td><td><p>$2.00</p></td><td><p>$2.40</p></td><td><p>Xiaomi MiMo</p></td></tr><tr><td><p>Grok 4.3 (low context)</p></td><td><p>$1.25</p></td><td><p>$2.50</p></td><td><p>$3.75</p></td><td><p>xAI</p></td></tr><tr><td><p>MiMo-V2.5 Pro (≤256K)</p></td><td><p>$1.00</p></td><td><p>$3.00</p></td><td><p>$4.00</p></td><td><p>Xiaomi MiMo</p></td></tr><tr><td><p>Kimi-K2.6</p></td><td><p>$0.95</p></td><td><p>$4.00</p></td><td><p>$4.95</p></td><td><p>Moonshot</p></td></tr><tr><td><p>GLM-5.2</p></td><td><p>$1.40</p></td><td><p>$4.40</p></td><td><p>$5.80</p></td><td><p>Z.ai</p></td></tr><tr><td><p>Grok 4.3 (high context)</p></td><td><p>$2.50</p></td><td><p>$5.00</p></td><td><p>$7.50</p></td><td><p>xAI</p></td></tr><tr><td><p>MiMo-V2.5 Pro (&gt;256K)</p></td><td><p>$2.00</p></td><td><p>$6.00</p></td><td><p>$8.00</p></td><td><p>Xiaomi MiMo</p></td></tr><tr><td><p>Qwen3.7-Max</p></td><td><p>$2.50</p></td><td><p>$7.50</p></td><td><p>$10.00</p></td><td><p>Alibaba Cloud</p></td></tr><tr><td><p>Gemini 3.5 Flash</p></td><td><p>$1.50</p></td><td><p>$9.00</p></td><td><p>$10.50</p></td><td><p>Google</p></td></tr><tr><td><p>Gemini 3.1 Pro Preview (≤200K)</p></td><td><p>$2.00</p></td><td><p>$12.00</p></td><td><p>$14.00</p></td><td><p>Google</p></td></tr><tr><td><p>GPT-5.4</p></td><td><p>$2.50</p></td><td><p>$15.00</p></td><td><p>$17.50</p></td><td><p>OpenAI</p></td></tr><tr><td><p>Gemini 3.1 Pro Preview (&gt;200K)</p></td><td><p>$4.00</p></td><td><p>$18.00</p></td><td><p>$22.00</p></td><td><p>Google</p></td></tr><tr><td><p>Claude Opus 4.8</p></td><td><p>$5.00</p></td><td><p>$25.00</p></td><td><p>$30.00</p></td><td><p>Anthropic</p></td></tr><tr><td><p>GPT-5.5</p></td><td><p>$5.00</p></td><td><p>$30.00</p></td><td><p>$35.00</p></td><td><p>OpenAI</p></td></tr><tr><td><p><b>Sakana Fugu Ultra</b></p></td><td><p><b>$5.00</b></p></td><td><p><b>$30.00</b></p></td><td><p><b>$35.00</b></p></td><td><p><b>Sakana AI</b></p></td></tr><tr><td><p>Claude Fable 5 / Claude Mythos 5</p></td><td><p>$10.00</p></td><td><p>$50.00</p></td><td><p>$60.00</p></td><td><p>Anthropic</p></td></tr></tbody></table><p>Developers modeling operational costs should also note a significant architectural caveat in how Fugu bills for its multi-agent capabilities. According to the developer documentation, Fugu Ultra’s API responses include detailed usage fields that separate user-visible token generation from internal orchestration work. The background tokens consumed and generated when Fugu delegates sub-tasks, verifies code, or routes between underlying agents are not absorbed by the provider; they represent real token usage and are counted toward the final price of the request at standard rates.</p><h2><b>The Orchestration landscape: Fugu vs. The Field and notable benchmark performance</b></h2><p>To understand Fugu’s position in the mid-2026 AI ecosystem, it is critical to distinguish between <i>model routing</i> and <i>multi-agent orchestration</i>. </p><p>Over the past year, enterprise adoption of standard routing platforms—such as Not Diamond, Martian, and the open-source RouteLLM framework—has skyrocketed. These systems act as intelligent air traffic controllers; using semantic classifiers or meta-models, they analyze an incoming prompt and predict which single foundation model will yield the highest quality or most cost-effective response, dispatching the query accordingly.</p><p>Fugu operates on a fundamentally different paradigm. Rather than making a one-shot routing decision, Fugu aligns more closely with complex multi-round systems like Router-R1 (a framework introduced at NeurIPS 2025). It breaks a query down, interleaves reasoning with delegation, and dynamically assigns sub-tasks to multiple models in parallel or sequence before synthesizing a final output.</p><p>While frameworks like LangGraph, CrewAI, and Microsoft AutoGen offer developers the tools to build similar multi-agent systems, they require immense manual configuration—defining roles, setting up conditional edges, and managing state across long-running loops. </p><p>Fugu abstracts this operational overhead entirely. It is essentially a LangGraph-style workflow packaged as a single, black-box API endpoint.</p><p>An orchestration system is ultimately bounded by the raw capabilities of the underlying models in its pool, a reality reflected in Sakana’s own benchmark testing against standalone frontier models.</p><p>On rigorous coding and agentic tasks, collective intelligence shows a distinct advantage over standard models. Fugu Ultra posted a <b>73.7 on SWE-Bench Pro</b>, significantly outperforming Anthropic&#x27;s Claude Opus 4.8 (69.2) and OpenAI&#x27;s GPT-5.5 (58.6). </p><p>However, Fugu is not a silver bullet, and its performance is not a clean sweep across the board. When compared to highly specialized or restricted-access monolithic models, Fugu occasionally trails:</p><ul><li><p><b>SWE-Bench Pro:</b> While Fugu Ultra (73.7) beat most accessible models, it was comfortably eclipsed by Anthropic’s limited-access Fable 5 (80.0), which is currently absent from Fugu&#x27;s swappable pool due to the U.S. government&#x27;s export control order and Anthropic&#x27;s subsequent response to remove the model entirely from global usage. </p></li><li><p><b>Humanity&#x27;s Last Exam:</b> Fugu Ultra (50.0) narrowly edged out Opus 4.8 (49.8), but again fell short of Fable 5 (53.3).</p></li><li><p><b>Long-Context and Security:</b> On the MRCRv2 long-context-recall test, OpenAI&#x27;s GPT-5.5 maintained the lead (94.8 vs Fugu Ultra&#x27;s 93.6), and Opus 4.8 remained the top performer on the CTI-REALM cybersecurity benchmark (69.6 vs Fugu Ultra&#x27;s 69.4).</p></li></ul><p>The quantitative data points to a clear conclusion: Fugu is highly effective at boosting performance on messy, multi-step tasks (like writing a complex HTML5 game from scratch) by leaning on the combined strengths of multiple mid-tier and high-tier models. </p><p>However, for sheer brute-force reasoning within a single, highly constrained domain, the industry&#x27;s largest standalone models still hold the edge—provided an enterprise can maintain uninterrupted access to them.</p><h2><b>Background on Sakana&#x27;s formation and noteworthy achievements to date</b></h2><p><a href="https://venturebeat.com/ai/what-you-need-to-know-about-sakana-ai-the-new-startup-from-a-transformer-paper-co-author">Sakana AI was formed in Tokyo in 2023 </a>by Llion Jones, a co-author of Google’s foundational 2017 &quot;Attention Is All You Need&quot; paper, and David Ha, the former head of research at Stability AI. </p><p>Disillusioned by large tech company bureaucracy and the industry&#x27;s hyper-fixation on scaling single, massive foundational models, the founders built Sakana around principles of biomimicry and evolutionary computing.</p><p>The company&#x27;s name, derived from the Japanese word for fish, reflects its core technical thesis: utilizing collective &quot;swarm&quot; intelligence rather than brute-force compute. Following a $2.6 billion Series B valuation in late 2025 and <a href="https://venturebeat.com/technology/when-deep-research-isnt-enough-for-your-business-sakana-ai-launches-ultra-deep-research-agent-for-100-page-reports-in-8-hours">the recent June 2026 launch of Marlin</a>—an autonomous, eight-hour research agent for the B2B sector—Fugu represents the commercialization of Sakana&#x27;s multi-agent routing technology for everyday developers.</p><h2><b>A mixed reception among the broader AI community online</b></h2><p>The developer community has responded to Fugu by rigorously testing its practical tradeoffs, weighing its routing efficiencies against the sheer power of monolithic foundation models.</p><p>AI observer, developer and influencer <a href="https://x.com/ChrissGPT/status/2068904825685787083?s=20">Chris (@ChrissGPT on X)</a> highlighted the specific utility of Fugu over raw foundational AI. </p><p>&quot;For a single clean prompt, you probably would [use Fable 5, Mythos, or GPT-5.5 directly],&quot; he noted, but argued that Fugu&#x27;s true value emerges in messy, multi-step environments. &quot;...whether it involves delegation, verification, synthesis, code review, research loops, security analysis... the more it would make sense to use this,&quot; he wrote.</p><p>Chris also pointed out the strategic geopolitical advantage of Fugu&#x27;s architecture, noting that if frontier AI access is abruptly revoked due to regulation or export controls, an orchestrator can dynamically swap models to prevent a total system failure.</p><p>Creative agency owner <a href="https://x.com/markksantos/status/2068962823007285628?s=20">Mark Santos (@markksantos) </a>of Mark Studios provided a direct, real-world comparison by tasking both Fugu Ultra and Claude Opus 4.8 with building a &quot;Crossy Road&quot; game clone using Three.js. The results underscored the operational differences between an orchestrator and a monolithic giant:</p><ul><li><p><b>Sakana Fugu Ultra:</b> Completed the task in 22 minutes using ~89,000 tokens for roughly $7.32. However, the final game suffered from minor logic errors, such as inverted directional turns and wonky camera angles.</p></li><li><p><b>Claude Opus 4.8:</b> Took 79 minutes, burned ~940,000 tokens for nearly $37.85, and got stuck in a retry loop requiring human intervention. Despite the inefficiency, it ultimately produced superior application design and functionality.</p></li></ul><p>Santos concluded the experiment by stating, &quot;In terms of application functionality, quality, and design, Opus won. In terms of model speed and performance, Fugu... won&quot;.</p><p>Elie Bakouch, a research engineer at cloud-based, open AI infrastructure and systems provider <a href="https://www.primeintellect.ai/">Prime Intellect</a>, <a href="https://x.com/eliebakouch/status/2068939729811468503">pointed out on X</a> that &quot;to be clear, this is a closed source orchestrator on top of closed source models. if before you didn&#x27;t control the models, now you don&#x27;t even control which ones are used or how much. this is not &#x27;AI sovereignty&#x27;...&quot;</p><div></div><p>These early tests and reactions mirror the sentiment summarized by <a href="https://www.reddit.com/r/LLMDevs/comments/1uca8e3/comment/ot2k0kx/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button">Reddit user GreedyWorking1499</a> in initial platform discussions: &quot;<i>Until proven otherwise, this is just a highly advanced router/wrapper, not a fundamental not a fundamental leap in intelligence like Mythos/Fable was.</i>&quot;</p><p>Yet, as enterprises increasingly demand fail-safes against single-vendor reliance, Sakana is proving that packaging collective intelligence into a single API endpoint is a highly viable commercial path.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4kHEA49GjoZBqBlllf4GIv/0fd14ca57187e6d633ab33ceded01f69/ChatGPT_Image_Jun_22__2026__11_26_12_AM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Why agentic enterprises need to become learning systems]]></title>
            <link>https://venturebeat.com/orchestration/why-agentic-enterprises-need-to-become-learning-systems</link>
            <guid isPermaLink="false">6rPcvxuYi8OaBqUrNifhS7</guid>
            <pubDate>Mon, 22 Jun 2026 15:00:00 GMT</pubDate>
            <description><![CDATA[<p><i>Presented by Splunk</i></p><hr/><p>Every day, organizations learn things their AI systems never get to use.</p><p>A security analyst corrects an AI-generated investigation. A network engineer identifies the root cause of a recurring outage. An observability team discovers that a pattern of latency, logs and infrastructure changes predicts service degradation. A customer operations team learns which signals indicate an escalation is likely.</p><p>Each moment contains valuable organizational knowledge. But in most enterprises, that knowledge disappears into tickets, dashboards, chat threads, post-incident reviews and the minds of individual experts. It may help solve the immediate problem, but it rarely becomes part of a reusable system that improves future AI-driven decisions.</p><p>That is the next challenge for the agentic enterprise.</p><p>The future will not be defined simply by who has the most capable model or the most autonomous agents. Many organizations will have access to similar frontier models. Many will deploy agents across security, IT, engineering, customer service, and business operations.</p><p>The real differentiator will be whether those agents can learn from the organization around them.</p><p>Not by constantly retraining the underlying model, but by capturing operational experience, converting it into institutional knowledge and making that knowledge available to future agents, workflows, and decisions.</p><p>The agentic enterprise is not just an enterprise that uses AI. It is an enterprise that learns through AI.</p><h2>Agentic enterprises allow AI systems to learn from them</h2><p>The AI conversation has been dominated by model capability: larger context windows, better reasoning, faster inference, stronger tool use, and more sophisticated agentic behavior.</p><p>Those advances matter. But in the enterprise, a model is only one part of the system.</p><p>A model does not automatically know how a specific organization operates. It does not inherently know which remediation step solved last month’s outage, which analyst correction improved a threat investigation, which network signal preceded a service disruption, or which internal policy should override an otherwise plausible recommendation.</p><p>That knowledge belongs to the enterprise.</p><p>For agentic systems to improve, organizations need a way to capture that knowledge and make it reusable. In many cases, that does not require changing the model itself. It requires changing the ecosystem around the model: the knowledge base, retrieval layer, prompts, policies, guardrails, routing logic and workflows that shape how agents behave.</p><p>The model may remain the same. The learning system around it becomes smarter.</p><h2>Feedback loops turn every outcome into a teachable moment for agents</h2><p>Every agentic workflow creates signals.</p><p>An agent receives a request. It retrieves context, reasonsthrough possible actions, calls tools, and generates answers. A human accepts, rejects, or modifies that answer. Downstream systems reveal whether the action worked.</p><p>That entire chain is valuable.</p><p>AI observability gives organizations visibility into what happened: the prompt, response, reasoning path, tool calls, data sources, intermediate steps, failure modes and outcomes. Without that visibility, organizations cannot understand why an agent behaved the way it did, let alone improve it.</p><p>But observability alone is not enough.</p><p>The larger opportunity is to turn observed behavior into institutional knowledge. A trace should not only help a developer and operators debug an agent. It should help the enterprise understand what the agent learned, what the human corrected, what outcome followed, and what should change before the next similar event.</p><p>That is the shift from monitoring AI to teaching AI.</p><p>In the agentic enterprise, feedback loops connect action to outcome, outcome to knowledge and knowledge back to future action.</p><h2>A learning system in practice across security, observability and the network</h2><p>Consider a service experiencing intermittent degradation.</p><p>An observability agent detects unusual latency and error rates. A network agent identifies packet loss across a specific path. A security agent notices that the same time window includes suspicious authentication behavior and unusual traffic from a previously unseen source.</p><p>Individually, each agent has only a partial view. Together, they create a richer operational picture.</p><p>The first time this incident occurs, human experts may need to intervene. A network engineer confirms that packet loss was caused by a misconfigured routing change. A security analyst determines that the suspicious traffic was not an attack, but a side effect of a misrouted internal service. An SRE connects the network event to the application degradation.</p><p>That resolution contains knowledge the organization should not have to relearn.</p><p>A mature agentic learning system would capture the traces, human corrections, topology context, security findings, observability signals and final remediation steps. It would preserve the relationship between those signals: latency pattern, network path, identity behavior, routing change and remediation.</p><p>The next time a similar pattern appears, agents would not start from zero. They could retrieve the prior case, compare current conditions, recommend the proven diagnostic path and escalate with better context.</p><p>The underlying frontier model did not need to be retrained.</p><p>The enterprise learned.</p><h2>The architecture of the learning agentic enterprise</h2><p>A learning-oriented agentic enterprise needs more than a model or chatbot. It needs an architecture that can capture experience, turn it into usable knowledge, connect that knowledge to operational context, and govern how it changes future agent behavior.</p><p><b>Memory </b>preserves what happened: what the agent saw, what it did, where humans intervened, and what outcomes followed.</p><p><b>Knowledge bases</b> turn that experience into reusable guidance, including playbooks, examples, policies, procedures, and evidence.</p><p>A <b>data fabric </b>connects the operational environment. The signals agents need live across logs, metrics, traces, tickets, identity systems, security tools, network telemetry, collaboration platforms, and business applications. A data fabric makes those signals discoverable, correlated, governed, and usable in context.</p><p><b>AI observability </b>explains how agents behave by capturing prompts, tool calls, intermediate steps, responses, feedback, and outcomes. That visibility helps organizations understand where agents succeed, where they fail, and what should improve.</p><p>The <b>control plane</b> governs how learning becomes change: what knowledge is promoted, which prompts or policies are updated, which agents can use new information, what approvals are required, and how changes are audited.</p><p>Together, these capabilities allow AI systems to improve over time in a controlled, trustworthy way that allows the enterprise to learn from its own operations.</p><h2>The organizations that learn fastest will win </h2><p>The next era of AI will not be won by models alone. It will be won by organizations that can capture what they learn from every workflow, expert correction, incident, investigation, and outcome.</p><p>The most advanced agentic enterprises will not simply deploy more agents. They will build systems that allow every agent to benefit from the collective knowledge of the organization.</p><p>That means connecting operational data through a data fabric. It means observing agent behavior deeply enough to understand it. It means preserving experience in memory and institutionalizing it in knowledge bases. It means using a control plane to govern how learning changes agent behavior.</p><p>The future of AI is not a single autonomous agent acting alone. It is an ecosystem of agents, humans, data and controls that learns over time.</p><p>The organizations that build that ecosystem will create AI systems that get better with every interaction. Not because the model is constantly changing, but because the enterprise itself is becoming more intelligent.</p><p><i>Learn more about how </i><a href="https://www.splunk.com/ciscodatafabric"><i>Cisco Data Fabric powered by the Splunk Platform</i></a><i> is accelerating agentic operations.</i></p><p><i>Hao Yang is Vice President AI at Splunk, a Cisco Company.</i></p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/79J1LCPmW0K5edwrClgk73/f504c7a9a7e6de04e5654e27f17e30a1/Image.jpeg?w=300&amp;q=30" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[Researchers introduce Self-Harness, a framework that lets AI agents rewrite their own rules, boosting performance up to 60%]]></title>
            <link>https://venturebeat.com/orchestration/researchers-introduce-self-harness-a-framework-that-lets-ai-agents-rewrite-their-own-rules-boosting-performance-up-to-60</link>
            <guid isPermaLink="false">34BZWRl0WoShIvXTDpjvrF</guid>
            <pubDate>Mon, 22 Jun 2026 14:23:00 GMT</pubDate>
            <description><![CDATA[<p>Not every company can or should build their own frontier AI language model. However, the <i>harness</i> controlling the model is something that most enterprises can and <i>should</i> customize for their specific purposes.</p><p>Of course, this is easier said than done. A<!-- -->gent harnesses are still largely tuned through manual, ad hoc debugging — a process that relies heavily on intuition rather than systematic feedback loops, making it difficult to keep pace with rapidly evolving LLMs.</p><p>To solve this challenge, researchers at the Shanghai Artificial Intelligence Laboratory have introduced “<a href="https://arxiv.org/abs/2606.09498">Self-Harness</a>,” a new paradigm in which an LLM-based agent systematically improves its own operating rules. By examining its own execution traces to apply edits, the system trades manual guesswork for empirical evidence.</p><p>Self-improving harnesses can enable development teams to deploy robust custom agents that continually adapt their own execution protocols to overcome model-specific weaknesses.</p><h2><b>The challenge of harness engineering</b></h2><p>An LLM-based agent&#x27;s performance is not determined solely by its underlying base model, but also by its harness: the surrounding system that provides context and enables the model to interact with the environment. A harness includes components like system prompts, tools, memory, verification rules, runtime policies, orchestration logic, and failure-recovery procedures.</p><p>This layer is crucial because many common agent failures stem from the harness rather than the model. For example, an agent may report success without checking the model’s response (e.g., running the code to see if it passes the tests), or it might retry a failed action repeatedly. The harness is also responsible for preventing <a href="https://venturebeat.com/ai/mits-new-recursive-framework-lets-llms-process-10-million-tokens-without">context rot or overload</a> when the agent’s interaction history grows very large. Examples of popular harnesses include SWE-agent, Claude Code, Codex, and OpenHands.</p><p>Harness engineering remains a significant challenge, but the bottleneck isn&#x27;t necessarily that humans are too slow or incapable. </p><p>In fact, Hangfan Zhang, lead author of the Self-Harness paper, told VentureBeat that &quot;in many cases, an experienced engineer with deep domain knowledge can still propose better changes than an LLM can today.&quot;</p><p>Instead, the true bottleneck of manual engineering is that it relies heavily on ad hoc debugging rather than a verifiable, empirical feedback loop. &quot;The deeper issue is that the current harness-engineering paradigm often lacks a systematic feedback loop,&quot; Zhang explained. &quot;Many edits are made based on intuition, a few observed failures, or ad hoc debugging.&quot;</p><p>With new models being released at a rapid pace, depending on human intuition to manually tune model-specific harnesses becomes increasingly costly and untenable. While some approaches use stronger models to improve the harnesses of weaker target agents, this dependence on external guidance has its own challenges, as these models may be costly, unavailable for frontier models, or mismatched to the target model&#x27;s failure modes.</p><h2><b>How Self-Harness works</b></h2><p>The Self-Harness paradigm enables an LLM-based agent to improve its own harness without relying on human engineers or stronger external models.</p><p>This continuous self-evolution is driven by a three-stage iterative loop that turns behavioral evidence into harness updates:</p><ul><li><p><b>Weakness mining:</b> Starting from an initial harness, the agent runs a set of tasks, producing execution traces with verifiable outcomes. The agent categorizes failed traces and tries to detect model-specific failure patterns.</p></li><li><p><b>Harness proposal:</b> Based on these failure patterns, the agent uses a “proposer” role to generate a set of diverse yet minimal harness modifications, each tied to a specific failure mechanism to avoid overly general corrections.</p></li><li><p><b>Proposal validation:</b> The system evaluates candidate modifications through regression tests. An edit is promoted only if it improves performance without causing measurable degradation on held-out tasks. If multiple candidate modifications pass the regression tests, they are merged into the next version of the harness, which then serves as the starting point for the next iteration.</p></li></ul><p>To visualize why an enterprise would need this, imagine an automated issue-fixing agent that reads internal documentation, writes patches, and opens pull requests. If the company updates its documentation style, the agent might suddenly fail, pulling the wrong context or writing bad patches. </p><p>On the surface, the agent simply looks broken. But Self-Harness turns this ambiguous failure into a solvable problem. &quot;The failure traces expose where the agent is misusing the new documentation format; the proposer can generate a targeted harness edit... and the evaluator can decide whether that edit improves the failing cases without regressing other cases,&quot; Zhang said.</p><h2><b>Self-Harness in action</b></h2><p>The researchers evaluated Self-Harness on <a href="https://www.tbench.ai/">Terminal-Bench-2.0</a>, a benchmark that tests general tool-based execution, including artifact management, command use, verification behavior, and recovery from execution errors. They applied Self-Harness with MiniMax M2.5, Qwen3.5-35B-A3B, and GLM-5.</p><p>To isolate the impact of the self-evolving harness, they started with a minimal harness built upon the DeepAgent SDK, containing only the benchmark-facing system prompt, and the default filesystem and shell tools. The model backend, tool set, benchmark environment, and evaluator were kept unchanged while only the harness was allowed to vary.</p><p>The quantitative results show that <b>agents improved their performance through automated harness edits. </b>On held-out tasks, <b>performance jumped significantly across the board, ranging from 33 to 60 percent </b>relative improvements for different models.</p><p>Importantly, an explicit acceptance rule promotes only those edits that improve performance without introducing unacceptable regressions. What makes Self-Harness powerful for enterprise applications is that it doesn’t simply make the prompt longer or add generic instructions. Instead, it introduces targeted changes that reflect the recurring problems each model encounters during execution.</p><p>For example, under the baseline harness, MiniMax M2.5 would get stuck endlessly exploring dataset configurations until the execution environment timed out, failing to produce any deliverables. Through Self-Harness, the system identified this specific flaw and wrote a &quot;loop breaker&quot; into its runtime policy, forcing the agent to stop and redirect its approach after 50 tool calls. It also added a rule to create an initial version of required artifacts as early as possible.</p><p>On the other hand, Qwen-3.5 had a habit of hitting a file overwrite error and then blindly retrying the same command repeatedly, eventually deleting necessary files out of confusion before stopping. The self-harness fixed this by introducing a strict command-retry discipline (forbidding exact duplicate commands) and a mechanism that forced the agent to immediately recreate any missing artifacts if a file error occurred.</p><p>GLM-5 struggled to preserve environment changes across different commands, and would often waste time on massive downloads or finalize tasks even when sanity checks were failing. Its self-generated harness introduced rules instructing the agent to persist PATH variables across shell sessions, limit external compute, and repair any failed sanity checks before concluding its run.</p><h2><b>The hidden costs of automated harnesses</b></h2><p>While Self-Harness automates the tedious work of tracking down idiosyncratic model failures, decision-makers must be realistic about the trade-offs. Replacing human engineering with automated trial-and-error requires significant computational overhead.</p><p>&quot;Self-Harness replaces part of the human engineering burden with repeated proposal generation, parallel candidate evaluation, and regression testing,&quot; Zhang said. &quot;That can mean more API tokens, more latency during optimization, and more infrastructure for running evaluation tasks.&quot;</p><p>Also, this system relies on the accuracy of its evaluation pipeline. During their experiments on Terminal-Bench-2.0, the researchers relied on strict, deterministic verifiers to ensure the agent&#x27;s edits were actually helpful. Without this rigorous ground truth, an automated system risks promoting bad updates. &quot;[The] evaluation system is not an optional component; it is what lets us trade human intuition for empirical evidence,&quot; Zhang said.</p><p>This reliance on strict verifiers also dictates where Self-Harness should be deployed. &quot;The best deployment targets today are environments where failures can be measured and where trial-and-error is relatively safe,&quot; Zhang said, pointing to coding, internal workflow automation, and DevOps data pipelines as ideal use cases.</p><p>Conversely, enterprises should avoid fully automating harnesses in high-stakes or subjective fields. &quot;The clearest red flags are domains where evaluation is subjective, delayed, non-deterministic, or costly to get wrong, such as medical decision-making, safety-critical infrastructure, or legal decisions.&quot;</p><h2><b>From prompt tweakers to feedback architects</b></h2><p>The introduction of self-improving agents does not mean coding or enterprise workflows will suddenly become human-free. The quality of collaboration between the human engineer and the AI is still paramount and difficult to capture with automated benchmarks. </p><p>Instead, the engineering profession is moving up the abstraction layer. &quot;The role of enterprise engineers will shift from manually patching individual prompts or tool calls toward designing the feedback systems that make agent improvement possible,&quot; Zhang predicted. Moving forward, &quot;the engineer becomes less of a prompt tweaker and more of a feedback architect.&quot;</p><p>As foundational models grow more capable, they will naturally absorb many capabilities that currently require manual harness engineering. &quot;But once that happens, the harness will not disappear; its scope will move outward to connect the model to richer external environments,&quot; Zhang said. &quot;Until that boundary moves beyond what humans can evaluate, humans will remain critical providers of feedback.&quot;</p>]]></description>
            <author>bendee983@gmail.com (Ben Dickson)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/7evVer22ufMtmOwoZ1Cfuq/6222b39130f1015e2fc42f84df11c42c/self-improving_harness.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>