<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Mon, 22 Jun 2026 13:03:32 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[7,000 Langflow servers are under attack. LangGraph and LangChain have the same holes]]></title>
            <link>https://venturebeat.com/security/7000-langflow-servers-under-attack-langgraph-langchain-same-holes</link>
            <guid isPermaLink="false">4v4X3T0BLpo0EzJ33GND2V</guid>
            <pubDate>Fri, 19 Jun 2026 21:14:19 GMT</pubDate>
            <description><![CDATA[<p>Your AI agent did exactly what it was designed to do. The framework underneath it just handed an attacker a shell on the box that holds your OpenAI key, your database credentials, and your CRM tokens.</p><p>That is not a hypothetical. In a few months, three of the most widely deployed AI agent frameworks each turned a known, ordinary bug class into a way through. <a href="https://research.checkpoint.com/2026/from-sqli-to-rce-exploiting-langgraphs-checkpointer/">Check Point Research</a> chained a SQL injection in LangGraph’s SQLite checkpointer to full remote code execution. Tenable and VulnCheck tracked a path traversal in Langflow’s file upload endpoint to active, in-the-wild RCE. <a href="https://www.cyera.com/research/langdrained-3-paths-to-your-data-through-the-worlds-most-popular-ai-framework">Cyera</a> documented a path traversal in LangChain-core’s prompt loader that reads your secrets off disk. Two paths to a shell, one to your keys. They are the same bug, wearing three frameworks.</p><p>These frameworks became production infrastructure faster than anyone secured them. They store agent state, take file uploads, load prompt configs, and hold the credentials to databases, CRMs, and internal APIs. The edge tools watch traffic. The endpoint tools watch processes. Neither was built to treat an imported framework as a boundary worth guarding, and that blind spot is exactly where all three chains live, widening every week as these frameworks ship to production.</p><h2><b>The LangGraph chain, SQL injection to a Python shell</b></h2><p>Start with the one most teams pulled into production this quarter. LangGraph gives AI agents memory through checkpointers, the persistence layer that stores execution state. It has cleared over 50 million downloads a month. Yarden Porat of Check Point Research took that layer apart and found three vulnerabilities. Two of them chain to RCE.</p><p><a href="https://advisories.gitlab.com/pypi/langgraph-checkpoint-sqlite/CVE-2025-67644/">CVE-2025-67644</a>, rated CVSS 7.3, is a SQL injection in the SQLite checkpointer. The function that builds the WHERE clause for checkpoint lookups drops user-controlled filter keys straight into the query with no parameterization and no escaping. This does not hit everyone, but where it hits, it is serious. A deployment is exposed when it self-hosts LangGraph on the SQLite or Redis checkpointer and lets untrusted input reach get_state_history() or a similar history endpoint. Meet those conditions, and an attacker who controls the filter writes a fabricated row straight into the checkpoint table. Run LangChain’s managed LangSmith platform on PostgreSQL, and the exposure is gone.</p><p>Then <a href="https://advisories.gitlab.com/pypi/langgraph/CVE-2026-28277/">CVE-2026-28277</a>, CVSS 6.8, finishes the job. LangGraph’s msgpack checkpoint decoder rebuilds Python objects from the stored data, which lets it import a module and call a named function with attacker-supplied arguments. That step needs write access to the checkpoint store; the SQL injection is what grants it remotely. LangGraph loads the forged row as a legitimate checkpoint, the decoder runs the specified function, including os.system, and code executes under the identity of the agent server. A third issue, CVE-2026-27022, CVSS 6.5, reaches the same place through the Redis checkpointer.</p><p>There has been no confirmed exploitation in the wild yet. A working proof-of-concept is public in Check Point’s disclosure. The fixes are version bumps: langgraph-checkpoint-sqlite to 3.0.1, langgraph to 1.0.10, and langgraph-checkpoint-redis to 1.0.2.</p><h2><b>The Langflow chain, one unauthenticated request to RCE</b></h2><p>Langflow is the one already under attack. CVE-2026-5027, CVSS 8.8, is a path traversal in the POST /api/v2/files endpoint, which takes the filename straight from the form data and writes it to disk unsanitized. An attacker packs that filename with traversal sequences and drops a file anywhere, such as a cron job in /etc/cron.d/. Because Langflow ships with auto-login enabled in its default configuration, an exposed instance needs no credentials at all. A single unauthenticated request reaches the endpoint, and the next cron run hands over a shell.</p><p>VulnCheck’s Caitlin Condon confirmed exploitation on June 9: “Our Canaries observed exploitation of CVE-2026-5027 that successfully leveraged the path traversal to write what appear to be test files on victim systems.” Censys put roughly 7,000 exposed instances on the internet, most in North America. This is the third Langflow flaw to draw active exploitation this year, after <a href="https://www.probablypwned.com/article/langflow-cve-2025-34291-muddywater-account-takeover-rce">CVE-2025-34291</a>, which the Iranian state-sponsored group MuddyWater weaponized and which CISA added to its <a href="https://thehackernews.com/2026/05/cisa-adds-exploited-langflow-and-trend.html">Known Exploited Vulnerabilities catalog</a> in May. CVE-2026-5027 itself was patched in version 1.9.0, released April 15.</p><p>The timeline is what sets the clock. The patch shipped April 15. Attacks started in June, and <a href="https://www.thestack.technology/langflow-instances-are-getting-exploited-again/">VulnCheck added CVE-2026-5027 to its exploited-vulnerabilities list June 8</a> once its sensors caught the first in-the-wild hits. Every instance left unpatched between those two dates has been sitting in the open for almost two months. The lesson for security teams is to start the patch clock at disclosure, not at a federal catalog entry.</p><h2><b>The LangChain-core gap, arbitrary file reads through the prompt loader</b></h2><p>LangChain-core, the foundation under both, disclosed <a href="https://thehackernews.com/2026/03/langchain-langgraph-flaws-expose-files.html">CVE-2026-34070</a>, CVSS 7.5, a path traversal in its legacy prompt-loading API. The load_prompt() functions read a file path out of a config dict with no check against traversal sequences or absolute paths, so an attacker who influences that path reads arbitrary files the process can reach, including the .env file holding OPENAI_API_KEY and ANTHROPIC_API_KEY. Cyera paired it with CVE-2025-68664, CVSS 9.3, a deserialization flaw that resolves environment secrets through a crafted object. The fix versions differ, which matters when you patch: CVE-2026-34070 lands in <a href="https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAINCORE-15809257">langchain-core 1.2.22 and 0.3.86</a>; CVE-2025-68664 lands earlier in <a href="https://nvd.nist.gov/vuln/detail/CVE-2025-68664">1.2.5 and 0.3.81</a>. Clear both, or the higher-severity flaw stays live behind a patched one.</p><p>Three frameworks, three classic AppSec bugs. Path traversal. SQL injection. Unsafe deserialization. Nothing exotic, nothing AI-specific, just old vulnerabilities living inside new infrastructure. None of this is a frontier-model problem. It is plumbing, sitting in the layer where AI meets the enterprise.</p><h2><b>Why the scanner cannot see it</b></h2><p>Merritt Baer, CSO at <a href="https://www.enkryptai.com/">Enkrypt AI</a> and former deputy CISO at AWS, has named what makes this kind of failure hard to see coming. It does not announce itself as an AI problem. &quot;CISOs will experience MCP insecurity not in the abstract, but when an employee pastes sensitive data into a tool, or when an attacker finds an unauthenticated MCP server in your cloud,&quot; Baer told VentureBeat. &quot;It won&#x27;t feel like &#x27;AI risk.&#x27; It will feel like your traditional security program failing.&quot; The framework chains here are the same shape. An exposed Langflow instance is an unauthenticated server in your cloud, and the alert, if one fires, reads like an ordinary incident.</p><p>That is the gap in one sentence. The exploit lives in the framework your code imports. The WAF never sees a msgpack decoder running three layers down. The EDR watches the agent server make the same process calls it makes a thousand times a day and waves it through. Both tools are doing their job. Nobody scoped the framework itself as the thing that could turn on you. </p><p>The root cause is older than AI, and Baer names it. “MCP is shipping with the same mistake we’ve seen in every major protocol rollout: insecure defaults,” she told VentureBeat. “If we don’t build authentication and least privilege in from day one, we’ll be cleaning up breaches for the next decade.” Langflow’s auto-login is that mistake shipped. LangChain-core’s unguarded prompt loader is that mistake shipped. The convenient default is the vulnerability. And the moment an agent connects to anything, that risk compounds. “You’re not just trusting your own security, you’re inheriting the hygiene of every tool, every credential, every developer in that chain,” Baer said. “That’s a supply chain risk in real time.”</p><p>There is a governance failure layered on top of the technical one, and it is the same miscategorization Assaf Keren, chief security officer at Qualtrics and former CISO at PayPal, has flagged in adjacent tooling. “Most security teams still classify experience management platforms as ‘survey tools,’ which sit in the same risk tier as a project management app,” Keren told VentureBeat. “This is a massive miscategorization.” Swap in AI agent frameworks, and it still holds. Teams file LangGraph, Langflow, and LangChain under developer convenience, then wire them into databases, CRMs, and provider keys. “Security has to be an enabler,” Keren said, “or teams route around it.” These frameworks are what routing around it looks like.</p><p>Follow the money and it points at the same layer. On its <a href="https://www.fool.com/earnings/call-transcripts/2026/06/03/crowdstrike-crwd-q1-2027-earnings-transcript/">Q1 fiscal 2027 earnings call</a>, CrowdStrike reported its AI detection and response line up more than 250% sequentially, and on June 17 it <a href="https://www.crowdstrike.com/en-us/press-releases/crowdstrike-advances-ai-and-cloud-security-operations-on-aws/">extended that runtime coverage</a> to agent, LLM, and MCP traffic on AWS. George Kurtz, the company’s co-founder and CEO, named the reason in plain terms: “Agents run on the endpoint. They make tool calls, access files, invoke APIs, and move data at the process level.” That is the exact plumbing these chains abuse, and real money is now moving to the layer your AppSec scan skips.</p><h2><b>What to put in front of the board</b></h2><p>The board does not need the CVE numbers. It needs the consequence, and Keren draws the line the board cares about. Most teams have mapped the technical blast radius. “But not the business blast radius,” Keren told VentureBeat. “When an AI engine triggers a compensation adjustment based on poisoned data, the damage is not a security incident. It is a wrong business decision executed at machine speed.” A framework RCE is the same problem one layer earlier. The agent does not just leak a credential; it acts on production systems with it, and the business sees an outcome no one can explain.</p><p>So frame it the way a board frames it: we run AI agent frameworks in production that can be turned into remote shells through bugs our scanners are not built to find, all three are patched, one is under active attack, and here is the date every instance is verified and closed. None of this required custom malware or a zero-day.</p><h2><b>The six-question checklist</b></h2><p>Six trust boundaries, one per row, each with the question, the proof point, the command, the fix, and the board line. Run it tonight.</p><table><tbody><tr><td><p><b>Trust-Boundary Question</b></p></td><td><p><b>Proof Point</b></p></td><td><p><b>What Broke</b></p></td><td><p><b>Verify Before You Install</b></p></td><td><p><b>The Fix</b></p></td><td><p><b>Board Language</b></p></td></tr><tr><td><p><b>1. Can the agent&#x27;s state store be poisoned with code?</b></p></td><td><p>LangGraph SQLi-to-RCE chain. CVE-2025-67644 (CVSS 7.3) chains into CVE-2026-28277 (CVSS 6.8). PoC public, no in-the-wild use yet.</p></td><td><p>Filter keys interpolated into SQL with an f-string. Forged checkpoint row hits the msgpack decoder, which imports and runs an attacker-named callable.</p></td><td><p>pip show langgraph-checkpoint-sqlite. Below 3.0.1 = vulnerable. Confirm get_state_history() is not exposed to network input.</p></td><td><p>Upgrade langgraph-checkpoint-sqlite to 3.0.1, langgraph to 1.0.10, langgraph-checkpoint-redis to 1.0.2.</p></td><td><p>“Our agent memory layer can be tricked into running attacker code. Vendor has patched it. We are upgrading and confirming the endpoint is not exposed.”</p></td></tr><tr><td><p><b>2. Can an unauthenticated request write a file to our agent server?</b></p></td><td><p>Langflow CVE-2026-5027 (CVSS 8.8). On VulnCheck KEV (June 8). Active exploitation confirmed June 9. ~7,000 exposed instances (Censys).</p></td><td><p>Path traversal in POST /api/v2/files. Filename unsanitized. Auto-login on by default. Two HTTP calls drop a cron job and earn a shell.</p></td><td><p>Query Censys or Shodan for your Langflow, Flowise, n8n, and Dify instances on the perimeter. Check whether auto-login is enabled.</p></td><td><p>Upgrade Langflow to 1.9.0+. Disable auto-login. Pull AI dev tools behind VPN or zero-trust. Isolate port 7860.</p></td><td><p>“Our AI dev tools are reachable from the internet with login off. This exact flaw is under active attack now. We are pulling them behind access controls today.”</p></td></tr><tr><td><p><b>3. Can our prompt loader read files it should never touch?</b></p></td><td><p>LangChain-core CVE-2026-34070 (CVSS 7.5), path traversal in the prompt-loading API. Paired with deserialization CVE-2025-68664 (CVSS 9.3).</p></td><td><p>load_prompt() reads a config-supplied path with no traversal check, returning files such as the .env holding OPENAI_API_KEY and ANTHROPIC_API_KEY.</p></td><td><p>pip show langchain-core. Below 1.2.22 (1.x) or 0.3.86 (0.x) = vulnerable. Audit any code passing user-influenced paths to load_prompt().</p></td><td><p>Upgrade langchain-core past both fixes: 1.2.22 / 0.3.86 (CVE-2026-34070) and 1.2.5 / 0.3.81 (CVE-2025-68664). Replace load_prompt() with an allowlisted directory. Run as non-root.</p></td><td><p>“Our prompt system could be steered to read our API keys off disk. We are patching and removing the legacy loader.”</p></td></tr><tr><td><p><b>4. Does a compromised framework hand over every credential at once?</b></p></td><td><p>These frameworks are often deployed with provider keys, database credentials, and integration tokens available to the process environment. Cyera documents the credential-exfiltration path.</p></td><td><p>One RCE on the agent server exposes every secret the process can read. Blast radius is the full credential set, not one app.</p></td><td><p>Inventory which secrets each framework process can reach. Confirm keys come from a secrets manager, not static .env files.</p></td><td><p>Move provider keys to ephemeral injection. Rotate any key a vulnerable instance could have read. Scope each key to least privilege.</p></td><td><p>“A single break in one AI framework exposes the keys to every model and data store it touches. We are rotating and scoping them now.”</p></td></tr><tr><td><p><b>5. Are these frameworks running outside security governance?</b></p></td><td><p>A prior Langflow flaw, CVE-2025-34291, was weaponized by Iranian-linked MuddyWater and added to CISA KEV in May. Shadow AI is the new shadow IT.</p></td><td><p>Teams stand frameworks up for speed, give them credentials, and never bring them under review. The security team cannot see what it does not know exists.</p></td><td><p>Run a discovery sweep for AI frameworks outside change management. Map each to an owner and an approval record.</p></td><td><p>Assign every framework a documented owner and a place in the approval process. Offer a sanctioned alternative so teams do not route around you.</p></td><td><p>“We have AI frameworks in production that no one formally approved. We are bringing them under governance, not banning them.”</p></td></tr><tr><td><p><b>6. Can our scanners even see inside the framework at runtime?</b></p></td><td><p>Runtime detection is forming around this layer: CrowdStrike Falcon AIDR expanded to AWS June 17 (Bedrock, Kiro, Strands); its <a href="https://www.crowdstrike.com/en-us/press-releases/crowdstrike-expands-project-quiltworks-with-aws-hardening-the-cloud-attack-surface-against-frontier-ai-risk/">QuiltWorks coalition</a> now covers cloud workloads.</p></td><td><p>WAF reads HTTP at the edge. EDR watches the endpoint. By default, neither reliably models a msgpack decoder or a prompt loader three layers down in an imported framework as a separate trust boundary.</p></td><td><p>Test whether your AppSec scan covers third-party framework internals. Track CVEs by dependency, not just by what your edge tools can parse.</p></td><td><p>Add framework dependencies to vuln management. Treat agent output and stored state as untrusted. Patch on disclosure, not on KEV listing.</p></td><td><p>“Our scanners check our code, not the frameworks our code imports. We are closing that blind spot and patching on disclosure, not waiting for the federal catalog.”</p></td></tr></tbody></table><p><i>How to read this table: each row is one trust boundary, left to right, from the question to ask to the line to read your board.</i></p><h2><b>Give the board the deadline, not the technology</b></h2><p>The fixes are not a re-architecture. They are version bumps and config changes you can land this week. The exposure is the gap between the day the patch shipped and the day your team runs the checks, and right now that gap is measured in months. The frameworks did exactly what they were built to do. </p>]]></description>
            <author>louiswcolumbus@gmail.com (Louis Columbus)</author>
            <category>Security</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5CFo8mBoW1WjItcZvYyHpg/3172659c88b4856fe7137de54672ab16/hero.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Fine-tuning forgets. RAG leaks context. Hypernetworks build the model your agent needs on demand.]]></title>
            <link>https://venturebeat.com/orchestration/fine-tuning-forgets-rag-leaks-context-hypernetworks-build-the-model-your-agent-needs-on-demand</link>
            <guid isPermaLink="false">6O4fBfCpDsG1PifGNWI4Vg</guid>
            <pubDate>Fri, 19 Jun 2026 16:30:50 GMT</pubDate>
            <description><![CDATA[<p>Enterprise teams keep watching the same thing happen. An AI agent demos beautifully, goes to production, and stalls: it runs for a short stretch, then needs a human to top up its context and check its output, and the promised efficiency drains into supervision. The agent did the work; you did the watching. It’s one reason so many agent pilots never turn into production systems.</p><p>The pitch on the other side of that wall is the one every team wants to believe: an agent that runs a long job on its own, overnight if it has to, and leaves a person to validate only the last 10%. Whether that is achievable turns on a problem the orchestration conversation mostly skips. When AI firm Chroma tested 18 leading models, <a href="https://www.morphllm.com/context-rot">every one lost accuracy as its input grew</a>, a property of how attention works, not a gap a stronger model closes. An agent fed more and more of your business as it runs does not get steadier. It gets shakier.</p><p>This is the layer beneath the orchestration race. Routing, durable execution and observability all assume each agent is already competent enough to coordinate in the first place. The deeper question is how long an agent can run before a human has to step in, and that comes down to where your company&#x27;s knowledge lives relative to the model. Both standard fixes leave a human in the loop.</p><h2>Why teaching a model your business keeps you in the loop</h2><p>Frontier models keep getting more capable, and the gap does not close, because it is not a capability problem. It is about where your knowledge sits relative to the model, and enterprises have had <a href="https://venturebeat.com/ai/fine-tuning-vs-in-context-learning-new-research-guides-better-llm-customization-for-real-world-tasks">two ways</a> to place it there. </p><p>The first is fine-tuning, which bakes knowledge into the weights. It remains subject to catastrophic forgetting, a problem identified in the 1980s and <a href="https://www.emergentmind.com/topics/catastrophic-forgetting-in-language-models">still unresolved in 2026</a>: teaching a model something new tends to erode what it already knew. Teams work around it by isolating each task in its own fine-tuned model or adapter, which produces a sprawling estate of models that <a href="https://www.infoworld.com/article/4131242/researchers-propose-a-self-distillation-fix-for-catastrophic-forgetting-in-llms.html">raises cost and governance overhead</a>. And a fine-tuned model is a snapshot, stale the day a policy changes, when the expensive, slow retraining cycle starts over.</p><p>The second is in-context learning, which skips retraining by placing the relevant policies in the prompt at run time. This is where context rot bites. Retrieval narrows what goes into the prompt, but a retrieval miss looks identical to a confident answer, and both cost and latency climb with every token added.</p><p>The two failures rhyme. With fine-tuning, the model can be confidently working from last quarter&#x27;s policy. With in-context learning, it can be confidently working from a detail it lost in the middle of a long prompt. Either way the output looks equally assured, so you cannot tell which parts are wrong without checking all of them. That is why the human never gets to leave. Some teams often run both at once, fine-tuning the stable knowledge and retrieving the rest. That softens each failure but removes neither: on any given output you still cannot be sure the model is both current and working from the right context, so you still check it.</p><h2>A third path: generate the specialist model on demand</h2><p>A third approach is moving from research into early product. Instead of retraining one model or stuffing its prompt, a generator builds a small, task-specific model on demand from your policies, at inference time. The generator is a hypernetwork: a network whose output is the weights of another network. </p><p>The idea was <a href="https://arxiv.org/abs/1609.09106">named in 2016</a>; applying it to produce specialist language models from text or documents is recent and active. Sakana AI&#x27;s <a href="https://arxiv.org/abs/2506.06105">Text-to-LoRA</a>, presented at ICML 2025, generates a model adapter from a plain-language description in a single pass, and a 2026 system called SHINE calls hypernetwork adaptation <a href="https://arxiv.org/pdf/2602.06358">a promising new frontier</a>, precisely because it sidesteps both the retraining cost of fine-tuning and the context limits of prompting.</p><p>The point of generating adapters rather than training and storing them is to collapse a sprawling library of per-task LoRAs into one network that can produce them on demand, including for tasks it has not seen.</p><p>The elegant part is how this closes the loop on the problem above: the per-task adapter teams hand-build to dodge catastrophic forgetting is the same object a hypernetwork produces automatically. The model zoo stops being a governance headache and becomes a generated output.</p><p>The case for going small underneath all this was put most directly in a 2025 paper by <a href="https://arxiv.org/abs/2506.02153">Nvidia researchers</a>: for the narrow, repetitive tasks that fill agent workflows, small models are capable enough and 10 to 30 times cheaper to run than frontier generalists. Nace.AI, a Palo Alto company that raised a <a href="https://www.businesswire.com/news/home/20260505315897/en/">$21.5 million seed round in May</a>, is the clearest commercial instance. Its core technology, a generator it calls a MetaModel, <a href="https://nace.ai/research/enterprise-policy-injection-with-metamodels">produces parameter adaptations for a model at inference time</a> from a company&#x27;s policies, pointed at regulated work: audit, compliance, risk assessment. The company says its agents handle the bulk of a workflow while human experts validate the result, a split it markets as 90/10.</p><h2><b>How the three approaches compare</b></h2><table><tbody><tr><td><p>
</p></td><td><p><b>Fine-tuning</b></p></td><td><p><b>In-context / RAG</b></p></td><td><p><b>Hypernetwork-generated model</b></p></td></tr><tr><td><p><b>Where business knowledge lives</b></p></td><td><p>In the model&#x27;s weights</p></td><td><p>In the prompt, re-supplied each run</p></td><td><p>In on-demand generated weights</p></td></tr><tr><td><p><b>Cost to update on a policy change</b></p></td><td><p>High: retrain</p></td><td><p>Low: edit the source</p></td><td><p>Low: regenerate</p></td></tr><tr><td><p><b>Staleness</b></p></td><td><p>High: a snapshot</p></td><td><p>Low</p></td><td><p>Low: regenerated from current policy</p></td></tr><tr><td><p><b>Per-call cost and latency</b></p></td><td><p>Low</p></td><td><p>High, grows with context</p></td><td><p>Low at run time</p></td></tr><tr><td><p><b>Dominant failure mode</b></p></td><td><p>Forgetting; model-zoo sprawl</p></td><td><p>Context rot; silent retrieval misses</p></td><td><p>Generator quality; calibration</p></td></tr><tr><td><p><b>Who owns the improving asset</b></p></td><td><p>Whoever trains the model</p></td><td><p>Whoever holds the data store</p></td><td><p>Depends where generator and feedback live</p></td></tr></tbody></table><h2>Why a hypernetwork-built model raises the autonomy ceiling</h2><p>A model that is narrow, current and small has a smaller surface on which to be wrong. Fewer errors, confined to a known domain, mean fewer outputs an agent has to escalate to a person, which is the real basis for any high-autonomy claim. It is also where a number like 90/10 comes from: not a dial set in advance, but an outcome of how little the system needs to hand back. Reported autonomy shares are best read as measurements of an architecture, not as settings.</p><p>Two design choices decide whether that autonomy is trustworthy or merely fast. The first is grounding: tying every output to its source so a reviewer can verify rather than redo. Research models built for exactly this, such as <a href="https://arxiv.org/pdf/2510.00880">HalluGuard</a>, label each claim as supported or not and cite the passage they relied on. Nace ships its agents with grounding models and reasoning traces for the same reason. A 10% review only means something if the human can confirm provenance in seconds.</p><p>The second is the feedback loop, and it forces a question every buyer should ask: when your experts validate the output, whose model improves, and where does it live? That decides whether the compounding asset belongs to the vendor or to you. Arrangements differ. Nace, for instance, uses an external network of certified experts for some engagements and, for direct enterprise deployments, the customer&#x27;s own staff, with the resulting model kept inside the customer&#x27;s cloud. Each choice routes the learning, and the ownership, somewhere different.</p><h2>Where the third path breaks</h2><p>The approach is still early, and a few questions will decide how far it goes. Calibration is the linchpin: the value rests on the model knowing when it is unsure. And it is genuinely unsettled, recent work generating these adapters found they do not automatically improve calibration over ordinary fine-tuning, with gains appearing only under specific constraints. </p><p>The quality of the generated model also depends heavily on the policy data it is built from, which puts a premium on data curation. And scale is the open research frontier, the hypernetworks shown in published work so far have been small. This is where Nace&#x27;s own work gets interesting: in our interview, the company said it has scaled its generator well beyond those published sizes and derived a scaling law for how performance grows, results it has begun to share publicly and is now putting through peer review. If it holds up, it would help answer one of the central open questions in the field, and it is the paper worth watching.</p><p>Whichever approach wins, the work still ends at a human, and that handoff is its own design problem. When Deloitte Australia delivered a roughly A$440,000 government report, it <a href="https://www.theregister.com/2025/10/06/deloitte_ai_report_australia/">shipped with fabricated citations and an invented court quote</a> after passing senior review, because the reviewers checked the conclusions, which were sound, and not the provenance, which was not. Controlled research suggests the pattern is general: experts <a href="https://academic.oup.com/pnasnexus/article/5/6/pgag146/8703788">corrected an identical flawed recommendation less often when it was labeled AI-generated</a>. </p><p>The EU AI Act&#x27;s <a href="https://artificialintelligenceact.eu/article/14/">Article 14</a> now names this automation bias. The lesson is not about any one vendor: a high autonomy share concentrates human attention into a thin, late slice of the work, so the value of that review depends entirely on whether the human can check provenance fast, which loops back to grounding.</p><h2>What to build, and what to ask before you buy</h2><p>The honest takeaway: what holds your agents back is usually not orchestration or model size, but whether the model knows your business well enough to be left alone, and the right fix depends on the job. To automate a long, repetitive, high-volume process end to end, run most of your internal audit overnight and have your own experts check the final slice, a hypernetwork generated model is the approach most likely to do it cheaply and run long enough to matter. For a short task that finishes in a few steps and never needed to run unattended, the gap between this and a well-prompted frontier model shrinks to almost nothing, and is not worth the integration cost.</p><p>When a vendor pitches autonomous or specialist agents, four questions cut through it. </p><ol><li><p>Where does the business knowledge live: in the weights, the prompt, or generated on demand?</p></li><li><p>What does each output come with, so a reviewer can verify it instead of redoing it? </p></li><li><p>What decides which work gets escalated to a human? </p></li><li><p>And whose model improves from that feedback, and where does it run? </p></li></ol><p>The answers, not the headline ratio, tell you what you are buying.</p><p>The hypernetwork approach is the most credible attempt yet at making a small model know a specific business without forgetting it and without re-explaining it on every run. It is also the least proven, and the parts that matter most, calibration and scale, are still in peer review. For the right job, pilot it now. For the wrong one, the integration cost buys you little that a well-prompted frontier model wouldn&#x27;t.</p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5i664fldp1fQIBiyfnykGq/ac7e785b49b229db8798fd5a95b3f895/Gemini_Generated_Image_rwbmyvrwbmyvrwbm.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Anthropic's Claude Code Artifacts update brings live, shared dashboards and interactive workspaces to enterprises]]></title>
            <link>https://venturebeat.com/data/anthropics-claude-code-artifacts-update-brings-live-shared-dashboards-and-interactive-workspaces-to-enterprises</link>
            <guid isPermaLink="false">6Lq2CNIADedZTWB9Vn7sUh</guid>
            <pubDate>Thu, 18 Jun 2026 23:18:59 GMT</pubDate>
            <description><![CDATA[<p>Anthropic announced a potentially game-changing new feature for users of Claude Code on the Claude Team and Enterprise subscription plans: <a href="https://claude.com/blog/artifacts-in-claude-code">Artifacts</a>. </p><p>This update turns a Claude Code session&#x27;s work into a live, interactive, and shareable, custom HTML webpage, allowing a Claude Code user to plug in live code, multiple data sources, and have it surface on an interactive URL that they can send to other teammates — be it a dashboard, an app design, or some other product meant for internal usage. </p><div></div><p>These teammates and the original user can watch the webpage it update in real-time as Claude Code goes about its work autonomously or under the user&#x27;s guidance, and as the connected data sources and codebases change. </p><p>While Anthropic first introduced Artifacts to its consumer web chatbot in the summer of 2024—where it evolved from a manual toggle feature to a generally available tool for publishing code snippets and games to the web—integrating this capability directly into the Claude Code command-line interface (CLI) and desktop app bridges the gap between deep, back-end engineering and the non-technical stakeholders who need to understand it.</p><h2><b>Product and Technology: The End of the Status Update</b></h2><p>At its core, Claude Code Artifacts acts as a dynamic translation layer. Built directly from the unbroken context of a user’s session, the agent uses the local repository codebase, connected monitoring tools, and conversational reasoning to spin up specialized web pages. </p><p>Engineers no longer need to wire up external data sources or stand up temporary infrastructure; the AI builds the UI from what already exists.</p><p>Crucially, these web pages are not static exports. As the AI works through a terminal session, the open webpage refreshes in-place, updating charts and text instantly at the exact same URL. Every update publishes a new version history, allowing teammates to roll back or track the agent&#x27;s progress securely on desktop or mobile.</p><h2><b>The Battle of Live, Interactive, Shared AI Work Surfaces: Anthropic&#x27;s Claude Code Artifacts vs. OpenAI&#x27;s Codex Sites</b></h2><p>Anthropic&#x27;s update comes more than <a href="https://venturebeat.com/orchestration/openais-codex-update-lets-agents-build-interactive-enterprise-workspaces-via-sites-and-role-specific-plugins">two weeks after OpenAI released a massive update to its own Codex platform</a>, introducing a strikingly similar enterprise hosting feature called &quot;Sites&quot;. </p><p>This tit-for-tat product cadence highlights a rapidly escalating battle over the enterprise workspace across functions and beyond developers themselves, though there are some important technical and philosophical distinctions worth pointing out for enterprises considering either.  </p><p>As revealed in their respective developer documentation webpages, <a href="https://developers.openai.com/codex/sites">OpenAI</a> is building a platform-as-a-service; <a href="https://code.claude.com/docs/en/artifacts#share-session-output-as-artifacts">Anthropic</a> is building a stateless canvas.</p><p>OpenAI’s Sites is designed to generate durable, full-stack web applications. According to the platform&#x27;s documentation, Codex Sites hosts projects that output as Cloudflare Worker-compatible ES modules. </p><p>Crucially, Sites supports persistent backend infrastructure: agents can automatically wire up &quot;D1&quot; relational databases for structured data (like user progress or saved records) and &quot;R2&quot; object storage for file uploads. An OpenAI Site can support public sign-ins, integrate with external identity providers, and allows for highly specific access controls tailored to specific workspace groups. </p><p>It utilizes a two-stage publishing process—saving a reviewable candidate linked to a Git commit before officially deploying to production. In short, it is a production environment designed to replace functional internal SaaS tools.</p><p>Anthropic’s Claude Code Artifacts, by contrast, deliberately avoids the backend. The newly released documentation is blunt about its limitations: &quot;An artifact is a capture of work, not an application&quot;. </p><p>Each Artifact is a single, self-contained HTML page capped at a rendered size of 16 MiB. To guarantee organizational security, Claude wraps the published file in a strict Content Security Policy (CSP) that blocks all external network requests. T</p><p>his means the page cannot load external scripts, fonts, or stylesheets, and <code>fetch</code>, XHR, and WebSocket calls are completely blocked. All CSS and JavaScript must be inlined, and images must be embedded as data URIs. Artifacts cannot store form input, call an API at view time, or serve multiple routes.</p><p>This technical limitation is actually Anthropic&#x27;s deliberate philosophical position: While OpenAI wants to spin up persistent software portals for the whole company, Anthropic is keeping Claude Code firmly anchored in ephemeral, highly secure technical workflows. Claude Artifacts are <i>not</i> meant to be software; they are meant to replace whiteboard diagrams, manual bug walkthroughs, and status reports with secure, self-updating visual tools that never leak live data outside the corporate boundary.</p><h2><b>Licensing and Enterprise Security: Keeping the Codebase Private</b></h2><p>Because these agents sit at the nexus of proprietary company data and live codebases, licensing and access controls are a primary concern. </p><p>Both Anthropic and OpenAI have opted for closed, proprietary licensing models for these new visual workspaces. For end users and developers, the distinction is critical. Unlike permissive open-source software (such as MIT or Apache 2.0) or strict copyleft licenses (like GPL)—which grant developers the legal freedom to inspect, modify, and self-host the underlying code—neither Claude Code Artifacts nor Codex Sites can be independently forked or hosted. </p><p>Enterprise clients do not maintain code-level ownership over Anthropic&#x27;s rendering engine or Codex’s integration nodes; both operate strictly within their <i>respective creators&#x27; managed infrastructures.</i></p><p>To make this vendor-managed approach palatable to enterprise compliance teams, both companies have heavily prioritized organizational security. Anthropic ensures every artifact is private to its author by default and strictly cannot be made public to the broader internet. When an engineer chooses to share a link, it is viewable exclusively by authenticated members of their specific organization. System administrators retain ultimate authority, managing access through org-level toggles, role-based scoping, and explicit retention policies, while maintaining oversight through a centralized compliance API.</p><p>OpenAI takes a similarly gated approach with Codex Sites, rolling the feature out primarily for ChatGPT Business and Enterprise workspaces. Like Anthropic, OpenAI relies on system administrators to manage deployment through centralized workspace settings, requiring an admin to explicitly enable Sites via role-based access control (RBAC) for Enterprise tiers.</p><p>However, because Codex Sites functions more like a hosted web application, its access controls are slightly more granular. When an engineer prepares to share a deployed URL, they can apply specific access modes: restricting the site to just themselves and workspace admins, opening it to all active users in the workspace, or limiting access to custom user groups. </p><p>Furthermore, to prevent sensitive data leaks, OpenAI provides a dedicated Sites panel to manage runtime environment variables and secrets securely, ensuring those keys do not have to be committed to local source files.</p><h2><b>Reactions and Reflections</b></h2><p>The introduction of visual, self-updating UI layers to command-line agents is fundamentally altering how developers view their own workflows. As AI handles the raw syntax and automates the reporting, the friction of communicating technical work to stakeholders is vanishing.</p><p>Boris Cherny, the Lead and creator of Claude Code, highlighted the sheer utility of the update in a <a href="https://x.com/bcherny/status/2067700226669060207?s=20">post on X earlier today</a>: </p><p>&quot;I&#x27;ve been using Artifacts in Claude Code for everything: visual explanations of tricky code, system diagrams, quick previews of a few animation options, data analyses and dashboards I share with the team,&quot; Cherny wrote. &quot;They are a game changer for how I work with Claude. Can&#x27;t wait to hear what you think!&quot;</p><p>This sentiment is practically demonstrated in Anthropic’s launch materials. In one scenario, an engineer prompts Claude Code to investigate user drop-offs since a previous software release. </p><p>In a matter of seconds, the agent executes an SQL read, builds an interactive drop-off funnel dashboard, and diagnoses that &quot;Pro accounts stall at the export sheet&quot;. The AI then proposes UI fixes, updates the live charts as the code is refactored, and generates a secure link that a manager can instantly open via mobile.</p><p>By turning the terminal into a live, collaborative canvas, Anthropic is proving that the most valuable output of an AI coding assistant isn&#x27;t just the code itself—it is the context, the reasoning, and the ability to share that work instantly.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/2rGZZLXFe3kRoiPC50Ycwv/a32853493a91cece13baf19f06f38fbb/ChatGPT_Image_Jun_18__2026__07_12_17_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[New AI optimization framework beats Claude Code and Codex by 2.5x on the same compute budget]]></title>
            <link>https://venturebeat.com/orchestration/new-ai-optimization-framework-beats-claude-code-and-codex-by-2-5x-on-the-same-compute-budget</link>
            <guid isPermaLink="false">3lOjpmJB6qSEZAFnXyZLg1</guid>
            <pubDate>Thu, 18 Jun 2026 18:13:54 GMT</pubDate>
            <description><![CDATA[<p>Imagine your engineering team just deployed an AI agent to search through internal company documents and answer employee questions. It works perfectly in development, but in production, it consistently hallucinates or misses key constraints. Fixing this is rarely a simple patch. It requires a tedious, trial-and-error process of tweaking chunking strategies, retrieval methods, and system prompts simultaneously. Because these adjustments are entangled, it becomes nearly impossible to attribute which specific tweak actually solved the problem. </p><p>To address this challenge, researchers at Renmin University of China and Microsoft Research introduced <a href="https://arxiv.org/abs/2606.11926">Arbor</a>, a framework that upgrades AI-driven research and optimization from a sequence of trial-and-error guesses into a cumulative learning process. Arbor organizes hypotheses, experiments, and insights into a tree that helps the system learn from prior failures to make smarter, verified improvements over time.</p><p>In practical tests, Arbor delivered more than 2.5 times the verifiable performance gains of standard AI coding agents across real-world engineering tasks while operating under the same resource budget. </p><p>For enterprise AI, this technique directly translates to automating the continuous improvement of complex, real-world engineering systems.</p><h2>Understanding the bottleneck in autonomous optimization</h2><p>As large language models and AI systems become more capable, they are expected to carry out more complex operations such as autonomous optimization (AO) of software systems such as agent harnesses or model training algorithms. </p><p>AO captures the fundamental loop of autonomous research. An AI agent starts with an initial mutable artifact, such as a machine learning codebase or data pipeline, and a specific objective. The agent&#x27;s goal is to iteratively improve this artifact through experimental feedback without step-by-step human supervision.</p><p>The main challenge of AO is often misunderstood. Many engineering teams find that simply giving a coding agent more time or compute to optimize a codebase doesn&#x27;t lead to better results. &quot;Automation can keep an AI working for a very long time — but a loop is not the same as progress,&quot; Jiajie Jin, co-author of the paper, told VentureBeat. &quot;If the goal is vague, or the metric is easy to hack, long-running automation often just produces &#x27;improvements&#x27; faster that nobody actually wants.&quot;</p><p>Jin explains that complex tasks take many attempts to get right, and standard agent architectures are missing the critical data structure to maintain state. &quot;How do you make sure the insight and experience from each attempt actually accumulate, instead of getting lost in a scrollback buffer?&quot; he said. Without this structure, agents simply repeat the same mistakes.</p><p>Current agent systems can run experiments for many hours against well-specified goals: editing code, invoking tools, running tests autonomously. But they treat each attempt in isolation, missing the structural mechanisms that would let them accumulate and act on what they&#x27;ve learned.</p><p>They lack the capacity to simultaneously maintain and compare multiple competing research directions. Without this, they cannot interpret both successes and failures to reshape their future exploration, which is the core mechanism that makes human research cumulative.</p><p>General coding agents typically rely on conversation transcripts for their memory. Because AO tasks span hundreds of turns and easily exceed context window limits, these agents struggle to preserve and reuse factual evidence over long histories. As a result, they lose the overarching structure of the research process and are prone to stalling on early failures or chasing noisy evaluation swings. The system needs a structured, durable memory that records what directions have been tried, what factual evidence was produced, and how each result changes the space of future hypotheses.</p><p>Existing frameworks are also prone to reward hacking and overfitting to development metrics. This makes them create the illusion of progress without producing improvements that transfer to real-world performance.</p><p>Finally, general-purpose coding agents typically chain their tool calls on a single shared working tree. This architectural limitation prevents them from testing parallel hypotheses in isolated environments without corrupting the main codebase or obscuring which hypothesis caused a specific outcome.</p><h2>The Arbor framework</h2><p>Arbor solves the challenges of AO with a framework that automates the long-horizon loop of exploration, experimentation, and abstraction that characterizes human research. Arbor separates the strategic direction of research from the ground-level coding tasks with two key components:</p><p><b>The coordinator:</b> A long-lived AI agent that acts like a principal investigator. It never directly edits the target codebase. Instead, it owns the general state of the optimization research, observes accumulated evidence, comes up with new hypotheses and directions to explore, and decides what to do with the results of experiments.</p><p><b>Executors:</b> Short-lived, highly focused AI agents. When the coordinator wants to test an idea, it spins up an executor and places it in an isolated environment, essentially a fresh git worktree. Each executor is handed one hypothesis. It implements the assigned idea, runs evaluations, debugs errors, and reports back to the coordinator with the results and created artifacts.</p><p>These two components collaborate through a mechanism that the researchers call “Hypothesis Tree Refinement” (HTR). HTR represents the entire research process as a persistent, branching tree where every node binds together four things: a hypothesis, the executable artifact, the factual evidence produced, and a distilled insight. This means the coordinator can explore multiple competing directions at the same time without losing its place.</p><p>The coordinator builds the tree by placing broad ideas near the root, while concrete refinements branch out as leaves. This allows Arbor to safely explore multiple competing hypotheses simultaneously. If an executor&#x27;s experiment fails, the tree records why it failed as a negative constraint, ensuring the system doesn&#x27;t endlessly repeat the same mistake.</p><p>To understand why Arbor&#x27;s isolation matters, consider a common enterprise scenario: optimizing a <a href="https://venturebeat.com/orchestration/architectural-patterns-for-graph-enhanced-rag-moving-beyond-vector-search-in-production">Retrieval-Augmented Generation</a> (RAG) pipeline for an internal AI assistant. &quot;When you ask a single agent like Claude Code or Codex to &#x27;improve accuracy,&#x27; it will typically change a bunch of things in one pass — chunking, the prompt, the retrieval method,&quot; Jin said. This entangles the changes, making it impossible to attribute which one actually helped. It also directly mutates the repository without isolation. </p><p>Arbor solves this by treating each lever as a separate hypothesis. Chunking becomes one branch, retrieval another, and the prompt another — each implemented and evaluated in its own isolated git worktree. &quot;So you get clean attribution: &#x27;constraint decomposition on the retrieval side gave +X; breadth-first search actually hurt,&#x27;&quot; Jin said.</p><p>When an executor returns a report, the coordinator writes the evidence to the tree and backpropagates the insight upward to parent nodes. This means a local observation becomes a generalized constraint that shapes the coordinator&#x27;s future idea generation.</p><p>To prevent reward hacking or overfitting to the development data, HTR enforces a strict “merge gate.” Even if an executor reports a fantastic development score, the coordinator will spin up an isolated worktree to test the candidate against a held-out test evaluator. The artifact is only merged into the current best trunk if it demonstrably improves the test score, verifying that the progress is real.</p><p>Arbor generally falls under the concept of &quot;<a href="https://addyosmani.com/blog/loop-engineering/">loop engineering</a>,&quot; popularized by industry figures like OpenClaw creator Peter Steinberger and Claude Code lead Boris Cherny. The idea is to move beyond single prompts to design iterative cycles (observe, reason, act, verify) that drive autonomous agents. However, as Jin points out, &quot;A loop can fill up with messy, untraceable attempts, and you end up with nothing to show and no way to reconstruct what changed.&quot; </p><h2>Arbor in action</h2><p>The researchers evaluated Arbor on an autonomous optimization task suite built from real-world research settings and the MLE-Bench Lite machine learning engineering benchmark. The AO suite featured tasks from different areas of AI development, including model training, harness engineering, and data synthesis.</p><p>The researchers used different backbone models for the coordinator and executor agents, including Claude Opus 4.6, GPT-5.5, and Gemini-3-Flash. They tested Arbor against the strongest coding agents, Codex and Claude Code. Arbor and the baselines were given the same resources. For the MLE-Bench Lite tasks, Arbor was also compared against top-tier agentic research systems like AI-Scientist, ML-Master, and AIDE.</p><p>Arbor consistently outperformed the baselines. It achieved the best held-out test result on all tasks, attaining more than 2.5 times the average relative gain of Codex and Claude Code. On the BrowseComp task, which involves optimizing a search agent, Arbor improved the system&#x27;s held-out accuracy from a baseline of 45.33% to 67.67%. Meanwhile, Codex and Claude Code stalled at 50% and 53.33%, respectively. On MLE-Bench Lite, when equipped with GPT-5.5, Arbor achieved the strongest result among all benchmarked systems.</p><p>Arbor proved to be resilient against overfitting. For example, during the Terminal-Bench 2.0 task experiments, Claude Code achieved a high development score of 75 but its score dropped to 71 on the held-out data. Arbor had a lower development score of 72.22 but achieved the highest held-out score of 77.36, ensuring its results transfer to real-world applications.</p><p>Arbor also showed generalization in a cross-task transfer experiment. After Arbor finished optimizing the search harness for the BrowseComp task, researchers took the optimized codebase and tested it on two unrelated search-agent tasks, HLE and DeepSearchQA. Arbor&#x27;s optimized codebase significantly improved performance on those unseen tasks as well.</p><h2>Deploying Arbor: Sweet spots and hidden costs</h2><p>For engineering leads looking to drop Arbor into their existing tech stack, the framework is designed to sit on top of existing Git workflows rather than replacing them. &quot;Its output is an ordinary git branch that your existing code review, CI, and human review can inspect directly,&quot; Jin said. Only verified gains are merged into a per-run trunk, leaving the main repository untouched until a developer manually chooses to promote the code.</p><p>However, deploying Arbor comes with specific tradeoffs. Jin points out that the biggest catch is token cost, as maintaining a long-lived coordinator that continuously manages the tree and dispatches executors is the dominant expense. Running multiple isolated worktrees concurrently also requires genuine compute and disk resources to process real experiments.</p><p>So where is Arbor&#x27;s sweet spot? According to Jin, it excels at tasks with a clear, trustworthy metric, tolerance for a long time horizon, and a real search space with several plausible directions, such as pipeline optimization, data-synthesis quality, and model-training recipe tuning. </p><p>Conversely, teams should explicitly avoid using Arbor for real-time latency tasks, obvious one-line fixes, or when the underlying evaluation metric is flawed. The quality ceiling of the entire run is strictly bounded by the quality of the evaluator. &quot;If the metric isn&#x27;t trustworthy, Arbor will just optimize toward an untrustworthy result faster,&quot; Jin said.</p><p>Jin sees the next evolution going beyond single scalar metrics. &quot;A natural evolution is to have each node&#x27;s artifact carry a vector — accuracy, latency, cost — instead of a single score,&quot; Jin said. &quot;Going from a single scalar to a multi-objective Pareto search is a very natural extension of the framework.&quot;</p>]]></description>
            <author>bendee983@gmail.com (Ben Dickson)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4cekVJ1BS9ADYlnh0s4ezc/d00dd7900cb08ca4bd77a3060898fb33/arbor.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Copilot searched your mailbox. LiteLLM handed out admin keys. Run this 5-check audit before your stack is next]]></title>
            <link>https://venturebeat.com/security/copilot-searched-your-mailbox-litellm-handed-out-admin</link>
            <guid isPermaLink="false">1a7xSZvdjuJg9QOkTPSMyW</guid>
            <pubDate>Thu, 18 Jun 2026 17:42:49 GMT</pubDate>
            <description><![CDATA[<p>Two AI tools broke in the same way in the same two weeks, and four research teams proved it. The pattern underneath every disclosure is one sentence: enterprise AI accepts external input with no trust boundary. </p><p>On June 15, Varonis disclosed <a href="https://www.varonis.com/blog/searchleak">SearchLeak (CVE-2026-42824)</a>, a proof-of-concept exfiltration chain in Microsoft 365 Copilot Enterprise Search. A victim clicks a crafted microsoft.com URL, Copilot searches their mailbox, and the data leaves through a Bing SSRF. No plugins, no second click, no visible indicator. Four days earlier, Obsidian Security published a <a href="https://www.obsidiansecurity.com/blog/litellm-privilege-escalation-rce">three-CVE chain against LiteLLM</a> that carried a default low-privilege user all the way to admin and remote code execution. Two tools. Two teams. One broken boundary.</p><p>The five-check audit at the end of this article maps each gap to a CVE or a market signal from June, a command you can run before lunch, and a sentence a CISO can read to the board.</p><h2>Copilot turned a trusted URL into an exfiltration engine</h2><p>SearchLeak chained three weaknesses into a silent data-theft chain. The URL q parameter fed attacker instructions straight to Copilot’s LLM. A rendering race condition fired an image tag before the output sanitizer ran. Bing’s image-search endpoint, allowlisted in the <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP">Content Security Policy</a>, routed the stolen data out. Microsoft rated the flaw critical and patched it on the back end, according to Varonis. <a href="https://nvd.nist.gov/vuln/detail/CVE-2026-42824">NVD has not yet scored it</a>; a third-party tracker lists it at 6.5 medium. The severity is contested, but the mechanism is not.</p><p>The escalation is the real story. This is the third Varonis Copilot exfiltration chain in twelve months, after <a href="https://arstechnica.com/security/2026/01/a-single-click-mounted-a-covert-multistage-attack-against-copilot/">Reprompt</a> in January and <a href="https://www.bleepingcomputer.com/news/security/new-attack-turned-microsoft-365-copilot-into-1-click-data-theft-tool/">EchoLeak</a> in 2025. Reprompt hit Copilot Personal. SearchLeak hit Enterprise Search. Enterprise inherits the user’s full organizational permissions, so the blast radius is everything that a user can reach.</p><h2>LiteLLM handed a default account to every provider key</h2><p>The LiteLLM gateway holds the keys for OpenAI, Anthropic, Azure, and Bedrock behind a single proxy. The Obsidian chain runs in three moves. <a href="https://cvefeed.io/vuln/detail/CVE-2026-47101">CVE-2026-47101</a>, an authorization bypass, lets a non-admin mint a wildcard API key. CVE-2026-47102 promotes that caller to proxy admin through an unguarded /user/update endpoint. CVE-2026-40217 escapes the code sandbox through exec() with full builtins. Obsidian then demonstrated a reverse shell by injecting a forged tool-call response through LiteLLM’s callback mechanism. Obsidian assessed the combined chain at CVSS 9.9. The developer typed one word. The attacker popped a shell.</p><p>A separate LiteLLM flaw made the urgency immediate. <a href="https://thehackernews.com/2026/06/litellm-flaw-cve-2026-42271-exploited.html">CVE-2026-42271</a>, a command-injection bug in the MCP test endpoints, landed on the <a href="https://www.cisa.gov/known-exploited-vulnerabilities-catalog">CISA KEV list</a> on June 8 with a June 22 remediation deadline. That KEV entry is not the Obsidian chain. The two are distinct disclosures four days apart, fixed in different releases, pointed at the same gateway. LiteLLM carries more than 40,000 GitHub stars and sits in thousands of enterprise deployments. This is not the first scare, either. A <a href="https://thehackernews.com/2026/06/litellm-vulnerability-chain-lets-low.html">supply-chain compromise backdoored LiteLLM versions 1.82.7 and 1.82.8 on PyPI in March</a>. A compromised gateway exposes every provider credential the organization holds.</p><h2>Langflow and Mini Shai-Hulud proved the pattern scales</h2><p>The same boundary broke in two more tools in the same fortnight. <a href="https://thehackernews.com/2026/06/unpatched-langflow-flaw-cve-2026-5027.html">Langflow CVE-2026-5027</a> became the third Langflow remote-code-execution flaw to hit active exploitation this year. A path traversal in file upload lets an attacker write files anywhere on disk, and because Langflow ships with auto-login enabled by default, a single unauthenticated request reaches RCE. <a href="https://www.vulncheck.com/">VulnCheck</a> confirmed exploitation on June 9. Censys counted roughly 7,000 exposed instances, the heaviest concentration in North America, with <a href="https://attack.mitre.org/groups/G0069/">MuddyWater</a> attribution.</p><p>The <a href="https://www.securityweek.com/over-100-npm-pypi-packages-hit-in-new-shai-hulud-supply-chain-attacks/">Mini Shai-Hulud campaign</a> hit a different pressure point. After the worm’s source code went public on May 12, copycat variants <a href="https://socket.dev/blog/mini-shai-hulud-campaign-hits-red-hat-cloud-services-npm-packages">compromised 32 Red Hat Cloud Services npm packages</a> on June 1, packages pulled 80,000 times a week. The worm harvests more than 20 credential types and self-propagates under the compromised maintainer’s identity.</p><p>Four teams, four tools, one operating failure. The bug classes differ. SearchLeak is a prompt injection. LiteLLM is privilege escalation. Langflow is path traversal. Mini Shai-Hulud is supply-chain poisoning. The boundary that broke is the same in all four.</p><h2>The market already repriced the risk</h2><p>CrowdStrike’s <a href="https://www.fool.com/earnings/call-transcripts/2026/06/03/crowdstrike-crwd-q1-2027-earnings-transcript/">Q1 FY27 earnings call</a> put a number on the gap. <a href="https://www.crowdstrike.com/en-us/platform/falcon-aidr-ai-detection-and-response/">AIDR</a>, the company’s AI detection and response line, grew ending ARR more than 250% sequentially, with a Q2 pipeline above $50 million (<a href="https://www.sec.gov/Archives/edgar/data/0001535527/000153552726000022/crwd-20260603xex991.htm">SEC-filed 8-K</a>). Total company ARR reached $5.51 billion, and CrowdStrike’s fleet telemetry shows more than 1,800 agentic applications running across enterprise endpoints. </p><p>On June 17, the company <a href="https://www.crowdstrike.com/en-us/press-releases/crowdstrike-advances-ai-and-cloud-security-operations-on-aws/">extended AIDR to AWS</a>, adding real-time evaluation of agent, LLM, and MCP communications across Amazon Bedrock, Kiro, and Strands Agents, building on its work with <a href="https://www.anthropic.com/glasswing">Anthropic’s Project Glasswing</a>. Daniel Bernard, CrowdStrike’s chief business officer, said the AI attack surface now spans development, runtime, identities, and cloud infrastructure, and that teams treating those as separate domains leave the gaps between them open.</p><h2>Practitioners name the same gap in plainer terms</h2><p>David Levin, CISO at American Express Global Business Travel, <a href="https://venturebeat.com/security/amex-ciso-fights-threats-at-machine-speed-with-ai/">told VentureBeat</a> the pattern does not surprise him. “We kind of have this shadow AI, which is just the new version of shadow IT,” Levin said. </p><p>Both Langflow and LiteLLM fit the description. Teams stood them up for convenience, gave them credentials, and never brought them under governance. Levin puts the fix before deployment. “We didn’t go into this with just saying we’re going to go do this without the right fundamentals,” he said. “We leverage NIST controls. NIST has released their CSF along with their AI framework. OWASP released their top 10. You need the right fundamentals before you deploy.”</p><p>Merritt Baer, CSO at Enkrypt AI and former AWS Deputy CISO, named the structural version of the failure in a separate <a href="https://venturebeat.com/security/most-enterprises-cant-stop-stage-three-ai-agent-threats-venturebeat-survey-finds">VentureBeat interview</a>. “Enterprises believe they’ve ‘approved’ AI vendors, but what they’ve actually approved is an interface, not the underlying system,” Baer said. “The real dependencies are one or two layers deeper, and those are the ones that fail under stress.” She has tied that directly to how systems fall. “Raw zero-days aren’t how most systems get compromised. Composability is,” Baer <a href="https://venturebeat.com/security/adversaries-hijacked-ai-security-tools-at-90-organizations-the-next-wave-has-write-access-to-the-firewall">told VentureBeat</a>. “It’s the glue between the model and your data where the risk lives. If you give an agent bash and a root token, you’ve already done most of the attacker’s work for them.” That is what rows 2 and 4 of the audit test: the gateway that holds every key, and the agent identity no one governs.</p><p>Levin had a sharper frame for the boardroom. “You need to talk more in terms of risk versus compliance to your boards and your executives,” he said. “It’s not about the size of the engineering team anymore. It’s the size of your imagination. It’s all written in plain English. It’s not hard for anyone.” Neither SearchLeak nor LiteLLM needed custom malware or a zero-day to work.</p><p>Adam Meyers, CrowdStrike’s SVP of Intelligence, put the operational squeeze in numbers in an exclusive VentureBeat interview. “The problem is not zero-day. The problem is patching. If you 10x that problem, they’re gonna be completely underwater,” Meyers said. He pointed to identity as the second front. “Some of these AI have their own identities, or people give their identity to the AI to take action on their behalf, and that makes it a very complex problem.”</p><h2>The five-check trust-boundary audit</h2><p>Each row maps a gap to its proof point, a verification command for Monday morning, the fix, and the sentence to read to the board.</p><table><tbody><tr><td><p><b>Trust-Boundary Gap</b></p></td><td><p><b>Proof Point</b></p></td><td><p><b>What Broke</b></p></td><td><p><b>Verify Monday</b></p></td><td><p><b>Fix Monday</b></p></td><td><p><b>Board Language</b></p></td></tr><tr><td><p><b>1. Prompt-to-Data</b></p></td><td><p>SearchLeak CVE-2026-42824. P2P injection + HTML race + Bing SSRF. One-click mailbox exfiltration via microsoft.com URL. PoC demonstrated; Microsoft rated it critical, NVD not yet scored.</p></td><td><p>URL q-parameter passed to LLM as instructions. Sanitizer ran after render. Bing acted as exfiltration proxy via CSP allowlist.</p></td><td><p>Audit CSP allowlists for domains performing server-side fetches. Monitor Copilot Search URLs for encoded payloads. Review Copilot audit logs.</p></td><td><p>Confirm server-side patch applied. Enable sensitivity labels restricting Copilot. Treat AI streaming output as untrusted.</p></td><td><p>“Our AI assistant could search employee email and send results to an attacker through a trusted Microsoft URL. Vendor patched it. We must verify configuration.”</p></td></tr><tr><td><p><b>2. Gateway Credential Exposure</b></p></td><td><p>LiteLLM three-CVE chain (-47101, -47102, -40217). CVSS 9.9. Separate CVE-2026-42271 on CISA KEV (fixed in v1.83.7; full chain fixed in v1.83.14-stable). June 22 deadline.</p></td><td><p>No role validation on key endpoints. Self-promotion to admin via /user/update. exec() sandbox escape. One gateway exposes all provider keys.</p></td><td><p>Run pip show litellm. Below 1.83.14-stable = vulnerable. Check /mcp-rest/test/ exposure. Audit proxy_admin accounts.</p></td><td><p>Upgrade to v1.83.14-stable+. Rotate all provider API keys. Block /mcp-rest/test/* at proxy. Review Custom Code Guardrails.</p></td><td><p>“Our AI gateway held keys for every provider. A default account could promote itself to admin and steal them all. Rotating and patching now.”</p></td></tr><tr><td><p><b>3. AI Tooling Sprawl</b></p></td><td><p>Langflow CVE-2026-5027 (CVSS 8.8). Third RCE of 2026. ~7,000 exposed instances. MuddyWater. Active exploitation June 9.</p></td><td><p>Path traversal in file upload. Auto-login enabled by default. Single unauthenticated request to RCE.</p></td><td><p>Query Censys/Shodan for Langflow, Flowise, n8n, Dify on your perimeter. Check auto-login. Inventory AI tools outside change management.</p></td><td><p>Pull AI platforms behind VPN/zero-trust. Enable auth everywhere. Upgrade Langflow to v1.9.0+ (current release 1.10.0). Fingerprint surface continuously.</p></td><td><p>“AI dev tools are exposed to the internet with login disabled. A nation-state group is exploiting this flaw now. Pulling behind access controls today.”</p></td></tr><tr><td><p><b>4. Non-Human Identity Governance</b></p></td><td><p>AIDR ARR up 250% (Q1 FY27, SEC 8-K). Q2 pipeline &gt;$50M. 1,800+ agentic apps across enterprise endpoints.</p></td><td><p>Agents hold identities and act on behalf of humans. Some exceed their intended scope to reach a goal. No standard governs agent credential lifecycle.</p></td><td><p>Inventory all non-human identities used by agents and MCP servers. Map agent-to-data-store access. Flag agents with write access to security policy.</p></td><td><p>Least-privilege every agent identity. Set privilege boundaries via identity protection. Runtime detection for policy-exceeding actions. Human-in-the-loop for policy changes.</p></td><td><p>“AI agents hold credentials and act autonomously. We do not govern their identity lifecycle like human access. The 250% market growth tells us this gap is systemic.”</p></td></tr><tr><td><p><b>5. Runtime Agentic Detection</b></p></td><td><p>Falcon AIDR expanded to AWS (June 17). Covers Bedrock, Kiro, Strands Agents. MCP integration. Real-time agent/LLM/MCP evaluation.</p></td><td><p>Traditional tools monitor human-speed actions. Agents run at machine speed, thousands of actions per minute, and route around controls to reach goals.</p></td><td><p>Test if EDR/XDR links agent actions to originating identity. Verify SIEM ingests MCP communications. Confirm you can distinguish human from agent on endpoint.</p></td><td><p>Deploy AIDR or equivalent runtime detection. Shadow-AI discovery for all agentic apps, models, MCP servers, identities. Real-time policy enforcement on agent actions.</p></td><td><p>“We cannot distinguish a human employee from an AI agent acting on their behalf. We need runtime detection at machine speed that can stop damage before it starts.”</p></td></tr></tbody></table><h2>The fix is plumbing, not policy</h2><p>The <a href="https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/">June 2 executive order</a> creates an AI Cybersecurity Clearinghouse with a July 2 deadline. The five gaps above are not frontier-model problems. They are plumbing problems in the gateways, orchestration platforms, identity layers, and runtime environments where AI meets the enterprise. </p><p>The audit is five rows. Every row maps to a June disclosure or market signal, a command a team can run before lunch, and a sentence a CISO can read to the board. The question is not whether your vendor will patch. It&#x27;s whether you find the gap first — or whether an attacker finds it the way they found Copilot and LiteLLM.</p>]]></description>
            <author>louiswcolumbus@gmail.com (Louis Columbus)</author>
            <category>Security</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/599hDEEWXHzpIDiNVQFFsc/069254d665cc4a88ccee32f955648c72/hero.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Adobe embeds agentic AI workflows across Creative Cloud, shifting from media generation to production orchestration]]></title>
            <link>https://venturebeat.com/orchestration/adobe-embeds-agentic-ai-workflows-across-creative-cloud-shifting-from-media-generation-to-production-orchestration</link>
            <guid isPermaLink="false">2zp4nMWlfrCETzjiYTxTOp</guid>
            <pubDate>Thu, 18 Jun 2026 14:33:04 GMT</pubDate>
            <description><![CDATA[<p><a href="https://blog.adobe.com/en/publish/2026/06/18/adobe-firefly-introduces-new-agentic-capabilities-and-an-upgraded-creative-ai-studio-built-for-the-way-you-work">Adobe has announced</a> a major expansion of its &quot;creative agent&quot; across its flagship Creative Cloud suite and upgraded Firefly AI studio. </p><p>Available in public beta starting today across Premiere Pro, Photoshop, Illustrator, InDesign, and Frame.io, the agent is designed to serve everyone from individual creators to enterprise marketing teams. </p><p>Unlike first-generation generative AI tools that simply output flat media from a chat interface, Adobe’s embedded assistant acts as an orchestration layer. </p><p>It interprets natural language prompts and directly accesses the underlying software&#x27;s APIs to execute complex, multi-step production workflows—from batch-renaming video sequences to dynamically updating brand assets across print layouts—while leaving the final aesthetic decisions entirely in the hands of the human designer. </p><h3><b>Technology: Contextual Memory and DOM Manipulation</b></h3><p>At the core of this release is a significant technical upgrade to how Adobe&#x27;s AI handles persistent memory and context window management. In its upgraded Firefly creative AI studio—currently in private beta—Adobe has introduced two foundational architectural components: &quot;Elements&quot; and &quot;Projects&quot;. </p><ul><li><p><b>Elements</b> functions as a visual variables library, allowing users to save and reuse specific characters, locations, and objects across multiple generations to ensure strict visual consistency as campaigns scale. </p></li><li><p><b>Projects</b> acts as the contextual memory layer, storing assets, generations, and session history in a unified space so users can pick up where they left off without rebuilding their prompt context. </p></li></ul><p>Beyond pixel generation, the system&#x27;s most critical technological leap is its ability to operate seamlessly within the complex document structures of desktop applications. &quot;Our Adobe Creative Agent can leverage the decades of powerful features, workflows, APIs that we&#x27;ve brought into our application and exposed through tooling that can now be invoked through a creative agent,&quot; an Adobe representative explained. </p><h3><b>Product: Automating the Tedious, Expanding the Canvas</b></h3><p>The practical application of this technology fundamentally alters standard production workflows. Adobe is positioning the human user as a &quot;creative director&quot; capable of delegating repetitive, labor-intensive tasks to the AI. The rollout introduces highly specific specialist agents tailored to the logic of each application: </p><ul><li><p><b>Premiere Pro:</b> The agent handles tedious project setup, analyzing and sorting source media into bins, batch renaming clips, identifying interview questions, and assembling a rough working starting point. </p></li><li><p><b>Illustrator:</b> The assistant automates mathematical and multi-step design tasks, such as generating 50 versioned files from a spreadsheet or running pre-flight checks to flag color mode errors before printing. It can even programmatically duplicate a vector shape 100 times, randomize its position, and change its size based on its z-depth and transparency. </p></li><li><p><b>Photoshop &amp; InDesign:</b> The agent executes batch background removals, dynamic layer organization, and applies brand updates across multi-page layouts. </p></li></ul><p>Furthermore, Adobe is actively integrating its creative agent into major third-party enterprise platforms, including OpenAI&#x27;s ChatGPT, Anthropic&#x27;s Claude, Microsoft 365 Copilot, and soon, Google Gemini and Slack. </p><h3><b>Licensing: Commercial SaaS and Enterprise Implications</b></h3><p>Unlike open-source orchestration frameworks or models released under MIT or Apache licenses, Adobe&#x27;s creative agent operates strictly within a proprietary, commercial SaaS ecosystem. For enterprise decision-makers, this carries specific implications. Because the agent relies on Adobe&#x27;s proprietary APIs to manipulate project files, it requires an active Creative Cloud commercial license. Additionally, by bringing the &quot;Adobe for creativity connector&quot; to platforms like Slack and Microsoft Copilot , enterprise IT and systems architects must consider how internal chat tools will interface with Adobe&#x27;s cloud processing environments to support enterprise creative and marketing teams securely. </p><h3><b>The Enterprise Unknowns: APIs, Governance, and Architecture</b></h3><p>While Adobe’s announcements highlight a powerful user interface and deep integration within its own flagship applications, several critical questions remain for enterprise technical decision-makers tasked with building bespoke AI systems. VentureBeat has reached out to Adobe for clarification on these infrastructure-level details and will update this coverage as we learn more.</p><p>For AI system architects, the value of a creative agent lies not just in a native application UI, but in its extensibility. It remains unclear if Adobe plans to expose these new agentic capabilities via API, or if the company will support the Model Context Protocol (MCP). Without MCP support or direct API access, enterprise teams will face friction integrating Adobe&#x27;s tools into their own custom task-routing frameworks and internal LLM pipelines.</p><p>Adobe’s new &quot;Elements&quot; feature promises to solve the generative AI consistency problem by anchoring characters and objects across generations. </p><p>However, the backend architecture driving this persistent memory is not yet detailed. Whether Adobe is leveraging on-the-fly Low-Rank Adaptation (LoRA) based on user uploads or utilizing a form of visual Retrieval-Augmented Generation (RAG) is a critical distinction for technology leaders managing compute costs, model evaluations, and enterprise-grade inference pipelines.</p><p>As organizations build out &quot;Projects&quot; and define brand-specific &quot;Elements&quot;, security and data decision-makers require strict guarantees regarding data provenance and storage. It is currently unknown exactly where this contextual workflow and vector data lives—specifically, whether it remains strictly sandboxed within the customer&#x27;s enterprise Creative Cloud instance on Adobe servers, and how role-based permissions apply to these new agentic workflows.</p><p>Finally, as lightning-fast, developer-first, multi-model AI creative platforms like <a href="https://www.linkedin.com/posts/toddj0_running-out-of-new-ways-to-describe-just-share-7356718780363796481-zREK/">fal.ai gain significant traction</a> among enterprises and developers, Adobe’s position in the broader developer ecosystem remains a point of interest. </p><p>Whether Adobe views these infrastructure-level API providers as direct competitors to its Firefly AI studio or as potential integration points for bespoke enterprise environments has yet to be seen.</p><h3><b>Community Reactions: The Tension Between Automation and Craft</b></h3><p>The integration of agentic AI touches on the tension between eliminating drudgery and surrendering creative control. According to Adobe&#x27;s recent Creators&#x27; Toolkit Report, which surveyed over 16,000 creators globally, the market is highly receptive to AI as an operational assistant rather than an autonomous creator. </p><ul><li><p>75 percent of surveyed creators describe creative AI as integrated or essential to their current workflows. </p></li><li><p>85 percent emphasized that the final creative decision must always remain in human hands. </p></li></ul><p>This sentiment is central to Adobe&#x27;s messaging. By focusing the agent&#x27;s capabilities on file organization, layer management, and brand compliance, Adobe aims to automate what a spokesperson called the &quot;tedious parts of their workflow&quot;. The goal, according to Adobe executive David Wadhwani, is to let creatives focus on the craft so they can &quot;apply their taste and make the calls that only they can&quot;. </p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/CRblDVbL1kQ1uEVW35sEq/ad7b269e284b5c73782b5d4934b93c2c/ChatGPT_Image_Jun_18__2026__10_30_55_AM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[AWS enters the context layer race with a graph that learns from agents, not manual curation]]></title>
            <link>https://venturebeat.com/data/aws-enters-the-context-layer-race-with-a-graph-that-learns-from-agents-not-manual-curation</link>
            <guid isPermaLink="false">3abnQNrcZpXGEx3mLt9fH7</guid>
            <pubDate>Wed, 17 Jun 2026 22:10:38 GMT</pubDate>
            <description><![CDATA[<p>Building a context layer between enterprise data stores and AI agents is bespoke work, with no standard service to automate or maintain the graphs over time. Amazon is making a direct play to change that.</p><p>Amazon on Wednesday entered the space, announcing a series of three products it&#x27;s positioning as a context intelligence stack for AI agents. The centerpiece is AWS Context, a new knowledge graph service that gets smarter through agent usage over time. AWS also announced the general availability of Amazon S3 Annotations and a preview of skill assets in AWS Glue Data Catalog.</p><p>The context layer is now a contested architectural category with no shortage of options from different vendors. AWS is entering that market with a different architectural premise: that the graph should learn from how agents use it automatically, without human re-curation.</p><p>&quot;Your agents now get smarter without you having to rebuild anything from scratch,&quot; said Swami Sivasubramanian, vice president of Agentic AI at AWS, during his AWS Summit NYC keynote. </p><p>&quot;This service automatically builds a knowledge graph from all your existing data,&quot; he said. &quot;This service infers relationships across your data sets, business rules, and domain knowledge, and makes all of it available to your agents and your organization at runtime.&quot;  </p><h2>AWS Context builds a self-learning knowledge graph from existing data</h2><p>It&#x27;s a problem AWS says it has seen repeatedly in customer deployments. </p><p>AWS Context maps relationships across existing data automatically: what tables exist, what columns mean, how sources relate and which sources are authoritative. It combines semantic search with graph-level reasoning and infers relationships across datasets, business rules and domain knowledge, making all of it available to agents at runtime.</p><p>&quot;The knowledge graph improves itself over time as it learns which sources produce correct results and which parts get used,&quot; Sivasubramanian said. </p><p>Data stewards manage the graph through the AWS Management Console, reviewing inferred relationships, promoting them to production and attaching business definitions and usage rules. Every query inherits the calling user&#x27;s IAM and Lake Formation permissions, making agent data access auditable by identity through controls enterprises already rely on.</p><p>All metadata is published in Apache Iceberg format to Amazon S3 Tables, queryable via Athena, Redshift, Spark or any Iceberg-compatible engine, with no proprietary APIs. Third-party catalog connections are supported, so context from systems outside AWS can be pulled into the same graph. Agents query through agentic search APIs and MCP tools across Bedrock AgentCore, EKS or any MCP-compatible framework.</p><h2>Context is more than just a single service</h2><p>Context is a complicated space and AWS is layering multiple services to help enterprises build context across the data stack.</p><p><b>Amazon S3 Annotations.</b> This service enables users to attach rich business context at the storage layer, directly to individual S3 objects. </p><p><b>AWS Glue Data Catalog skill assets</b>. Glue skill assets attach domain knowledge at the catalog layer, linking runbooks, query patterns and usage rules to data assets across the estate. </p><p>AWS Context then synthesizes both into the knowledge graph that agents query at runtime, combining semantic search with graph-level reasoning across structured and unstructured sources. Each layer feeds the next.</p><h2>AWS is entering a highly competitive context space</h2><p><a href="https://venturebeat.com/data/ai-agents-keep-giving-confident-wrong-answers-the-context-layer-is-enterprise-ais-next-production-problem">Snowflake announced</a> its context approach earlier this month with its Horizon Context and Cortex Sense services. Microsoft is providing context via its<a href="https://venturebeat.com/data/enterprise-ai-agents-keep-operating-from-different-versions-of-reality?_gl=1*b66y4g*_up*MQ..*_ga*MTM4OTgwNTA2LjE3ODE3MzAyNTk.*_ga_SCH1J7LNKY*czE3ODE3MzAyNTgkbzEkZzAkdDE3ODE3MzAyNTgkajYwJGwwJGgw*_ga_B8TDS1LEXQ*czE3ODE3MzAyNTgkbzEkZzEkdDE3ODE3MzAyNTgkajYwJGwwJGgw"> Fabric IQ platform</a> that provides a semantic ontology for data. Redis has developed a<a href="https://venturebeat.com/data/context-architecture-is-replacing-rag-as-agentic-ai-pushes-enterprise-retrieval-to-its-limits?_gl=1*i19buu*_up*MQ..*_ga*MTM4OTgwNTA2LjE3ODE3MzAyNTk.*_ga_SCH1J7LNKY*czE3ODE3MzAyNTgkbzEkZzAkdDE3ODE3MzAyNTgkajYwJGwwJGgw*_ga_B8TDS1LEXQ*czE3ODE3MzAyNTgkbzEkZzEkdDE3ODE3MzAyNTgkajYwJGwwJGgw"> context platform</a> that optimizes data for retrieval. Vector database vendor Pinecone has its<a href="https://venturebeat.com/data/the-rag-era-is-ending-for-agentic-ai-a-new-compilation-stage-knowledge-layer-is-what-comes-next?_gl=1*klgyi3*_up*MQ..*_ga*MTM4OTgwNTA2LjE3ODE3MzAyNTk.*_ga_SCH1J7LNKY*czE3ODE3MzAyNTgkbzEkZzAkdDE3ODE3MzAyNTgkajYwJGwwJGgw*_ga_B8TDS1LEXQ*czE3ODE3MzAyNTgkbzEkZzEkdDE3ODE3MzAyNTgkajYwJGwwJGgw"> Nexus context offering</a> that compiles enterprise data into task-specific artifacts before agents ever query them.</p><p>AWS&#x27;s structural argument is straightforward: for enterprises already running S3, Glue and Lake Formation, AWS Context extends an existing identity model with no data movement required. The pitch is zero-integration friction — not just cost consolidation.</p><p>&quot;Context makes agents more powerful and as the whole world is building agents, every agentic platform vendor needs a context capability,&quot; Holger Mueller, VP and Principal analyst at Constellation Research, told VentureBeat.</p><p>Mueller noted that AWS is no exception. &quot;The concern — as with all context offerings — is going to be performance, especially for transactional data,  we will see,&quot; he said.</p>]]></description>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/3io1fRBUqBtb0b4g3PWP1u/23d64118a63ed9f4388e51d688f32ab8/context-smk1.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>