Arm Newsroom

Arm and Google Cloud redefine agentic AI infrastructure with Axion processors

Yan Fisher — Wed, 22 Apr 2026 17:00:00 +0000

Google Cloud is taking a major step toward operationalizing agentic AI at scale with multiple updates, including new TPU 8t and TPU 8i systems as well as the introduction of its Agent Sandbox on Google Kubernetes Engine (GKE), a purpose-built deployment framework designed to run complex, multi-step AI systems efficiently and securely. This new agentic infrastructure is built on Axion, Google’s Arm Neoverse-based CPU, which underscores a crucial shift toward purpose-built CPU architectures for next-generation AI workloads.

As agentic AI moves from experimentation to production, the infrastructure requirements are changing. Unlike traditional inference, which relies on single model calls, agentic systems orchestrate continuous chains of reasoning, tool use and real-time data retrieval. This dramatically increases concurrency, latency sensitivity and overall compute demand, placing the CPU firmly on the critical path to success.

This is where Arm-based infrastructure stands apart. Built for high-throughput, energy-efficient compute, Arm Neoverse platforms – and in this case specifically Google Axion – have emerged as the foundation for scalable agentic AI deployments.

Agentic AI at Scale: Axion leads the pack

Google Cloud’s announcement of eighth-generation TPU systems builds on its strong legacy of custom silicon design. This generation introduces a divergence for training and inference applications with the TPU 8t and TPU 8i variants, and, for the first time, it integrates Google Axion CPU as the header. That reduces data preparation latency to ensure the TPU engines stay fully utilized and never stall.

TPU isn’t the end of the story though; Google Cloud is pursuing a co-designed ‘AI Hypercomputer’ vision, and equally important is the introduction of the GKE Agent Sandbox, which offers scalable and low-latency infrastructure designed for agents to safely execute untrusted code and tool calls without sacrificing performance. With Google Axion, you can build agents on leading infrastructure without compromising on cost or choice.

GKE Agent Sandbox running on Google Axion processors and built on gVisor with Kata Containers support delivers up to:

300 sandboxes per second per cluster at
<1s time-to-first-instruction latency

Maintaining this level of sandbox throughput and low-latency execution puts continuous pressure on the underlying infrastructure. As agentic AI becomes a default deployment pattern, the infrastructure beneath it must keep pace with delivering the throughput, responsiveness, and efficiency required to run agentic workloads reliably at scale. Axion is designed to meet that demand.

And as these agentic systems expand, the efficiency of inference becomes critical as well. Without efficient inference, agents cannot function; without agentic orchestration, inference remains underutilized. By landing both critical tasks on CPU-based infrastructure organizations can scale intelligent systems with strong performance while maintaining cost.

AI inference on Axion: Performance that changes the economics

C4A VMs, powered by Arm Neoverse V2 based Axion CPUs, are optimized to complement pure-play accelerators in handling these parallel, latency-sensitive workloads efficiently by enabling high-throughput AI inference on general-purpose compute.

These capabilities are already becoming evident in production environments. loveholidays, the European online travel platform, runs large-scale embedding and inference workloads across petabytes of data, where accelerator-based approaches can be cost-prohibitive at scale.

“As a business, we are growing our token appetite faster than our budget allows,” said Dimitri Lerko, Head of Engineering, loveholidays. “Running large-scale embedding and inference workloads on GPUs is cost-prohibitive at our data volumes, so maximizing CPU efficiency is critical. Leveraging Axion family of C4A and N4A VMs gives us the price-performance headroom to build real-time AI decision-making pipelines with bespoke and open-source model inference using CPUs — something that simply wasn’t viable before.”

In our testing, C4A consistently outperforms current-generation x86 instances across a range of AI inference workloads:

Extending the Axion portfolio

For workloads that require greater control, the Axion family extends to C4A Metal, a native bare-metal instance (in preview) that brings the same Arm architecture from cloud to edge. It enables consistent development, validation, and deployment across environments, with direct hardware access and no hypervisor overhead for deterministic performance. This is ideal for demanding use cases like automotive vHIL, native Android CI/CD, and specialized enterprise infrastructure requiring greater control, performance, and architectural consistency.

“At Panasonic, we’re building next-generation in-vehicle experiences across the cloud and the car,” said Andrew Poliak, Chief Technology Officer, Panasonic Automotive Systems America, LLC. “During the preview of C4A Metal instances, we used a bare-metal Arm environment that matches our edge architecture, enabling teams to develop, test and validate automotive applications on a single, consistent platform. This allows us to move from cloud to vehicle with bit parity – running the same binaries in both environments – without– architectural compromise.”

Alongside this, N4A, the most recent addition to the Axion family, provides a cost-efficient foundation for scale-out workloads such as web services, APIs, and data pipelines.

Together, C4A, C4A Metal, and N4A form a unified, workload-optimized compute addressing AI inference to scale-out applications, and spans cloud to edge, enabling teams to optimize for both performance and cost on Arm.

A leading ecosystem for Arm-first deployment

Arm now underpins one of the industry’s largest and fastest-growing software ecosystem, driving the shift to Arm-first computing across cloud and edge. Google is already running production services such as YouTube, Gmail, BigQuery, Spanner, Bigtable, Google Earth Engine, Google Compute Engine, GKE Dataflow, Cloud Batch, among others, on Axion processors, and has migrated more than 30,000 internal applications across its production environment.

For organizations beginning their migration, Arm’s Cloud Migration Resource Hub provides 100+ Learning Paths covering common workload patterns on Google Axion. Across the Neoverse ecosystem, the Arm Software Ecosystem Dashboard tracks validated software and recommended versions, while adherence to SystemReady VE standards ensures seamless software interoperability from day one. Leading ISVs including Elastic, MongoDB, Palo Alto Networks, Redis Labs, and Couchbase are fully validated on Axion-based infrastructure.

Get started with Google Axion

Whether deploying agentic workloads with GKE Agent Sandbox, optimizing inference on C4A or scaling general-purpose compute with N4A, Axion provides a consistent, Arm-based foundation for modern AI infrastructure.

Get Started with Google Axion on Google Cloud

Additional Resources:

The post Arm and Google Cloud redefine agentic AI infrastructure with Axion processors appeared first on Arm Newsroom.

Arm’s global platform evolution to meet growing computing and ecosystem demands

Vince Jesaitis — Mon, 20 Apr 2026 15:00:00 +0000

AI is entering a new phase, shifting from experimentation to continuous, large-scale deployment of systems that can reason, plan, and act.

The rise of agentic AI systems is accelerating this shift in computing, increasing the scale, complexity, and persistence of AI workloads, and placing new demands on the infrastructure that supports them.

Across regions, the constraints are clear. Power availability limits how much AI infrastructure can be deployed. Physical space is restricting expansion in existing data centers. As AI systems scale, the complexity of coordinating compute across CPUs, accelerators, and memory is increasing significantly.

This represents an inflection point – and it is reshaping what the ecosystem needs from compute platforms.

A platform evolution that meets a changing market

For more than 35 years, Arm has enabled the global compute ecosystem by providing the platform that powers infrastructure and billions of devices across markets. Our model has focused on designing the architecture, licensing it to partners, and enabling them to build custom solutions optimized for their needs. The flexibility Arm provides through its IP licensing business has allowed companies to make the customizations they need in computing power, performance, and size.

As AI infrastructure evolves, Arm’s model is expanding.

With the introduction of the Arm AGI CPU – our first Arm-designed data center CPU – we are extending Arm’s offerings into production silicon for the first time. This builds on our longstanding IP and Compute Subsystems (CSS) offerings to provide partners with the broadest set of options for deploying Arm-based solutions. Meta serves as lead partner and co-developer, working alongside Arm to optimize the Arm AGI CPU for large-scale AI infrastructure and deploying it alongside its own custom silicon.

Arm’s evolution is driven by clear demand from across the ecosystem. Partners are not looking for a single approach to compute. They are asking for flexibility: multiple pathways they can use in combination depending on their workloads, timelines, and scale. With the introduction of the Arm AGI CPU, Arm is assuming more of the significant engineering work necessary to develop leading edge systems for AI infrastructure and providing another pathway for partners to develop customized AI solutions. This production silicon will allow partners to focus their limited engineering resources on complementary AI chips and systems, accelerating development and innovation.

For Arm, this means we now offer:

Custom IP for differentiated innovation
Pre-integrated CSS to accelerate development
Production silicon that can be deployed directly

This is not a shift from Arm’s partner model. It is an expansion of it. These pathways are designed to be interoperable, enabling partners to build on Arm in the way that best fits their needs and demands.

Arm now delivers IP, CSS and production silicon

The role of CPUs

As AI systems evolve, so does the role of the CPU.

While accelerators remain essential for training and executing AI models, CPUs play a critical role in enabling these systems to operate at scale – coordinating workloads, managing data and ensuring systems run efficiently.

As AI becomes more distributed and continuously running, coordination demands are increasing significantly. Agentic systems generate more interactions, greater data movement, and require sustained performance over time.

Scaling AI is no longer just about increasing accelerators, it also depends on CPU for orchestration, control, and system-level operations. This shift is driving demand for processors designed to deliver performance and efficiency at scale.

Efficiency is becoming the limiting factor

However, the ability to scale AI is increasingly constrained by infrastructure realities.

In many regions, power is emerging as a primary bottleneck, with grid capacity and new facilities taking years to come online. This makes efficiency a strategic requirement.

Improving performance within existing power and space constraints allows organizations to deploy more compute without waiting for new infrastructure. It can accelerate timelines, reduce overall costs, reduce pressure on energy systems, and expand access to AI across a broader range of markets.

Similarly improving performance per rack allows more capability to be delivered within existing facilities, reducing the need for additional physical expansion.

As AI systems become more complex, coordination between CPUs and accelerators becomes essential to overall efficiency.

The Arm AGI CPU is designed with these constraints in mind – delivering the performance, scale, and efficiency required for AI infrastructure – enabling more than 2x performance per rack compared to x86 CPU-based racks. The same power delivery but twice the performance.

A more flexible and resilient ecosystem

These challenges are global. So is the response.

Across markets, policymakers and industry leaders are working to balance AI innovation with the realities of energy systems, infrastructure constraints, and long-term economic growth.

Flexibility and resilience across the compute ecosystem are increasingly important.

Providing multiple pathways to building and deploying infrastructure can:

Support innovation and a diverse ecosystem
Lower barriers to participation
Reduce concentration risk across technology approaches and providers
Strengthen resilience across supply chains.

By expanding the Arm compute platform, from IP to CSS to silicon, we are contributing to a more adaptable and diverse ecosystem for AI infrastructure.

Arm’s expansion is supported by a broad cross-section of the global ecosystem – from hyperscalers and cloud providers to semiconductor and infrastructure partners – reflecting a shared recognition that AI infrastructure requires more flexible and diverse approaches to compute.

Reactions from global technology leaders to the Arm AGI CPU

Building for the next era

The next phase of AI will be defined by how effectively it can be deployed.

The infrastructure decisions made today will shape how quickly AI can scale and how broadly it can be accessed.

By expanding our platform, we’re broadening the ways we support our partners and the global ecosystem in the agentic AI era. This is how we’ll enable the next phase of AI – efficiently, flexibly and at a global scale.

The post Arm’s global platform evolution to meet growing computing and ecosystem demands appeared first on Arm Newsroom.

The evolution of physical AI: From controlled environments to the real world

Arm Editorial Team — Wed, 15 Apr 2026 15:00:00 +0000

Physical AI is moving machines beyond the predictable, controlled environments and into the complexity of the real world. Where robots were once designed for precision and repetition on factory floors, they are now being built to sense, reason, interpret, and respond to dynamic surroundings. This shift is also reflected at a macro level, with AI-driven productivity gains projected to increase global GDP by around 4% over the next decade.

Advances in AI are enabling physical AI systems to understand what they see, grasp context, and adjust their behavior within milliseconds. Whether navigating in warehouses, assisting in hospitals or moving on the road, autonomous machines are making decisions based on real-time conditions instead of fixed sequences.

Arm has supported the development of physical AI systems for years, starting with fixed machines on the factory floor. That same foundation is now enabling the next generation of intelligent robots and autonomous machines that can operate in real-world environments and respond in real time.

Physical AI in the real world

The evolution of physical AI is clear when looking at the machines being built today. Across different form factors, robots and other autonomous machines are beginning to operate with greater awareness, adaptability, and independence.

Next-generation humanoid robotics

Advances in physical AI are being seen in more complex, human-like systems. At Arm’s Cambridge headquarters, AGIBOT highlighted how far humanoid robotics have progressed. Robots demonstrated dexterous control and navigated complex environments with fluid motion, combining perception, reasoning, and control in real time.

When robots visit the office

AGIBOT joined us in Cambridge with its humanoids and a quadruped showcasing a variety of tasks from painting and dancing to human interactions and dynamic movement.

Behind every action is the Arm compute platform, spanning sensor-level processing… pic.twitter.com/XHE6YTIVfr
— Arm (@Arm) March 20, 2026

These physical AI systems are designed to operate in human environments. They must understand space, interpret intent, and execute actions with precision, whilst ensuring the safety of people around them. This places significant demands on compute, as multiple workloads such as vision processing, motion planning, and AI inference must run simultaneously within tight power and thermal limits.

The Arm compute platform supports these requirements by enabling efficient processing across these workloads, enabling humanoid systems to operate responsively and safely in real-world settings.

Quadrupeds and industrial robotics

Quadruped robots represent another important category of physical AI, particularly in environments where terrain is unpredictable and, at times, unsafe.

Robots developed by companies such as Deep Robotics are designed for inspection, exploration, and emergency rescue. They can navigate uneven ground, climb obstacles, and maintain stability in changing conditions. These capabilities rely on continuous perception and real-time control, supported by efficient compute.

Similarly, platforms like the PUDU D5 extend autonomous mobility into industrial environments. Designed for inspection, patrol, and logistics support, the D5 Series operates across large sites and uneven terrain using LiDAR and camera-based vision. This is particularly valuable in environments that are hazardous, remote, or difficult for people to access, where robots can support operations while improving safety and reducing risk.

To support this, the system uses a heterogeneous compute architecture that distributes workloads across perception, planning, and control. Sensor data is processed continuously, allowing the robot to interpret its environment and respond with low latency.

Processors built on the Arm compute platform support these core functions, working alongside AI accelerators to deliver efficient performance at the edge. This enables robots to operate independently in environments where reliability, safety and energy efficiency are critical.

The same shift is also visible in industrial automation. Collaborative robots on factory floors are becoming more responsive to changing workflows, working alongside people and adapting to new tasks without requiring fully fixed configurations.

Boston Dynamics’ Spot is another example of how mobile robots are being deployed in industrial settings for inspection and remote operation, where real-time perception and control are essential.

Autonomous vehicles and mobility platforms

Physical AI is also transforming autonomous mobility, where systems must operate and navigate safely in complex, real-world conditions.

Autonomous robotaxis, such as those developed through the collaboration between Lenovo and WeRide, demonstrate how scalable compute platforms built on Arm are enabling Level 4 autonomy. These systems process large volumes of sensor data, including cameras and LiDAR, to make real-time driving decisions.

At the same time, the Arm and Tensor partnership highlights how next-generation compute platforms are being designed to support AI-driven mobility. These combine high-performance compute with energy efficiency, enabling real-time perception, planning, and control in autonomous systems.

Arm’s work with Rivian also shows how custom autonomy platforms are enabling vehicles to interpret their environment and make real-time driving decisions at scale. In these environments, reliability and latency are critical. Decisions must be made instantly, and systems must operate consistently over long periods. Efficient, scalable compute plays a central role in making this possible.

The building blocks of intelligent systems

At the core of physical AI is a continuous loop between sensing, decision-making, and action. Systems must process inputs from sensors, interpret that data, and trigger responses within milliseconds. In many cases, this latency between perception and action becomes a defining requirement, particularly in environments where safety and timing are critical.

The evolution of physical AI is rooted in how modern robotic and autonomous machines systems are being engineered. They bring together several core capabilities that operate as a coordinated system. Each layer contributes to how the machine understands and interacts with the world.

Perception provides environmental awareness. Cameras, LiDAR, and sensor arrays generate a continuous stream of data, allowing the system to detect objects, estimate distance, and map its surroundings. Edge AI inference processes this data locally. By running AI models on-device, robots can respond instantly without waiting for cloud input. This is critical in environments where latency affects safety or performance.

Multimodal reasoning combines inputs such as vision and language. Robots and other autonomous machines can interpret a scene, understand a command, and decide on the appropriate action. This brings interaction closer to how humans communicate and operate. Real-time control, safety and security ensure that decisions are executed reliably. Deterministic compute allows systems to respond within predictable time frames, while safety and security mechanisms help manage risk in complex environments.

These capabilities are already being applied across industries. In smart factories for instance, predictive maintenance systems analyze equipment data continuously to detect early signs of failure. Meanwhile, in physical AI deployments, systems are designed to process real-world inputs and act with minimal delay.

Enabling physical AI with efficient, scalable compute

Today’s physical AI systems must process high volumes of sensor data, run AI models, and control movement, all within tight power and thermal limits. Many systems operate on batteries with power limits, which places a strong emphasis on efficiency. At the same time, performance must remain consistent, especially in environments where timing and accuracy are critical.

To support this, today’s compute platforms must deliver:

Deterministic performance for real-time decision-making;
Energy efficiency for sustained operation in constrained environments;
On-device AI inference to reduce latency;
Heterogeneous compute to manage diverse workloads; and
The ability to scale across different types of physical AI systems.

The Arm compute platform is designed around these principles. Because of this, it’s used across the full spectrum of compute within physical AI systems, from low power microcontrollers that process sensor data to high-performance central compute (essentially the “brain” of physical AI systems) that handle complex AI workloads. This allows engineers and developers to build systems where each component is optimized for its role, while still operating on a consistent architecture.

For example, in robotics, this approach enables a balance between performance and efficiency. A robot can process sensor inputs, run AI inference efficiently, and execute control logic without exceeding power or thermal limits. This is essential for systems that need to operate continuously in real-world conditions.

Arm’s ecosystem also plays a role in accelerating development. By working with partners across hardware and software, and supported by a global ecosystem of over 22 million developers, Arm enables a wide range of physical AI platforms, from machines on the factory floor in fixed environments to humanoids and quadrupeds operatingin real world environments.

Shaping the future of intelligent machines

Physical AI is shaping how machines interact with the world. Robots and other autonomous machines are being designed with a deeper understanding of their environment, supported by advanced AI that allows them to interpret, decide, and act. This is expanding their role across industries, from interactive systems to industrial automation and autonomous mobility.

As these systems continue to evolve, the balance between performance and efficiency will determine how widely they can be deployed. Arm has supported the development of physical AI systems for years, starting with fixed machines on the factory floor. That same foundation is now enabling the next generation of intelligent robots and autonomous machines, where AI operates directly in the physical world and systems are built to respond in real time.

As physical AI continues to scale, it will increasingly be built on Arm.

Powering intelligent physical AI on Arm

Arm enables safe, real-time, energy-efficient AI systems that sense, decide, and act reliably in the physical world.

Learn how

The post The evolution of physical AI: From controlled environments to the real world appeared first on Arm Newsroom.

Gemma 4 on Arm: Accessible, immediate, optimized on-device AI to accelerate the mobile app experience

Alex Spinelli — Thu, 02 Apr 2026 16:26:45 +0000

Real-time assistance, seamless communication, and greater personalization are now baseline expectations for billions of smartphone users worldwide. Highly capable on-device AI that operates in the power envelope of modern smartphones is essential to delivering instant, intelligent experiences at scale, while unlocking AI’s future potential. 

Google’s launch of Gemma 4 accelerates the ongoing shift to on-device AI, enabling developers to seamlessly access optimized performance and bring increasingly capable AI experiences directly into the apps people use every day. Unlocking these benefits at a global smartphone scale depends on the underlying compute foundation, with one constant that is ubiquitous across the entire Android ecosystem: Arm.

What’s new for Gemma 4 

Gemma 4 further advances on-device AI by delivering improved performance and efficiency, while expanding support for the kinds of multimodal experiences that matter most on Arm-based devices, including reasoning, agentic workflows, and vision-and-audio enabled use cases. With enhanced capabilities across text, audio*, and image, broader language support, and a foundation for real-time assistive experiences, it enables more responsive, context-aware interactions directly on-device without increasing memory footprint.

Exploring Gemma 4 performance on Arm CPUs

In early Arm engineering tests, SME2 shows promising performance gains for running Gemma 4 workloads. Initial tests on the Gemma 4 E2B (Effective 2 Billion) model demonstrate an average of 5.5x speedup in prefill (processing user input) and up to 1.6x faster decode (generating responses), highlighting the potential of Armv9 CPU innovations for on-device AI workloads. These engineering tests include upcoming patches to Google XNNPACK and Arm KleidiAI.

As an early example of what is possible with these improvements, Envision, an accessibility-focused app for blind and low-vision users, evaluated an on-device approach for delivering more of its experience locally. Historically, Envision’s scene interpretation relied on cloud connectivity. In this prototype, Gemma 4 was evaluated running locally on Arm CPUs with SME2 capabilities, enabling users to capture a photo and receive a detailed scene description directly on-device without requiring a network connection or sending sensitive data off-device.

These explorations on Arm CPUs highlight the broader flexibility of the Arm compute platform and the potential for continued innovation across CPU and heterogeneous compute pathways.

The result is lower latency, stronger privacy, and more consistent user experiences regardless of connectivity conditions. This shift from cloud dependency to local inference is critical for mobile applications. It has the potential to reduce infrastructure costs for developers, improve reliability for users, and unlock new categories of real-time applications. 

“Envision is excited to work with Arm and Google to bring powerful accessibility experiences directly onto smartphones. Running visual understanding models like Gemma 4 on-device on SME2-enabled Arm CPUs opens the door to reliable, low-latency scene description and visual Q&A for blind and low-vision users. For our community, the ability to access these capabilities offline is incredibly meaningful because it ensures the technology works wherever they are, while also improving privacy by keeping more processing on the device itself.”  – Karthik Mahadevan, CEO, Envision 

Envision is an early example of what’s possible when Gemma 4 meets the Arm compute platform at mobile scale. As more developers integrate Gemma 4, on-device AI will increasingly become the default architecture rather than the exception. 

Why Arm matters for on-device AI at Android scale

The Armv9 architecture is the most secure, pervasive and advanced ISA ever. Arm Scalable Matrix Extension 2 (SME2) – a set of advanced CPU instructions in the Armv9 architecture – is a key technology, as it accelerates matrix-heavy AI workloads within the power envelope of smartphones. Already built into Arm C1 CPUs  that are integrated into the latest Android smartphone devices, SME2 unlocks higher sustained performance and improved efficiency.  

Through Arm KleidiAI –  Arm’s software acceleration layer integrated into leading runtime libraries, like Google’s XNNPACK, and frameworks, like Google LiteRT and MediaPipe  – the benefits of SME2 are readily accessible to mobile developers with no changes required to existing code, models or deployment pipelines. As a result, developers automatically access out-the-box performance optimizations simply by targeting Arm-based Android devices built on SME2. 

In practice, these software-level gains translate directly into better on-device experiences. Users benefit from faster responses, smoother sustained interactions, and more reliable on-device AI, all while maintaining battery life and thermal stability, even as models grow more capable. 

“Delivering Gemma 4 efficiently across the Android ecosystem requires deep collaboration across hardware and software. Our work with Arm reflects a shared commitment to advancing on-device AI, combining the benefits of the Armv9 architecture and built-in acceleration technologies, like SME2, with the Android operating system to unlock greater performance and efficiency at scale. Together, we’re making it easier for developers to bring fast, responsive, and privacy-preserving AI experiences to our users, without needing to modify their existing applications.”  – Sandeep Patil, Engineering Director, Android

Arm and Google: Building the future of on-device AI together 

As more applications move AI on-device, Arm and Google are committed to supporting developers with accessible performance optimizations and clear guidance that help Gemma 4 accelerate application experiences across all Arm-based mobile devices. 

The future of mobile AI will not be defined solely by larger models, but by how efficiently, securely, and pervasively they run at scale across the Android ecosystem. Through this collaboration, the benefits of on-device AI will be felt by billions of Android smartphone users worldwide.  

_{*only for E2B (Effective 2 Billion) and E4B (Effective 4 Billion)}

The post Gemma 4 on Arm: Accessible, immediate, optimized on-device AI to accelerate the mobile app experience appeared first on Arm Newsroom.

Top 12 Arm-based innovations from March 2026

Arm Editorial Team — Wed, 01 Apr 2026 15:06:43 +0000

From training AI models in the data center to running real-time intelligence on devices, the way compute is being built and used is changing fast. Across the Arm ecosystem, partners and developers are solving real problems: building more efficient AI infrastructure, modernizing embedded workflows, enabling smarter cloud analytics, and bringing advanced experiences to mobile, gaming, and robotics platforms.

This month’s roundup highlights how these innovations are coming to life in practice. From Arm’s first silicon for AI infrastructure to on-device generative AI, neural graphics, and scalable autonomous systems, each story offers a closer look at how Arm-based technologies are helping teams move faster, build more efficiently, and deploy intelligence where it matters most.

Introducing the Arm AGI CPU

Arm has taken a major step forward with the introduction of its first Arm-designed silicon, the Arm AGI CPU, extending the Neoverse platform beyond IP and Compute Subsystems (CSS) to give customers greater choice, from building custom silicon to integrating full platform solutions or deploying Arm-designed processors.

Building on this milestone, Shivangi Agrawal, Product Manager, explores what this means in practice with a ready-to-deploy 1U dual-node reference server designed for high-density AI and cloud workloads. The platform gives partners a production-representative environment to evaluate performance, optimize software stacks, and design for rack-level efficiency across power, cooling, and space. For cloud providers and system builders, it offers a clearer, faster path to deploying Arm-based AI infrastructure at scale.

Bringing real-world robotics to life with Arm

Robots stepped into Arm’s Cambridge office as Shanghai-based robotics company AGIBOT, known for its humanoid and embodied AI systems, showcased its latest humanoids and quadruped systems in action, demonstrating everything from precise task execution to dynamic, human-like movement.

When robots visit the office

AGIBOT joined us in Cambridge with its humanoids and a quadruped showcasing a variety of tasks from painting and dancing to human interactions and dynamic movement.

Behind every action is the Arm compute platform, spanning sensor-level processing… pic.twitter.com/XHE6YTIVfr
— Arm (@Arm) March 20, 2026

Behind each interaction is the Arm compute platform, enabling intelligence from sensor-level processing through to high-performance AI. It highlights how Arm-based systems are supporting real-world robotics, where efficiency, responsiveness, and scalable compute are critical to moving from demonstration to deployment.

Aligning embedded development with modern LLVM workflows

Arm is taking another step toward a more unified and modern software ecosystem by beginning the transition of Arm Toolchain for Embedded from Picolibc to LLVM-libc. Paul Black, Director of Product Management, outlined how this shift is being introduced gradually, starting as an optional overlay before becoming the default C library. The move is designed to better align embedded development with the broader LLVM ecosystem, improving integration across tools and enabling more consistent, portable workflows.

For developers, the transition offers advantages such as simplified licensing and closer integration with LLVM-based runtimes, while also requiring updates to elements like linker scripts, semihosting, and startup code.

Turning smartphone cameras into real-time AI creation tools

Smartphone cameras are rapidly evolving from capture devices into real-time generative AI tools, enabling users to preview and create content directly on their devices.

Your smartphone camera is about to become a generative AI studio.

We've partnered with @tecnomobile to bring real-time, fully on-device AI-generated content previews to smartphones — running at 30fps with zero cloud reliance.

Built on Armv9 CPUs and accelerated by Arm… pic.twitter.com/IJXKZXCfEE
— Arm (@Arm) March 5, 2026

Through its partnership with TECNO, Arm is helping bring this shift to life with fully on-device AI-generated content previews running at 30 frames per second, without relying on the cloud. Built on Armv9 CPUs and accelerated by Arm KleidiAI, the solution delivers faster AI processing while maintaining responsiveness and energy efficiency.

How Arm Developer Labs helps bring industry challenges into the classroom

Arm Developer Labs gives students and educators a practical way to work on real computing problems using professional tools, workflows, and guidance from Arm engineers, helping bridge the gap between classroom learning and industry needs. In this Arm Community blog, Kieran Hejmadi, Software & Academic Ecosystem Development Manager, explores how Arm Developer Labs is helping universities modernize computing curricula through hands-on, industry-driven software challenges built on Arm technologies.

In this example, students at Anglia Ruskin University Cambridge built a web-scraping and data-visualization project that tracked research trends over time, while also extending the work with AI-driven analysis. It shows why industry-aligned learning matters now, giving students more relevant technical experience and helping universities prepare graduates for fast-changing software and AI careers.

Advancing the future of gaming on Arm at GDC

At GDC 2026, Arm brought together developers and partners to explore the next phase of gaming innovation, from neural graphics and AI-driven upscaling to real-world performance optimization.

From neural graphics and upscaling to real-world performance tuning , one thing is clear: the future of gaming is built on Arm.

We had such a great time exploring the future with our partners and all of you at the Arm Developer Summit here at #GDC2026! pic.twitter.com/etB2RRUXmP
— Arm Software Developers (@ArmSoftwareDev) March 10, 2026

Across sessions at the Arm Developer Summit, a clear theme emerged: how to push visual quality further on mobile without exceeding power and thermal limits. Developers explored practical techniques such as Neural Frame Rate Upscaling and AI-assisted rendering, alongside tools that help identify bottlenecks and optimize performance in production workflows. Additionally, Annie Tallund, Solutions Engineer, explored how the Arm Neural Graphics Development Kit is expanding with new neural technologies to help developers bring AI-enhanced graphics and rendering workflows to mobile platforms.

The next generation of AI data centers with Arm

As AI data centers take on more always-on workloads, the challenge is no longer just delivering peak performance. It is also about keeping complex systems running efficiently at scale. Learn why CPUs have become a critical control layer for orchestrating accelerators, managing workloads, and improving overall infrastructure efficiency. It highlights Arm’s work with Meta to help advance more power-efficient AI data center design across the ecosystem.

In the AI data center, power sets the pace.

Always-on AI shifts the constraint from peak performance to intelligent orchestration. CPUs are the control layer that keeps accelerators productive and systems operating efficiently at scale.

Our work with @meta reflects Arm’s… pic.twitter.com/nkU2UtJc26
— Arm (@Arm) March 5, 2026

New Kiro Powers help streamline agentic AI development on Arm

Agentic AI is reshaping how developers build and move software across cloud and edge environments, especially when teams need faster workflows with more architectural guidance. Arm has introduced new Kiro Powers to accelerate agentic AI development and simplify cloud-to-edge workflows on Arm-based platforms, as explained by Zach Lasiuk, Principal Solutions Designer, in this Arm Community blog.

It is these new capabilities that are helping cloud teams plan migrations to AWS Graviton and help embedded developers manage transitions between Arm SoCs with more structure, visibility, and confidence. For developers working across modern AI, embedded, automotive, and edge systems, Zach also explains how Arm is helping make complex platform changes easier to manage.

Accelerating real-time event analytics in the cloud with Arm

Cloud-based event analytics is helping platforms turn live audience activity into immediate business decisions, from dynamic ticket pricing to targeted VIP experiences, while demand is still building.

Pascal Mudimba, Arm Ambassador, shows how stdio x Labs reduced analytics latency by 40% by combining Arm-based compute with optimized BigQuery ML pipelines on Google Cloud. By moving key stages such as stream processing and model preparation onto Arm, the team improved throughput, lowered infrastructure costs, and delivered faster insights for always-on data workloads.

The result is a practical example of how developers and digital platforms can use Arm in the cloud to make real-time analytics more responsive, scalable, and efficient, especially in environments where timing directly impacts revenue and user experience.

Customizing AI models for real-world use cases on Arm

In this video, Michael Hall, Principal Software Engineer and Developer Evangelist, explores how to turn a general-purpose large language model into a domain-specific AI system that can deliver faster, more accurate responses.

Using an NVIDIA DGX Spark desktop powered by Arm CPUs and NVIDIA GPUs, the session walks through fine-tuning a Llama 3.2 model with a custom dataset of Raspberry Pi device specifications. It demonstrates how widely used tools like PyTorch and Hugging Face Transformers can be combined to load models, prepare data, train efficiently, and deploy tailored AI solutions.

Training humanoid robots faster with NVIDIA DGX Spark

Reinforcement learning is becoming a key tool for robotics, helping humanoid systems learn how to move, balance, and respond to real-world conditions through simulation before deployment.

In this Arm Community blog, Odin Shen, Principal Solutions Architect, offers a practical end-to-end workflow for training humanoid robots using NVIDIA DGX Spark, combining high-fidelity simulation, parallelized training, and scalable infrastructure to accelerate real-world robot policy development. Using Isaac Sim and Isaac Lab on a single Arm-based DGX Spark system, developers can build, train, and evaluate humanoid locomotion policies more efficiently and move faster toward reproducible physical AI workflows.

A stronger foundation for Vulkan frame analysis

Frame Advisor is part of Arm Performance Studio, a free toolset that helps graphics developers capture and analyze Vulkan frames so they can spot GPU bottlenecks faster and optimize with more confidence.

In this Arm Community blog, Daniel Baines, Staff Software Engineer, explains how the latest Frame Advisor upgrade introduces a new Vulkan capture pipeline built on GFXReconstruct, improving reliability, correctness, and scalability for today’s increasingly complex graphics workloads. The result is a stronger foundation for developers working on performance-critical applications, especially when accurate frame analysis is essential to diagnosing issues and validating optimizations. Modern tooling can make Vulkan development more dependable today while opening the door to future capabilities.

The post Top 12 Arm-based innovations from March 2026 appeared first on Arm Newsroom.

Announcing Arm AGI CPU: The silicon foundation for the agentic AI cloud era

Mohamed Awad — Tue, 24 Mar 2026 16:55:00 +0000

Today, Arm is announcing the Arm AGI CPU, a new class of production-ready silicon built on the Arm Neoverse platform and designed to power the next generation of AI infrastructure.

For the first time in our more than 35-year history, Arm is delivering its own silicon products – extending the Arm Neoverse platform beyond IP and Arm Compute Subsystems (CSS) to give customers greater choice in how they deploy Arm compute – from building custom silicon to integrating platform-level solutions or deploying Arm-designed processors. It reflects both the rapid evolution of AI infrastructure and growing demand from the ecosystem for production-ready Arm platforms that can be deployed at pace and scale.

The rise of the agentic AI infrastructure

AI systems are increasingly operating continuously at global scale. Historically, the human was the bottleneck in computing – the pace at which people could interact with systems defined how quickly work could move through them. In the era of agentic AI, that constraint disappears as software agents coordinate tasks, interact with multiple models and make decisions in real time.

As AI systems run continuously and workloads grow in complexity, the CPU becomes the pacing element of modern infrastructure – responsible for keeping distributed AI systems operating efficiently at scale. In a modern-day AI data center, the CPU manages thousands of distributed tasks – orchestrating accelerators, managing memory and storage, scheduling workloads and moving data across systems – and now, with agentic AI, coordinating fan-out across large numbers of agents.

This shift places new demands on the CPU and that requires an evolution of the processor.

Arm Neoverse already underpins many of today’s leading hyperscale and AI platforms, including AWS Graviton, Google Axion, Microsoft Azure Cobalt and NVIDIA Vera. As AI infrastructure scales globally, partners across the ecosystem are asking Arm to do more. The Arm AGI CPU was created to address this shift.

Arm AGI CPU: Built for rack-scale agentic efficiency

Agentic AI workloads demand sustained performance at massive scale. The Arm AGI CPU is designed to deliver high per-task performance at sustained load across thousands of cores in parallel – all within the power and cooling limits of modern data centers.

Every element of the Arm AGI CPU – from operating frequency to memory and I/O architecture – has been designed to support massively parallel, high-performance agentic workloads in a densely populated rack deployment.

Arm’s reference server configuration is a 1OU, 2-node design – packing in two chips with dedicated memory and I/O for a total of 272 cores per blade. These blades are designed to fully populate a standard air-cooled 36kW rack – 30 blades delivering a total of 8160 cores. Arm has additionally partnered with Supermicro on a liquid-cooled 200kW design capable of housing 336 Arm AGI CPUs for over 45,000 cores.

In this configuration, the Arm AGI CPU is capable of delivering more than 2x the performance per rack compared to the latest x86 systems*, achieved through the fundamental advantages of the Arm architecture and careful matching of system resources to compute:

Arm AGI CPU’s class-leading memory bandwidth means more effective threads of execution per rack; x86 CPUs degrade as cores contend under sustained load.
High performance, efficient, single-threaded Arm Neoverse V3 CPU cores outperform legacy architectures; every Arm thread does more work.
More usable threads and more work-per-thread compounds to massive performance gains per rack.

Early momentum across the AI ecosystem

The Arm AGI CPU is already seeing strong commercial momentum with partners at the forefront of scaling agentic AI infrastructure. Planned deployments span accelerator management, agentic orchestration and the densification of services, applications and tools needed for agentic task scale-out — as well as increased networking and data plane compute to support the AI data center.

Meta is our lead partner and customer, co-developing the Arm AGI CPU to optimize gigawatt-scale infrastructure for its Meta family of apps and to work alongside Meta’s own custom MTIA accelerators. Other launch partners include Cerebras, Cloudflare, F5, OpenAI, Positron, Rebellions, SAP, and SK Telecom – each working with Arm on the deployment of the Arm AGI CPU to accelerate AI-driven services across cloud, networking and enterprise environments. Commercial systems are now available for order from ASRockRack, Lenovo and Supermicro.

To accelerate adoption further, Arm is introducing the Arm AGI CPU 1OU Dual Node Reference Server, an Open Compute Project (OCP) DC-MHS standard form factor server. Arm plans to contribute this reference server design and supporting firmware, along with further contributions including system architecture specifications, debug frameworks and diagnostic and verification tooling applicable to all Arm-based systems. Further details will come at the upcoming OCP EMEA Summit.

A new chapter for Arm infrastructure

The launch of Arm AGI CPU represents a new chapter in Arm’s data center journey and continued leadership in computing innovation. As AI reshapes the industry, Arm remains committed to enabling progress across the ecosystem – meeting customers where they are, from hyperscale cloud providers to AI startups.

The Arm AGI CPU is the first offering of Arm’s new data center silicon product line and is available to order now. Follow-on products are committed, targeting best-in-class performance, scale and efficiency. This continues in parallel with the Arm Neoverse CSS product roadmap so that all Arm data center customers move forward together on platform architecture and software compatibility.

Entering this new chapter, our mission remains unchanged: to provide the compute foundation that enables innovation across industries. And the ecosystem is fully behind us: More than 50 leading companies across hyperscale, cloud, silicon, memory, networking, software, system design and manufacturing are supporting the expansion of the Arm compute platform into silicon. With Arm AGI CPU, we are not only defining the architecture of the AI-native data center, we are building it.

Hear more from our Arm AGI CPU deployment partners:

Cerebras

“At Cerebras we build AI infrastructure designed for ultra-fast, large-scale inference, and as this becomes the dominant workload in AI, composable, high-performance systems matter more than ever – these systems need purpose-built AI acceleration alongside efficient, scalable CPUs orchestrating data movement, networking, and coordination at scale. Extending the Arm compute platform into AGI-class infrastructure is a positive step for the ecosystem and for customers deploying AI at global scale.” – Andrew Feldman, CEO, Cerebras

Cloudflare

“To continue our mission of helping build a better Internet, Cloudflare needs infrastructure that scales efficiently across our global network. The Arm AGI CPU provides high-performance, energy-efficient compute designed for the next generation of workloads.” – Stephanie Cohen, Chief Strategy Officer, Cloudflare

OpenAI

“OpenAI runs AI systems at massive scale. Hundreds of millions use ChatGPT every day, businesses build on our API, and developers rely on tools like Codex. The Arm AGI CPU will play an important role in our infrastructure as we scale, strengthening the orchestration layer that coordinates large scale AI workloads and improving efficiency, performance, and bandwidth across the system.” – Sachin Katti, Head of Industrial Compute at OpenAI

Positron

“At Positron, we are focused on purpose-built inference accelerators that delivers breakthrough token generation efficiency using commodity memory. Arm has consistently delivered the industry’s most power-efficient compute platforms, which makes the Arm AGI CPU a natural foundation for next-generation AI infrastructure. By combining Positron’s inference acceleration technology with the energy-efficient Arm AGI CPU platform, we see a powerful opportunity to help data center operators deploy frontier AI models at scale with greater performance per watt and per dollar.” – Mitesh Agrawal, CEO, Positron AI

Rebellions

“High-performance AI systems require tight coordination between general-purpose compute and accelerator architectures. By combining the Arm AGI CPU with Rebellions’ NPUs in new high-density server configurations — we’re delivering a scalable, energy efficient platform that is optimized for AI inference workloads at scale.” – Marshall Choy, Chief Business Officer, Rebellions

SAP

“SAP’s successful deployment of SAP HANA on Arm-based AWS Graviton underscores the maturity and performance of the Arm ecosystem for enterprise workloads. The Arm AGI CPU extends that opportunity, providing scalable, efficient compute designed to support the next generation of AI-powered business solutions.” – Stefan Bäuerle, Senior Vice President, Head of HANA & Persistency, SAP

SK Telecom

“SK Telecom is expanding into large-scale, full-stack AI inference data center infrastructure, which includes Arm AGI CPU and Rebellions AI accelerator chip. By bringing together our sovereign A.X foundation model with inference-optimized AI servers, we are ready to deliver it to world while elevating our AIDC competitiveness.” – Suk-geun (SG) Chung, CTO and Head of AI CIC, SK Telecom

Forward-looking statements

This blog post contains forward-looking statements regarding Arm’s product roadmap, future performance, planned contributions and partner deployments. These statements are based on current expectations and are subject to risks and uncertainties that could cause actual results to differ materially. For a discussion of factors that could affect Arm’s results, please refer to Arm’s filings with the U.S. Securities and Exchange Commission.

Performance claims are based on Arm internal estimates comparing a fully populated rack of Arm AGI CPU-based servers against comparable x86-based server configurations using industry-standard workloads. Actual results may vary based on system configuration, workload, and other factors.

All product and company names are trademarks or registered trademarks of their respective holders.

*Based on estimates

The post Announcing Arm AGI CPU: The silicon foundation for the agentic AI cloud era appeared first on Arm Newsroom.

Arm Newsroom

Arm and Google Cloud redefine agentic AI infrastructure with Axion processors

Agentic AI at Scale: Axion leads the pack

AI inference on Axion: Performance that changes the economics

Extending the Axion portfolio

A leading ecosystem for Arm-first deployment

Get started with Google Axion

Arm’s global platform evolution to meet growing computing and ecosystem demands

A platform evolution that meets a changing market

The role of CPUs

Efficiency is becoming the limiting factor

A more flexible and resilient ecosystem

Building for the next era

The evolution of physical AI: From controlled environments to the real world

Physical AI in the real world

Next-generation humanoid robotics

Quadrupeds and industrial robotics

Autonomous vehicles and mobility platforms

The building blocks of intelligent systems

Enabling physical AI with efficient, scalable compute

Shaping the future of intelligent machines

Powering intelligent physical AI on Arm

Gemma 4 on Arm: Accessible, immediate, optimized on-device AI to accelerate the mobile app experience

What’s new for Gemma 4

Exploring Gemma 4 performance on Arm CPUs

Why Arm matters for on-device AI at Android scale

Arm and Google: Building the future of on-device AI together

Top 12 Arm-based innovations from March 2026

Introducing the Arm AGI CPU

Bringing real-world robotics to life with Arm

Aligning embedded development with modern LLVM workflows

Turning smartphone cameras into real-time AI creation tools

How Arm Developer Labs helps bring industry challenges into the classroom

Advancing the future of gaming on Arm at GDC

The next generation of AI data centers with Arm

New Kiro Powers help streamline agentic AI development on Arm

Accelerating real-time event analytics in the cloud with Arm

Customizing AI models for real-world use cases on Arm

Training humanoid robots faster with NVIDIA DGX Spark

A stronger foundation for Vulkan frame analysis

Announcing Arm AGI CPU: The silicon foundation for the agentic AI cloud era

The rise of the agentic AI infrastructure

Arm AGI CPU: Built for rack-scale agentic efficiency

Early momentum across the AI ecosystem

A new chapter for Arm infrastructure

Cerebras

Cloudflare

Meta

OpenAI

Positron

Rebellions

SAP

SK Telecom

Forward-looking statements

What’s new for Gemma 4 

Arm and Google: Building the future of on-device AI together