<!doctype html><html dir=ltr lang=en data-theme class="html theme--light"><head><meta charset=utf-8><title>Frank Denneman | Architecting AI Infrastructure</title><meta name=generator content="Hugo 0.155.2"><meta name=viewport content="width=device-width,initial-scale=1,viewport-fit=cover"><meta name=author content="Frank Denneman"><meta name=description content="Distinguished Engineer and Chief Technologist for AI at VMware Cloud Foundation. Writing about AI infrastructure, GPU platform design, and NUMA-aware architecture."><link rel=stylesheet href=https://frankdenneman.ai/css/anatole.min.d1a2db4db62630dc690a91ef034ed2fbecab93af4a610b9b8966d9f96705289c.css integrity="sha256-0aLbTbYmMNxpCpHvA07S++yrk69KYQubiWbZ+WcFKJw=" crossorigin=anonymous><link rel=stylesheet href=https://frankdenneman.ai/css/markupHighlight.min.73ccfdf28df555e11009c13c20ced067af3cb021504cba43644c705930428b00.css integrity="sha256-c8z98o31VeEQCcE8IM7QZ688sCFQTLpDZExwWTBCiwA=" crossorigin=anonymous type=text/css><link rel=stylesheet href=https://frankdenneman.ai/css/custom.min.b024dade060c56c631739bfb9359831e029263545ddec87c68a0e3e216518fc4.css integrity="sha256-sCTa3gYMVsYxc5v7k1mDHgKSY1Rd3sh8aKDj4hZRj8Q=" crossorigin=anonymous media=screen><link rel=stylesheet href=https://frankdenneman.ai/fontawesome/css/fontawesome.min.137b1cf3cea9a8adb7884343a9a5ddddf4280f59153f74dc782fb7f7bf0d0519.css integrity="sha256-E3sc886pqK23iENDqaXd3fQoD1kVP3TceC+3978NBRk=" crossorigin=anonymous type=text/css><link rel=stylesheet href=https://frankdenneman.ai/fontawesome/css/solid.min.e65dc5b48fb5f39b142360c57c3a215744c94e56c755c929cc3e88fe12aab4d3.css integrity="sha256-5l3FtI+185sUI2DFfDohV0TJTlbHVckpzD6I/hKqtNM=" crossorigin=anonymous type=text/css><link rel=stylesheet href=https://frankdenneman.ai/fontawesome/css/regular.min.6f4f16d58da1c82c0c3a3436e021a3d39b4742f741192c546e73e947eacfd92f.css integrity="sha256-b08W1Y2hyCwMOjQ24CGj05tHQvdBGSxUbnPpR+rP2S8=" crossorigin=anonymous type=text/css><link rel=stylesheet href=https://frankdenneman.ai/fontawesome/css/brands.min.e10425ad768bc98ff1fb272a0ac8420f9d1ba22f0612c08ff1010c95080ffe7e.css integrity="sha256-4QQlrXaLyY/x+ycqCshCD50boi8GEsCP8QEMlQgP/n4=" crossorigin=anonymous type=text/css><link rel="shortcut icon" href=https://frankdenneman.ai/favicons/favicon.ico type=image/x-icon><link rel=apple-touch-icon sizes=180x180 href=https://frankdenneman.ai/favicons/apple-touch-icon.png><link rel=icon type=image/png sizes=32x32 href=https://frankdenneman.ai/favicons/favicon-32x32.png><link rel=icon type=image/png sizes=16x16 href=https://frankdenneman.ai/favicons/favicon-16x16.png><link rel=canonical href=https://frankdenneman.ai/><link rel=alternate type=application/rss+xml href=https://frankdenneman.ai/index.xml title="Frank Denneman | Architecting AI Infrastructure"><script type=text/javascript src=https://frankdenneman.ai/js/anatole-header.min.f9132794301a01ff16550ed66763482bd848f62243d278f5e550229a158bfd32.js integrity="sha256-+RMnlDAaAf8WVQ7WZ2NIK9hI9iJD0nj15VAimhWL/TI=" crossorigin=anonymous></script><script async src=https://plausible.io/js/pa-V0qkOgRPBQTCYaqMrtS5Q.js></script><script>window.plausible=window.plausible||function(){(plausible.q=plausible.q||[]).push(arguments)},plausible.init=plausible.init||function(e){plausible.o=e||{}},plausible.init()</script><meta name=twitter:card content="summary"><meta name=twitter:title content="Frank Denneman"><meta name=twitter:description content="Distinguished Engineer and Chief Technologist for AI at VMware Cloud Foundation. Writing about AI infrastructure, GPU platform design, and NUMA-aware architecture."><meta property="og:url" content="https://frankdenneman.ai/"><meta property="og:site_name" content="Frank Denneman | Architecting AI Infrastructure"><meta property="og:title" content="Frank Denneman"><meta property="og:description" content="Distinguished Engineer and Chief Technologist for AI at VMware Cloud Foundation. Writing about AI infrastructure, GPU platform design, and NUMA-aware architecture."><meta property="og:locale" content="en_us"><meta property="og:type" content="website"><script type=application/ld+json>{"@context":"https://schema.org","@type":"WebSite","@id":"https://frankdenneman.ai/#website","name":"Frank Denneman | Architecting AI Infrastructure","url":"https://frankdenneman.ai/","description":"Distinguished Engineer and Chief Technologist for AI at VMware Cloud Foundation. Writing about AI infrastructure, GPU platform design, and NUMA-aware architecture.","creator":{"@id":"https://frankdenneman.ai/#person"},"publisher":{"@type":"Person","@id":"https://frankdenneman.ai/#person","name":"Frank Denneman","url":"https://frankdenneman.ai/"}}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"Person","@id":"https://frankdenneman.ai/#person","name":"Frank Denneman","url":"https://frankdenneman.ai/","jobTitle":"Distinguished Engineer | Chief Technologist for AI","worksFor":{"@type":"Organization","name":"VMware Cloud Foundation (Broadcom)"},"knowsAbout":["AI Infrastructure","GPU resource management","LLM runtime memory","vGPU placement","NUMA-aware infrastructure"],"sameAs":["https://www.linkedin.com/in/frankdenneman/","https://github.com/frankdenneman","https://www.instagram.com/roadtoimmersion","https://x.com/FrankDenneman"]}</script></head><body class=body><div class=wrapper><aside class=wrapper__sidebar><div class="sidebar
animated fadeInDown"><div class=sidebar__content><div class=sidebar__introduction><img class=sidebar__introduction-profileimage src=https://frankdenneman.ai/images/Headshot-FD.jpeg alt="profile picture"><div class=sidebar__introduction-title><h1><a href=https://frankdenneman.ai/>Frank Denneman</a></h1></div><div class=sidebar__introduction-description><p>Distinguished Engineer and Chief Technologist for AI at VMware Cloud Foundation. Writing about AI infrastructure, GPU platform design, and NUMA-aware architecture.</p></div></div><ul class=sidebar__list><li class=sidebar__list-item><a href=https://www.linkedin.com/in/frankdenneman/ target=_blank rel="noopener me" aria-label=LinkedIn title=LinkedIn><i class="fab fa-linkedin fa-2x" aria-hidden=true></i></a></li><li class=sidebar__list-item><a href=https://github.com/frankdenneman target=_blank rel="noopener me" aria-label=GitHub title=GitHub><i class="fab fa-github fa-2x" aria-hidden=true></i></a></li><li class=sidebar__list-item><a href=https://www.instagram.com/roadtoimmersion target=_blank rel="noopener me" aria-label=Instagram title=Instagram><i class="fab fa-instagram fa-2x" aria-hidden=true></i></a></li><li class=sidebar__list-item><a href=https://x.com/FrankDenneman target=_blank rel="noopener me" aria-label=X title=X><i class="fab fa-twitter fa-2x" aria-hidden=true></i></a></li></ul></div><footer class="footer footer__sidebar"><ul class=footer__list><li class=footer__item>&copy;
Frank Denneman
2026</li></ul></footer><script type=text/javascript src=https://frankdenneman.ai/js/medium-zoom.min.9531b4a2217a70c5a1fce89c1f81c9ebbdd586708fcd4130b417320c7230c8d6.js integrity="sha256-lTG0oiF6cMWh/OicH4HJ673VhnCPzUEwtBcyDHIwyNY=" crossorigin=anonymous></script><script async src="https://www.googletagmanager.com/gtag/js?id=G-XSZJ05LPXJ"></script><script>window.dataLayer=window.dataLayer||[];function gtag(){dataLayer.push(arguments)}gtag("js",new Date),gtag("config","G-XSZJ05LPXJ")</script></div></aside><main class=wrapper__main><header class=header><div class="animated fadeInDown"><a role=button class=navbar-burger data-target=navMenu aria-label=menu aria-expanded=false><span aria-hidden=true class=navbar-burger__line></span>
<span aria-hidden=true class=navbar-burger__line></span>
<span aria-hidden=true class=navbar-burger__line></span></a><nav class=nav><ul class=nav__list id=navMenu><li class=nav__list-item><a class=nav__link--active href=https://frankdenneman.ai/ title>Home</a></li><li class=nav__list-item><a href=https://frankdenneman.ai/categories/ai/ title>AI</a></li><li class=nav__list-item><a href=https://frankdenneman.ai/categories/numa/ title>NUMA</a></li><li class=nav__list-item><a href=https://frankdenneman.ai/ai-infrastructure/ title>AI Infra Series</a></li><li class=nav__list-item><a href=https://frankdenneman.ai/understanding-ai-memory/ title>AI Memory Series</a></li><li class=nav__list-item><a href=https://frankdenneman.ai/tools/ title>Tools</a></li><li class=nav__list-item><a href=https://frankdenneman.ai/search/ title>Search</a></li><li class=nav__list-item><a href=https://frankdenneman.ai/about/ title>About</a></li></ul><ul class="nav__list nav__list--end"></ul></nav></div></header><div class="post
animated fadeInDown"><div class=post__content></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-03-31-Topology-Aware-Multi-GPU-VM-Placement/>TOPOLOGY-AWARE MULTI-GPU VM PLACEMENT</a></h3><p>Architecting AI Infrastructure Series - Part 11 A multi-GPU VM isn&rsquo;t only asking for multiple devices. It&rsquo;s asking for a specific communication geometry. This distinction matters. When a platform team provisions a VM for LLM inference or fine-tuning, they&rsquo;re not simply allocating two units of compute. They&rsquo;re allocating two GPUs that can communicate at hundreds of gigabytes per second over NVLink. Two GPUs on the same server that must communicate via PCIe won&rsquo;t deliver the same result.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Tue, Mar 31, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-03-27-Understanding-Multi-GPU-Topologies-Within-a-Single-Host/>UNDERSTANDING MULTI-GPU TOPOLOGIES WITHIN A SINGLE HOST</a></h3><p>Architecting AI Infrastructure Series - Part 10 Part 9 covered why it&rsquo;s important to understand the topology when using multiple GPUs. When a model runs across several GPUs, communication between them becomes part of the process. Not all GPUs in a server communicate at the same speed, and these differences can impact performance.
Many AI teams prefer to run their workloads on a single server. This helps reduce network complexity and simplify deployment. Still, there are several ways to set up multiple GPUs in a single server.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Fri, Mar 27, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/>UNDERSTANDING UNIFIED MEMORY ON DGX SPARK RUNNING NEMOCLAW AND NEMOTRON</a></h3><p>NemoClaw became the talk of GTC 2026 within hours of its announcement. It wraps OpenClaw in NVIDIA’s OpenShell runtime, adds guardrails, and gives you an always on AI agent with a single install. Jensen Huang called OpenClaw the operating system for personal AI. NemoClaw is what makes that usable.
This is part 4 of the AI Memory series and focuses on how memory behaves on real systems.
I installed NemoClaw on a DGX Spark and ran Nemotron models locally to understand what actually happens in memory. The most important takeaway is simple. Unified memory breaks the usual GPU mental model.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Mon, Mar 23, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a><a class=category href=https://frankdenneman.ai/categories/dgx-spark/>dgx-spark</a></span>
<span><a class="category series-memory" href=https://frankdenneman.ai/understanding-ai-memory/>AI Memory</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-03-16-why-multi-gpu-requires-topology-awareness/>WHY MULTI GPU REQUIRES TOPOLOGY AWARENESS</a></h3><p>Architecting AI Infrastructure Series - Part 9 The AI Memory series has been showing how AI workloads use GPU memory in different ways. The Dynamic World of LLM Runtime Memory explains how the KV cache grows with each new token and becomes a main user of GPU resources. Understanding Activation Memory in Mixture of Experts Models looks at the hardware pressure that happens when activation memory spikes during the prefill phase. The series also covers how agentic systems keep memory active to stay on track during complex tasks, as discussed in Durable Agentic AI Sessions in GPU Memory.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Mon, Mar 16, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-03-12-durable-agentic-ai-sessions-in-gpu-memory/>DURABLE AGENTIC AI SESSIONS IN GPU MEMORY</a></h3><p>The durable memory of agentic systems When a user asks a question in a chat interface and the model responds, the interaction is a single prompt completion. A prompt goes in, tokens come out. From an infrastructure perspective this is a predictable transaction. As described in The Dynamic World of LLM Runtime Memory, the KV cache grows with the prompt, peaks during generation, and is released when the session ends. The memory footprint is bounded and relatively easy to plan for.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Thu, Mar 12, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-memory" href=https://frankdenneman.ai/understanding-ai-memory/>AI Memory</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-03-06-mig-partitioning-placement-geometry-and-stranded-capacity/>MIG PARTITIONING, PLACEMENT GEOMETRY, AND STRANDED CAPACITY</a></h3><p>Architecting AI Infrastructure — Part 8 Previous articles in this series explained how time-sliced GPU sharing works in both same-size and mixed-size environments. They showed that choices like profiles and the order in which workloads start can directly affect GPU utilization and whether workloads are placed successfully. In this part, we look at MIG and the design choices that affect placement success and overall resource utilization.
MIG takes a different approach to GPU sharing. Instead of multiplexing compute resources between workloads, MIG splits the GPU into hardware instances. Each instance gets its own dedicated compute and memory slices slices.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Fri, Mar 6, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-03-01-same-size-vs-mixed-size-placement/>SAME SIZE VS MIXED SIZE PLACEMENT AT CLUSTER SCALE</a></h3><p>Architecting AI Infrastructure — Part 7 The Silo Capacity Visualizer from Part 6 shows how profile selection and placement-ID alignment affect memory layout inside a single GPU. While that&rsquo;s helpful for understanding the basics, real capacity planning happens at the cluster level. This article introduces the Same-size vs Mixed-size Placement simulator, the second tool in the Cluster Profile Strategy Toolset. It lets you simulate vGPU placement across an entire cluster using both same-size and mixed-size policies simultaneously, with the same workload sequence for both. This way, you can directly compare their results.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Sun, Mar 1, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-02-24-mixed-size-vgpu-mode-in-practice/>MIXED SIZE VGPU MODE IN PRACTICE</a></h3><p>Architecting AI Infrastructure - Part 6 Last time, I looked at how Same Size vGPU mode works with different assignment policies and how right-sizing profiles can make placement more flexible. The main point was that both profile variety and assignment choices have a big impact on how much GPU capacity you can actually use over time.
Understanding Placement IDs and Siloed Capacity This article focuses on Mixed Size mode. Unlike locking a GPU to one profile after the first placement, Mixed Size lets you use different profile sizes on the same device. This might seem like an easy fix for fragmentation, but it brings a new challenge: placement IDs. These are fixed memory spots on the GPU where a profile can begin, so even if memory appears free, you can&rsquo;t always use it unless it aligns with a valid placement spot. For more details on how placement IDs work, see Part 4.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Tue, Feb 24, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a>
</span><span><a class=tag href=https://frankdenneman.ai/tags/gpu-placement/>GPU Placement</a><a class=tag href=https://frankdenneman.ai/tags/ai-platform/>AI Platform</a><a class=tag href=https://frankdenneman.ai/tags/vmware-private-ai-foundation/>VMware Private AI Foundation</a><a class=tag href=https://frankdenneman.ai/tags/kubernetes/>Kubernetes</a><a class=tag href=https://frankdenneman.ai/tags/vsphere/>vSphere</a><a class=tag href=https://frankdenneman.ai/tags/scheduling/>Scheduling</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-02-19-How-Same-Size-vGPU-Mode-and-Right-sizing-Shape-GPU-Placement-Efficiency/>HOW SAME SIZE VGPU MODE AND RIGHT-SIZING SHAPE GPU PLACEMENT EFFICIENCY</a></h3><p>Architecting AI Infrastructure - Part 5 In the previous article, we looked at how GPUs are placed within an ESXi host and how GPU modes and assignment policies determine which physical GPU a workload uses. These decisions impact more than just the initial placement of workloads. They also shape how GPU capacity changes over time, affecting fragmentation, consolidation, and how easily new workloads can be scheduled. In this article, we will look at workloads that use fractional GPU profiles and how their sizing choices impact overall platform efficiency.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Thu, Feb 19, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a>
</span><span><a class=tag href=https://frankdenneman.ai/tags/gpu-placement/>GPU Placement</a><a class=tag href=https://frankdenneman.ai/tags/ai-platform/>AI Platform</a><a class=tag href=https://frankdenneman.ai/tags/vmware-private-ai-foundation/>VMware Private AI Foundation</a><a class=tag href=https://frankdenneman.ai/tags/kubernetes/>Kubernetes</a><a class=tag href=https://frankdenneman.ai/tags/vsphere/>vSphere</a><a class=tag href=https://frankdenneman.ai/tags/scheduling/>Scheduling</a></span></div></div><div class="post
animated fadeInDown"><div class=post__content><h3><a href=https://frankdenneman.ai/2026-02-17-How-vSphere-GPU-Modes-and-Assignment-Policies-Determine-Host-Level-Placement/>HOW VSPHERE GPU MODES AND ASSIGNMENT POLICIES DETERMINE HOST LEVEL PLACEMENT</a></h3><p>Architecting AI Infrastructure - Part 4 In the last article, we tracked a GPU-backed VM from resource configuration to host selection. DRS evaluated the cluster, Assignable Hardware filtered hosts for GPU compatibility, DRS ran its Goodness calculation, and picked a destination host. Now, the host is selected. But the placement is not finished.
Inside the host, another set of decisions decides which physical GPU gets the workload and what types of workloads that GPU will handle from then on. These host-level choices are less visible than DRS decisions. They do not show up in dashboards or trigger alerts. However, their effects add up over time, and they play a key role in keeping a shared AI platform healthy or letting it decline.</p></div><div class=post__footer><em class="fas fa-calendar-day"></em>
<span class=post__footer-date>Tue, Feb 17, 2026
</span><span><a class=category href=https://frankdenneman.ai/categories/ai/>ai</a></span>
<span><a class="category series-infra" href=https://frankdenneman.ai/ai-infrastructure/>AI Infrastructure</a>
</span><span><a class=tag href=https://frankdenneman.ai/tags/gpu-placement/>GPU Placement</a><a class=tag href=https://frankdenneman.ai/tags/ai-platform/>AI Platform</a><a class=tag href=https://frankdenneman.ai/tags/vmware-private-ai-foundation/>VMware Private AI Foundation</a><a class=tag href=https://frankdenneman.ai/tags/kubernetes/>Kubernetes</a><a class=tag href=https://frankdenneman.ai/tags/vsphere/>vSphere</a><a class=tag href=https://frankdenneman.ai/tags/scheduling/>Scheduling</a></span></div></div><div class=pagination><ul class=pagination__list><li class=pagination__list-item><span class="page-link current">1</span></li><li class=pagination__list-item><a class=page-link href=https://frankdenneman.ai/page/2/>2</a></li><span class="page-link dots">&mldr;</span><li class=pagination__list-item><a class=page-link href=https://frankdenneman.ai/page/47/>47</a></li><li class=pagination__list-item><a class=page-link href=https://frankdenneman.ai/page/2/><i class="fa fa-angle-right" aria-label=Next></i></a></li></ul></div></main></div><footer class="footer footer__base"><ul class=footer__list><li class=footer__item>&copy;
Frank Denneman
2026</li></ul></footer><script type=text/javascript src=https://frankdenneman.ai/js/medium-zoom.min.9531b4a2217a70c5a1fce89c1f81c9ebbdd586708fcd4130b417320c7230c8d6.js integrity="sha256-lTG0oiF6cMWh/OicH4HJ673VhnCPzUEwtBcyDHIwyNY=" crossorigin=anonymous></script><script async src="https://www.googletagmanager.com/gtag/js?id=G-XSZJ05LPXJ"></script><script>window.dataLayer=window.dataLayer||[];function gtag(){dataLayer.push(arguments)}gtag("js",new Date),gtag("config","G-XSZJ05LPXJ")</script></body></html>