<?xml version="1.0" encoding="UTF-8" standalone="no"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" version="2.0">

<channel>
	<title>AWS Compute Blog</title>
	<atom:link href="https://aws.amazon.com/blogs/compute/feed/" rel="self" type="application/rss+xml"/>
	<link>https://aws.amazon.com/blogs/compute/</link>
	<description/>
	<lastBuildDate>Wed, 27 May 2026 15:04:10 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Migrating your Java applications to AWS Graviton using AWS Transform custom</title>
		<link>https://aws.amazon.com/blogs/compute/migrating-your-java-applications-to-aws-graviton-using-aws-transform-custom/</link>
					
		
		<dc:creator><![CDATA[Hahnara Hyun]]></dc:creator>
		<pubDate>Wed, 27 May 2026 15:04:10 +0000</pubDate>
				<category><![CDATA[AWS Transform]]></category>
		<category><![CDATA[Graviton]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">b568276dc81e2e50ca3d0c5bba44dc920320349c</guid>

					<description>For Java applications, modern JVMs like Amazon Corretto and OpenJDK are highly optimized for Arm64 and modern applications that are pure Java often require zero changes to run on Graviton. In many cases, applications aren’t fully modernized or purely Java and have a range of dependencies. When you’re responsible for migrating workloads, it’s helpful to […]</description>
										<content:encoded>&lt;p&gt;For Java applications, modern JVMs like &lt;a href="https://aws.amazon.com/corretto/" target="_blank" rel="noopener"&gt;Amazon Corretto&lt;/a&gt; and &lt;a href="https://openjdk.org/" target="_blank" rel="noopener"&gt;OpenJDK&lt;/a&gt; are highly optimized for Arm64 and modern applications that are pure Java often require zero changes to run on &lt;a href="https://aws.amazon.com/ec2/graviton/" target="_blank" rel="noopener"&gt;Graviton&lt;/a&gt;. In many cases, applications aren’t fully modernized or purely Java and have a range of dependencies. When you’re responsible for migrating workloads, it’s helpful to use a systematic approach that surfaces issues, proposes solutions, and does the transformation work for you at scale.&lt;/p&gt; 
&lt;p&gt;That’s why we built the Java x86 to Graviton Migration transformation for &lt;a href="https://aws.amazon.com/transform/custom/" target="_blank" rel="noopener"&gt;AWS Transform custom (ATX)&lt;/a&gt;. This is an AI-powered agent that analyzes your Java codebase, creates a migration plan, and executes the transformation—complete with version-controlled commits at every step. With ATX you can efficiently assess hundreds of Java applications simultaneously and quickly learn which applications require no changes and which ones need modifications. This streamlines the process of estimating the scope of effort, while also having suggested code updates before you even start.&lt;/p&gt; 
&lt;p&gt;ATX is available as a &lt;a href="https://github.com/kirodotdev/powers/tree/main/aws-transform" target="_blank" rel="noopener"&gt;Kiro power&lt;/a&gt;, a &lt;a href="https://marketplace.visualstudio.com/items?itemName=AmazonWebServices.aws-transform-plugin" target="_blank" rel="noopener"&gt;VS Code extension&lt;/a&gt;, and an &lt;a href="https://github.com/aws/agent-toolkit-for-aws/tree/main/skills/specialized-skills/migration-and-modernization-skills/aws-transform" target="_blank" rel="noopener"&gt;Agent Skill&lt;/a&gt; if you’d like to use it directly within other AI assistants to reduce context switching. While we will be using ATX to highlight how you can rapidly accelerate a Graviton migration, we have also published an open source &lt;a href="https://github.com/aws/aws-graviton-getting-started/tree/main/tools/skills" target="_blank" rel="noopener"&gt;Graviton universal skill&lt;/a&gt; based on the Agent Skills open standard so that you have the flexibility to use the skill natively within Kiro, Claude Code, Codex, or the platform of your choice.&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;AWS Graviton processors, based on the Arm64 architecture, can provide up to 40% better price performance over comparable x86-based instances for a wide variety of workloads. Now customers can &lt;/em&gt;&lt;em&gt;use&lt;/em&gt;&lt;em&gt; AI tools to quickly migrate workloads to Graviton.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;The Java x86 to Graviton migration transformation&lt;/h2&gt; 
&lt;p&gt;At a high level, we recommend customers finish any major version Java updates prior to migrating to Graviton and there’s a separate Java Version Upgrade transformation available for this use case. The Java x86 to Graviton Migration transformation requires a minimum of Java 8 and won’t incorporate Java version updates into the code changes.&lt;/p&gt; 
&lt;p&gt;The Java x86 to Graviton Migration completes multiple steps with work divided across multiple AI agents within the AWS Transform service, covering things like:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Native library analysis&lt;/strong&gt; – Identifies Java Native Interface (JNI) dependencies and finds Arm64-compatible alternatives&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Dependency updates&lt;/strong&gt; – Updates libraries to versions with Arm64 support&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Build configuration&lt;/strong&gt; – Modifies Maven/Gradle configs for multi-architecture builds&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Architecture-specific code&lt;/strong&gt; – Refactors hard-coded x86 assumptions&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Unit Test&lt;/strong&gt; – Verifies compatibility at runtime given unit tests are in the project&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Documentation&lt;/strong&gt; – Creates migration notes and runbooks for your team&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The agent automatically detects your Java version, manages runtime switching as needed during analysis, and handles much of the environment complexity for you such as multi-module project detection or Maven or Gradle auto-detection. Transformation completion times vary, but for many applications you can expect it to take roughly an hour (ATX works well with repos under 300K lines of code).&lt;/p&gt; 
&lt;p&gt;In this post, we:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Walk through the requirements for running the Java x86 to Graviton Migration transformation.&lt;/li&gt; 
 &lt;li&gt;Help you familiarize yourself with ATX using a single Java application with Interactive Mode&lt;/li&gt; 
 &lt;li&gt;Outline how to assess Graviton compatibility across the Java applications that you want to migrate to Graviton in a single batch and summarize the results with Campaign Mode.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;By the end, you should have a good idea of how Java x86 to Graviton Migration transformation functions and have a summary of the expected code changes and dependency updates needed for each of your Java applications, along with version-controlled code updates.&lt;/p&gt; 
&lt;h2&gt;Graviton transformation requirement&lt;/h2&gt; 
&lt;p&gt;The Java x86 to Graviton migration transformation should run on an Arm64 machine.&lt;/p&gt; 
&lt;p&gt;The agent doesn’t just read your code, it builds, loads native libraries, and validates your application’s runtime behavior on Arm64. If you run the transformation on an x86 machine, the agent can identify compatibility issues but can’t execute build validation or run tests.&lt;/p&gt; 
&lt;p&gt;If you try to run on x86, you will see the following error message:&lt;/p&gt; 
&lt;pre&gt;&lt;code&gt;⚠  This transformation requires Arm64 architecture.    
Detected: x86_64        
Please run ATX on an Arm64 environment. See documentation for options.&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;To get started you need a Graviton instance or Apple Mac laptop running Arm64 with the ATX CLI, build tools, and Java JDKs that your project requires. The project source code should also be loaded locally onto the machine running the ATX CLI. Because Apple silicon is Arm64-based, it’s possible to build, load, and verify Arm64 based dependencies for a quick proof-of-concept. However, we recommend running the transformation in an environment that reflects what you plan to deploy in production to surface any potential OS level incompatibilities.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/p&gt; 
&lt;table&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;th&gt;&lt;strong&gt;Requirement&lt;/strong&gt;&lt;/th&gt; 
   &lt;th&gt;&lt;strong&gt;Details&lt;/strong&gt;&lt;/th&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;AWS Transform custom permissions&lt;/td&gt; 
   &lt;td&gt;AWS Identity and Access Management (IAM) policies for the Transform service (see &lt;a href="https://docs.aws.amazon.com/transform/latest/userguide/custom-get-started.html#custom-authentication" target="_blank" rel="noopener"&gt;Authentication docs&lt;/a&gt;)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Arm64 execution environment&lt;/td&gt; 
   &lt;td&gt;&lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt; Graviton instance or Apple Silicon Mac. Running on x86 limits validation to static analysis only. Phase 3 (build/test) requires Arm64.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Node.js 20+&lt;/td&gt; 
   &lt;td&gt;Required by the AWS Transform CLI. Use the official installer at &lt;a href="https://nodejs.org/en/download" target="_blank" rel="noopener"&gt;nodejs.org/en/download&lt;/a&gt;. Package managers (dnf, yum) can install an older version.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Git&lt;/td&gt; 
   &lt;td&gt;AWS Transform custom uses local Git for version control during the transformation.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;AWS Transform CLI&lt;/td&gt; 
   &lt;td&gt;Installed using the setup script (see &lt;a href="https://docs.aws.amazon.com/transform/latest/userguide/custom-get-started.html#custom-installation" target="_blank" rel="noopener"&gt;Client Setup&lt;/a&gt; for the &lt;strong&gt;curl&lt;/strong&gt; command).&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Java build tooling&lt;/td&gt; 
   &lt;td&gt;A JDK (Arm64 build, e.g. Amazon Corretto or OpenJDK), Maven and/or Gradle as required by the target project. These are not optional for Java transformations. The agent needs them for dependency analysis, native library scanning, and build validation.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;Running the Graviton transformation with Interactive Mode&lt;/h2&gt; 
&lt;p&gt;With your code on an Arm64 environment and all the prerequisites for the transformation, we can begin the transformation.&lt;/p&gt; 
&lt;h3&gt;Step 1: Navigate to Your Project and create or clone a git repo&lt;/h3&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;cd /home/developer/workspace # Docker 
# or 
cd ~/workspace # AMI
git init&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;We recommend not pointing to the main branch of the repository of your application. You can work in a local git environment or create a separate branch. ATX needs the ability to commit changes as it iteratively transforms your code. The final decision on which commits are pushed is up to the developer.&lt;/p&gt; 
&lt;h3&gt;Step 2: Launch ATX Interactive Mode&lt;/h3&gt; 
&lt;p&gt;Enter the following command to launch ATX interactive mode.&lt;/p&gt; 
&lt;pre&gt;&lt;code&gt;atx&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;ATX starts in interactive mode:&lt;/p&gt; 
&lt;figure&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId10.png"&gt;&lt;img loading="lazy" width="1429" height="604" class="alignnone size-full wp-image-26226" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId10.png" alt=""&gt;&lt;/a&gt;
&lt;/figure&gt; 
&lt;p&gt;To view available transformations, in a separate terminal enter:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;atx custom def list &amp;gt; custom_list.txt&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The AWS Managed transformations will be listed first, followed by User-created transformations that you’ve developed.&lt;/p&gt; 
&lt;h3&gt;Step 3: Select the Graviton transformation&lt;/h3&gt; 
&lt;p&gt;Enter the following into &lt;a href="https://docs.aws.amazon.com/transform/latest/userguide/custom-command-reference.html" target="_blank" rel="noopener"&gt;atx cli&lt;/a&gt;:&lt;/p&gt; 
&lt;pre&gt;&lt;code&gt;&amp;gt;AWS/early-access-java-x86-to-graviton&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;ATX will prompt you for next steps and your project details:&lt;/p&gt; 
&lt;pre&gt;&lt;code&gt;&amp;gt; Would you like to:+ c to abort or provide feedback)
1. View the entire transformation definition
2. View specific sections of the transformation definition
3. Apply this transformation to your code
4. Modify this transformation
&amp;gt; 3
&amp;gt; What is the file system path to the code repository where you want to apply this transformation?
&amp;gt; .&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Note that because this is an AWS Managed Transformation, you can’t view the complete transformation definition or modify it. However, you can provide additional context customized to your use case. Keep in mind that the Transformation won’t make permanent changes to your code through the transformation process.&lt;/p&gt; 
&lt;h3&gt;Step 4: Provide additional context&lt;/h3&gt; 
&lt;p&gt;ATX might ask clarifying questions to tailor the transformation:&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId12.jpeg"&gt;&lt;img loading="lazy" width="2108" height="1032" class="alignnone size-full wp-image-26227" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId12.jpeg" alt=""&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;Step 5: Review the analysis&lt;/h3&gt; 
&lt;p&gt;ATX analyzes your codebase:&lt;/p&gt; 
&lt;pre&gt;&lt;code&gt;Analyzing your codebase...[████████████████████████████████████████] 100%
✓ Parsed 847 Java files
✓ Analyzed 234 dependencies  
✓ Identified 12 native library usages
✓ Scanned build configuration
✓ Evaluated test coverage
Analysis complete. Preparing migration plan...&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Step 6: Review the migration plan&lt;/h3&gt; 
&lt;p&gt;ATX presents a detailed plan before making changes:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;═══════════════════════════════════════════════════════════════
                    MIGRATION PLAN
═══════════════════════════════════════════════════════════════

I've analyzed your project and here's what I'll do:

PHASE 1: Dependency Updates (12 changes)
────────────────────────────────────────
 • Update netty-transport-native-epoll 4.1.85 → 4.1.100
   Reason: Arm64 support added in 4.1.94
   
 • Update snappy-java 1.1.8.4 → 1.1.10.5  
   Reason: Current version lacks Arm64 native binaries
   
 • Update lz4-java 1.7.1 → 1.8.0
   Reason: Arm64 optimization improvements
   
 • ... (9 more dependency updates)

PHASE 2: Build Configuration (3 changes)
────────────────────────────────────────
 • Add Arm64 classifier to native dependency declarations
 • Configure multi-architecture Docker build
 • Add Graviton-optimized JVM flags to startup scripts

PHASE 3: Code Modifications (2 changes)
────────────────────────────────────────
 • Refactor ProcessorArchitecture.java
   Current: Hard-coded x86_64 assumption
   Change: Add Arm64 detection logic
   
 • Update NativeLibraryLoader.java
   Current: Loads x86-specific .so files
   Change: Architecture-aware library loading

PHASE 4: Testing &amp;amp; Validation
────────────────────────────────────────
 • Run build and tests on Arm64
 • Generate migration report and documentation

═══════════════════════════════════════════════════════════════

? How would you like to proceed?
  ❯ Execute the full plan
    Execute phase by phase (review each)
    Modify the plan
    Ask questions about the plan
    Export plan and exit
&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Step 7: Ask questions, request changes&lt;/h3&gt; 
&lt;p&gt;This is your chance to understand and adjust:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;? Ask questions about the plan:

You: Why are you updating netty to 4.1.100? 

ATX: Good concern. Netty 4.1.94 was the first version with Arm64 
     native transport support, which is required for optimal 
     performance on Graviton. 
     
     However, I can adjust this:
     
     Option 1: Use 4.1.94 (minimum Arm64 support)
     Option 2: Use 4.1.97 (includes Arm64 bug fixes)
     Option 3: Keep 4.1.100 (latest stable with Arm64)
     
     Which would you prefer?

You: Let's go with 4.1.100.
&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Step 8: Execute the Transformation&lt;/h3&gt; 
&lt;p&gt;When you’re ready:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;? How would you like to proceed?
  ❯ Execute the full plan

Starting transformation...

Phase 1: Dependency Updates
───────────────────────────
[1/12] Updating netty-transport-native-epoll...
       ✓ Modified pom.xml
       ✓ Committed: "chore(deps): update netty to 4.1.97 for Arm64 support"

[2/12] Updating snappy-java...
       ✓ Modified pom.xml  
       ✓ Committed: "chore(deps): update snappy-java to 1.1.10.5 for Arm64"

... (progress continues)

Phase 2: Build Configuration
───────────────────────────
[1/3] Adding Arm64 classifiers...
      ✓ Modified pom.xml
      ✓ Committed: "build: add Arm64 native classifiers"

... (progress continues)

Phase 3: Code Modifications
───────────────────────────
[1/2] Refactoring ProcessorArchitecture.java...
      ✓ Modified src/main/java/com/example/util/ProcessorArchitecture.java
      ✓ Committed: "feat: add Arm64 architecture detection"

... (progress continues)

Phase 4: Validation
───────────────────
Running build... ✓
Running tests... ✓ (847 passed, 0 failed)

═══════════════════════════════════════════════════════════════
                 TRANSFORMATION COMPLETE
═══════════════════════════════════════════════════════════════

Summary:
 • 17 files modified
 • 3 files created  
 • 14 commits made
 • All tests passing

Generated Artifacts:
 • GRAVITON_MIGRATION_REPORT.html  - Full migration report
 • MIGRATION_RUNBOOK.md            - Deployment guide for your team
 • commit-log.txt                  - All commits with descriptions

Your code is now Graviton-ready!
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;After the transformation is complete, you can now &lt;a href="https://catalog.workshops.aws/cost-effective-ec2-performance/en-US/1-lab-1/3-analysis" target="_blank" rel="noopener"&gt;performance test and load test&lt;/a&gt; on Graviton instances to configure your scaling policies or target thresholds to &lt;a href="https://www.youtube.com/watch?v=mSrDZuxWFtw" target="_blank" rel="noopener"&gt;maximize price/performance on Graviton&lt;/a&gt;. For more guidance on performance testing, see the &lt;a href="https://github.com/aws/aws-graviton-getting-started/blob/main/perfrunbook/README.md" target="_blank" rel="noopener"&gt;AWS Graviton Technical Guide&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;What you get after transformation&lt;/h2&gt; 
&lt;h3&gt;Version-controlled history&lt;/h3&gt; 
&lt;p&gt;Every logical change is a separate commit:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;$ git log --oneline -10

a3f2b1c (HEAD) docs: add Graviton migration runbook
b82d4e5 test: add Arm64 architecture verification tests
c9a1f3d feat: add Arm64 architecture detection
d4e7c2a build: configure multi-arch Docker build
e5f8d1b build: add Arm64 native classifiers
f6a9e2c chore(deps): update lz4-java to 1.8.0
g7b0f3d chore(deps): update snappy-java to 1.1.10.5
h8c1a4e chore(deps): update netty to 4.1.97 for Arm64 support
...
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Each commit is atomic and revertible. If something doesn’t work, you can &lt;code&gt;git revert&lt;/code&gt; specific changes.&lt;/p&gt; 
&lt;h3&gt;Migration report&lt;/h3&gt; 
&lt;p&gt;A comprehensive markdown report covering:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;What was changed and why&lt;/li&gt; 
 &lt;li&gt;Dependencies that were updated&lt;/li&gt; 
 &lt;li&gt;Code modifications with before and after diffs&lt;/li&gt; 
 &lt;li&gt;Performance optimization recommendations&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Migration runbook&lt;/h3&gt; 
&lt;p&gt;A deployment guide for your team:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Pre-deployment checklist&lt;/li&gt; 
 &lt;li&gt;JVM flags designed for Graviton&lt;/li&gt; 
 &lt;li&gt;Monitoring and rollback procedures&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Additional resources on migrating to Graviton on an infrastructure level can be found in the &lt;a href="https://github.com/aws/aws-graviton-getting-started/blob/main/transition-guide.md" target="_blank" rel="noopener"&gt;Transition Guide&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Assessing Graviton compatibility for multiple Java applications with Campaign Mode&lt;/h2&gt; 
&lt;p&gt;When you’re ready to start migrating multiple applications, you might want to opt for an automated process that removes the manual effort of going back and forth with the transformation agent after each transformation step with campaign mode. The following command allows ATX CLI to go through a full transformation that you can check back in with after it’s completed. This limits the additional customization and context that you might want to provide the agent.&lt;/p&gt; 
&lt;p&gt;As mentioned in the first step of running a Graviton Transformation, the environment that the code is transformed in and decision of which commits are pulled into the main repo is up to the developer. Running in campaign mode across several applications doesn’t require accepting and pushing code changes. Therefore, this automated method is most useful when you want to gauge a high-level overview of effort required to migrate across several or even hundreds of applications.&lt;/p&gt; 
&lt;pre&gt;&lt;code&gt;atx custom def exec \
--code-repository-path /path/to/myapp \
--non-interactive \
--trust-all-tools \
--campaign  \
--repo-name myapp \
--add-repo&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;This command can be added into scripts, allowing further automations to be built into continuous integration and delivery (CI/CD) pipelines or &lt;a href="https://aws.amazon.com/blogs/devops/building-a-scalable-code-modernization-solution-with-aws-transform-custom/" target="_blank" rel="noopener"&gt;scaling transformation jobs&lt;/a&gt; across several repos without manually entering prompts as previously shown through interactive mode.&lt;/p&gt; 
&lt;p&gt;The status of transformations running with campaign mode will be displayed in the AWS Transform Web UI. &lt;a href="https://docs.aws.amazon.com/transform/latest/userguide/custom-get-started.html#custom-web-application" target="_blank" rel="noopener"&gt;Setting up the Web UI&lt;/a&gt; is a prerequisite to running a transformation in campaign mode.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId20.png"&gt;&lt;img loading="lazy" width="2852" height="1550" class="alignnone size-full wp-image-26228" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId20.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;In addition to this view, if you run the transformation across multiple applications, you can generate a consolidated dashboard with an agent of your choice. Gather the transformation results into a centralized directory, then use the following prompt for example:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;Analyze all Java application Graviton transformation summaries in &amp;lt;directory&amp;gt;/&amp;lt;path&amp;gt;/ and create a comprehensive dashboard that includes: 
 
1. Executive summary with key metrics (total apps, compatibility rate, code changes required) 
2. Application summary table with columns: Application name, Type, Java version, Dependencies count, Code changes, Compatibility %, Status 
3. Code changes analysis - which apps needed changes and why 
4. Dependency transformation analysis - common dependencies and their ARM64 status, any upgrades required 
5. Native library analysis - which apps use native libs and their compatibility 
6. Performance expectations - JWT/crypto improvements, general performance gains, cost-performance ratios 
7. JVM optimization patterns - common flags used across applications 
8. Build system patterns - Maven/Gradle usage, Docker multi-arch support 
9. Test results summary - pass/fail rates, pre-existing vs ARM64 issues 
10. Common libraries requiring changes (or note if none) 
11. Deployment readiness assessment 
12. Risk assessment with mitigation strategies 
13. Migration recommendations with phased approach 
14. Documentation summary - total docs created and their coverage 
 
Read graviton-validation/00-summary.md from each application subdirectory. Consolidate findings into a single comprehensive markdown dashboard with tables, metrics, and actionable insights. 
 
Focus on: compatibility rates, code change requirements, dependency issues, performance expectations, and migration readiness. 
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Keep in mind that agents might output outcomes of the migration that aren’t sourced from the transformation summaries. As a result, we recommend that you use the summary as a high-level estimate of the technical effort required for migrating to Graviton.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId21.png"&gt;&lt;img loading="lazy" width="936" height="448" class="alignnone size-full wp-image-26229" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/14/rId21.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The AWS Transform custom Java x86 to Graviton Migration transformation alleviates the guesswork in Graviton migrations by using AI for dependency analysis, compatibility assessment, code refactoring, and runtime validation. Development teams can evaluate hundreds of Java applications simultaneously, with each transformation providing atomic version-controlled commits for straightforward rollback and clear change tracking. The tool offers two modes: &lt;strong&gt;1) &lt;/strong&gt;interactive mode for hands-on, application-by-application migration with developer review at each step, or &lt;strong&gt;2)&lt;/strong&gt; campaign mode for automated assessment across multiple applications. ATX converts unknown Graviton migration effort into defined requirements through automated compilation and runtime testing. This provides a more efficient way to evaluate workload compatibility and migrate to Graviton.&lt;/p&gt; 
&lt;p&gt;The Java x86 to Graviton Migration transformation is one of a range of pre-built &lt;a href="https://docs.aws.amazon.com/transform/latest/userguide/transform-aws-customs.html" target="_blank" rel="noopener"&gt;AWS Managed Transformations&lt;/a&gt; but you can also create custom transformations unique to your own use case that can be scaled to drive migrations across your organization. Learn more on the AWS Transform custom &lt;a href="https://aws.amazon.com/transform/custom/" target="_blank" rel="noopener"&gt;website&lt;/a&gt; or &lt;a href="https://docs.aws.amazon.com/transform/latest/userguide/custom.html" target="_blank" rel="noopener"&gt;documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Resources&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;ATX Documentation&lt;/strong&gt;: &lt;a href="https://docs.aws.amazon.com/transform/latest/userguide/custom.html" target="_blank" rel="noopener"&gt;https://docs.aws.amazon.com/transform/latest/userguide/custom.html&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AWS-Managed Transformation Definitions&lt;/strong&gt;: &lt;a href="https://github.com/aws-samples/aws-transform-custom-samples/tree/main/aws-managed-definitions" target="_blank" rel="noopener"&gt;https://github.com/aws-samples/aws-transform-custom-samples/tree/main/aws-managed-definitions&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Graviton Getting Started&lt;/strong&gt;: &lt;a href="https://github.com/aws/aws-graviton-getting-started" target="_blank" rel="noopener"&gt;github.com/aws/aws-graviton-getting-started&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Agent Skills for Graviton migration&lt;/strong&gt;: &lt;a href="https://github.com/aws/aws-graviton-getting-started/tree/main/tools/skills" target="_blank" rel="noopener"&gt;https://github.com/aws/aws-graviton-getting-started/tree/main/tools/skills&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Streamline your infrastructure: Automating AMI creation with Kiro CLI and EC2 Image Builder</title>
		<link>https://aws.amazon.com/blogs/compute/streamline-your-infrastructure-automating-ami-creation-with-kiro-cli-and-ec2-image-builder/</link>
					
		
		<dc:creator><![CDATA[Malini Chatterjee]]></dc:creator>
		<pubDate>Fri, 22 May 2026 21:01:36 +0000</pubDate>
				<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Kiro]]></category>
		<category><![CDATA[EC2 Image Builder]]></category>
		<guid isPermaLink="false">26ac95a4d47c8e1192afcd8b5af6a01b1b7782bc</guid>

					<description>Managing infrastructure at scale requires robust automation tools that reduce manual effort while maintaining consistency and security. The combination of&amp;nbsp;Kiro CLI&amp;nbsp;and&amp;nbsp;AWS EC2 Image Builder&amp;nbsp;offers a powerful solution for automating the creation, testing, and deployment of Amazon Machine Images (AMIs). The challenge of manual image management Traditional approaches of creating and maintaining AMIs often involve manual […]</description>
										<content:encoded>&lt;p&gt;Managing infrastructure at scale requires robust automation tools that reduce manual effort while maintaining consistency and security. The combination of&amp;nbsp;&lt;a href="https://kiro.dev/cli/"&gt;Kiro CLI&amp;nbsp;&lt;/a&gt;and&amp;nbsp;&lt;a href="https://aws.amazon.com/image-builder/"&gt;AWS EC2 Image Builder&amp;nbsp;&lt;/a&gt;offers a powerful solution for automating the creation, testing, and deployment of Amazon Machine Images (AMIs).&lt;/p&gt; 
&lt;h1&gt;The challenge of manual image management&lt;/h1&gt; 
&lt;p&gt;Traditional approaches of creating and maintaining AMIs often involve manual processes that are time-consuming, error-prone, and difficult to scale. Teams struggle with:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Inconsistent configurations&lt;/strong&gt; across development, testing, and production environments&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Security vulnerabilities&lt;/strong&gt; from outdated base images and missing patches&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Compliance gaps&lt;/strong&gt; due to manual validation processes&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Slow deployment&lt;/strong&gt; cycles&amp;nbsp;caused by repetitive manual tasks&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;With EC2 Image Builder and Kiro CLI, teams can replace these manual workflows with automated, and secure AMI pipelines. EC2 Image Builder provides the fully managed automation engine, while Kiro CLI brings AI-powered assistance to help you build, iterate, and troubleshoot those pipelines faster — using natural language.&lt;/p&gt; 
&lt;h1&gt;EC2 Image Builder&lt;/h1&gt; 
&lt;p&gt;&lt;strong&gt;EC2 Image Builder&lt;/strong&gt;&amp;nbsp;is a fully managed AWS service that simplifies the creation, maintenance, and deployment of customized, secure, and up-to-date server images. The service provides the following key capabilities:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Automated build pipelines&lt;/strong&gt;: Define your image configuration once, automatically build images on a schedule or trigger basis, and manage the lifecycle of the AMI. Image Builder handles the entire lifecycle of custom AMI creation, testing, distributing and managing the lifecycle of the AMIs.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Built-in security&lt;/strong&gt;: Automatically apply the latest security patches and validate images against AWS security best practices. EC2 Image Builder can enforce security with every created AMI using update-linux/update-windows components patch OS vulnerabilities at build time, IMDSv2 can be enforced at the pipeline level, and Amazon Inspector validates CVE posture before image distribution — all automated, no manual intervention&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Testing and validation&lt;/strong&gt;: Run automated tests to verify your images meet functional and security requirements before deployment. This ensures only validated images reach production environments.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Multi-region distribution&lt;/strong&gt;: Automatically distribute your AMIs across multiple AWS regions and share them with specific AWS accounts, streamlining deployment across complex organizational structures.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h1&gt;Kiro CLI: AI-powered infrastructure automation&lt;/h1&gt; 
&lt;p&gt;&lt;strong&gt;Kiro CLI&lt;/strong&gt;&amp;nbsp;brings generative AI capabilities directly to your terminal, enabling natural language interactions with AWS services. This AI-powered command-line interface transforms how developers and operators interact with infrastructure automation tools.&lt;/p&gt; 
&lt;h2&gt;What makes Kiro CLI powerful&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Natural language commands&lt;/strong&gt;: Instead of memorizing complex CLI syntax or hand-authoring CloudFormation templates, simply describe what you want to accomplish. Kiro CLI interprets your intent and generates Infrastructure as Code — such as CloudFormation or CDK — that you can review, version-control, and deploy through your existing CI/CD pipelines. For quick, non-destructive exploration (e.g., listing resources or describing configurations), Kiro can also execute AWS API calls directly.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Context-aware assistance:&lt;/strong&gt; Kiro understands your AWS environment and provides intelligent suggestions based on your current context, resources, and best practices. You can connect Kiro CLI to remote tools and systems via Model Context Protocol (MCP), for example, you can connect to AWS MCP servers for and documentation and troubleshooting assistance.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Workflow automation:&lt;/strong&gt; Chain multiple operations together using conversational commands, reducing the cognitive load of managing complex infrastructure tasks.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Integration with AWS services:&lt;/strong&gt; Seamlessly interact with EC2 Image Builder, Systems Manager, and other AWS services without switching between different tools or interfaces.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h1&gt;The synergy: Kiro CLI + EC2 Image Builder, automated pipeline creation&lt;/h1&gt; 
&lt;p&gt;When combined, these tools create a streamlined workflow infrastructure automation:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Faster onboarding:&lt;/strong&gt; Seamless AMI creation and faster maintenance with Kiro CLI. Rather than switching between the AWS Console and AWS CloudFormation documentation during initial exploration, Kiro CLI lets you describe your requirements conversationally — giving you a fast path to a working pipeline that you can then manage and refine through the Console or CloudFormation as your production needs mature.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Improved security posture:&lt;/strong&gt; Automated patching and compliance validation built into every image. Describe your patching requirements conversationally, and Kiro CLI includes the appropriate build components that apply OS-level patches, kernel updates, and CVE fixes directly into the AMI at build time.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Consistent deployments&lt;/strong&gt;: Version-controlled AMI pipelines that produce identical, pre-tested images promoted across dev, staging, and production without manual changes. EC2 Image Builder ensures every build follows the same recipe, components, and validation steps.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Reduced operational overhead&lt;/strong&gt;: Eliminates manual, repetitive tasks around image creation, distribution, and lifecycle management accelerating iteration cycles for pipeline builds.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Faster troubleshooting:&lt;/strong&gt; Kiro CLI parses error output and explains root cause in plain language, cutting the time spent deciphering CloudFormation stack traces and Image Builder build logs.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h1&gt;Getting started&lt;/h1&gt; 
&lt;p&gt;Before implementing this solution, ensure you have the pre-requisites:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Kiro CLI installed&lt;/strong&gt; (installation guide: for &lt;a href="https://kiro.dev/docs/cli/installation/#linux-appimage"&gt;Linux&lt;/a&gt;, &lt;a href="https://kiro.dev/docs/cli/installation/#macos"&gt;macOS&lt;/a&gt; or &lt;a href="https://kiro.dev/docs/cli/installation/#windows"&gt;Windows&lt;/a&gt;) and configured.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Configure the&lt;/strong&gt; &lt;a href="https://kiro.dev/docs/cli/mcp/"&gt;AWS Documentation MCP server&lt;/a&gt; , refer the detailed steps &lt;a href="https://docs.aws.amazon.com/agent-toolkit/latest/userguide/getting-started-aws-mcp-server.html"&gt;here&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AWS account&lt;/strong&gt; with access permissions for the following services: 
  &lt;ul&gt; 
   &lt;li&gt;EC2 Image Builder&lt;/li&gt; 
   &lt;li&gt;IAM (for role creation and policy attachment)&lt;/li&gt; 
   &lt;li&gt;EC2 (for AMI management)&lt;/li&gt; 
   &lt;li&gt;Systems Manager&lt;/li&gt; 
   &lt;li&gt;VPC (for network configuration)&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;An existing VPC&lt;/strong&gt; with public/private subnets configured&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;To begin automating your infrastructure using Kiro-CLI, here are some sample prompts that you can use as a baseline:&lt;/p&gt; 
&lt;h2&gt;Example 1: Amazon Linux for EKS nodes&lt;/h2&gt; 
&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Teams running Kubernetes on Amazon EKS need custom node AMIs that include the correct container runtime, kubelet version, and security hardening — and that stay current with weekly base image updates. This prompt automates that pipeline and keeps your EKS node groups up to date automatically.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-text"&gt;Create a production-ready EC2 Image Builder pipeline using a direct APIs 
for custom EKS-optimized Amazon Linux 2023 AMIs with the following requirements:

- Weekly automated builds triggered by base AMI updates
- AWS managed components for container runtime, kubelet and CloudWatch agent
- Automatic launch template updates for EKS managed node groups&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;What Kiro CLI generates:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Kiro CLI produces the API calls and supporting configuration to set up:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An &lt;strong&gt;EC2 Image Builder pipeline&lt;/strong&gt; with a weekly schedule and base AMI change detection&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Image recipe&lt;/strong&gt; based on the EKS-optimized Amazon Linux 2023 AMI&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Component definitions&lt;/strong&gt; for container runtime (containerd), kubelet, and CloudWatch Agent&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Automation&lt;/strong&gt; to update EKS managed node group launch templates with the new AMI ID after each build&lt;/li&gt; 
 &lt;li&gt;If we use a short prompt, Kiro will pick the default values, which customer can definitely change/edit accordingly. However, if we want to be more presriptive, then one can follow a detailed prompt like Example 2 below.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Example 2: Windows server golden image&lt;/h2&gt; 
&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Enterprise teams running Windows-based workloads often need a standardized, hardened base image that meets compliance requirements (such as CIS benchmarks) and includes approved software. Manually maintaining this image is error-prone and time-consuming. This prompt automates the full pipeline — from build to distribution.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-text"&gt;Create a production-ready EC2 Image Builder pipeline for a Windows Server 2025 
golden image as a single CloudFormation template:

- Monthly automated builds via cron schedule
- Using latest public Windows Server 2025 AMI from AWS
- Components: AWS-managed CloudWatch Agent, AWS CLI, Windows Updates
- Apply AWS-managed STIG components (stig-build-windows) for build-time hardening 
  and corresponding stig-validate-windows for validation.
- For the EC2 instance profile role, use only these AWS-managed policies: 
  EC2InstanceProfileForImageBuilder, EC2InstanceProfileForImageBuilderECRContainerBuilds,
  and AmazonSSMManagedInstanceCore. Do NOT use any policy containing "FullAccess".
- Create a KMS multi-region primary key (MRK) in the pipeline region for AMI
  encryption, with a key policy granting cross-account access to
  [ACCOUNT_1, ACCOUNT_2, ACCOUNT_3] for kms:CreateGrant, kms:DescribeKey,
  and kms:Decrypt. Include a KMS alias. Output the key ARN for replica
  creation in target regions.
- Amazon Inspector vulnerability scanning
- Single pipeline deployed in one region. Use EC2 Image Builder
  DistributionConfiguration to share the output AMI to accounts
  [ACCOUNT_1, ACCOUNT_2, ACCOUNT_3] in regions us-east-1 and us-west-2.
  Do NOT create separate pipelines or stacks per region.
- In the DistributionConfiguration, use AmiDistributionConfiguration's
  built-in SsmParameterConfigurations to write the output AMI ID to
  /golden-image/windows-server-2025/latest in each distribution region.
  Do NOT use Lambda functions or custom resources for SSM parameter updates.
- Create an SNS topic for build notifications. Use the
  InfrastructureConfiguration's built-in SnsTopicArn property for pipeline
  status notifications. Do NOT create EventBridge rules for notifications.
- Lifecycle policy: Disable AMIs after 180 days, delete after 360 days
- Least-privilege IAM roles for Image Builder, EC2 instance profile,
  and lifecycle
- All resource names (KMS alias, IAM roles, SNS topics, Image Builder
  components, recipes, pipelines, infrastructure configs, distribution
  configs, lifecycle policies, EventBridge rules, and SSM parameter paths)
  must include !Sub "${AWS::StackName}" or a parameterized prefix to ensure
  uniqueness. This prevents conflicts if the template is deployed multiple
  times in the same account/region.
- Use AWS-managed components where available
- Parameterize account IDs and regions
- Do NOT create multiple stacks or deploy resources in multiple regions&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;What Kiro CLI generates:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Kiro CLI interprets this prompt and produces a complete CloudFormation template that includes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An EC2 Image Builder pipeline with a monthly build schedule&lt;/li&gt; 
 &lt;li&gt;Image recipe referencing the latest Windows Server 2025 AMI from &lt;a href="https://docs.aws.amazon.com/systems-manager/latest/userguide/parameter-store-public-parameters.html"&gt;AWS Systems Manager public parameter&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;AWS-managed components for CloudWatch Agent, AWS CLI, and Windows Updates&lt;/li&gt; 
 &lt;li&gt;STIG hardening build component with corresponding validation component&lt;/li&gt; 
 &lt;li&gt;KMS key and encryption settings applied to the output AMI&lt;/li&gt; 
 &lt;li&gt;Amazon Inspector integration for CVE scanning before distribution&lt;/li&gt; 
 &lt;li&gt;Distribution configuration targeting 3 AWS accounts across 2 regions&lt;/li&gt; 
 &lt;li&gt;Built-in SsmParameterConfigurations writing the AMI ID to /golden-image/windows-server-2025/latest in each distribution region&lt;/li&gt; 
 &lt;li&gt;SNS topic and subscriptions for build success/failure notifications&lt;/li&gt; 
 &lt;li&gt;Lifecycle policy: disable AMIs after 180 days, delete after 360 days&lt;/li&gt; 
 &lt;li&gt;Least-privilege IAM roles for Image Builder service, EC2 instance profile, and lifecycle management&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Once the execution is complete, you can navigate to the&amp;nbsp;&lt;a href="https://us-east-2.signin.aws.amazon.com/oauth?client_id=arn%3Aaws%3Asignin%3A%3A%3Aconsole%2Fimagebuilder&amp;amp;code_challenge=taThqksbEXH_y7G-A7avZie1qCIcf8v2vpCdclxj5QI&amp;amp;code_challenge_method=SHA-256&amp;amp;response_type=code&amp;amp;redirect_uri=https%3A%2F%2Fconsole.aws.amazon.com%2Fimagebuilder%3Fca-oauth-flow-id%3D516f%26hashArgs%3D%2523%26isauthcode%3Dtrue%26oauthStart%3D1779130871328%26state%3DhashArgsFromTB_us-east-2_09caa1a5b5317abb"&gt;EC2 Image Builder&lt;/a&gt;&amp;nbsp;&amp;nbsp;console. Once you are in the AWS Console EC2 Image Builder, you will be on the page for&amp;nbsp;&lt;strong&gt;Image Pipelines&lt;/strong&gt;. You will see in the screenshot below that the new pipeline is now&amp;nbsp;&lt;strong&gt;Enabled&lt;/strong&gt;.&lt;/p&gt; 
&lt;p&gt;Please note that the name of the pipeline will vary based on your specific inputs. This image is just a sample “enabled” pipeline looks like in &lt;a href="https://us-east-2.signin.aws.amazon.com/oauth?client_id=arn%3Aaws%3Asignin%3A%3A%3Aconsole%2Fimagebuilder&amp;amp;code_challenge=SqezZ-gdpMd6lUf2sIBa6_vmdD20uZnJIUtvP0fLopc&amp;amp;code_challenge_method=SHA-256&amp;amp;response_type=code&amp;amp;redirect_uri=https%3A%2F%2Fconsole.aws.amazon.com%2Fimagebuilder%3Fca-oauth-flow-id%3Dbcd2%26hashArgs%3D%2523%26isauthcode%3Dtrue%26oauthStart%3D1779130937525%26state%3DhashArgsFromTB_us-east-2_2baaccfdfcb2617c"&gt;EC2 Image Builder&amp;nbsp;&lt;/a&gt; console.&lt;/p&gt; 
&lt;div id="attachment_26254" style="width: 1440px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/18/Picture1111.png"&gt;&lt;img aria-describedby="caption-attachment-26254" loading="lazy" class="wp-image-26254 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/05/18/Picture1111.png" alt="Fig 1: Sample EC2 Image Builder console, after the pipeline is “enabled”" width="1430" height="271"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-26254" class="wp-caption-text"&gt;Fig 1: Sample EC2 Image Builder console, after the pipeline is “enabled”.&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;For more examples and scenarios, you can check &lt;a href="https://catalog.us-east-1.prod.workshops.aws/workshops/09f5cf4e-8f93-4ebc-8777-1b872556e98b/en-US"&gt;Infrastructure Automation with Kiro CLI and EC2 Image Builder workshop&lt;/a&gt;.&lt;/p&gt; 
&lt;h1&gt;Cleanup&lt;/h1&gt; 
&lt;p&gt;To avoid ongoing charges, remove all resources created during this walkthrough. The cleanup steps depend on which example you followed.&lt;/p&gt; 
&lt;h2&gt;Example 1: Amazon Linux for EKS nodes cleanup&lt;/h2&gt; 
&lt;p&gt;If you created resources via direct API calls, delete them in the following order:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Disable and delete the Image Builder pipeline — this stops the weekly automated builds triggered by base AMI updates.&lt;/li&gt; 
 &lt;li&gt;Delete the image recipe based on the EKS-optimized Amazon Linux 2023 AMI.&lt;/li&gt; 
 &lt;li&gt;Delete the component definitions for container runtime (containerd), kubelet, and CloudWatch Agent.&lt;/li&gt; 
 &lt;li&gt;Delete the infrastructure configuration and distribution configuration.&lt;/li&gt; 
 &lt;li&gt;Revert your EKS managed node group launch templates to their previous AMI ID, or point them to a known-good image, before removing the custom AMIs.&lt;/li&gt; 
 &lt;li&gt;Deregister any AMIs produced by the pipeline and delete their associated EBS snapshots.&lt;/li&gt; 
 &lt;li&gt;Remove IAM roles and instance profiles created for Image Builder and the EC2 instance profile.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Example 2: Windows server golden image cleanup&lt;/h2&gt; 
&lt;p&gt;If you deployed the CloudFormation template, navigate to the &lt;a href="https://us-east-2.signin.aws.amazon.com/oauth?client_id=arn%3Aaws%3Asignin%3A%3A%3Aconsole%2Fcloudformation&amp;amp;code_challenge=BLQC5GwceappkCkU6tFihCiLfjaaE4r-xRpI5O4ir2Q&amp;amp;code_challenge_method=SHA-256&amp;amp;response_type=code&amp;amp;redirect_uri=https%3A%2F%2Fconsole.aws.amazon.com%2Fcloudformation%3Fca-oauth-flow-id%3Da222%26hashArgs%3D%2523%26isauthcode%3Dtrue%26oauthStart%3D1779220116463%26state%3DhashArgsFromTB_us-east-2_d8f28b318444a980"&gt;AWS CloudFormation console&lt;/a&gt;, select your stack, and choose &lt;strong&gt;Delete&lt;/strong&gt;. This removes the pipeline, recipe, components, IAM roles, KMS resources, SNS topic, and lifecycle policy in a single operation.&lt;/p&gt; 
&lt;p&gt;After the stack is deleted, manually clean up these resources that CloudFormation does not remove:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Deregister distributed AMIs — In each target account (ACCOUNT_1, ACCOUNT_2, ACCOUNT_3) and region (us-east-1, us-west-2), deregister the shared Windows Server 2025 AMIs and delete their associated &lt;strong&gt;EBS snapshots&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Delete SSM parameters — Remove &lt;code&gt;/golden-image/windows-server-2025/latest&lt;/code&gt; in each distribution region where it was written by the SsmParameterConfigurations.&lt;/li&gt; 
 &lt;li&gt;Schedule KMS key deletion — If the multi-region primary key (MRK) was replicated to other regions, delete the replica keys first, then schedule deletion of the primary key. Revoke any cross-account grants issued to ACCOUNT_1, ACCOUNT_2, and ACCOUNT_3.&lt;/li&gt; 
 &lt;li&gt;Remove Amazon Inspector associations — If Inspector was enabled solely for this pipeline, disable it to avoid ongoing scanning charges.&lt;/li&gt; 
 &lt;li&gt;Verify lifecycle policy cleanup — Confirm that the lifecycle policy (disable after 180 days, delete after 360 days) was removed with the stack. If any AMIs were already marked for lifecycle action, manually deregister and delete them.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Please note that AMI de-registration and snapshot deletion must be performed in every account and region where images were distributed. Ensure receiving accounts also deregister their copies to stop incurring storage costs.&lt;/p&gt; 
&lt;h1&gt;Conclusion&lt;/h1&gt; 
&lt;p&gt;The combination of AI-powered tools like Kiro CLI with robust automation services like EC2 Image Builder represents the future of infrastructure management. Whether you’re managing dozens or thousands of instances, automating your AMI creation pipeline is no longer optional—it’s essential for maintaining security, consistency, and agility in modern cloud environments.&lt;/p&gt; 
&lt;p&gt;In this post, we highlighted the benefits of AI-assisted infrastructure management using Kiro CLI. You can start using the workshop &lt;a href="https://catalog.us-east-1.prod.workshops.aws/workshops/09f5cf4e-8f93-4ebc-8777-1b872556e98b/en-US"&gt;Infrastructure Automation with Kiro CLI and EC2 Image Builder&lt;/a&gt; for detailed prompts for building production-ready golden AMI pipeline with minimal manual coding.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Sharing Capacity Blocks for ML Across Your AWS Organization</title>
		<link>https://aws.amazon.com/blogs/compute/sharing-capacity-blocks-for-ml-across-your-aws-organization/</link>
					
		
		<dc:creator><![CDATA[Tyler Klimas]]></dc:creator>
		<pubDate>Mon, 18 May 2026 15:47:16 +0000</pubDate>
				<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Capacity reservation]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Resource sharing]]></category>
		<guid isPermaLink="false">048c9ec44aa485a81c0e9a97c357734b6f681775</guid>

					<description>When your data science team reserves GPU instances for a two-week training job but completes it in four days, that capacity has the potential to sit unused while your computer vision team waits another week to start their project. Now you can eliminate this GPU waste and scheduling conflict by sharing Capacity Blocks for ML […]</description>
										<content:encoded>&lt;p&gt;When your data science team reserves GPU instances for a two-week training job but completes it in four days, that capacity has the potential to sit unused while your computer vision team waits another week to start their project. Now you can eliminate this GPU waste and scheduling conflict by sharing Capacity Blocks for ML across your &lt;a href="https://aws.amazon.com/organizations/" target="_blank" rel="noopener noreferrer"&gt;AWS Organization&lt;/a&gt;. This scheduling mismatch between teams creates bottlenecks that delay product launches, increase infrastructure costs, and slow your ability to deliver machine learning (ML) powered features to customers. With cross-account sharing for &lt;a href="https://aws.amazon.com/ec2/capacityblocks/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2) Capacity Blocks for ML&lt;/a&gt;, you can now distribute reserved graphics processing unit (GPU) capacity across teams based on actual demand rather than rigid scheduling predictions. This means your computer vision team can use the capacity as soon as the data science team is done.&lt;/p&gt; 
&lt;p&gt;In this post, we’ll show you how to configure cross-account sharing for Capacity Blocks for ML, set up monitoring for your shared resources, and optimize instance utilization through alerting. By increasing the utilization rates and reducing over-provisioning, you improve your resource efficiency and cost optimization for your organization.&lt;/p&gt; 
&lt;p&gt;You can reduce idle resources in your ML team’s account by sharing capacity with other teams waiting for GPUs. Additionally, you can maintain Capacity Blocks for ML centrally. This lets you control which teams have access to the capacity and helps you reduce waste and bottlenecks in your organization. Before starting into the tutorial, let’s review how Capacity Blocks for ML and &lt;a href="https://aws.amazon.com/ram/" target="_blank" rel="noopener noreferrer"&gt;AWS RAM&lt;/a&gt; work together.&lt;/p&gt; 
&lt;h2&gt;Overview&lt;/h2&gt; 
&lt;p&gt;Capacity Blocks for ML let you reserve GPU-based accelerated compute instances ahead of time for short duration ML workloads. When you launch instances in Capacity Blocks for ML, &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt; automatically places the instances in &lt;a href="https://aws.amazon.com/ec2/ultraclusters/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 UltraClusters&lt;/a&gt;, giving you low-latency, petabit scale networking. UltraClusters provide the high performance networking your training workloads require.&lt;/p&gt; 
&lt;p&gt;You see exactly when GPU capacity is available and schedule your Capacity Blocks for ML to start when it makes sense for your project. You pay upfront for the entire reservation period. This makes Capacity Blocks for ML useful when you need GPUs for days to months. It provides predictable capacity without long-term commitments.&lt;/p&gt; 
&lt;p&gt;When you purchase Capacity Blocks for ML, you can share it with other accounts in your AWS Organization using &lt;a href="https://docs.aws.amazon.com/ram/latest/userguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;AWS Resource Access Manager&lt;/a&gt; (AWS RAM). With AWS RAM, you can share AWS resources across accounts within your organization. When you share with other accounts, those accounts become consumer accounts that can launch instances using your capacity. As the owner account, you pay the upfront reservation cost and retain ownership. If you’re launching instances from a consumer account, you are responsible for additional costs such as &lt;a href="https://aws.amazon.com/ec2/capacityblocks/pricing/" target="_blank" rel="noopener noreferrer"&gt;operating system licensing charges&lt;/a&gt;. Capacity Blocks can be shared to multiple accounts simultaneously, with the entire Capacity Block reservation being shared on a first come, first served basis.&lt;/p&gt; 
&lt;div id="attachment_25986" style="width: 726px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25581-1.png"&gt;&lt;img aria-describedby="caption-attachment-25986" loading="lazy" class="wp-image-25986 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25581-1.png" alt="Overview of AWS Organizations showing an owner account sharing to two consumer accounts using an AWS RAM resource share." width="716" height="531"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25986" class="wp-caption-text"&gt;Figure 1: Capacity Block sharing using Resource Access Manager.&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;With the share feature, you benefit from flexible GPU capacity management when your priorities shift, or teams finish work at different times. Now, when your data science team completes experimentation early, your other teams can use that capacity for production training. If priorities shift mid-quarter, you can move capacity where it’s needed most.&lt;/p&gt; 
&lt;p&gt;In this tutorial, you’ll share a Capacity Block for ML across accounts and then create an alarm to monitor utilization when it drops below a threshold. Before you start, complete the following prerequisites.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;To share Capacity Blocks for ML, you must first &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-blocks-purchase.html" target="_blank" rel="noopener noreferrer"&gt;find and purchase a Capacity Block.&lt;/a&gt; Only standard Capacity Blocks for ML can be shared using AWS RAM. UltraServer Capacity Blocks are not eligible for sharing.&lt;/p&gt; 
&lt;p&gt;You can share Capacity Blocks only within your AWS Organization. Verify the owner of the Capacity Blocks as well as the consumer(s) are within the same organization. For guidance, see &lt;a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_tutorials_basic.html" target="_blank" rel="noopener noreferrer"&gt;Creating and configuring an organization&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Before sharing Capacity Blocks, you must configure resource sharing with AWS Organizations. Only the management account with the following required AWS &lt;a href="https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html" target="_blank" rel="noopener noreferrer"&gt;Identity and Access Management (IAM) permissions&lt;/a&gt; can enable resource sharing within an Organization:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;ram:EnableSharingWithAwsOrganization&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;iam:CreateServiceLinkedRole&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;organizations:EnableAWSServiceAccess&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;organizations:DescribeOrganization&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Using the &lt;a href="https://console.aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt; of the management account:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/ram/" target="_blank" rel="noopener noreferrer"&gt;AWS RAM console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Settings&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select &lt;strong&gt;Enable sharing with AWS Organizations.&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p class="mceTemp"&gt;&lt;/p&gt; 
&lt;div id="attachment_25987" style="width: 1220px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25582.jpeg"&gt;&lt;img aria-describedby="caption-attachment-25987" loading="lazy" class="wp-image-25987 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25582.jpeg" alt="Enable sharing with AWS Organizations in Settings of Resource Access Manager." width="1210" height="496"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25987" class="wp-caption-text"&gt;Figure 2: Enable sharing with AWS Organizations in AWS RAM.&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Using the &lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (CLI)&lt;/a&gt;:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Run this command to give AWS RAM trusted access to your organization’s account structure: &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws organizations enable-aws-service-access --service-principal ram.amazonaws.com&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Turn on resource sharing within your organization so accounts and OUs can access shared resources without manual acceptance: &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ram enable-sharing-with-aws-organization&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;After you turn on sharing in your organization, you need the &lt;a href="https://docs.aws.amazon.com/ram/latest/userguide/tshoot-access-denied.html" target="_blank" rel="noopener noreferrer"&gt;following IAM permissions&lt;/a&gt; to create resource shares:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;ram:CreateResourceShare&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;ram:AssociateResourceShare&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;ram:GetResourceShares&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Now that you’ve completed the prerequisites, you’ll learn how to share the Capacity Blocks for ML to other accounts of your organization.&lt;/p&gt; 
&lt;h2&gt;Tutorial&lt;/h2&gt; 
&lt;p&gt;You’ll complete this sharing process in four steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create a resource share.&lt;/li&gt; 
 &lt;li&gt;Attach Capacity Block to the resource share.&lt;/li&gt; 
 &lt;li&gt;Verify the share in your consumer account.&lt;/li&gt; 
 &lt;li&gt;Monitor the resource share.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Verify Capacity Reservation (console)&lt;/h2&gt; 
&lt;ol&gt; 
 &lt;li&gt;In your Capacity Block owner’s account, navigate to the &lt;a href="https://console.aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Capacity Reservations.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Confirm your Capacity Blocks for ML is in Active or Scheduled state.&lt;/li&gt; 
 &lt;li&gt;If you have a Resource share already configured, choose &lt;strong&gt;Actions&lt;/strong&gt;, &lt;strong&gt;Share &lt;/strong&gt;and select your Resource share.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25583.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25988 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25583.png" alt="" width="1209" height="142"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 3: EC2 Capacity Reservation&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Share Capacity Blocks for ML (console)&lt;/h2&gt; 
&lt;p&gt;You now will create a Resource Share and associate the following resources.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/ram/" target="_blank" rel="noopener noreferrer"&gt;AWS RAM console&lt;/a&gt; in your Capacity Block owner’s account.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Resource shares&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create resource share&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25584.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25989 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25584.png" alt="Create Resource share in AWS RAM Console" width="1211" height="587"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 4: Create Resource share in AWS RAM&lt;/em&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Enter a name for your resource share.&lt;/li&gt; 
 &lt;li&gt;Under Select resource type, choose &lt;strong&gt;Capacity Reservations&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select your Capacity Block from the list.&lt;/li&gt; 
 &lt;li&gt;Under Principals, specify the accounts, organizational units, or organization to share with.&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25585.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25990 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25585.png" alt="Select principals to share resources in AWS RAM." width="1211" height="342"&gt;&lt;/a&gt;&lt;em&gt;Figure 5: Select principals to share resources with&lt;/em&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create resource share&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Share Capacity Blocks for ML (AWS CLI)&lt;/h2&gt; 
&lt;p&gt;Replace the placeholder values in the following CLI commands below with your actual values:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;arn:aws:ec2:us-east-2:123456789012:capacity-reservation/cr-1234abcd56EXAMPLE → Your Capacity Reservation ARN&lt;/li&gt; 
 &lt;li&gt;111122223333 → The AWS account ID of the principal you’re sharing with&lt;/li&gt; 
 &lt;li&gt;arn:aws:ram:us-east-2:123456789012:resource-share/7ab63972-b505-7e2a-420d-6f5d3EXAMPLE → Your RAM resource share ARN&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create resource share with Capacity Block and principals:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ram create-resource-share \
         --name capacity-block-share \
         --resource-arns arn:aws:ec2:us-east-2:123456789012:capacity-reservation/cr-1234abcd56EXAMPLE \ 
         --principals 111122223333&lt;/code&gt;&lt;/pre&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;To add a Capacity Block to existing resource share:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ram associate-resource-share \
         --resource-share-arn arn:aws:ram:us-east-2:123456789012:resource-share/7ab63972-b505-7e2a-420d-6f5d3EXAMPLE \
         --resource-arns arn:aws:ec2:us-east-2:123456789012:capacity-reservation/cr-1234abcd56EXAMPLE&lt;/code&gt;&lt;code class="lang-bash"&gt;
&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Access and Launch shared Capacity Blocks (console)&lt;/h2&gt; 
&lt;p&gt;After you add the Capacity Block to a resource share, your consumer accounts automatically gain access when you share the Capacity Block within the same AWS Organization.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/ram/" target="_blank" rel="noopener noreferrer"&gt;AWS RAM console&lt;/a&gt; in your consumer account.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Shared with me&lt;/strong&gt;, &lt;strong&gt;Resource shares&lt;/strong&gt;. Verify the Resource share is Active.&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25586.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25991 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25586.png" alt="Within your consumer account, verify resource share." width="1210" height="589"&gt;&lt;/a&gt;&lt;em&gt;Figure 6: In your consumer account, verify the resource share&lt;/em&gt;&lt;/li&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 console&lt;/a&gt;. In the left navigation pane, choose &lt;strong&gt;Capacity Reservations.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Confirm the shared Capacity Block appears and is in Active or Scheduled state. Because sharing is asynchronous, the Capacity Block may take a few moments to appear even after the resource share shows Active.&lt;/li&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 console&lt;/a&gt; and choose &lt;strong&gt;Launch instance&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Configure your instance as required (AMI, instance type, key pair, etc.).&lt;/li&gt; 
 &lt;li&gt;Under Advanced details, for Purchasing option, choose Capacity Blocks.&lt;/li&gt; 
 &lt;li&gt;For Capacity reservation, choose Specify Capacity Reservation.&lt;/li&gt; 
 &lt;li&gt;For Capacity reservation targeted ID, select or enter your Capacity Block reservation ID.&lt;/li&gt; 
 &lt;li&gt;Launch the instance.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Access shared Capacity Blocks (AWS CLI)&lt;/p&gt; 
&lt;p&gt;Replace the placeholder values in the following CLI commands below with your actual values:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;ami-0abcdef1234567890 → Your AMI ID&lt;/li&gt; 
 &lt;li&gt;cr-0c54f6734d944345a → Your Capacity Reservation ID&lt;/li&gt; 
&lt;/ul&gt; 
&lt;ol&gt; 
 &lt;li&gt;List resource shares in your consumer account:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ram get-resource-shares --resource-owner OTHER-ACCOUNTS&lt;/code&gt;&lt;/pre&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Verify that capacity reservation is available:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ec2 describe-capacity-reservations&lt;/code&gt;&lt;/pre&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Launch EC2 instance from Capacity Block:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ec2 run-instances \
         --image-id ami-0abcdef1234567890 \
         --count 1 \
         --instance-type p5.48xlarge \
         --key-name my-key-pair \
         --subnet-id subnet-0abcdef1234567890 \
         --instance-market-options MarketType='capacity-block' \
         --capacity-reservation-specification CapacityReservationTarget={CapacityReservationId=cr-0c54f6734d944345a}&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Monitor usage (console)&lt;/h2&gt; 
&lt;p&gt;You can create &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; alarms to proactively identify low utilization of your Capacity Block. This helps you to improve the usage of your capacity reservation. This section shows you how to create an &lt;a href="https://docs.aws.amazon.com/sns/latest/dg/sns-create-topic.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Notification Service (Amazon SNS)&lt;/a&gt; email notification when the number of running instances drops below a certain threshold.&lt;/p&gt; 
&lt;p&gt;In addition to monitoring usage, AWS CloudTrail logs capture API events related to your Capacity Block, including the CapacityReservationId. As the owner, you can see which accounts are consuming instances and when.&lt;/p&gt; 
&lt;p&gt;Step 1: Create an SNS Topic for Notifications&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://console.aws.amazon.com/sns/" target="_blank" rel="noopener noreferrer"&gt;Amazon SNS console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Topics.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create topic.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Type&lt;/strong&gt;, select &lt;strong&gt;Standard.&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25587.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25992 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25587.png" alt="Create SNS Topic for CloudWatch alarm." width="1209" height="362"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 7: Create SNS Topic&lt;/em&gt;&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;For &lt;strong&gt;Name&lt;/strong&gt;, enter &lt;code&gt;capacity-block-alerts.&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create topic.&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Step 2: Create an SNS Subscription:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Create subscription.&lt;/strong&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25588.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25993 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25588.png" alt="Create SNS Subscription" width="1211" height="455"&gt;&lt;/a&gt;&lt;em&gt;Figure 8: Create SNS Subscription&lt;/em&gt;&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Protocol&lt;/strong&gt;, choose &lt;strong&gt;Email.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Endpoint&lt;/strong&gt;, enter your email address.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create subscription.&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Step 3: Create the CloudWatch Alarm&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Alarms&lt;/strong&gt;, &lt;strong&gt;All alarms.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create alarm.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Select metric.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;EC2 Capacity Reservations.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;By Capacity Reservation.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Find your Capacity Block ID (e.g., cr-12345678abcdef).&lt;/li&gt; 
 &lt;li&gt;Select the checkbox next to &lt;strong&gt;InstanceUtilization.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Select metric.&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Step 4: Configure the Metric&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Under &lt;strong&gt;Metric&lt;/strong&gt;:&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Statistic&lt;/strong&gt;: Select &lt;strong&gt;Average.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;For&lt;strong&gt; Period&lt;/strong&gt;: Select &lt;strong&gt;5 minutes.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Under &lt;strong&gt;Conditions&lt;/strong&gt; choose &lt;strong&gt;Threshold type&lt;/strong&gt;: Select &lt;strong&gt;Static.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Whenever&lt;strong&gt; InstanceUtilization is&lt;/strong&gt;…: Select &lt;strong&gt;Lower than&lt;/strong&gt;…: Enter &lt;strong&gt;20 &lt;/strong&gt;(This metric is measured in percentage).&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Next.&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Step 5: Configure Actions&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Under &lt;strong&gt;Notifications:&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Alarm state trigger&lt;/strong&gt;: Select &lt;strong&gt;In alarm.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Select an SNS topic:&lt;/strong&gt; Choose &lt;strong&gt;Select an existing SNS topic.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Send a notification to&lt;/strong&gt;…: Select capacity-block-alerts.&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25589.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25994 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/CB-25589.png" alt="Configure CloudWatch Alarm" width="1210" height="606"&gt;&lt;/a&gt;&lt;em&gt;Figure 9: Configure CloudWatch Alarm&lt;/em&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Next.&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Step 6: Name and Create Alarm&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;For &lt;strong&gt;Alarm name&lt;/strong&gt;, enter: CapacityBlock-LowUtilization-cr-123456789abcdef.&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Alarm description&lt;/strong&gt;, enter: Alert when Capacity Block utilization drops below 20%.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Next&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Review your configuration and choose &lt;strong&gt;Create alarm&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Monitor usage (AWS CLI)&lt;/h2&gt; 
&lt;p&gt;Replace the placeholder values in the following CLI commands below with your actual values:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;123456789012 → Your 12-digit AWS account number&lt;/li&gt; 
 &lt;li&gt;cr-0c54f6734d944345a → Your Capacity Reservation ID&lt;/li&gt; 
 &lt;li&gt;7ab63972-b505-7e2a-420d-6f5d3EXAMPLE → Your RAM resource share ID&lt;/li&gt; 
 &lt;li&gt;your_email@example.com → Your email address for notifications&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create the SNS topic:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws sns create-topic \
        --name capacity-block-alerts&lt;/code&gt;&lt;/pre&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Using the TopicArn from the output, subscribe your email:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws sns subscribe \
        --topic-arn arn:aws:sns:us-east-2:123456789012:capacity-block-alerts \
        --protocol email \
        --notification-endpoint your_email@example.com&lt;/code&gt;&lt;/pre&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Create the full CloudWatch alarm:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;    aws cloudwatch put-metric-alarm \
        --alarm-name "CapacityBlock-LowUtilization-cr-1234EXAMPLE" \
        --alarm-description "Alert when Capacity Block utilization drops below 20%" \
        --namespace "AWS/EC2CapacityReservations" \
        --metric-name "InstanceUtilization" \
        --dimensions Name=CapacityReservationId,Value=cr-0c54f6734d944345a \
        --statistic Average \
        --period 300 \
        --evaluation-periods 1 \
        --threshold 20 \
        --comparison-operator LessThanThreshold \
        --alarm-actions arn:aws:sns:us-east-2:123456789012:capacity-block-alerts&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Clean up (console)&lt;/h2&gt; 
&lt;p&gt;As the owner of the Capacity Block, you retain the ability to modify the resource share. However, owners cannot modify instances that consumers launch into Capacity Blocks they have shared. This section outlines how to clean up your previous work.&lt;/p&gt; 
&lt;p&gt;Using the &lt;a href="https://console.aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;:&lt;/p&gt; 
&lt;p&gt;Stop sharing the Capacity Block&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to &lt;a href="https://console.aws.amazon.com/ram/" target="_blank" rel="noopener noreferrer"&gt;AWS RAM console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation, choose &lt;strong&gt;Shared by me, Resource shares&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select your resource share.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Modify&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Remove the Capacity Block from the resource share or delete the entire resource share.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Delete the CloudWatch Alarm&lt;/p&gt; 
&lt;ol start="6"&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation, choose &lt;strong&gt;Alarms, All alarms&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select the alarm you created.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Actions&lt;/strong&gt;, &lt;strong&gt;Delete&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Confirm deletion.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Delete the SNS Topic and Subscription&lt;/p&gt; 
&lt;ol start="11"&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/sns/" target="_blank" rel="noopener noreferrer"&gt;Amazon SNS console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation, choose &lt;strong&gt;Subscriptions&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select the subscription and choose &lt;strong&gt;Delete&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation, choose &lt;strong&gt;Topics&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select capacity-block-alerts and choose &lt;strong&gt;Delete&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Confirm deletion.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Clean up (AWS CLI)&lt;/h2&gt; 
&lt;p&gt;Replace the placeholder values in the following CLI commands below with your actual values:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;123456789012 → Your 12-digit AWS account number&lt;/li&gt; 
 &lt;li&gt;7ab63972-b505-7e2a-420d-6f5d3EXAMPLE → Your RAM resource share ID&lt;/li&gt; 
 &lt;li&gt;cr-0c54f6734d944345a → Your Capacity Reservation ID&lt;/li&gt; 
 &lt;li&gt;a1b2c3d4-5678-90ab-cdef-EXAMPLE → Your SNS subscription ID&lt;/li&gt; 
&lt;/ul&gt; 
&lt;ol&gt; 
 &lt;li&gt;Remove the Capacity Block from the resource share &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ram disassociate-resource-share \
        --resource-share-arn arn:aws:ram:us-east-2:123456789012:resource-share/7ab63972-b505-7e2a-420d-6f5d3EXAMPLE \
        --resource-arns arn:aws:ec2:us-east-2:123456789012:capacity-reservation/cr-0c54f6734d944345a &lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Delete the resource share &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws ram delete-resource-share \
        --resource-share-arn arn:aws:ram:us-east-2:123456789012:resource-share/7ab63972-b505-7e2a-420d-6f5d3EXAMPLE &lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Delete the CloudWatch Alarm &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws cloudwatch delete-alarms \
         --alarm-names "CapacityBlock-LowUtilization-cr-123456789" &lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Delete the SNS Topic and Subscription 
  &lt;ol&gt; 
   &lt;li&gt;List subscriptions to get the subscription ARN &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws sns list-subscriptions-by-topic \
         --topic-arn arn:aws:sns:us-east-2:123456789012:capacity-block-alerts&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
   &lt;li&gt;Delete the subscription &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws sns unsubscribe \
         --subscription-arn arn:aws:sns:us-east-2:123456789012:capacity-block-alerts:a1b2c3d4-5678-90ab-cdef-EXAMPLE&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
   &lt;li&gt;Delete the Topic &lt;pre&gt;&lt;code class="lang-bash"&gt;    aws sns delete-topic \
         --topic-arn arn:aws:sns:us-east-2:123456789012:capacity-block-alerts&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we showed you how to share Capacity Blocks for ML across your AWS Organization using AWS RAM. We covered configuring the AWS RAM integration with Organizations, creating resource shares, and accessing shared Capacity Blocks for ML from consumer accounts. Finally, we showed you how to monitor and alert on low instance utilization.&lt;/p&gt; 
&lt;p&gt;By sharing Capacity Blocks across your organization, you can reduce idle GPU capacity, eliminate scheduling bottlenecks between teams, and maximize the return on your reserved compute investment. To take this further, consider building dashboards in Amazon CloudWatch to track utilization trends across multiple Capacity Blocks.&lt;/p&gt; 
&lt;p&gt;You can get started by &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-blocks-purchase.html" target="_blank" rel="noopener noreferrer"&gt;purchasing Capacity Blocks for ML&lt;/a&gt; and sharing it across your organization today. For more details on other resources you can share with AWS RAM, visit the &lt;a href="https://docs.aws.amazon.com/ram/latest/userguide/shareable.html" target="_blank" rel="noopener noreferrer"&gt;Shareable AWS resources&lt;/a&gt; in the user guide. If you have questions,&amp;nbsp;&lt;a href="https://aws.amazon.com/contact-us/" target="_blank" rel="noopener noreferrer"&gt;contact your AWS account team&lt;/a&gt; or leave a comment below.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Enhancing network observability with new AWS Outposts racks LAG metrics</title>
		<link>https://aws.amazon.com/blogs/compute/enhancing-network-observability-with-new-aws-outposts-racks-lag-metrics/</link>
					
		
		<dc:creator><![CDATA[Adam Duffield]]></dc:creator>
		<pubDate>Thu, 30 Apr 2026 19:14:52 +0000</pubDate>
				<category><![CDATA[AWS Outposts]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Launch]]></category>
		<guid isPermaLink="false">4f1658e82e875278e1fa128e3f99931ea274e563</guid>

					<description>When you deploy AWS Outposts racks, you can run AWS infrastructure and services in on-premises locations. Maintaining seamless connectivity, both to the AWS Region and your on-premises network, is fundamental to delivering consistent, uninterrupted service to your applications. Implementing an observability strategy that uses available network metrics is key to understanding the health of this […]</description>
										<content:encoded>&lt;p&gt;When you deploy &lt;a href="https://aws.amazon.com/outposts/rack/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts racks&lt;/a&gt;, you can run AWS infrastructure and services in on-premises locations. Maintaining seamless connectivity, both to the &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Region&lt;/a&gt; and your on-premises network, is fundamental to delivering consistent, uninterrupted service to your applications. Implementing an observability strategy that uses available network metrics is key to understanding the health of this connectivity.&lt;/p&gt; 
&lt;p&gt;In &lt;a href="https://aws.amazon.com/blogs/compute/improving-network-observability-with-new-aws-outposts-racks-network-metrics/" target="_blank" rel="noopener noreferrer"&gt;August 2025&lt;/a&gt;, we launched two new &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; metrics, &lt;code&gt;VifConnectionStatus&lt;/code&gt; and &lt;code&gt;VifBgpSessionState&lt;/code&gt;, that helped provide greater visibility into these Layer 3 networking constructs. However, insight into Layer 2 networking was still missing. AWS has released a new &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/outposts-cloudwatch-metrics.html#outposts-metrics" target="_blank" rel="noopener noreferrer"&gt;metric&lt;/a&gt; &lt;code&gt;LagStatus&lt;/code&gt;, that provides greater visibility into the hybrid infrastructure connectivity for both &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/index.html" target="_blank" rel="noopener noreferrer"&gt;first-generation&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/index.html" target="_blank" rel="noopener noreferrer"&gt;second-generation&lt;/a&gt; Outpost racks.&lt;/p&gt; 
&lt;h2&gt;Link aggregation group overview&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/local-rack.html#link-aggregation" target="_blank" rel="noopener noreferrer"&gt;Link aggregation&lt;/a&gt; combines multiple physical Ethernet connections into one logical link, referred to as a link aggregation group (LAG). This consolidation delivers benefits such as increased aggregate bandwidth and built-in redundancy through fault-tolerant connections between network devices. AWS Outposts uses LAG connections between Outpost network devices (ONDs) and customer network devices (CNDs). The links from each Outpost network device are aggregated into an Ethernet LAG to represent a single network connection.&lt;/p&gt; 
&lt;div id="attachment_25960" style="width: 1218px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25960" loading="lazy" class="size-full wp-image-25960" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/02/compute-2553-image-1.png" alt="Figure : Second-Generation Outposts Rack network connections" width="1208" height="646"&gt;
 &lt;p id="caption-attachment-25960" class="wp-caption-text"&gt;Figure : Second-Generation Outposts Rack network connections&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Each LAG between an Outpost network device and a customer local network device is configured as an IEEE 802.1q Ethernet trunk. This enables the use of multiple VLANs for network segmentation between data paths. Each Outpost has the following VLANs to communicate with local network devices:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Service link VLAN – Enables communication between the Outpost and customer network devices to establish a service link path to the AWS Region.&lt;/li&gt; 
 &lt;li&gt;Local gateway VLAN(s) – (If exists, and as single or multiple LGW routing domains), enables communication between Outpost and the customer network devices to establish a local gateway path to connect your Outpost subnets to the local area network.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;div id="attachment_25961" style="width: 1220px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25961" loading="lazy" class="size-full wp-image-25961" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/02/compute-2553-image-2.png" alt="Figure : Second-Generation Outposts Rack VLAN layout" width="1210" height="579"&gt;
 &lt;p id="caption-attachment-25961" class="wp-caption-text"&gt;Figure : Second-Generation Outposts Rack VLAN layout&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Using the LagStatus metric&lt;/h2&gt; 
&lt;p&gt;The new &lt;code&gt;LagStatus&lt;/code&gt; metric in CloudWatch provides visibility into the operational status of LAG connections between Outposts networking devices and on-premises infrastructure. The metric reports a binary status (1 for the LAG being UP, 0 for the LAG being down) and includes the &lt;code&gt;OutpostId&lt;/code&gt; and &lt;code&gt;LagId&lt;/code&gt; as dimensions to quickly identify non-operational resources.&lt;/p&gt; 
&lt;p&gt;You can view this metric on the CloudWatch console. As with all operational telemetry, access to these metrics should be appropriately restricted to authorized principals. The metric data points are published at 5-minute intervals, and like all CloudWatch metrics, there might be a time lag in the metric data being published. In the navigation pane, choose&amp;nbsp;&lt;strong&gt;All metrics&lt;/strong&gt;, followed by&amp;nbsp;&lt;strong&gt;Outposts&lt;/strong&gt;&amp;nbsp;under the AWS namespaces section. The Outposts namespace can only be viewed by the Outposts owner account, unless CloudWatch&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Unified-Cross-Account.html" target="_blank" rel="noopener noreferrer"&gt;cross-account observability&lt;/a&gt;&amp;nbsp;is configured.&lt;/p&gt; 
&lt;div id="attachment_25962" style="width: 1219px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25962" loading="lazy" class="size-full wp-image-25962" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/02/compute-2553-image-3.png" alt="Figure : CloudWatch Metrics view of the LagStatus metric" width="1209" height="468"&gt;
 &lt;p id="caption-attachment-25962" class="wp-caption-text"&gt;Figure : CloudWatch Metrics view of the LagStatus metric&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;While the &lt;code&gt;LagStatus&lt;/code&gt; metric alone provides insight into the Outposts network connectivity, combining it with &lt;code&gt;VifConnectionStatus&lt;/code&gt; and &lt;code&gt;VifBgpSessionState&lt;/code&gt; delivers more immediate, actionable insights that expedite troubleshooting. In addition, to improve the clarity of the existing metrics, the related &lt;code&gt;LagId&lt;/code&gt; is added as a new Outposts metric dimension. By observing the values of all three metrics, you can narrow down the potential cause of any issues. The following table gives some possible connectivity issue scenarios and how they can be identified using these metrics:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;LagStatus&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;LGW BGP&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;ServiceLink BGP&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Potential issue&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Recommended state – all components working&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DOWN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ServiceLink BGP issue – configuration issue&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DOWN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;LGW BGP issue – configuration issue&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;UP&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DOWN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DOWN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Both BGP sessions down – configuration issue&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DOWN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DOWN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DOWN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lag configuration issue or Physical failure&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;With these metrics, you can use &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/alarm-combining.html" target="_blank" rel="noopener noreferrer"&gt;CloudWatch Composite Alarms&lt;/a&gt; to alert operational teams when any of the components aren’t running as expected.&lt;/p&gt; 
&lt;p&gt;To create a composite alarm, alarms must first be defined for all three of the individual metrics. This can be done from the console, CLI, or AWS CloudFormation. Following the principle of least privilege, ensure that IAM permissions are restricted to the minimum actions required for CloudWatch alarm creation. For more information, see the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/permissions-reference-cw.html" target="_blank" rel="noopener noreferrer"&gt;CloudWatch documentation&lt;/a&gt;. If you prefer, you can configure these individual alarms without notification actions enabled to reduce potential notification noise. Each &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/vif-vif-groups.html" target="_blank" rel="noopener noreferrer"&gt;virtual interface (VIF)&lt;/a&gt; has its own set of metrics, so you would need to configure alarms for all VIFs used with your Outpost. The number of total VIFs will vary depending on the Outpost generation that’s deployed because of the different networking architectures.&lt;/p&gt; 
&lt;p&gt;First-generation Outposts racks use four VIFs per rack (two for Service Link, two for Local Gateway). Second-generation racks require a minimum of eight VIFs (four for Service Link, four for Local Gateway), because they support &lt;a href="https://aws.amazon.com/blogs/compute/simplify-network-segmentation-for-aws-outposts-racks-with-multiple-local-gateway-routing-domains/" target="_blank" rel="noopener noreferrer"&gt;multiple local gateway routing domains&lt;/a&gt;, each with its own VIFs.&lt;/p&gt; 
&lt;p&gt;An example alarm configuration as seen in the console for a single VIF is shown in the following figure 4.&lt;/p&gt; 
&lt;div id="attachment_25963" style="width: 1220px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25963" loading="lazy" class="size-full wp-image-25963" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/02/compute-2553-image-4.png" alt="Figure : Individual CloudWatch alarms for VIF status" width="1210" height="395"&gt;
 &lt;p id="caption-attachment-25963" class="wp-caption-text"&gt;Figure : Individual CloudWatch alarms for VIF status&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;After these individual alarms are created, a composite alarm can be created that monitors for any of the component metrics going into an alarm status. In the following example, the AWS Command Line Interface (AWS CLI) is used to create the composite alarm called composite-alarm-lag1 and send a notification using an &lt;a href="https://aws.amazon.com/sns/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Notification Service (Amazon SNS)&lt;/a&gt; topic called outpost-network-alarms. As this topic carries infrastructure health data, it’s recommended to encrypt it using an &lt;a href="https://aws.amazon.com/kms/" target="_blank" rel="noopener noreferrer"&gt;AWS Key Management Service&lt;/a&gt; key and restrict the subscription policy to authorized principals.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws cloudwatch put-composite-alarm \
  --alarm-name "composite-alarm-lag1" \
  --alarm-rule "ALARM(VifBgpSessionState-lgw-vif-xxxxxxxxxxxx) OR ALARM(VifConnectionStatus-lgw-vif-xxxxxxxxxxxx) OR ALARM(VifBgpSessionState-sl-vif-xxxxxxxxxxxx) OR ALARM(VifConnectionStatus-sl-vif-xxxxxxxxxxxx) OR ALARM(LagStatus-op-lag-xxxxxxxxxxxx)" \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:outpost-network-alarms \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;You can use this granular monitoring to quickly identify and troubleshoot connectivity issues, particularly in scenarios where LAG status is up but VIF BGP status is down.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;This post provides details about the newly released &lt;code&gt;LagStatus&lt;/code&gt; CloudWatch metric, and how this metric can be used with existing metrics such as &lt;code&gt;VifConnectionStatus&lt;/code&gt; and &lt;code&gt;VifBgpSessionState&lt;/code&gt; to build a comprehensive network connectivity observability solution. The &lt;code&gt;LagStatus&lt;/code&gt; metric is now available in all commercial AWS Regions and the AWS GovCloud (US-East) and AWS GovCloud (US-West) Regions where Outposts racks are available, for both first-generation and second-generation racks at no additional cost.&lt;/p&gt; 
&lt;p&gt;For more information about Outposts rack networking patterns, see the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/whitepapers/latest/aws-outposts-high-availability-design/networking.html" target="_blank" rel="noopener noreferrer"&gt;Networking&lt;/a&gt;&amp;nbsp;section of the Outposts High Availability Design and Architecture Considerations whitepaper.&lt;/p&gt; 
&lt;p&gt;Reach out to your AWS account team, or fill out this&amp;nbsp;&lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;form&lt;/a&gt;&amp;nbsp;to learn more about observability for Outposts.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Serverless ICYMI Q1 2026</title>
		<link>https://aws.amazon.com/blogs/compute/serverless-icymi-q1-2026/</link>
					
		
		<dc:creator><![CDATA[Julian Wood]]></dc:creator>
		<pubDate>Thu, 30 Apr 2026 15:58:24 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Amazon DynamoDB]]></category>
		<category><![CDATA[Amazon Elastic Container Service]]></category>
		<category><![CDATA[Amazon EventBridge]]></category>
		<category><![CDATA[Amazon Simple Queue Service (SQS)]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Serverless Application Model]]></category>
		<category><![CDATA[AWS Step Functions]]></category>
		<category><![CDATA[Kiro]]></category>
		<category><![CDATA[Serverless]]></category>
		<category><![CDATA[Strands Agents]]></category>
		<category><![CDATA[serverless]]></category>
		<category><![CDATA[Serverless ICYMI]]></category>
		<guid isPermaLink="false">640e5539dadf26b65d625708aaf5ea9e98adaddb</guid>

					<description>Stay current with the latest serverless innovations that can improve your applications. In this 32nd quarterly recap, discover the most impactful AWS serverless launches, features, and resources from Q1 2026 that you might have missed. In case you missed our last ICYMI, check out what happened in Q4 2025. 2026 Q1 calendar Serverless with Mama […]</description>
										<content:encoded>&lt;p&gt;Stay current with the latest serverless innovations that can improve your applications. In this 32nd quarterly recap, discover the most impactful AWS serverless launches, features, and resources from Q1 2026 that you might have missed.&lt;/p&gt; 
&lt;p&gt;In case you missed our last ICYMI, check out what happened in &lt;a href="https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2025/" target="_blank" rel="noopener noreferrer"&gt;Q4 2025&lt;/a&gt;.&lt;/p&gt; 
&lt;div id="attachment_26177" style="width: 597px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/2026-Q1-calendar.png"&gt;&lt;img aria-describedby="caption-attachment-26177" loading="lazy" class="size-full wp-image-26177" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/2026-Q1-calendar.png" alt="2026 Q1 calendar" width="587" height="151"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-26177" class="wp-caption-text"&gt;2026 Q1 calendar&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Serverless with Mama J&lt;/h2&gt; 
&lt;div style="text-align: center"&gt; 
 &lt;iframe loading="lazy" title="I Explained Serverless to My Mom (She Got It)" width="500" height="281" src="https://www.youtube-nocookie.com/embed/vg1Q1to4qoE?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen sandbox="allow-scripts allow-same-origin"&gt;&lt;/iframe&gt;
 &lt;br&gt; 
 &lt;i&gt;Serverless with Mama J&lt;/i&gt; 
&lt;/div&gt; 
&lt;p&gt;If you really want to know whether you understand something, try explaining it to your mom!&lt;/p&gt; 
&lt;p&gt;That’s exactly what Eric Johnson did. His mom, everyone calls her Mama J, wanted to know what serverless actually means and why it matters. So he walked her through it: what servers do, why they’re a headache to manage, and how &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; lets you skip all that by running code only when it’s needed, scaling automatically, and charging you nothing when nobody’s using it.&lt;/p&gt; 
&lt;p&gt;Watch the video on the &lt;a href="https://www.youtube.com/watch?v=vg1Q1to4qoE" target="_blank" rel="noopener noreferrer"&gt;AWS Developers YouTube channel&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Build serverless apps faster with AI&lt;/h2&gt; 
&lt;p&gt;AWS is providing a growing set of AI-powered tools to bring serverless expertise directly into your coding assistants. From &lt;a href="https://github.com/awslabs/mcp" target="_blank" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt; servers and Anthropic Claude &lt;a href="https://claude.com/plugins/aws-serverless" target="_blank" rel="noopener noreferrer"&gt;plugins&lt;/a&gt; to &lt;a href="https://kiro.dev/powers/" target="_blank" rel="noopener noreferrer"&gt;Kiro Powers&lt;/a&gt;. These tools provide contextual guidance for architecture decisions, implementation patterns, and deployment automation across the full serverless development lifecycle.&lt;/p&gt; 
&lt;p&gt;For more information on the tools available, see the &lt;a href="https://serverlessland.com/explore/ai-dev-tools" target="_blank" rel="noopener noreferrer"&gt;resources page&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Serverless Patterns Collection&lt;/h2&gt; 
&lt;p&gt;The open source &lt;a href="https://serverlessland.com/patterns" target="_blank" rel="noopener noreferrer"&gt;Serverless Patterns Collection&lt;/a&gt; on &lt;a href="https://serverlessland.com/" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt; now provides a direct link to download pattern .zip files. You can also clone the &lt;a href="https://github.com/aws-samples/serverless-patterns" target="_blank" rel="noopener noreferrer"&gt;whole repo&lt;/a&gt; and explore more patterns.&lt;/p&gt; 
&lt;div id="attachment_26180" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Serverless-Patterns-.zip-download.png"&gt;&lt;img aria-describedby="caption-attachment-26180" loading="lazy" class="size-large wp-image-26180" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Serverless-Patterns-.zip-download-1024x363.png" alt="Serverless Patterns .zip download" width="1024" height="363"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-26180" class="wp-caption-text"&gt;Serverless Patterns .zip download&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;AWS Lambda&lt;/h2&gt; 
&lt;p&gt;Build fault-tolerant, long-running applications using familiar programming patterns using &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;durable functions&lt;/a&gt;. You can use Lambda durable functions to write multi-step workflows in your preferred programming language, using built-in methods that automatically handle progress checkpointing and error recovery. This can improve your architecture so that you can focus on your business logic and optimize costs by charging only for active compute time.&lt;/p&gt; 
&lt;p&gt;You can build durable functions in Python and TypeScript and there is a &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/02/lambda-durable-execution-java-preview/" target="_blank" rel="noopener noreferrer"&gt;durable execution SDK for Java&lt;/a&gt; in preview with the &lt;a href="https://github.com/aws/aws-durable-execution-sdk-java/" target="_blank" rel="noopener noreferrer"&gt;code available on GitHub&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Eric Johnson has a &lt;a href="https://www.youtube.com/watch?v=M-R6JLS3I2k" target="_blank" rel="noopener noreferrer"&gt;new video deep dive&lt;/a&gt; showing how to upload videos and scan them with AI. Learn how to coordinate multiple AWS services like Amazon Rekognition and Amazon Transcribe, implement human-in-the-loop approval workflows, and crate a live dashboard for real-time updates.&lt;/p&gt; 
&lt;p&gt;To find out how durable functions work, see the &lt;a href="https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt; which also provides testing and best practices guidance. You can also watch the re:Invent Breakout Session video: &lt;a href="https://www.youtube.com/watch?v=XJ80NBOwsow" target="_blank" rel="noopener noreferrer"&gt;Deep Dive on AWS Lambda durable functions (CNS380)&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Lambda &lt;a href="https://aws.amazon.com/blogs/compute/net-10-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;now supports the .NET 10 runtime&lt;/a&gt;, including support for file-based apps. Developers can take advantage of the latest .NET 10 performance improvements, new language features, and improved startup times for Lambda functions.&lt;/p&gt; 
&lt;p&gt;You can now see &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-availability-zone-metadata/" target="_blank" rel="noopener noreferrer"&gt;Availability Zone (AZ) metadata&lt;/a&gt; in function execution environments. This allows you to determine&amp;nbsp;the AZ ID (e.g., use1-az1) of the AZ your function is running in. This helps build functions that can make AZ-aware routing decisions,&amp;nbsp;such as preferring same-AZ endpoints for downstream services to reduce cross-AZ latency.&amp;nbsp;Operators can also implement AZ-aware resilience patterns like&amp;nbsp;AZ-specific&amp;nbsp;fault injection testing.&lt;/p&gt; 
&lt;h2&gt;Payload size increase&lt;/h2&gt; 
&lt;p&gt;AWS has increased the maximum payload size from 256 KB to 1 MB for a number of services such as asynchronous Lambda invocations, &lt;a href="https://aws.amazon.com/sqs/pricing/" target="_blank" rel="noopener noreferrer"&gt;Amazon SQS&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/eventbridge" target="_blank" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt;. This gives you more room to build and maintain context-rich event-driven systems and reduce the need for complex workarounds such as data chunking or external large object storage.&lt;/p&gt; 
&lt;p&gt;This &lt;a href="https://aws.amazon.com/blogs/compute/more-room-to-build-serverless-services-now-support-payloads-up-to-1-mb/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt; explores a real-world example using rich event context in agentic event-driven architectures&lt;/p&gt; 
&lt;div id="attachment_26179" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Payload-size-increase-workflow.png"&gt;&lt;img aria-describedby="caption-attachment-26179" loading="lazy" class="size-large wp-image-26179" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Payload-size-increase-workflow-1024x475.png" alt="Payload size increase workflow" width="1024" height="475"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-26179" class="wp-caption-text"&gt;Payload size increase workflow&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Amazon Bedrock&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; expanded its model availability with a &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-bedrock-adds-support-six-open-weights-models/" target="_blank" rel="noopener noreferrer"&gt;new set of fully managed open-weight models&lt;/a&gt; spanning frontier reasoning and agentic coding. Other model releases include &lt;a href="https://www.aboutamazon.com/news/aws/anthropic-claude-4-opus-sonnet-amazon-bedrock" target="_blank" rel="noopener noreferrer"&gt;Anthropic Claude Opus 4.6 and Claude Sonnet 4.6&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/nvidia-nemotron-3-super-amazon-bedrock/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA Nemotron 3 Super&lt;/a&gt;. You can invoke them through the unified Amazon Bedrock API without managing any underlying infrastructure, making it straightforward to experiment and swap models as your workload evolves.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/agentcore/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt; is the infrastructure layer for securely deploying and operating AI agents. It works with popular open source frameworks, including &lt;a href="https://strandsagents.com/" target="_blank" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;, &lt;a href="https://www.langchain.com/langgraph" target="_blank" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; and &lt;a href="https://www.crewai.com/" target="_blank" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, giving you the flexibility to build with your preferred tools without vendor lock-in.&lt;/p&gt; 
&lt;p&gt;AgentCore Gateway now includes &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-using-mcp-semantic-search.html" target="_blank" rel="noopener noreferrer"&gt;semantic tool search&lt;/a&gt;, so you can discover the right tool for a task using natural language queries instead of manually browsing a catalogue. It also adds custom KMS encryption, debugging messages, and resource tagging to give you stronger governance over tool integrations.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/policy-amazon-bedrock-agentcore-generally-available/" target="_blank" rel="noopener noreferrer"&gt;Policy in Bedrock AgentCore&lt;/a&gt; allows you to define precise boundaries on agent actions and run continuous quality monitoring. This helps you maintain predictable, auditable agent behavior in production without embedding guardrail logic inside each individual agent.&lt;/p&gt; 
&lt;p&gt;AgentCore Runtime now &lt;a href="Introducing%20stateful%20MCP%20client%20capabilities%20on%20Amazon%20Bedrock%20AgentCore%20Runtime" target="_blank" rel="noopener noreferrer"&gt;supports stateful MCP server features&lt;/a&gt;, allowing agents to maintain session context across tool calls for richer, more coherent multi-step interactions.&lt;/p&gt; 
&lt;h2&gt;Strands Agents&lt;/h2&gt; 
&lt;div id="attachment_26182" style="width: 740px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Strands-Agents-SDK.png"&gt;&lt;img aria-describedby="caption-attachment-26182" loading="lazy" class="size-full wp-image-26182" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Strands-Agents-SDK.png" alt="Strands Agents SDK" width="730" height="217"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-26182" class="wp-caption-text"&gt;Strands Agents SDK&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;&lt;a href="https://strandsagents.com/" target="_blank" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; is an open source SDK for building and running AI agents in just a few lines of code, working with models available in Amazon Bedrock. &lt;a href="https://aws.amazon.com/blogs/opensource/introducing-strands-labs-get-hands-on-today-with-state-of-the-art-experimental-approaches-to-agentic-development/" target="_blank" rel="noopener noreferrer"&gt;Strand&lt;/a&gt;&lt;a id="_Hlt227341361" target="_blank" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a id="_Hlt227341360" target="_blank" rel="noopener noreferrer"&gt;s Labs is a new dedicated GitHub organization for experimental agent projects, including robotics and code agents. This gives you early access to cutting-edge agentic techniques before they reach production frameworks. See the &lt;/a&gt;&lt;a href="https://aws.amazon.com/blogs/opensource/introducing-strands-labs-get-hands-on-today-with-state-of-the-art-experimental-approaches-to-agentic-development/" target="_blank" rel="noopener noreferrer"&gt;introduction blog post&lt;/a&gt; for more information.&lt;/p&gt; 
&lt;h2&gt;AWS Step Functions&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/step-functions/" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt; introduces &lt;a href="https://aws.amazon.com/blogs/compute/testing-step-functions-workflows-a-guide-to-the-enhanced-teststate-api/" target="_blank" rel="noopener noreferrer"&gt;an enhanced TestState API&lt;/a&gt; that enables API-based testing for validating workflows before deployment. The new API supports testing individual states in isolation or complete workflows end-to-end, making it easier to verify state machine logic without incurring runtime costs.&lt;/p&gt; 
&lt;p&gt;By integrating TestState API testing into CI/CD pipelines, you can validate workflow logic before deployment, reducing the risk of production issues. Find complete code examples and testing framework in the &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Amazon EventBridge&lt;/h2&gt; 
&lt;p&gt;Amazon EventBridge Scheduler now provides &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-eventbridge-scheduler-resource-metrics/" target="_blank" rel="noopener noreferrer"&gt;resource count metrics to help you monitor quota usage&lt;/a&gt;. These new metrics make it easier to track the number of schedules and schedule groups in your account and proactively manage service quotas.&lt;/p&gt; 
&lt;h2&gt;Amazon DynamoDB&lt;/h2&gt; 
&lt;p&gt;You can replicate &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt; table data &lt;a href="https://aws.amazon.com/blogs/database/amazon-dynamodb-global-tables-now-support-replication-across-aws-accounts/" target="_blank" rel="noopener noreferrer"&gt;across multiple AWS accounts and Regions&lt;/a&gt;. This enhances resiliency through account-level isolation, supports tailored security and data-perimeter controls. You can align workloads by business unit or environment and simplify governance requirements.&lt;/p&gt; 
&lt;div id="attachment_26178" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Amazon-DynamoDB-global-replication.png"&gt;&lt;img aria-describedby="caption-attachment-26178" loading="lazy" class="size-large wp-image-26178" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/30/Amazon-DynamoDB-global-replication-1024x582.png" alt="Amazon DynamoDB global replication" width="1024" height="582"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-26178" class="wp-caption-text"&gt;Amazon DynamoDB global replication&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Amazon ECS&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ecs/managed-instances/" target="_blank" rel="noopener noreferrer"&gt;Amazon ECS Managed Instances&lt;/a&gt; can now &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/02/ecs-mi-ec2-capacity-reservations/" target="_blank" rel="noopener noreferrer"&gt;integrate with Amazon EC2 Capacity Reservations&lt;/a&gt;. This allows you to make sure there is capacity availability for your container workloads while benefiting from the management automation of ECS Managed Instances.&lt;/p&gt; 
&lt;p&gt;ECS also now supports &lt;a href="https://aws.amazon.com/elasticloadbalancing/network-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;Network Load Balancer (NLB)&lt;/a&gt; for linear and canary deployment strategies. This helps you perform gradual traffic shifting using NLBs, providing more flexibility in deployment pipelines for latency-sensitive applications.&lt;/p&gt; 
&lt;h2&gt;Serverless blog posts&lt;/h2&gt; 
&lt;h3&gt;January&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/net-10-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;.NET 10 runtime now available in AWS Lambda&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2025/" target="_blank" rel="noopener noreferrer"&gt;Serverless ICYMI Q4 2025&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/more-room-to-build-serverless-services-now-support-payloads-up-to-1-mb/" target="_blank" rel="noopener noreferrer"&gt;More room to build: serverless services now support payloads up to 1 MB&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;February&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;Building fault-tolerant applications with AWS Lambda durable functions&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/optimizing-compute-intensive-serverless-workloads-with-multi-threaded-rust-on-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Optimizing Compute-Intensive Serverless Workloads with Multi-threaded Rust on AWS Lambda&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;March&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/enabling-high-availability-of-amazon-ec2-instances-on-aws-outposts-servers-part-3/" target="_blank" rel="noopener noreferrer"&gt;Enabling high availability of Amazon EC2 instances on AWS Outposts servers (Part 3)&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/testing-step-functions-workflows-a-guide-to-the-enhanced-teststate-api/" target="_blank" rel="noopener noreferrer"&gt;Testing Step Functions workflows: a guide to the enhanced TestState API&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Serverless Office Hours&lt;/h2&gt; 
&lt;p&gt;Join our livestream every Tuesday at 11 AM PT for live discussions, Q&amp;amp;A sessions, and deep dives into serverless technologies. Watch episodes on-demand at &lt;a href="https://serverlessland.com/office-hours" target="_blank" rel="noopener noreferrer"&gt;serverlessland.com/office-hours&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;January&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Jan 7 – &lt;a href="https://www.youtube.com/watch?v=OOyPRuIuA5w" target="_blank" rel="noopener noreferrer"&gt;New: Amazon API Gateway response streaming&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Jan 14 – &lt;a href="https://www.youtube.com/watch?v=uwtOT_7I-fc" target="_blank" rel="noopener noreferrer"&gt;What’s New: AWS Lambda event source mappings&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Jan 21 – &lt;a href="https://www.youtube.com/watch?v=wicD8G0rn1Y" target="_blank" rel="noopener noreferrer"&gt;New: AWS Lambda tenant isolation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Jan 28 – &lt;a href="https://www.youtube.com/watch?v=2mdvA3mrksw" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions Local Testing&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;February&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Feb 4 – &lt;a href="https://www.youtube.com/watch?v=j2gGDtZInBk" target="_blank" rel="noopener noreferrer"&gt;App Modernization with CDK Blueprints&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Feb 11 – &lt;a href="https://www.youtube.com/watch?v=BwhD0EoRE04" target="_blank" rel="noopener noreferrer"&gt;Observability for Distributed Systems&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Feb 18 – &lt;a href="https://www.youtube.com/watch?v=my2bQtHBUeY" target="_blank" rel="noopener noreferrer"&gt;AI &amp;amp; Java&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Feb 25 – &lt;a href="https://www.youtube.com/watch?v=l8VIMB1g9Zo" target="_blank" rel="noopener noreferrer"&gt;AI for content creators&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;March&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Mar 11 – &lt;a href="https://www.youtube.com/watch?v=dw2iHHau7Jw" target="_blank" rel="noopener noreferrer"&gt;Serverless resilience: A practitioner’s guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Mar 18 – &lt;a href="https://www.youtube.com/watch?v=1m8BwxmT7Zc" target="_blank" rel="noopener noreferrer"&gt;Analytics for Modern Data Lakes &amp;amp; AI&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Mar 24 – &lt;a href="https://www.youtube.com/watch?v=foYaB6_hd8w" target="_blank" rel="noopener noreferrer"&gt;AWS MCP server&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Still looking for more?&lt;/h2&gt; 
&lt;p&gt;The&amp;nbsp;&lt;a href="http://aws.amazon.com/serverless" target="_blank" rel="noopener noreferrer"&gt;Serverless landing page&lt;/a&gt;&amp;nbsp;has overall information about building serverless applications. The&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/resources/?aws-lambda-resources-blog.sort-by=item.additionalFields.createdDate&amp;amp;aws-lambda-resources-blog.sort-order=desc" target="_blank" rel="noopener noreferrer"&gt;Lambda resources page&lt;/a&gt;&amp;nbsp;contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.&lt;/p&gt; 
&lt;p&gt;You can also&amp;nbsp;follow the Developer Advocacy team to see the latest news, follow conversations, and interact with the team.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Julian Wood:&amp;nbsp;&lt;a href="https://twitter.com/julian_wood" target="_blank" rel="noopener noreferrer"&gt;@julian_wood&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/julianrwood/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/julianrwood/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Eric Johnson:&amp;nbsp;&lt;a href="https://twitter.com/edjgeek" target="_blank" rel="noopener noreferrer"&gt;@edjgeek&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/singledigit/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/singledigit/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Erik Hanchet: &lt;a href="https://x.com/ErikCH" target="_blank" rel="noopener noreferrer"&gt;@ErikCH&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/erikhanchett/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/erikhanchett/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Salih Gueler: &lt;a href="https://x.com/salihgueler" target="_blank" rel="noopener noreferrer"&gt;@salihgueler&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/salihgueler/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/salihgueler/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Marcia Villalba:&amp;nbsp;&lt;a href="https://twitter.com/mavi888uy/" target="_blank" rel="noopener noreferrer"&gt;@mavi888uy&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/marciavillalba" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/marciavillalba&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;And finally, visit &lt;a href="http://serverlessland.com/" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;&amp;nbsp; for your serverless needs.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>AWS Outposts monitoring and reporting: A comprehensive Amazon EventBridge solution</title>
		<link>https://aws.amazon.com/blogs/compute/aws-outposts-monitoring-and-reporting-a-comprehensive-amazon-eventbridge-solution/</link>
					
		
		<dc:creator><![CDATA[Matt Price]]></dc:creator>
		<pubDate>Tue, 14 Apr 2026 16:18:12 +0000</pubDate>
				<category><![CDATA[Amazon DynamoDB]]></category>
		<category><![CDATA[Amazon Elastic Block Store (Amazon EBS)]]></category>
		<category><![CDATA[Amazon EventBridge]]></category>
		<category><![CDATA[Amazon RDS]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Organizations]]></category>
		<category><![CDATA[AWS Outposts]]></category>
		<category><![CDATA[AWS Outposts rack]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Resource Access Manager (RAM)]]></category>
		<guid isPermaLink="false">60eb57ed8879462a862a621ab1a93ec42341ab0d</guid>

					<description>Organizations using AWS Outposts racks commonly manage capacity from a single AWS account and share resources through AWS Resource Access Manager (AWS RAM) with other AWS accounts (consumer accounts) within AWS Organizations. In this post, we demonstrate one approach to create a multi-account serverless solution to surface costs in shared AWS Outposts environments using Amazon […]</description>
										<content:encoded>&lt;p&gt;Organizations using &lt;a href="https://aws.amazon.com/outposts/rack/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts racks&lt;/a&gt; commonly manage capacity from a single AWS account and share resources through &lt;a href="https://aws.amazon.com/ram/" target="_blank" rel="noopener noreferrer"&gt;AWS Resource Access Manager&lt;/a&gt; (AWS RAM) with other AWS accounts (consumer accounts) within &lt;a href="https://aws.amazon.com/organizations/" target="_blank" rel="noopener noreferrer"&gt;AWS Organizations&lt;/a&gt;. In this post, we demonstrate one approach to create a multi-account serverless solution to surface costs in shared AWS Outposts environments using &lt;a href="https://aws.amazon.com/eventbridge/" target="_blank" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt;, &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt;. This solution reports on instance runtime and allocated storage for &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt;, &lt;a href="https://aws.amazon.com/rds" target="_blank" rel="noopener noreferrer"&gt;Amazon Relational Database Services (Amazon RDS)&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/ebs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Block Store (Amazon EBS)&lt;/a&gt; services running on Outposts racks. In turn, teams can track the cost of infrastructure associated with their workloads across AWS accounts. This solution is a framework that can be customized to meet your organization’s specific business objectives.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;The following is the &lt;a href="https://developer.hashicorp.com/terraform" target="_blank" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt;-based reference architecture used to represent the solution, including EventBridge, DynamoDB, and Lambda across a multi-account environment. Relevant launch events are tracked in EventBridge that invoke Lambda functions, which are logged in DynamoDB tables (&lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports" target="_blank" rel="noopener noreferrer"&gt;see sample code&lt;/a&gt;). This allows reporting on captured event data through the &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/index.html" target="_blank" rel="noopener noreferrer"&gt;AWS SDK for Python (Boto3)&lt;/a&gt;.&amp;nbsp;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25970" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-1.png" alt="AWS architecture diagram showing data collection and workload account integration with EventBridge, CloudTrail, and Outposts" width="1280" height="720"&gt;&lt;/a&gt;&lt;br&gt; &lt;em&gt;Figure 1: Reference architecture for reporting solution on AWS Outposts&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;The following prerequisites are necessary to implement this solution:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;At least two active AWS accounts in the same &lt;a href="https://aws.amazon.com/organizations/" target="_blank" rel="noopener noreferrer"&gt;AWS Organization&lt;/a&gt; as the Outposts owner account. 
  &lt;ul&gt; 
   &lt;li&gt;One AWS account, which is the data collection account to store the event data (this doesn’t have to be the account that owns the Outposts).&lt;/li&gt; 
   &lt;li&gt;Workload accounts where resources are deployed on Outposts.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt; installed and configured on an administrative instance. For more information, see &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html" target="_blank" rel="noopener noreferrer"&gt;Installing, updating, and uninstalling the AWS CLI &lt;/a&gt;in the AWS CLI documentation.&lt;/li&gt; 
 &lt;li&gt;Terraform installed on the same administrative instance. For more information, see the &lt;a href="https://learn.hashicorp.com/tutorials/terraform/install-cli" target="_blank" rel="noopener noreferrer"&gt;Terraform documentation&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Make sure that you have the necessary &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; permissions necessary to create the AWS resources using Terraform in all accounts.&lt;/li&gt; 
 &lt;li&gt;Prior Experience with Terraform deployments on AWS Cloud. To increase your familiarity, you can explore &lt;a href="https://learn.hashicorp.com/collections/terraform/aws-get-started" target="_blank" rel="noopener noreferrer"&gt;Get Started – AWS&lt;/a&gt; on the HashiCorp website.&lt;/li&gt; 
 &lt;li&gt;Access to clone the &lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts Monitoring and Reporting&lt;/a&gt; git repository.&lt;/li&gt; 
 &lt;li&gt;SDK for Python installed and configured on a local machine.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Walkthrough&lt;/h2&gt; 
&lt;p&gt;The following sections walk you through how to deploy this solution.&lt;/p&gt; 
&lt;h3&gt;Deploying in data collection account&lt;/h3&gt; 
&lt;p&gt;Step 1: Create a bucket in-Region to hold the Terraform state file in the data collection account.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws s3 mb s3://state-bucket-name&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 2:&amp;nbsp;Clone the repository.On your local machine, clone the repository that contains the sample by running the following command:&lt;/p&gt; 
&lt;p&gt;git clone &lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports.git" target="_blank" rel="noopener noreferrer"&gt;https://github.com/aws-samples/sample-outposts-monitoring-and-reports.git&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Navigate to the cloned repository by running the following command:cd sample-outposts-monitoring-and-reports/data_collection&lt;/p&gt; 
&lt;p&gt;Step 3: Edit the providers.tf to configure the AWS provider.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;

provider "aws" {
&amp;nbsp;&amp;nbsp;region = ""
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 4: Edit the backend.tf to provide the Terraform state bucket and Outposts anchored &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Region&lt;/a&gt;.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;terraform {
&amp;nbsp;&amp;nbsp;backend "s3" {
&amp;nbsp;&amp;nbsp; &amp;nbsp;bucket = ""
&amp;nbsp;&amp;nbsp; &amp;nbsp;key &amp;nbsp; &amp;nbsp;= "terraform.tfstate"
&amp;nbsp;&amp;nbsp; &amp;nbsp;region = ""
&amp;nbsp;&amp;nbsp;}
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 5: Modify the variables.tf.From the root directory of the cloned repository, modify the variables.tf file with the target Region and workload accounts as shown in the following example. The target Region is the collection destination.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-typescript"&gt;variable "aws_region" {
&amp;nbsp;&amp;nbsp;description = "AWS region for resources"
&amp;nbsp;&amp;nbsp;type &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= string
&amp;nbsp;&amp;nbsp;default &amp;nbsp; &amp;nbsp; = ""
}

variable "allowed_account_id" {
&amp;nbsp;&amp;nbsp;description = "AWS account ID allowed to put events to the event bus"
&amp;nbsp;&amp;nbsp;

}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Initialize the configuration directory of the data collection account to download and install the providers defined in the configuration by running the following command:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform init&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;All resources are deployed with minimal permissions to serve as an example. We recommend viewing all configurations to make sure that they meet your organizational security policies.&amp;nbsp;Step 6: Deploy infrastructure in the data collection account.Run terraform plan on the configuration to and review which resources are created:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform plan&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;When you have reviewed the plan, run the following command and enter “yes” to accept the changes and deploy:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform apply&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Deployment should take less than 5 minutes. If you receive any errors, review the previously mentioned steps to ensure that you followed them in their entirety. If the errors persist, reach out to AWS Support for additional guidance.&lt;/p&gt; 
&lt;h3&gt;Deploying in workload account&lt;/h3&gt; 
&lt;p&gt;The data collection account receives events from EventBridge and performs intelligent analysis and storage from the AWS Outposts resource data.Step 1: Navigate to the workload account directory by running the following command:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;cd ../workload_account&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 2: Edit variables.tf to set up the Region and event bus &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Name (ARN).&amp;nbsp;&lt;/a&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-typescript"&gt;variable "aws_region" {
&amp;nbsp;&amp;nbsp;description = "AWS region for resources"
&amp;nbsp;&amp;nbsp;type &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= string
&amp;nbsp;&amp;nbsp;default &amp;nbsp; &amp;nbsp; = ""
}

variable "event_bus_arn" {
&amp;nbsp;&amp;nbsp;description = "target event bus arn"
&amp;nbsp;&amp;nbsp;type &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= string
&amp;nbsp;&amp;nbsp;default &amp;nbsp; &amp;nbsp; = ""
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Edit the code to update the event bus name.&lt;/p&gt; 
&lt;p&gt;Step 3: Run the following command to create the backend.tf and create the Terraform state bucket for each workload account.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;./init-backend.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This is an idempotent operation that creates a file from the template and a bucket with a fixed name including the account ID if it doesn’t exist.&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;Step 4:&amp;nbsp;Initialize the configuration directory of the Data Collection Account to download and install the providers defined in the configuration by running the following command:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform init&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 5: Deploy the infrastructure in the Data Collection Account.Run a terraform plan on the configuration and review which resources are created:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform plan&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After you have reviewed the plan, run the following command and enter “yes” to accept the changes and deploy:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform apply&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Deployment should take less than 5 minutes. If you receive any errors, follow the troubleshooting steps in the previous section.&lt;/p&gt; 
&lt;p&gt;At this point, any Amazon EC2 or Amazon RDS instances and Amazon EBS volumes are logged to the DynamoDB tables in the data collection account. Repeat Steps 3–5 for each workload account running resources on AWS Outposts with appropriate account credentials. If you’re deploying at scale and using &lt;a href="https://docs.aws.amazon.com/controltower/latest/userguide/what-is-control-tower.html" target="_blank" rel="noopener noreferrer"&gt;AWS Control Tower&lt;/a&gt; consider using &lt;a href="https://docs.aws.amazon.com/controltower/latest/userguide/aft-overview.html" target="_blank" rel="noopener noreferrer"&gt;AWS Control Tower Account Factory for Terraform (AFT)&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Running monthly reports&lt;/h2&gt; 
&lt;p&gt;With this solution in place, reports can be generated on demand. These reports can be customized by modifying the Python example scripts shown to support your needs. Reports can be created from a local machine with credentials that have access to the DynamoDB tables in the data collection account. The examples were created from the source directory of the data collection account git repository.&amp;nbsp;Run the following command to view the report for Amazon RDS usage in September 2025:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;./rds_runtime_calculator.py --year 2025 --month 9 --output rds_report.csv&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-2.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25971" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-2.png" alt="Spreadsheet showing RDS database instances with configuration details, storage allocation, and operational status in us-west-2 region" width="1519" height="155"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 2: Example of RDS runtime report&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;Run the following command to view the report for Amazon EBS usage in September 2025:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;./ebs_volume_reporter.py --year 2025 --month 9 --output ebs_report.csv&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-4.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25973" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-4.png" alt="EBS volume tracking table showing volume configurations, lifecycle hours, and active/deleted status in us-west-2" width="1431" height="95"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 3: Example of EBS usage report&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;Run the following command to view the report for Amazon EC2 usage in September 2025:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;./ec2_runtime_calculator.py --month 9 --year 2025 --output ec2_report.csv&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-6.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25975" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-6.png" alt="EC2 instance tracking table showing c5.large instances with runtime hours and running/stopped status on AWS Outposts" width="1431" height="139"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 4: Example of EC2 runtime report&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;h2&gt;Cleaning up&lt;/h2&gt; 
&lt;p&gt;Complete the following steps to clean up the resources that were deployed by this solution. For each workload account, complete the following:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;cd sample-outposts-monitoring-and-reports/workload_account&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform destroy &lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Enter “yes” to proceed. You can then manually empty and remove the terraform state S3 bucket for that account.&lt;/p&gt; 
&lt;p&gt;For the data collection, complete the following:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-js"&gt;cd ../data_collection&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-js"&gt;terraform destroy&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Enter “yes” to proceed. You can then manually empty and remove the terraform state S3 bucket for that account.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Customers who have shared multi-account Outposts deployments can use this solution to create account level reporting for Outposts resources using real-time event capture and processing, state analysis and categorization, historical usage metrics, and serverless architecture.&amp;nbsp;Teams can use this to visualize and report on the costs of running their workloads on Outposts. The event-driven design supports accurate tracking while maintaining low operational overhead. The solution scales effectively across multiple Outposts and accounts, providing a unified view of hybrid infrastructure. Keep in mind that you can extend the functionality described here to meet your business objectives.&lt;/p&gt; 
&lt;p&gt;Deploy this solution today using the &lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; to gain financial insights to share with the tenants of your Outposts workload accounts.&amp;nbsp;Reach out to your AWS account team, or fill out &lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;this form&lt;/a&gt; to learn more about Outposts.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building Memory-Intensive Apps with AWS Lambda Managed Instances</title>
		<link>https://aws.amazon.com/blogs/compute/building-memory-intensive-apps-with-aws-lambda-managed-instances/</link>
					
		
		<dc:creator><![CDATA[Guy Haddad]]></dc:creator>
		<pubDate>Fri, 10 Apr 2026 19:54:44 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Simple Storage Service (S3)]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Amazon S3]]></category>
		<category><![CDATA[AWS Compute]]></category>
		<guid isPermaLink="false">c4d2a0fd8a069c4ff4c99146159ea8e803cf7d0e</guid>

					<description>Building memory-intensive applications with AWS Lambda just got easier. AWS Lambda Managed Instances gives you up to 32 GB of memory—3x more than standard AWS Lambda—while maintaining the serverless experience you know. Modern applications increasingly require substantial memory resources to process large datasets, perform complex analytics, and deliver real-time insights for use cases such as […]</description>
										<content:encoded>&lt;p&gt;Building memory-intensive applications with &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; just got easier. &lt;a href="https://aws.amazon.com/lambda/lambda-managed-instances/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances&lt;/a&gt; gives you up to 32 GB of memory—3x more than standard AWS Lambda—while maintaining the serverless experience you know. Modern applications increasingly require substantial memory resources to process large datasets, perform complex analytics, and deliver real-time insights for use cases such as in-memory analytics, Machine Learning (ML) model inference, and real-time semantic search. AWS Lambda Managed Instances gives you a familiar serverless programming model and experience combined with the flexibility of being able to choose the underlying &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt; instance types and providing developers with access to large memory configurations.&lt;/p&gt; 
&lt;p&gt;In this post, you will see how AWS Lambda Managed Instances enables memory-intensive workloads that were previously challenging to run in serverless environments, using an AI-powered customer analytics application as a practical example. You’ll see cost savings of up to 33% compared to standard Lambda for predictable workloads, while eliminating the operational overhead of managing EC2 instances.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Understanding AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances runs your AWS Lambda functions on the Amazon EC2 instance types of your choice in your account, including &lt;a href="https://aws.amazon.com/ec2/graviton/" target="_blank" rel="noopener noreferrer"&gt;Graviton4&lt;/a&gt; and memory-optimized instance types. AWS handles underlying infrastructure lifecycle including provisioning, scaling, patching, and routing, while you benefit from Amazon EC2 pricing advantages like &lt;a href="https://aws.amazon.com/savingsplans/" target="_blank" rel="noopener noreferrer"&gt;Savings Plans&lt;/a&gt; and &lt;a href="https://aws.amazon.com/aws-cost-management/aws-cost-optimization/reserved-instances/" target="_blank" rel="noopener noreferrer"&gt;Reserved Instances&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Key benefits include:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Flexible instance selection:&lt;/strong&gt; Choose from compute-optimized (C), general-purpose (M), and memory-optimized (R) instance families&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Configurable memory-CPU ratios:&lt;/strong&gt; Optimize resource allocation for your workload&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Multi-concurrent invocations:&lt;/strong&gt; One execution environment handles multiple invocations simultaneously, improving utilization for I/O-heavy applications&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Dynamic scaling:&lt;/strong&gt; Instances scale based on CPU utilization without cold starts&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;AWS Lambda Managed Instances is best suited for high-volume, predictable workloads that benefit from sustained compute capacity and larger memory configurations.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Memory-Intensive Workloads Work Best with AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;This blog focuses on one of AWS Lambda Managed Instances’ most powerful capabilities: running memory-intensive workloads that require more than the standard AWS Lambda’s 10 GB memory and 250MB ZIP limits. Here are the use cases where AWS Lambda Managed Instances helps:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;In-Memory Analytics&lt;/strong&gt; — Load gigabytes of structured data into memory at initialization and serve sub-millisecond analytical queries across thousands of invocations&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ML Model Inference&lt;/strong&gt; — Keep large model weights resident in memory across invocations for consistent, low-latency inference without a dedicated endpoint.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Real-Time Semantic Search&lt;/strong&gt; — Build vector similarity search over large embedding indexes held entirely in memory, enabling natural language queries over millions of records without an external vector database.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Graph Processing&lt;/strong&gt; — Hold large graph structures in memory for traversal algorithms that require the full graph to be accessible at once.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scientific &amp;amp; Numerical Computing&lt;/strong&gt; — Run simulations, Monte Carlo methods, and large matrix operations that require substantial working memory and benefit from memory-optimized Amazon EC2 instance families.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Large-Scale Report Generation&lt;/strong&gt; — Aggregate and transform multi-gigabyte datasets in memory to generate complex reports or dashboards on demand, without staging data through intermediate storage.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Use Case: AI-Powered Customer Analytics with AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;To demonstrate the power of AWS Lambda Managed Instances for memory-intensive applications, we built an AI-Powered Customer Analytics application that combines in-memory data processing with ML-based semantic search. The application loads in memory 1 million customer behavioral records (sessions, purchases, browsing patterns) from a Parquet file in S3 into a Pandas DataFrame and an embeddings cache consuming 200MB, then responds for analytics queries:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Customer Analysis&lt;/strong&gt; — Deep-dive into individual customer behavior: engagement scores, conversion rates, purchase patterns, and AI-generated customer segments&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Semantic Search&lt;/strong&gt; — Natural language queries powered by FastEmbed (sentence-transformers/all-MiniLM-L6-v2) that find similar customers using vector similarity&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cohort Analysis&lt;/strong&gt; — Real-time segmentation by device, country, age group with aggregated metrics&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;&lt;strong&gt;Architecture Overview&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Our AI-powered customer analytics application demonstrates this in practice: 1 million records in memory (200MB), a compact sentence transformer model for semantic search, sub-second query performance, and zero infrastructure to manage. The solution uses a simple, serverless architecture:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Customer transaction data (Parquet format) is stored in Amazon S3&lt;/li&gt; 
 &lt;li&gt;Amazon Cognito User Pool authenticates users and issues JWT tokens for API access&lt;/li&gt; 
 &lt;li&gt;Amazon API Gateway routes requests with Cognito authorizer validation, rate limiting (5 requests/second, burst 10), X-Ray tracing, and access logging&lt;/li&gt; 
 &lt;li&gt;AWS Lambda function with AWS Lambda Managed Instances loads the entire dataset (200MB) and all-MiniLM-L6-v2 model (900MB) into memory during initialization while also performing a threaded embeddings cache generation. This step can consume about 14GB of the allocated memory, exceeding standard AWS Lambda’s 10 GB limit&lt;/li&gt; 
 &lt;li&gt;Analytics queries execute against the in-memory data using the model&lt;/li&gt; 
 &lt;li&gt;Results are returned in milliseconds for interactive analysis&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26050" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-1.png" alt="Architecture diagram" width="1566" height="718"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Deploy the Application&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;The below steps walk you through deploying the application to AWS using the AWS Serverless Application Model (SAM). The deployment process packages your Lambda function code, uploads artifacts to Amazon S3, and provisions all required AWS resources including Lambda functions, IAM roles, and any configured VPC networking via AWS CloudFormation.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Make sure you have the following tools installed locally:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS CLI&lt;/a&gt; configured with credentials&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;SAM CLI&lt;/a&gt; installed&lt;/li&gt; 
 &lt;li&gt;Python 3.13+ installed locally&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://www.docker.com/" target="_blank" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; or &lt;a href="https://runfinch.com/" target="_blank" rel="noopener noreferrer"&gt;Finch&lt;/a&gt; (required for container builds)&lt;/li&gt; 
 &lt;li&gt;AWS account with appropriate permissions&lt;/li&gt; 
 &lt;li&gt;A VPC with at least 2 subnets (across different Availability Zones) and a security group — required for the Lambda Managed Instances capacity provider&lt;/li&gt; 
 &lt;li&gt;Supported regions: Check &lt;a href="https://builder.aws.com/capabilities/" target="_blank" rel="noopener noreferrer"&gt;AWS Capabilities by Region&lt;/a&gt; for supported regions&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Getting Started&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The complete source code for this application is available in our GitHub repository. To deploy it yourself follow the below steps and refer to the full &lt;a href="https://github.com/aws-samples/sample-lambda-managed-instances-analytics"&gt;deployment instructions&lt;/a&gt; hosted on GitHub.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;1. Clone the repository&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;git clone &lt;a href="https://github.com/aws-samples/sample-lambda-managed-instances-analytics.git"&gt;https://github.com/aws-samples/sample-lambda-managed-instances-analytics.git&lt;/a&gt;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;2. Navigate to the project folder&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;cd sample-lambda-managed-instances-analytics&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;chmod +x setup-data.sh deploy-lambda.sh&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;3. Generate sample data and upload to S3&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;./setup-data.sh&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;This script will create an S3 bucket (if needed), generate 1M rows of sample data, and upload the data to S3.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;4. Build and deploy the Lambda function&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;./deploy-lambda.sh&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;This script will build the container image with FastEmbed, push it to ECR, and deploy the Lambda function along with Capacity Provider, API Gateway, and Cognito User Pool. After deployment, it automatically generates the UI authentication configuration and prompts you to create a test user.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-2.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26051" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-2.png" alt="SAM template" width="484" height="221"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-3.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26052" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-3.png" alt="Capacity provider configuration" width="1071" height="430"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Run the Application&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;1. Start the UI&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The application includes a simple HTML-based UI through which you can test the AWS Lambda function using Amazon API Gateway:&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;cd ui &amp;amp;&amp;amp; python3 -m http.server 8000&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;2. Open your browser at &lt;a href="http://localhost:8000" target="_blank" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt; and click ‘Sign In’ to authenticate via Cognito using the username/password that you created during deployment&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-4.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26053" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-4.png" alt="Starting the UI" width="2232" height="256"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;3. Enter your API endpoint URL. Test connection and click system Info.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-5-2.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26054" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-5-2.png" alt="Testing the connection" width="2230" height="1206"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test the Application&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;a. Customer Analysis&lt;/strong&gt; — Enter one or more User IDs to get more information on the customer behavior: engagement scores, conversion rates, purchase patterns, and AI-generated customer segments&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-6.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26055" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-6.png" alt="Running customer analysis" width="1240" height="798"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;b. Semantic Search – &lt;/strong&gt;Enter natural language queries like “list high value customers from USA” in the Semantic Search and verify the results. Note that the response is very fast as the analytics data and FastEmbed models are loaded into memory during init stage&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-7-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26056" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-7-1.png" alt="Running semantic search" width="1240" height="798"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;c. Cohort Analysis&lt;/strong&gt; — Enter the query data to get Real-time segmentation by device, country, age group with aggregated metrics&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-8-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26057" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-8-1.png" alt="Running cohort analysis" width="1227" height="833"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;AWS Lambda Managed Instances automatically publishes metrics to Amazon CloudWatch, giving you visibility into function performance and capacity utilization. Monitor &lt;strong&gt;InitDuration&lt;/strong&gt; to track dataset and model load time at startup, &lt;strong&gt;MaxMemoryUsed&lt;/strong&gt; to confirm your data fits within configured memory, and &lt;strong&gt;ProvisionedConcurrencySpilloverInvocations&lt;/strong&gt; to detect when AWS Lambda Managed Instances capacity is exhausted.&lt;/p&gt; 
&lt;p&gt;Enable &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights.html" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Lambda Insights&lt;/strong&gt;&lt;/a&gt; for enhanced per-invocation metrics including CPU time and memory utilization over time. Use &lt;strong&gt;Amazon CloudWatch Log Insights&lt;/strong&gt; to query INIT_START, INIT_END, and REPORT log entries for initialization and memory details per invocation.&lt;/p&gt; 
&lt;h2&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-9-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26058" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-9-1.png" alt="AWS Lambda Insights" width="1660" height="735"&gt;&lt;/a&gt;&lt;/h2&gt; 
&lt;h2&gt;&lt;strong&gt;What Makes This Better with AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Without AWS Lambda Managed Instances, building this same application would require one of these alternatives:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Option A: EC2 with auto-scaling&lt;/strong&gt; — Full control, full responsibility: patching, scaling policies, load balancing, and deployment pipelines — all on you.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Option B: Redesign for standard Lambda&lt;/strong&gt; — Swap in-memory data for an external database and replace the ML model with &lt;a href="https://aws.amazon.com/sagemaker/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker&lt;/a&gt; endpoint. More latency, more cost, more complexity.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;With AWS Lambda Managed Instances, you write a single AWS Lambda function, define a Capacity Provider, and deploy with SAM. AWS Lambda handles the Amazon EC2 instances, scaling, and lifecycle, giving you the memory you need with the operational simplicity you want. The in-memory approach eliminates network latency and disk I/O, delivering consistent sub-200ms response times for complex analytics.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Cost Considerations &lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances uses Amazon EC2-based pricing with a management fee. For predictable workloads, you can leverage Amazon EC2 Savings Plans or Reserved Instances to reduce costs significantly.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Example cost comparison&lt;/strong&gt; (us-east-1, 32 GB memory, 1M invocations/month):&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;AWS Lambda (standard):&lt;/strong&gt; ~$267/month (on-demand pricing)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AWS Lambda Managed Instances:&lt;/strong&gt; ~$180/month (with 1-year Compute Savings Plan)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Savings:&lt;/strong&gt; 33% reduction&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The cost benefits increase with higher memory configurations and sustained workloads that can take advantage of Amazon EC2 pricing discounts.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Best Practices&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Based on experience building this solution, here are key recommendations:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Memory sizing:&lt;/strong&gt; Start with your dataset size plus 50% overhead for processing. Monitor Amazon CloudWatch metrics to optimize.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Initialization strategy:&lt;/strong&gt; Load large datasets during the init phase to amortize the cost across multiple invocations.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Concurrency configuration:&lt;/strong&gt; Set PerExecutionEnvironmentMaxConcurrency based on your workload’s I/O characteristics. Higher values work well for I/O-bound analytics.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data format:&lt;/strong&gt; Use columnar formats like Parquet for efficient memory usage and fast loading.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Monitoring:&lt;/strong&gt; Track initialization duration, memory utilization, and invocation latency in Amazon CloudWatch to identify optimization opportunities.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Cleanup&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;When you’re done exploring the solution, it’s good practice to remove all provisioned resources to avoid ongoing charges. For the full cleanup commands and exact steps, refer to the project’s README.md in GitHub repository.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances opens up a new class of serverless applications that support larger AWS Lambda layer packages and more memory. Memory-intensive workloads — in-memory analytics, ML inference, graph processing, scientific computing — can now run with the simplicity of AWS Lambda and the resources of Amazon EC2. The customer analytics example demonstrates how in-memory processing with AWS Lambda Managed Instances delivers performance improvements over traditional database queries while maintaining serverless benefits like automatic scaling and pay-per-use pricing.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Ready to get started?&lt;/strong&gt; Explore the &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances documentation&lt;/a&gt; and try building your own memory-intensive serverless application. You can find the complete code for &lt;a href="https://github.com/aws-samples/sample-lambda-managed-instances-analytics"&gt;this example on GitHub&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2</title>
		<link>https://aws.amazon.com/blogs/compute/accelerate-cpu-based-ai-inference-workloads-using-intel-amx-on-amazon-ec2/</link>
					
		
		<dc:creator><![CDATA[Santosh Kumar]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 16:43:10 +0000</pubDate>
				<category><![CDATA[*Post Types]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[PyTorch on AWS]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">21db657322c27b28f881000b3cc565d6157c04e7</guid>

					<description>This post shows you how to accelerate your AI inference workloads by up to 76% using Intel Advanced Matrix Extensions (AMX) – an accelerator that uses specialized hardware and instructions to perform matrix operations directly on processor cores – on Amazon Elastic Compute Cloud (Amazon EC2) 8th generation instances. You'll learn when CPU-based inference is cost-effective, how to enable AMX with minimal code changes, and which configurations deliver optimal performance for your models.</description>
										<content:encoded>&lt;p&gt;This post shows you how to accelerate your AI inference workloads by up to 76% using &lt;a href="https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/what-is-intel-amx.html" target="_blank" rel="noopener noreferrer"&gt;Intel Advanced Matrix Extensions (AMX)&lt;/a&gt; – an accelerator that uses specialized hardware and instructions to perform matrix operations directly on processor cores – on &lt;a href="https://aws.amazon.com/pm/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt; 8th generation instances. You’ll learn when CPU-based inference is cost-effective, how to enable AMX with minimal code changes, and which configurations deliver optimal performance for your models.&lt;/p&gt; 
&lt;p&gt;Many organizations find that CPU-based inference is more suitable for their production Artificial Intelligence/Machine Learning (AI/ML) workloads after evaluating factors like cost, operational complexity, and infrastructure compatibility. As more organizations deploy AI solutions, improving how models run on standard CPUs has become a critical cost control strategy for workloads where CPU inference provides the right balance of performance and economics.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://my.idc.com/getdoc.jsp?containerId=prUS52530724" target="_blank" rel="noopener noreferrer"&gt;IDC&lt;/a&gt;, a global market intelligence and advisory firm, projects that worldwide AI spending will reach $632 billion by 2028, growing at a 29% compound annual growth rate from 2024, with inference costs representing a significant portion of operational expenses. &lt;a href="https://www.deloitte.com/us/en/about/press-room/deloitte-2026-tmt-predictions.html" target="_blank" rel="noopener noreferrer"&gt;Deloitte&lt;/a&gt;, a leading professional services firm specializing in technology consulting and research, forecasts that inference – the running of AI models – will make up two-thirds of all AI compute by 2026, far exceeding initial training costs. This makes optimizing AI/ML inference on CPU crucial for controlling long-term AI/ML operational expenses.&lt;/p&gt; 
&lt;p&gt;At the core of AI inference workloads are matrix multiplication operations – the mathematical foundation of neural networks that drives computational demand. These matrix-heavy operations create a performance bottleneck for CPU-based inference, resulting in suboptimal performance for AI/ML workloads. This creates three key challenges for organizations: balancing cost optimization with performance requirements, meeting real-time latency demands, and scaling efficiently with variable workload demands. Intel’s Advanced Matrix Extensions (AMX) technology addresses these challenges by accelerating matrix operations directly on CPU cores, making CPU-based inference competitive and cost-effective.&lt;/p&gt; 
&lt;h3&gt;AMX capabilities and architecture&lt;/h3&gt; 
&lt;p&gt;AMX supports multiple data formats including &lt;a href="https://www.intel.com/content/www/us/en/content-details/671279/bfloat16-hardware-numerics-definition.html" target="_blank" rel="noopener noreferrer"&gt;BF16&lt;/a&gt; which preserves the range of 32-bit floating point operations in half the space, INT8 maximizes throughput when accuracy can be slightly compromised, and FP16 offers a balance between the two. This flexibility lets you match precision to your specific needs.&lt;/p&gt; 
&lt;p&gt;Introduced in 2023 with 4th Generation &lt;a href="https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html" target="_blank" rel="noopener noreferrer"&gt;Intel Xeon Scalable processors&lt;/a&gt;, AMX consists of eight 1KB tile registers (specialized on-chip memory for matrix data) and a Tile Matrix Multiply Unit (TMUL – dedicated hardware for matrix calculations) that enables processors to perform 2048 INT8 operations or 1024 BF16 operations per cycle. These tile registers provide efficient matrix storage, reducing memory access overhead and improving computational efficiency for matrix operations central to neural networks.&amp;nbsp;For real-world customer workloads, this translates to significantly faster inference times for &lt;a href="https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/" target="_blank" rel="noopener noreferrer"&gt;transformer&lt;/a&gt; models, recommendation systems, and natural language processing tasks, while reducing the total cost of ownership through improved resource utilization and lower infrastructure requirements.&lt;/p&gt; 
&lt;div id="attachment_25812" style="width: 567px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/1-ComputeBlog-2473-AMX-Architecture.png"&gt;&lt;img aria-describedby="caption-attachment-25812" loading="lazy" class=" wp-image-25812" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/20/1-ComputeBlog-2473-AMX-Architecture.png" alt="Architecture diagram of Intel Advanced Matrix Extensions (AMX) showing the key components: Intel Xeon CPU with AMX support, tile architecture with 8 tiles of 1 KiB each as 2D registers, Tile Matrix Multiply Unit (TMUL) with data flow between them, supported data types (BF16, INT8, FP16), and AMX instruction categories (Configuration, Data Management, Operations)" width="557" height="453"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25812" class="wp-caption-text"&gt;Figure 1: AMX Architecture showing AMX tile registers, processing units, and data flow within CPU core&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note: &lt;/em&gt;&lt;/strong&gt;&lt;em&gt;AMX operations, including tile setup and memory-to-tile data movement (which are handled automatically by the system), introduce small overhead that may outweigh benefits for smaller models or single-batch processing where insufficient matrix operations cannot amortize these costs, making batch size optimization critical for performance gains.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;When to choose CPU inference with AMX&lt;/h2&gt; 
&lt;p&gt;CPU inference with AMX acceleration benefits workloads including:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Batch processing and traditional ML&lt;/strong&gt;: Content summarization, recommendation systems, and analytical workloads benefit from CPU’s cost efficiency and ability to handle sparse data structures and branching logic.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Small to medium-sized models: &lt;/strong&gt;Models under 7B parameters and batch sizes of 8-16 samples achieve excellent performance through optimized threading, making CPUs ideal for applications like fraud detection and chatbots.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Variable demand workloads&lt;/strong&gt;: E-commerce systems and applications with unpredictable traffic patterns can quickly scale CPU instances up or down based on demand, avoiding the fixed costs of dedicated accelerator hardware that sits idle during low-traffic periods.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Complex business logic&lt;/strong&gt;: Applications like financial risk assessment and content moderation that need to combine ML predictions with business rules and conditional logic work well on CPUs, which handle mixed workloads better than specialized accelerators.&lt;/p&gt; 
&lt;h2&gt;Implementation: AMX optimization with PyTorch&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://pytorch.org/" target="_blank" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt;, a popular open-source machine learning framework, includes built-in Intel optimizations through &lt;a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/onednn.html" target="_blank" rel="noopener noreferrer"&gt;oneDNN&lt;/a&gt; (Intel’s Deep Neural Network library) that automatically use AMX when available. Setup requires installing dependencies and configuring environment variables for optimal performance.&lt;/p&gt; 
&lt;h3&gt;Install dependencies&lt;/h3&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;# Install transformers and torch
pip install torch transformers&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Configure environment variables&lt;/h3&gt; 
&lt;p&gt;These environment variables tell oneDNN library how to optimize your inference workload for AMX.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Enable AMX instruction set (tells oneDNN to use AMX tiles for matrix operations): &lt;pre&gt;&lt;code class="lang-bash"&gt;export DNNL_MAX_CPU_ISA=AVX512_CORE_AMX&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Optimize thread affinity (binds threads to CPU cores for better cache performance): &lt;pre&gt;&lt;code class="lang-bash"&gt;export KMP_AFFINITY=granularity=fine,compact,1,0&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Use all available CPU cores for parallel processing: &lt;pre&gt;&lt;code class="lang-bash"&gt;export OMP_NUM_THREADS=$(nproc)&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Cache compiled kernels (avoids recompilation overhead on subsequent runs): &lt;pre&gt;&lt;code class="lang-bash"&gt;export ONEDNN_PRIMITIVE_CACHE_CAPACITY=4096&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Set default precision to BF16 (enables automatic AMX acceleration): &lt;pre&gt;&lt;code class="lang-bash"&gt;export ONEDNN_DEFAULT_FPMATH_MODE=bf16&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;(Optional) Enable verbose logging to verify AMX activation: &lt;pre&gt;&lt;code class="lang-bash"&gt;export ONEDNN_VERBOSE=1&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;BF16 optimization example&lt;/h3&gt; 
&lt;p&gt;With environment variables configured, implementing BF16 optimization requires minimal to no code changes. The following example demonstrates how PyTorch automatically leverages AMX tile registers for matrix operations when BF16 precision is used.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is a simplified example for demonstration purposes; adapt the code to your specific use case and requirements.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import time

# Load model and tokenizer from HuggingFace
model_name = "google/gemma-3-1b-it"

model_revision = "dcc83ea841ab6100d6b47a070329e1ba4cf78752"
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    revision=model_revision
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    revision=model_revision
)
# Fix tokenizer padding issue for batch processing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Enable BF16 precision for automatic AMX acceleration
model = model.to(dtype=torch.bfloat16)
model.eval()  # Set to inference mode

# Inference function with BF16 autocast
def run_optimized_inference(prompts):
    inputs = tokenizer(prompts, padding=True, 
                      return_tensors="pt")  # Tokenize input
    
    with torch.no_grad():  # Disable gradients for inference
        with torch.amp.autocast('cpu',
                               dtype=torch.bfloat16):  # BF16 autocast
            outputs = model.generate(
                **inputs,
                max_length=100,     # Set maximum sequence length 
                do_sample=False     # Use greedy decoding
            )
    return outputs

# Example usage with performance measurement
prompts = ["What are the benefits of cloud computing?"]
start_time = time.time()
results = run_optimized_inference(prompts)  # Run BF16-optimized inference
elapsed_time = time.time() - start_time
tokens_generated = len(results[0]) - len(tokenizer.encode(
    prompts[0]))  # Count new tokens

# Display results and performance metrics
print(tokenizer.decode(results[0], skip_special_tokens=True))
print(f"Latency: {elapsed_time*1000:.1f}ms, "
      f"Throughput: {tokens_generated/elapsed_time:.1f} "
      f"tokens/sec")&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Performance benchmarks&lt;/h2&gt; 
&lt;p&gt;To validate AMX performance benefits, we conducted benchmarks across multiple popular language models representing different use cases and model sizes.&lt;/p&gt; 
&lt;h3&gt;Benchmarking methodology and environment&lt;/h3&gt; 
&lt;p&gt;We tested two improvements: hardware generation advances (m8i vs m7i) and AMX optimization impact (FP32 vs BF16). This shows you both upgrade paths for your workloads.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Models tested&lt;/strong&gt;: BigBird-RoBERTa-large (355M), Microsoft DialoGPT-large (762M), Google Gemma-3-1b-it (1B), DeepSeek-R1-Distill-Qwen-1.5B (1.5B), Llama-3.2-3B-Instruct (3B), YOLOv5&amp;nbsp;(tested with 30 images at ~1200×800 resolution with 5 iterations for each image)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon EC2 instance types&lt;/strong&gt;: &lt;a href="https://aws.amazon.com/ec2/instance-types/m8i/" target="_blank" rel="noopener noreferrer"&gt;m8i.4xlarge&lt;/a&gt;, &lt;a href="https://aws.amazon.com/ec2/instance-types/m7i/" target="_blank" rel="noopener noreferrer"&gt;m7i.4xlarge&lt;/a&gt; (8&lt;sup&gt;th&lt;/sup&gt; &amp;amp; 7&lt;sup&gt;th&lt;/sup&gt; gen general-purpose Amazon EC2 instances with 16 vCPUs and 64 GiB memory, both AMX-capable)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch sizes&lt;/strong&gt;: 1, 8, 32&amp;nbsp;(number of input samples processed simultaneously in a single inference call)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Iterations&lt;/strong&gt;: 5 runs per configuration&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Comparison types&lt;/strong&gt;: 
  &lt;ul&gt; 
   &lt;li&gt;Instance generation comparison (m8i vs m7i performance)&lt;/li&gt; 
   &lt;li&gt;AMX optimization impact (32-bit floating-point (FP32) vs Brain Floating Point 16 (BF16) on same instance)&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Optimizations&lt;/strong&gt;: FP32 baseline vs BF16 AMX&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Framework&lt;/strong&gt;:&amp;nbsp;PyTorch 2.8.0 (which has built-in Intel optimizations)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Region&lt;/strong&gt;: AWS us-west-2&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Measurement methodology&lt;/strong&gt;: In our benchmarks, ‘inference latency’ represents the complete model inference execution time including input tokenization and full sequence generation (for generative models) or complete forward pass (for non-generative models). Each measurement is the average of 5 iterations after warm-up iterations, excluding model loading time. We use this metric because AMX’s matrix multiplication acceleration improves performance throughout the complete forward pass.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Throughout this blog, FP32 refers to the default 32-bit floating-point precision, while BF16 refers to Brain Floating Point 16-bit precision with AMX acceleration enabled.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: Performance results are based on internal testing and may vary depending on specific workloads, configurations, and environments.&lt;/p&gt; 
&lt;h3&gt;Detailed result: BigBird-RoBERTa-large&lt;/h3&gt; 
&lt;p&gt;This benchmark represents document classification, content summarization, and text analysis workloads typical in batch processing where high throughput is desirable and offline inference scenarios where strict latency requirements are not critical.&lt;/p&gt; 
&lt;div id="attachment_25811" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/2-ComputeBlog-2473-latency-datatypeVsbatch-roberta.png"&gt;&lt;img aria-describedby="caption-attachment-25811" loading="lazy" class="wp-image-25811 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/20/2-ComputeBlog-2473-latency-datatypeVsbatch-roberta.png" alt="Bar chart comparing BigBird-RoBERTa-large inference latency between m7i and m8i instances with FP32 and BF16 precision across batch sizes 1, 8, and 32, showing 55-67% latency reduction with BF16 AMX." width="1431" height="728"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25811" class="wp-caption-text"&gt;Figure 2: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model BigBird-RoBERTa-large (355M parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25828" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/3-ComputeBlog-2473-throughput-roberta.png"&gt;&lt;img aria-describedby="caption-attachment-25828" loading="lazy" class="wp-image-25828 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/20/3-ComputeBlog-2473-throughput-roberta.png" alt="Bar chart comparing throughput for the BigBird-RoBERTa-large model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32. m8i.4xlarge achieves 4–25% higher throughput, with the largest gain at FP32 batch size 1 (25%, from 1214.29 to 1512.03 tokens/sec). BF16(AMX) batch size 1 reaches the highest overall throughput at 3391.06 tokens/sec on m8i.4xlarge with a 14 % improvement over m7i.4xlarge. Throughput gains with BF16(AMX) are smaller at larger batch sizes (4–5%), as AMX overhead limits scaling for this smaller model." width="2497" height="1274"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25828" class="wp-caption-text"&gt;Figure 3: m7i.4xlarge vs m8i.4xlarge throughput comparison for BigBird-RoBERTa-large model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25829" style="width: 2122px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/4-ComputeBlog-2473-latency-instancetypeVsbatch-roberta.png"&gt;&lt;img aria-describedby="caption-attachment-25829" loading="lazy" class="wp-image-25829 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/4-ComputeBlog-2473-latency-instancetypeVsbatch-roberta.png" alt="Bar chart comparing inference latency for bigbird-roberta-large between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 55–69% compared to FP32 across all configurations" width="2112" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25829" class="wp-caption-text"&gt;Figure 4: FP32 vs BF16 inference latency comparison for model BigBird-RoBERTa-large (355M parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;BigBird-RoBERTa-large model benchmarking demonstrates three key performance improvements. &lt;strong&gt;Figure 2&lt;/strong&gt; shows m8i hardware delivers 4-20% latency reduction across batch sizes compared to m7i for both FP32 and BF16 with AMX, providing immediate benefits without application changes. With AMX and BF16, performance gains decrease at higher batch sizes as AMX overhead exceeds benefits for smaller models like BigBird-RoBERTa-large. &lt;strong&gt;Figure 3&lt;/strong&gt; validates these improvements with corresponding 4-25% throughput gains, enabling better resource utilization for production applications. &lt;strong&gt;Figure 4&lt;/strong&gt; demonstrates that enabling AMX with BF16 optimization provides the most significant impact, reducing m8i latency by 55-67% compared to non-AMX FP32 baseline, enabling 2-3x higher processing capacity and reduced compute costs.&lt;/p&gt; 
&lt;p&gt;The analysis above demonstrates the methodology for interpreting benchmark results using BigBird-RoBERTa-large as a representative example. The remaining models (DialoGPT-large, Gemma-3-1b-it, DeepSeek-R1-Distill-Qwen-1.5B, and Llama-3.2-3B-Instruct) follow identical testing procedures and exhibit similar performance patterns, with variations primarily in the magnitude of improvements based on model size and architecture. The comprehensive analysis of five models and their performance implications are synthesized in the following section.&lt;/p&gt; 
&lt;h3&gt;Benchmarking result for additional models&lt;/h3&gt; 
&lt;p&gt;To validate AMX’s effectiveness across diverse AI workloads, we benchmarked five additional models representing different use cases and model sizes. Each model follows the same testing methodology described above, with performance patterns showing how AMX benefits vary based on model architecture, parameter count, and batch size.&lt;/p&gt; 
&lt;h4&gt;DialoGPT-large (762M) – Conversational AI&lt;/h4&gt; 
&lt;p&gt;This benchmark represents conversational AI, chatbots, and real-time dialogue systems where low latency and consistent response times are critical for user experience.&lt;/p&gt; 
&lt;div id="attachment_25808" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/5-ComputeBlog-2473-latency-datatypeVsbatch-dialogpt.png"&gt;&lt;img aria-describedby="caption-attachment-25808" loading="lazy" class="size-full wp-image-25808" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/5-ComputeBlog-2473-latency-datatypeVsbatch-dialogpt.png" alt="Bar chart comparing inference latency for the DialoGPT-large model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 9– 25% latency reduction, with the largest improvement at FP32 batch size 32 (25%)" width="1431" height="733"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25808" class="wp-caption-text"&gt;Figure 5: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model DialoGPT-large (762M parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25830" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/6-ComputeBlog-2473-throughput-dialogpt.png"&gt;&lt;img aria-describedby="caption-attachment-25830" loading="lazy" class="size-full wp-image-25830" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/6-ComputeBlog-2473-throughput-dialogpt.png" alt="Bar chart comparing throughput for the DialoGPT-large model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 10–34% higher throughput, with the largest gain at FP32 batch size 32 (34%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 355.9 tokens/sec" width="2497" height="1283"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25830" class="wp-caption-text"&gt;Figure 6: m7i.4xlarge vs m8i.4xlarge throughput comparison for DialoGPT-large model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25831" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/7-ComputeBlog-2473-latency-instancetypeVsbatch-dialogpt.png"&gt;&lt;img aria-describedby="caption-attachment-25831" loading="lazy" class="size-full wp-image-25831" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/7-ComputeBlog-2473-latency-instancetypeVsbatch-dialogpt.png" alt="Bar chart comparing inference latency for DialoGPT-large between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) increases latency at batch size 1 (negative improvement of -44% and -45%) but reduces latency at larger batch sizes, with up to 43% reduction at m7i.4xlarge batch size 32" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25831" class="wp-caption-text"&gt;Figure 7: FP32 vs BF16 inference latency comparison for model DialoGPT-large (762M parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;Gemma-3-1b-it (1B) – General Purpose&lt;/h4&gt; 
&lt;p&gt;This benchmark represents general-purpose language understanding tasks, content generation, and smaller model deployments suitable for cost-sensitive applications and variable demand workloads.&lt;/p&gt; 
&lt;div id="attachment_25805" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/8-ComputeBlog-2473-latency-datatypeVsbatch-gemma.png"&gt;&lt;img aria-describedby="caption-attachment-25805" loading="lazy" class="size-full wp-image-25805" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/8-ComputeBlog-2473-latency-datatypeVsbatch-gemma.png" alt="Bar chart comparing inference latency for the Gemma-3-1b-it model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 7– 17% latency reduction, with the largest improvement at BF16(AMX) batch size 1 (17%)" width="1431" height="730"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25805" class="wp-caption-text"&gt;Figure 8: M7i.4xlarge vs M8i.4xlarge inference latency comparison for model Gemma-3-1b-it (1B parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25832" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/9-ComputeBlog-2473-throughput-gemma-1.png"&gt;&lt;img aria-describedby="caption-attachment-25832" loading="lazy" class="size-full wp-image-25832" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/9-ComputeBlog-2473-throughput-gemma-1.png" alt="Bar chart comparing throughput for the Gemma-3-1b-it model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 7–20% higher throughput, with the largest gain at BF16(AMX) batch size 1 (20%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 127.8 tokens/sec" width="2497" height="1278"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25832" class="wp-caption-text"&gt;Figure 9: m7i.4xlarge vs m8i.4xlarge latency and throughput comparison for Gemma-3-1b-it across model batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25833" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/10-ComputeBlog-2473-latency-instancetypeVsbatch-gemma-1.png"&gt;&lt;img aria-describedby="caption-attachment-25833" loading="lazy" class="size-full wp-image-25833" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/10-ComputeBlog-2473-latency-instancetypeVsbatch-gemma-1.png" alt="Bar chart comparing inference latency for Gemma-3-1b-it between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 24–42% at larger batch sizes but slightly increases latency at m7i.4xlarge batch size 1 (-4%), with the best improvement of 42% on m8i.4xlarge at batch size 8" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25833" class="wp-caption-text"&gt;Figure 10: FP32 vs BF16 inference latency comparison for model Gemma-3-1b-it (1B parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;DeepSeek-R1-Distill-Qwen-1.5B (1.5B) – Reasoning&lt;/h4&gt; 
&lt;p&gt;This benchmark represents reasoning and analytical workloads, including complex decision-making systems, financial analysis, and applications requiring sophisticated logic processing.&lt;/p&gt; 
&lt;div id="attachment_25802" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/11-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek.png"&gt;&lt;img aria-describedby="caption-attachment-25802" loading="lazy" class="size-full wp-image-25802" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/11-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek.png" alt="Bar chart comparing inference latency for the DeepSeek-R1-Distill-Qwen-1.5B model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 7–16% latency reduction, with the largest improvements at BF16(AMX) batch sizes 1 and 8 (both 16%)" width="1431" height="730"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25802" class="wp-caption-text"&gt;Figure 11: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model DeepSeek-R1-Distill-Qwen-1.5B (1.5B parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25834" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/12-ComputeBlog-2473-throughput-deepseek.png"&gt;&lt;img aria-describedby="caption-attachment-25834" loading="lazy" class="size-full wp-image-25834" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/12-ComputeBlog-2473-throughput-deepseek.png" alt="Bar chart comparing throughput for the DeepSeek-R1-Distill-Qwen-1.5B model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 8–19% higher throughput, with the largest gains at BF16(AMX) batch sizes 1 and 8 (both 19%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 415.1 tokens/sec" width="2497" height="1278"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25834" class="wp-caption-text"&gt;Figure 12: m7i.4xlarge vs m8i.4xlarge latency and throughput comparison for DeepSeek-R1-Distill-Qwen-1.5B model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25835" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/13-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek-1.png"&gt;&lt;img aria-describedby="caption-attachment-25835" loading="lazy" class="size-full wp-image-25835" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/13-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek-1.png" alt="Bar chart comparing inference latency for DeepSeek-R1-Distill-Qwen-1.5B between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 17–68% across all configurations, with the largest improvement of 68% on m8i.4xlarge at batch size 8 and consistently strong reductions of 59–66% at larger batch sizes" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25835" class="wp-caption-text"&gt;Figure 13: FP32 vs BF16 inference latency comparison for model DeepSeek-R1-Distill-Qwen-1.5B (1.5B parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;Llama-3.2-3B-Instruct (3B) – Large model&lt;/h4&gt; 
&lt;p&gt;This benchmark represents larger model deployments for complex instruction-following tasks, advanced content generation, and applications requiring higher model capacity while maintaining cost efficiency.&lt;/p&gt; 
&lt;div id="attachment_25799" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/14-ComputeBlog-2473-latency-instancetypeVsbatch-llama.png"&gt;&lt;img aria-describedby="caption-attachment-25799" loading="lazy" class="size-full wp-image-25799" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/14-ComputeBlog-2473-latency-instancetypeVsbatch-llama.png" alt="Bar chart comparing inference latency for the Llama-3.2-3B-Instruct model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 8–15% latency reduction, with the largest improvement at FP32 batch size 8 (15%) and consistent gains of 12–14% with BF16(AMX) at smaller batch sizes" width="1431" height="730"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25799" class="wp-caption-text"&gt;Figure 14: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model Llama-3.2-3B-Instruct (3B parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25836" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/15-ComputeBlog-2473-throughput-llama.png"&gt;&lt;img aria-describedby="caption-attachment-25836" loading="lazy" class="size-full wp-image-25836" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/15-ComputeBlog-2473-throughput-llama.png" alt="Bar chart comparing throughput for the Llama-3.2-3B-Instruct model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 8– 17% higher throughput, with the largest gains at FP32 batch size 8 and BF16(AMX) batch size 1 (both 17%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 187.3 tokens/sec" width="2497" height="1278"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25836" class="wp-caption-text"&gt;Figure 15: m7i.4xlarge vs m8i.4xlarge latency and throughput comparison for Llama-3.2-3B-Instruct model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25837" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/16-ComputeBlog-2473-latency-instancetypeVsbatch-llama-1.png"&gt;&lt;img aria-describedby="caption-attachment-25837" loading="lazy" class="size-full wp-image-25837" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/16-ComputeBlog-2473-latency-instancetypeVsbatch-llama-1.png" alt="Bar chart comparing inference latency for Llama-3.2-3B-Instruct between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 24–72% across all configurations, with the largest improvements of 72% on both m8i.4xlarge batch size 8 and m7i.4xlarge batch size 8, and consistently strong reductions of 68–70% at batch size 32" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25837" class="wp-caption-text"&gt;Figure 16: FP32 vs BF16 inference latency comparison for model Llama-3.2-3B-Instruct (3B parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;Yolov5 – Computer vision model&lt;/h4&gt; 
&lt;p&gt;This benchmark represents computer vision workloads including object detection, image classification, and real-time video processing applications where consistent throughput is important for production deployments.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Instance type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt;&lt;strong&gt;Inference latency in Sec &lt;/strong&gt;(Processing time per image)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt; &lt;p&gt;&lt;strong&gt;Throughput&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;(Image processed per sec)&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;m8i.4xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.034&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.029&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;29.23&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;34.63&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;m7i.4xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.038&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.031&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;26.39&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32.28&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;m8i improvement&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;10.5%&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;6.5%&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;10.8%&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;7.3%&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Key insights:&lt;/strong&gt; m8i instances deliver 7-11% better performance than m7i across both precision formats. Combining hardware upgrade with AMX optimization, m8i with BF16 delivers up to 24% lower latency and 31% higher throughput compared to m7i with FP32.&lt;/p&gt; 
&lt;h2&gt;Benchmark result summary&lt;/h2&gt; 
&lt;p&gt;The detailed graphs above demonstrate consistent performance patterns across &lt;strong&gt;tested&lt;/strong&gt; models. Key findings:&lt;/p&gt; 
&lt;h3&gt;M8i vs M7i instance performance&lt;/h3&gt; 
&lt;p&gt;m8i instances deliver 9-14% average and up to 20% better performance than m7i across the tested models through hardware advances: up to 4.6x larger L3 cache, higher base frequencies, up to 2.5x higher &lt;a href="https://en.wikipedia.org/wiki/DDR5_SDRAM" target="_blank" rel="noopener noreferrer"&gt;DDR5&lt;/a&gt; bandwidth, and enhanced AMX execution with FP16 support.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;m8i average latency improvement*&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BigBird-RoBERTa-large (355M)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Document analysis&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DialoGPT-large (762M)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Conversational AI&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;14%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Gemma-3-1b-it (1B)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;General purpose&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DeepSeek-R1 (1.5B)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Reasoning tasks&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;11%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Llama-3.2-3B (3B)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Large model deployment&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;12%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;YOLOv5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Computer vision&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;9%&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;* Average across all tested configurations (FP32 and BF16 at batch sizes 1, 8, and 32)&lt;/p&gt; 
&lt;h3&gt;AMX acceleration impact (FP32 vs BF16)&lt;/h3&gt; 
&lt;p&gt;BF16 precision with AMX delivers 21-72% performance improvements at batch sizes of 8 and above compared to FP32 baseline on the same instance type. These results compare FP32 vs BF16 performance on m8i.4xlarge, with performance gains varying by model size and batch configuration. Larger batch sizes show greater AMX benefits.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;Model&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="3"&gt;Latency improvement (%)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 32&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BigBird-RoBERTa-large&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;55&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;67&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;63&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DialoGPT-large&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;– 44*&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;21&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;30&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Gemma-3-1b-it&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;42&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;24&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DeepSeek-R1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;24&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;68&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;59&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Llama-3.2-3B&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;27&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;72&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;68&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;* &lt;em&gt;At batch size 1, DialoGPT-large’s autoregressive decoding generates tokens sequentially, producing many small matrix operations where AMX tile setup overhead exceeds the acceleration benefit. At batch sizes 8 and above, multiple sequences are processed in parallel, creating larger matrix operations that amortize this overhead and deliver 21-30% improvement.&lt;/em&gt;&lt;/p&gt; 
&lt;h4&gt;Performance patterns by batch size&lt;/h4&gt; 
&lt;p&gt;Larger models (1B+ parameters) show consistently better AMX performance across the tested batch sizes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Batch size 1&lt;/strong&gt;: Mixed results – larger models show 6-27% improvement, smaller models may experience AMX overhead&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch size 8&lt;/strong&gt;: Strong performance gains of 21-72% across the tested models, with larger models showing greater benefits&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch size 32&lt;/strong&gt;: Significant improvements of 24-68% for most models, demonstrating AMX’s batch processing strength&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h4&gt;Batch size optimization guidelines&lt;/h4&gt; 
&lt;p&gt;AMX performance scales with batch size, with optimal range varies by model size. Performance saturates beyond batch 16 due to hardware limits including memory bandwidth and compute bottlenecks.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Model Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Performance Gain&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Recommended Batch Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&amp;lt;1B parameters&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;21-67%&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8-32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 1 results vary by architecture*&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1-2B parameters&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;42-68%&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;4-16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6-24% gains even at batch 1&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;3B+ parameters&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;27-72%&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1-8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Benefits across batch sizes&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;* Encoder models (BigBird) show 55% gains at batch 1; autoregressive models (DialoGPT) may experience overhead.&lt;/p&gt; 
&lt;h4&gt;Combined performance benefits&lt;/h4&gt; 
&lt;p&gt;When we combine AMX optimization with 8th generation instances (m8i), the performance improvements compound significantly. For example, Llama-3.2-3B-Instruct running with BF16 AMX on m8i instances can achieve up to 76% better performance compared to FP32 inference on m7i instances at optimal batch sizes (batch 8: m7i FP32 45.51s vs m8i BF16 10.93s = 76% improvement; batch 32: m7i FP32 62.60s vs m8i BF16 17.47s = 72% improvement).&lt;/p&gt; 
&lt;h3&gt;Throughput scaling&lt;/h3&gt; 
&lt;p&gt;Across the tested models, throughput (tokens/sec) increases proportionally with latency reduction. This consistent relationship demonstrates that AMX optimizations translate directly to improved inference efficiency.&lt;/p&gt; 
&lt;h3&gt;Price-Performance Analysis: Gemma-3-1b-it Model&lt;/h3&gt; 
&lt;p&gt;While m8i.4xlarge instances are priced slightly higher than m7i.4xlarge ($0.847 vs $0.806 per hour in us-west-2), they deliver superior price-performance. To illustrate the economic benefits, we analyzed cost per 1 million tokens using Gemma-3-1b-it as a representative example. M8i delivers up to 13% better price-performance over m7i through hardware generation advances, with both instances running BF16 AMX.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Batch Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Data Type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt;&lt;strong&gt;m7i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt;&lt;strong&gt;m8i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Price-Performance improvement&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;14.3&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$15.66&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;17.2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$13.67&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;13%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;71&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$3.16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;82.3&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$2.86&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;9%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;119.1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$1.88&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;127.8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$1.84&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;2%&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;Combining the hardware upgrade with BF16 AMX optimization delivers up to 44% better price-performance compared to FP32 on m7i.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Batch Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="3"&gt;&lt;strong&gt;m8i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="3"&gt;&lt;strong&gt;m7i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt; &lt;p&gt;&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Price-Performance improvement&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Data Type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Data Type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;17.2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$13.67&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;14.9&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$15.03&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;9%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;82.3&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$2.86&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;44.1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$5.08&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;44%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;127.8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$1.84&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;89.2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$2.51&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;27%&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h4&gt;Key findings from the price-performance analysis:&lt;/h4&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Combined optimization delivers up to 44% better price-performance&lt;/strong&gt;: m8i with AMX and BF16 outperforms m7i with FP32 at batch size 8 – consistent with our batch size optimization guidelines where batch sizes of 4-16 deliver optimal results for 1B models like Gemma-3-1b-it, achieving $2.86 per 1M tokens for applications like chatbots and fraud detection.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Larger batches maximize cost efficiency&lt;/strong&gt;: Batch size 32 reduces costs further to $1.84 per 1M tokens, a 27% improvement over m7i FP32 – ideal for throughput-oriented workloads like content summarization and recommendation systems where latency requirements are flexible.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Production deployment recommendation&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;BF16 AMX&lt;/strong&gt;:&amp;nbsp;Delivers 21-72% performance improvements at recommended batch sizes while maintaining model accuracy, making it suitable for production workloads including fraud detection systems, content moderation, and real-time recommendation engines&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch processing&lt;/strong&gt;: Target batch sizes of 4-16 based on your use case – smaller batches (1-4) for latency-sensitive applications like chatbots, larger batches (8-16) for throughput-focused scenarios like document analysis and offline processing&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Instance selection&lt;/strong&gt;:&amp;nbsp;m8i instances provide consistent 9-14% performance improvements over m7i, delivering immediate ROI for existing CPU inference workloads without requiring application changes&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Model size consideration&lt;/strong&gt;:&amp;nbsp;Larger models (1B+ parameters) show better AMX utilization across batch sizes, making them ideal candidates for m8i deployment in complex reasoning and content generation applications&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion and next steps&lt;/h2&gt; 
&lt;p&gt;By using Intel AMX on Amazon EC2 8th generation instances, you can achieve substantial performance improvements for AI inference workloads. Our benchmarks demonstrate&amp;nbsp;up to 72% performance improvements across popular language models, making CPU inference more competitive for batch processing, real-time applications, recommender systems, and variable demand workloads while delivering substantial cost savings through improved resource utilization.&lt;/p&gt; 
&lt;p&gt;Key takeaways&lt;strong&gt;:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;BF16 AMX optimization&lt;/strong&gt;&amp;nbsp;delivers up to 72% performance improvements across model sizes, with batch 8 showing 21-72% gains and batch 32 showing 24-68% gains&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch sizes of 4-8 &lt;/strong&gt;provide optimal performance for most models—DialoGPT achieves 21% improvement in latency at batch 8, while Llama-3.2-3B achieves 72% improvement&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;8th generation instances&lt;/strong&gt;&amp;nbsp;deliver up to 14% performance improvements over m7i across the tested workloads&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Combined optimizations&lt;/strong&gt;&amp;nbsp;(m8i + BF16 AMX) can achieve compound performance improvements up to 76% in optimal configurations (vs m7i FP32), making CPU inference highly competitive for cost-sensitive applications&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;M8i instances deliver up to 13% better price-performance vs m7i&lt;/strong&gt; (lower cost per 1M tokens), based on our analysis of the Gemma-3-1b-it model&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Proper environment configuration&lt;/strong&gt; is critical for AMX activation&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;You can implement these optimizations immediately. &lt;/strong&gt;AMX hardware acceleration combined with PyTorch’s Intel-specific enhancements requires configuring environment variables while delivering substantial speed gains. Begin with BF16 optimization on your existing models, then explore INT8 quantization for additional gains.&lt;/p&gt; 
&lt;h3&gt;Next steps:&lt;/h3&gt; 
&lt;ol&gt; 
 &lt;li&gt;Launch an Intel based&amp;nbsp;Amazon EC2 8th generation instance (m8i.4xlarge)&lt;/li&gt; 
 &lt;li&gt;Install PyTorch (includes built-in Intel optimizations)&lt;/li&gt; 
 &lt;li&gt;Configure AMX environment variables&lt;/li&gt; 
 &lt;li&gt;Measure performance improvements&lt;/li&gt; 
 &lt;li&gt;Scale your optimized inference workloads&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Additional resources&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/what-is-intel-amx.html" target="_blank" rel="noopener noreferrer"&gt;Intel AMX documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/ec2/instance-types/m8i/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 m8i instances&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html" target="_blank" rel="noopener noreferrer"&gt;PyTorch Intel optimizations guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://huggingface.co/models" target="_blank" rel="noopener noreferrer"&gt;HuggingFace model hub&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://github.com/oneapi-src/oneDNN" target="_blank" rel="noopener noreferrer"&gt;oneDNN library documentation&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Build high-performance apps with AWS Lambda Managed Instances</title>
		<link>https://aws.amazon.com/blogs/compute/build-high-performance-apps-with-aws-lambda-managed-instances/</link>
					
		
		<dc:creator><![CDATA[Debasis Rath]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 14:53:01 +0000</pubDate>
				<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">423c73bf0bbaf0cc504a6aca239ab3187bf33a14</guid>

					<description>In this post, you will learn how to configure AWS Lambda Managed Instances by creating a Capacity Provider that defines your compute infrastructure, associating your Lambda function with that provider, and publishing a function version to provision the execution environments. We will conclude with production best practices including scaling strategies, thread safety, and observability for reliable performance.</description>
										<content:encoded>&lt;p&gt;High-performance applications such as CPU-intensive processing, memory-heavy analytics, and steady-state data pipelines often require more predictable compute resources than standard &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; configurations provide. &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances (LMI)&lt;/a&gt; addresses this by letting you run Lambda functions on selected Amazon EC2 &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html" target="_blank" rel="noopener noreferrer"&gt;instance types&lt;/a&gt; while preserving the Lambda programming model. You can choose over 400 Amazon Elastic Compute Cloud (Amazon EC2) instance types from general purpose, compute optimized, or memory optimized instance families to match workload requirements. AWS Lambda continues to manage infrastructure operations such as instance lifecycle management, operating system patching, runtime updates, request routing, and automatic scaling. This approach gives your teams greater control over compute characteristics, &lt;a href="https://aws.amazon.com/ec2/pricing/" target="_blank" rel="noopener noreferrer"&gt;EC2 pricing model&lt;/a&gt; and reduces operational overhead of managing servers or clusters.&lt;/p&gt; 
&lt;p&gt;In this post, you will learn how to configure AWS Lambda Managed Instances by creating a &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-capacity-providers.html" target="_blank" rel="noopener noreferrer"&gt;Capacity Provider&lt;/a&gt; that defines your compute infrastructure, associating your Lambda function with that provider, and publishing a function version to provision the execution environments. We will conclude with production best practices including scaling strategies, thread safety, and observability for reliable performance.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/create-lmi.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25941" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/create-lmi.png" alt="Figure 1. Creating Function on LMI" width="1358" height="467"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 1. Creating Function on LMI&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Creating Capacity Providers&lt;/h2&gt; 
&lt;p&gt;A Capacity Provider defines the infrastructure blueprint for running LMI functions on Amazon EC2. It specifies instance types, network placement, and scaling behavior. To create a Capacity Provider, you need two parameters: an IAM role (Capacity Provider Operator Role) granting Lambda permissions to launch and manage instances and your VPC configuration with subnets and security groups. Create this role in your account with the &lt;code&gt;&lt;a href="https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaManagedEC2ResourceOperator.html" target="_blank" rel="noopener noreferrer"&gt;AWSLambdaManagedEC2ResourceOperator&lt;/a&gt;&lt;/code&gt; managed policy following the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html" target="_blank" rel="noopener noreferrer"&gt;Principle of Least Privilege&lt;/a&gt; (granting only the minimum permissions necessary).&lt;/p&gt; 
&lt;p&gt;This &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/create-capacity-provider.html" target="_blank" rel="noopener noreferrer"&gt;command&lt;/a&gt; creates a Capacity Provider with instance types and scaling configuration:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ruby"&gt;aws lambda create-capacity-provider \
  --capacity-provider-name my-lmi-capacity \
  --vpc-config SubnetIds=subnet-abc123,subnet-def456,SecurityGroupIds=sg-xyz789 \
  --permissions-config CapacityProviderOperatorRoleArn=arn:aws:iam::123456789012:role/LMIOperatorRole \
  --instance-requirements Architectures=x86_64,AllowedInstanceTypes=c5.2xlarge,r5.4xlarge \
  --capacity-provider-scaling-config MaxVCpuCount=50,ScalingMode=Auto \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This command returns a Capacity Provider ARN that you’ll use to create your LMI function. Your functions behavior depends on four main configurations in the capacity provider:&lt;/p&gt; 
&lt;h3&gt;Instance selection&lt;/h3&gt; 
&lt;p&gt;Lambda currently supports three Amazon EC2 instance families (.large and up): C (&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/compute-optimized-instances.html" target="_blank" rel="noopener noreferrer"&gt;compute optimized&lt;/a&gt;) for CPU-heavy work, M (&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose-instances.html" target="_blank" rel="noopener noreferrer"&gt;general purpose&lt;/a&gt;) for balanced workloads, and R (&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/memory-optimized-instances.html" target="_blank" rel="noopener noreferrer"&gt;memory optimized&lt;/a&gt;) for large datasets. Choose x86 (Intel/AMD) or ARM (Graviton) architectures. If you don’t specify instance types, Lambda defaults to appropriate instances based on your function’s memory and CPU configuration. This is the recommended starting point unless you have specific performance requirements. When you need more control, use &lt;code&gt;AllowedInstanceTypes&lt;/code&gt; to specify only the instance types that Lambda can use or use &lt;code&gt;ExcludedInstanceTypes&lt;/code&gt; to exclude specific types while allowing all other instance types. You can’t use both parameters together.&lt;/p&gt; 
&lt;h3&gt;VPC and networking&lt;/h3&gt; 
&lt;p&gt;Configure multiple subnets across Availability Zones. Lambda creates a minimum Amazon EC2 fleet of three instances distributed across your configured Availability Zones to maintain availability and resiliency. Egress traffic from functions, including Amazon CloudWatch Logs, transits through the Amazon EC2 instance’s network interface in your Amazon Virtual Private Cloud (Amazon VPC). As functions send logs and metrics to CloudWatch, you will need internet access through a NAT Gateway or &lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html" target="_blank" rel="noopener noreferrer"&gt;VPC endpoints&lt;/a&gt; with &lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/what-is-privatelink.html" target="_blank" rel="noopener noreferrer"&gt;AWS PrivateLink&lt;/a&gt; for Amazon CloudWatch. This only affects egress traffic; function invoke requests don’t flow through your VPC. Security groups attached to your instances should allow only the traffic your function code needs. With LMI, configure VPC once at the Capacity Provider level instead of per function, simplifying management for multiple LMI functions. Standard Lambda functions continue to use their own VPC configurations. This Capacity Provider VPC configuration applies only to LMI functions.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-2-11.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25946" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-2-11.png" alt="Figure 2. LMI Networking" width="1543" height="680"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Figure 2. LMI Networking&lt;/strong&gt;&lt;/p&gt; 
&lt;h3&gt;Scaling configuration&lt;/h3&gt; 
&lt;p&gt;Set &lt;strong&gt;MaxVCpuCount&lt;/strong&gt; to cap compute capacity and control costs. New invocations throttle when you reach this limit until capacity frees up. Lambda monitors CPU utilization and scales instances automatically. Choose automatic scaling mode where Lambda tunes thresholds based on load patterns, or manual mode where you set a target CPU utilization percentage. Multiple functions can share the same Capacity Provider to reduce costs through better resource utilization, though you might want separate providers for functions with different performance or isolation requirements.&lt;/p&gt; 
&lt;h3&gt;Security&lt;/h3&gt; 
&lt;p&gt;Lambda encrypts &lt;a href="https://docs.aws.amazon.com/ebs/latest/userguide/ebs-encryption.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Block Store (Amazon EBS)&lt;/a&gt; volumes attached to EC2 instances with a service-managed key by default. You can provide your own &lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/overview.html" target="_blank" rel="noopener noreferrer"&gt;AWS Key Management Service (AWS KMS) key&lt;/a&gt; for encryption. Place instances in private subnets with restrictive security groups for enhanced security.&lt;/p&gt; 
&lt;h2&gt;Creating Lambda Managed Instance Functions&lt;/h2&gt; 
&lt;p&gt;You create an LMI function similarly to creating a standard Lambda function. You package your code, set your runtime, assign an execution role, and configure memory. The difference is specifying a &lt;code&gt;CapacityProviderConfig&lt;/code&gt; to tell Lambda which Capacity Provider to use and how to size each execution environment. Specify &lt;code&gt;CapacityProviderConfig&lt;/code&gt; during function creation with the Capacity Provider ARN and configure two execution environment settings. &lt;code&gt;ExecutionEnvironmentMemoryGiBPerVCpu&lt;/code&gt; sets the &lt;code&gt;memory-to-vCPU&lt;/code&gt; ratio (2:1, 4:1, or 8:1) based on your workload type and &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; defines how many concurrent requests share each execution environment. This table shows how memory and vCPU allocation maps across supported execution environment ratio.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px" colspan="2"&gt;2:1 Ratio(Compute optimized)&lt;/td&gt; 
   &lt;td style="padding: 10px" colspan="2"&gt;4:1 Ratio(General purpose)&lt;/td&gt; 
   &lt;td style="padding: 10px" colspan="2"&gt;8:1 Ratio(Memory optimized)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Memory (GB)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;vCPU(s)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;Memory (GB)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;vCPU(s)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;Memory (GB)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;vCPU(s)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;1&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;3&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;12&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;3&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;24&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;3&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;10&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;20&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;12&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;24&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;14&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;7&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;28&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;7&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;…&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;…&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Function Memory-to-CPU configuration&lt;/h3&gt; 
&lt;p&gt;Set the function’s memory size (up to 32 GB for LMI) and &lt;code&gt;ExecutionEnvironmentMemoryGiBPerVCpu&lt;/code&gt; ratio. The default ratio is 2:1. A 2:1 ratio map to compute optimized instances for CPU-intensive tasks like video encoding, 4:1 map to a general purpose for balanced workloads, and 8:1 maps to a memory optimized instances for large in-memory datasets or caching. You must set memory in multiples of the ratio. LMI requires a 2 GB minimum as execution environments need sufficient memory to handle multiple concurrent requests. LMI supports up to 32 GB memory per execution environment.&lt;/p&gt; 
&lt;h3&gt;Multi-Concurrency settings&lt;/h3&gt; 
&lt;p&gt;LMI supports &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-runtimes.html" target="_blank" rel="noopener noreferrer"&gt;multiple concurrent invocations&lt;/a&gt; sharing the same execution environment, reducing cost per invocation by maximizing vCPU utilization. This is particularly effective for I/O-bound workloads, where invocations waiting on database queries or API calls yield vCPU usage to other invocations during idle periods. Lambda defaults to max concurrency per execution environment based on your runtime: Node.js (64 per vCPU), Java, and .NET (32 per vCPU), Python (16 per vCPU). Use &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; to set a lower limit based on your workload’s resource needs. Decrease it if you’re experiencing memory pressure or CPU contention. When environments reach their configured max concurrency, new invocations throttle until capacity frees up at the execution environment level. This table captures the maximum concurrency per vCPU for each supported programming language.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Language&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;Default Max Concurrency&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Node.js&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;64 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Java&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;.NET&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Python&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;This &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html" target="_blank" rel="noopener noreferrer"&gt;command&lt;/a&gt; creates a Lambda function and associates it with your Capacity Provider:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;aws lambda create-function \
  --function-name my-lmi-function \
  --runtime python3.13 \
  --role arn:aws:iam::123456789012:role/LambdaExecutionRole \
  --handler app.lambda_handler \
  --zip-file fileb://function.zip \
  --memory-size 4096 \
  --capacity-provider-config '{
    "LambdaManagedInstancesCapacityProviderConfig": {
      "CapacityProviderArn": "arn:aws:lambda:us-east-1:123456789012:capacity-provider:my-lmi-capacity",
      "ExecutionEnvironmentMemoryGiBPerVCpu": 4.0,
      "PerExecutionEnvironmentMaxConcurrency": 10
    }
  }' \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Publishing Lambda Managed Instance Functions&lt;/h2&gt; 
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt;&amp;nbsp;publish a function version before invoking an LMI function. Publishing triggers Lambda to provision Amazon EC2 instances and initialize execution environments, so that the configured baseline capacity is ready before you start invoking. Expect a brief delay before your code goes live as Lambda provisions and launches Amazon EC2 instances. With LMI, execution environments pre-warm after publishing and remain invoke-ready, without cold starts for published versions. Standard Lambda environments initialize on first invoke (cold starts).&lt;/p&gt; 
&lt;p&gt;This &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/publish-version.html" target="_blank" rel="noopener noreferrer"&gt;command&lt;/a&gt; publishes a Lambda function version and provisions capacity:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws lambda publish-version --function-name my-lmi-function \
--region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After publishing, the function works with standard invocation methods including direct invokes, &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html" target="_blank" rel="noopener noreferrer"&gt;event source mappings&lt;/a&gt;, and service integrations with Amazon API Gateway, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB Streams, and Amazon EventBridge.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-3-8.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25947" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-3-8.png" alt="Figure 3. LMI Invocation from event sources" width="1073" height="519"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 3. LMI Invocation from event sources&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Scaling LMI Functions&lt;/h2&gt; 
&lt;p&gt;Lambda monitors &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-scaling.html" target="_blank" rel="noopener noreferrer"&gt;CPU utilization&lt;/a&gt; at Capacity Provider level. When CPU utilization reaches the target threshold, Lambda automatically provisions additional EC2 instances, and creates more execution environments on those instances, up to the &lt;code&gt;MaxVCpuCount&lt;/code&gt; limit you configured for your capacity provider. As demand decreases, Lambda consolidates workloads onto fewer EC2 instances. You can choose &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-scaling.html" target="_blank" rel="noopener noreferrer"&gt;automatic scaling mode&lt;/a&gt; (Lambda adjusts thresholds based on your patterns) or manual mode (you set a target CPU percentage). Automatic mode works for variable traffic patterns or when getting started. Manual mode fits when you have predictable patterns and want precise control over scaling thresholds for cost optimization.&lt;/p&gt; 
&lt;h3&gt;Min and max execution environments&lt;/h3&gt; 
&lt;p&gt;Control scaling at the function level with min and max execution environments. The default minimum is 3 execution environments to maintain &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html" target="_blank" rel="noopener noreferrer"&gt;high availability&lt;/a&gt; across Availability Zones. Your total function concurrency equals the number of execution environments multiplied by &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt;. For example, with min set to 3 and &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; of 10, you have provided capacity for 30 concurrent invocations. With max set to 20, you can scale up to 200 concurrent invocations with incoming traffic, based on CPU utilization or concurrency saturation per execution environment. Set max to cap total concurrency and prevent noisy neighbor issues when multiple functions share a Capacity Provider. LMI maintains a minimum number of execution environments with a minimum Amazon EC2 fleet, while standard Lambda scales to zero when idle. Set both min and max to 0 to deactivate a function without deleting it.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-4-7.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25936" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-4-7.png" alt="Figure 4. LMI Scaling" width="1241" height="615"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Figure 4. LMI Scaling&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;This command updates the minimum and maximum execution environments for your function:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;aws lambda put-function-scaling-config \
  --function-name my-lmi-function \
  --qualifier $LATEST \
  --function-scaling-config MinExecutionEnvironments=5,MaxExecutionEnvironments=20 \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;We’ll cover scaling patterns and throughput optimization strategies in depth in a separate blog post.&lt;/p&gt; 
&lt;h2&gt;Best Practices and Production Considerations&lt;/h2&gt; 
&lt;h3&gt;Thread Safety&lt;/h3&gt; 
&lt;p&gt;Since LMI supports multiple invocations sharing execution environments, your code must be &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html" target="_blank" rel="noopener noreferrer"&gt;thread-safe.&lt;/a&gt; Code that isn’t thread-safe causes data corruption, security issues, or unpredictable behavior under concurrent load.&lt;/p&gt; 
&lt;h4&gt;Thread safety essentials&lt;/h4&gt; 
&lt;p&gt;Avoid mutating shared objects or global variables. Use thread-local storage for request-specific data. Initialize shared clients (AWS SDK, database connections) outside the function handler and verify that configurations remain immutable during invocations. Write to &lt;code&gt;/tmp&lt;/code&gt; using request-specific file names to prevent concurrent writes.&lt;/p&gt; 
&lt;h4&gt;Runtime-specific guidance&lt;/h4&gt; 
&lt;p&gt;Java applications should use immutable objects, thread-safe collections, and proper synchronization. Node.js applications should use async context for request isolation. Python applications run separate processes per execution environment. So, focus on interprocess coordination and file locking for &lt;code&gt;/tmp&lt;/code&gt; access.&lt;/p&gt; 
&lt;h3&gt;Workload Optimization&lt;/h3&gt; 
&lt;p&gt;I/O-bound workloads perform better with higher concurrency per environment. Use asynchronous patterns and non-blocking I/O to maximize efficiency. CPU-bound workloads get no benefit from concurrency greater than one per vCPU. Instead, configure more vCPUs per function for true parallelism for compute-heavy tasks like data transformation or image processing.&lt;/p&gt; 
&lt;h3&gt;Testing&lt;/h3&gt; 
&lt;p&gt;Validate your code under concurrent execution. Test with multiple simultaneous invocations to detect race conditions and shared state issues before production deployment. You can use LocalStack for local emulation of LMI. Learn more about LocalStack’s LMI support in their &lt;a href="https://blog.localstack.cloud/testing-locally-with-lambda-managed-instances/" target="_blank" rel="noopener noreferrer"&gt;announcement blog&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Compatibility&lt;/h3&gt; 
&lt;p&gt;Tools like &lt;a href="https://docs.aws.amazon.com/powertools/" target="_blank" rel="noopener noreferrer"&gt;Powertools&lt;/a&gt; for AWS work with LMI without code changes. However, if you’re reusing existing Lambda function code, layers, or packaged dependencies on LMI, test for thread safety and compatibility with the multi-concurrent execution model before production deployment.&lt;/p&gt; 
&lt;h3&gt;Observability&lt;/h3&gt; 
&lt;p&gt;LMI automatically publishes CloudWatch metrics at two levels: capacity provider (CPU, memory, network, and disk utilization across your Amazon EC2 fleet) and execution environment (concurrency, CPU, and memory per function). Monitor &lt;code&gt;CPUUtilization&lt;/code&gt; to understand scaling headroom and right-size your &lt;code&gt;MaxVCpuCount&lt;/code&gt;. Track &lt;code&gt;ExecutionEnvironmentConcurrency&lt;/code&gt; against &lt;code&gt;ExecutionEnvironmentConcurrencyLimit&lt;/code&gt; to catch throttling before it impacts users. Lambda publishes metrics at 5-minute intervals. Use CloudWatch alarms to stay ahead of capacity limits in production.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances combines serverless simplicity with compute flexibility, helping you run high-performance workloads with reduced operational complexity. You maintain the familiar programming model of Lambda while accessing the diverse instance types of Amazon EC2 and predictable pricing, making it well-suited for data processing pipelines, compute intensive operations and cost-sensitive steady-state applications.&lt;/p&gt; 
&lt;p&gt;Ready to get started with LMI?&amp;nbsp;Deploy our&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-aws-lambda-managed-instances/tree/main/examples/fsi/sample-retirement-savings-simulator" target="_blank" rel="noopener noreferrer"&gt;Monte Carlo risk simulation example&amp;nbsp;&lt;/a&gt;from GitHub to see LMI in action with a real compute-intensive workload. The sample includes complete infrastructure code and walks you through capacity provider configuration, function setup, and performance optimization.&lt;/p&gt; 
&lt;p&gt;We want to hear from you. Share your feedback, questions, and use cases on &lt;a href="https://repost.aws/" target="_blank" rel="noopener noreferrer"&gt;re:Post&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Enhancing auto scaling resilience by tracking worker utilization metrics</title>
		<link>https://aws.amazon.com/blogs/compute/enhancing-auto-scaling-resilience-by-tracking-worker-utilization-metrics/</link>
					
		
		<dc:creator><![CDATA[Brian Moore]]></dc:creator>
		<pubDate>Tue, 24 Mar 2026 16:17:58 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Auto Scaling]]></category>
		<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Resilience]]></category>
		<guid isPermaLink="false">d9fc642874b341b8afa90f9f3c8c6eeed67691fb</guid>

					<description>A resilient auto scaling policy requires metrics that correlate with application utilization, which may not be tied to system resources. Traditionally, auto scaling policies track system resource such as CPU utilization. These metrics are easily available, but they only work when resource consumption correlates with worker capacity. Factors such as high variance in request processing time, mixed instance types, or natural changes in application behavior over time can break this assumption.</description>
										<content:encoded>&lt;p&gt;A resilient auto scaling policy requires metrics that correlate with application utilization, which may not be tied to system resources. Traditionally, auto scaling policies track system resource such as CPU utilization. These metrics are easily available, but they only work when resource consumption correlates with worker capacity. Factors such as high variance in request processing time, mixed instance types, or natural changes in application behavior over time can break this assumption.&lt;/p&gt; 
&lt;p&gt;Worker utilization tracking offers an alternative approach. Using a combination of total worker slots, work in flight, and work waiting in the backlog, a utilization value can be calculated for use in an auto scaling policy. This approach remains accurate across fleets with mixed instance types, applications with variable latencies, and requires no changes as your application evolves.&lt;/p&gt; 
&lt;h2&gt;The limitations of resource-based scaling&lt;/h2&gt; 
&lt;p&gt;Traditional auto scaling policies track system resource metrics like CPU utilization, assuming a direct correlation between resource consumption and available application capacity. Consider an application that reads messages from &lt;a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Queue Service (SQS)&lt;/a&gt;, processes them, and writes results to &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt;. If this application uses a fixed-size thread pool to process messages, such as 10 worker threads, the application reaches maximum capacity when all threads are busy, regardless of CPU utilization.&lt;/p&gt; 
&lt;p&gt;In our example, each worker spends most of its time waiting for DynamoDB responses rather than consuming CPU. All 10 threads become occupied handling requests, but CPU utilization stays low. From the perspective of the auto scaling policy, the fleet looks like it has enough capacity because plenty of CPU headroom remains. Meanwhile, new messages accumulate in the SQS queue because no workers are available to process them.&lt;/p&gt; 
&lt;p&gt;For queue-based workloads, &lt;a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html#scale-sqs-queue-custom-metric" target="_blank" rel="noopener noreferrer"&gt;AWS provides guidance&lt;/a&gt; to scale based on an acceptable backlog per worker. This is a calculated target based on your application’s average processing latency (queue delay). This works well when processing times are consistent, but breaks down if an application has variable latency characteristics.&lt;/p&gt; 
&lt;p&gt;Consider an image processing application that initially handles thumbnails taking 500 ms each. Using the traditional guidance with a target latency of 5 seconds you calculate an acceptable backlog of 10 messages per worker and deploy your scaling policy. Over time, the application evolves to also process 4K photos which take 2 seconds each. Eventually 4K photos are 50% of your traffic and total latency for queued messages has increased to 12.5 seconds, 2.5x more than your initial target.&lt;/p&gt; 
&lt;p&gt;The scaling policy is no longer fit for its intended purpose because your original latency assumptions no longer reflect reality. To keep this type of scaling effective you must also remember to update your scaling policies as your application behavior evolves.&lt;/p&gt; 
&lt;p&gt;A shift to using mixed instance types in your application can lead to additional complexity when using traditional resource-based scaling policies. Different instance types may handle the same workload at different CPU levels leading to an unbalanced average that misrepresents your actual application health. By changing your mental model to consider how much work your application can accept instead of how much of a system resource is available you can improve your scaling rules and better model your application’s capacity.&lt;/p&gt; 
&lt;h2&gt;Understanding worker utilization&lt;/h2&gt; 
&lt;p&gt;Worker utilization measures the ratio of active work to available processing capacity. To calculate it, divide total work by total workers.&lt;/p&gt; 
&lt;p&gt;We use an SQS-based processing application as an example to demonstrate how worker utilization operates, but this approach can also be applied to other applications where work units and worker capacity are measurable. In our example application total work consists of messages waiting to be processed plus messages currently being processed. &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; provides these values through the &lt;code&gt;ApproximateNumberOfMessagesVisible&lt;/code&gt; metric (messages waiting in the queue) and the &lt;code&gt;ApproximateNumberOfMessagesNotVisible&lt;/code&gt; metric (messages currently being processed or in flight). Each host in your application should publish the number of available workers as a custom CloudWatch metric with at least a 1-minute period. For Java thread pools or Python multiprocessing pools, this represents the pool or process count. The formula works regardless of the metric period. Using the shortest period possible allows more responsive target tracking and enables &lt;a href="https://aws.amazon.com/blogs/compute/faster-scaling-with-amazon-ec2-auto-scaling-target-tracking/" target="_blank" rel="noopener noreferrer"&gt;Fast Target Tracking&lt;/a&gt; if your application has sub-minute data points.&lt;/p&gt; 
&lt;p&gt;To derive the formula, we can use the following CloudWatch Metric Math expressions:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;totalWork&lt;/code&gt; = FILL(&lt;code&gt;backlog&lt;/code&gt;, REPEAT) + FILL(&lt;code&gt;inFlight&lt;/code&gt;, REPEAT)&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;utilizationRatio&lt;/code&gt; = &lt;code&gt;totalWork&lt;/code&gt; / &lt;code&gt;workers&lt;/code&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Where:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;backlog&lt;/code&gt; = &lt;code&gt;ApproximateNumberOfMessagesVisible&lt;/code&gt; with the &lt;code&gt;Maximum&lt;/code&gt; statistic.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;inFlight&lt;/code&gt; = &lt;code&gt;ApproximateNumberOfMessagesNotVisible&lt;/code&gt; with the &lt;code&gt;Maximum&lt;/code&gt; statistic.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;workers&lt;/code&gt; = Your custom &lt;code&gt;TotalWorkers&lt;/code&gt; metric with the &lt;code&gt;Sum&lt;/code&gt; statistic.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Putting the components together the final expression for your target tracking scaling policy uses the following formula:&lt;/p&gt; 
&lt;p&gt;IF(FILL(&lt;code&gt;workers&lt;/code&gt;, 0) &amp;gt; 0, &lt;code&gt;utilizationRatio&lt;/code&gt;, IF(&lt;code&gt;totalWork&lt;/code&gt; &amp;gt; 0, 1, 0))&lt;/p&gt; 
&lt;p&gt;The FILL function uses last known values if SQS metrics are delayed, and the IF statement handles the case where you have no traffic and your fleet scales to zero instances. When there are no available workers, the formula metric reports 1 to indicate that the workers are fully saturated. This prevents the application from getting stuck at zero capacity and not being able to respond to any requests.&lt;/p&gt; 
&lt;p&gt;In this formula, a value of 1 or higher represents full or over saturation, where all workers are busy with no spare capacity, like running at 100% CPU. Values below 1 indicate available capacity for your application to process more work.&lt;/p&gt; 
&lt;p&gt;For applications without a measurable backlog metric, you can track worker utilization using only the in-flight work. This approach works for APIs or other synchronous workloads where work arrives and is immediately assigned to workers rather than queuing. In these cases, the formula becomes:&lt;/p&gt; 
&lt;p&gt;IF(FILL (&lt;code&gt;workers&lt;/code&gt;, 0) &amp;gt; 0, &lt;code&gt;utilizationRatio&lt;/code&gt;, IF(FILL(&lt;code&gt;inFlight&lt;/code&gt;, 0) &amp;gt; 0, 1, 0))&lt;/p&gt; 
&lt;p&gt;In this scenario the utilization ratio is calculated as follows:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;utilizationRatio&lt;/code&gt; = FILL(&lt;code&gt;inFlight&lt;/code&gt;, REPEAT) / &lt;code&gt;workers&lt;/code&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The definitions of &lt;code&gt;workers&lt;/code&gt; and &lt;code&gt;inFlight&lt;/code&gt; remain the same for this formula. The primary difference is that the ratio directly tracks workers available and does not consider the backlog as an option.&lt;/p&gt; 
&lt;h2&gt;How worker utilization prevents outages&lt;/h2&gt; 
&lt;p&gt;Worker utilization-based scaling works for any application that can define available workers and total work. When the ratio of total work to available workers exceeds your threshold, the system scales out. This approach measures whether workers are available to handle the workload and treats application bottlenecks consistently. Whether workers are waiting on network I/O, performing CPU-intensive calculations, or experiencing another bottleneck doesn’t matter; the only question is whether total work exceeds available worker capacity. Any situation causing messages to accumulate on the queue increases the utilization ratio and triggers scale-out.&lt;/p&gt; 
&lt;h2&gt;Implementing worker utilization scaling&lt;/h2&gt; 
&lt;p&gt;To set up worker utilization-based auto scaling, identify metrics to use in the formula discussed earlier. First, identify a metric to track the amount of work being worked on. For SQS-based processing, AWS provides this metric. Second, implement a custom metric from your application representing the total workers. Optionally you can also identify a metric to track the available backlog of work.&lt;/p&gt; 
&lt;p&gt;Using CloudWatch metric math, you calculate the utilization metric and use it in a target tracking scaling policy. Here is an example &lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt; snippet showing the metric math configuration for a &lt;a href="https://aws.amazon.com/pm/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt; Auto Scaling group. This snippet shows only the scaling policy configuration and is only an example, before using in production fully test with your application. Your complete template also needs IAM roles with appropriate permissions for SQS, DynamoDB, and CloudWatch access.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-yaml"&gt;ScalingPolicy: 
  Type: AWS::AutoScaling::ScalingPolicy 
  Properties: 
    AutoScalingGroupName: !Ref AutoScalingGroup 
    PolicyType: TargetTrackingScaling 
    TargetTrackingConfiguration: 
      TargetValue: 0.7 
      CustomizedMetricSpecification: 
        Metrics: 
          - Id: backlog 
            MetricStat: 
            Metric: 
              Namespace: AWS/SQS 
              MetricName: ApproximateNumberOfMessagesVisible 
              Dimensions: 
                - Name: QueueName 
                  Value: !GetAtt ProcessingQueue.QueueName 
              Stat: Maximum 
          - Id: inFlight 
            MetricStat: 
            Metric: 
              Namespace: AWS/SQS 
              MetricName: ApproximateNumberOfMessagesNotVisible 
              Dimensions: 
                - Name: QueueName 
                  Value: !GetAtt ProcessingQueue.QueueName 
              Stat: Maximum 
          - Id: workers 
            MetricStat: 
            Metric: 
              Namespace: YourApp 
              MetricName: TotalWorkers 
            Stat: Sum 
          - Id: totalWork 
            Expression: FILL(backlog, REPEAT) + FILL(inFlight, REPEAT) 
          - Id: utilizationRatio 
            Expression: totalWork / workers 
          - Id: utilization 
            Expression: IF(FILL(workers, 0) &amp;gt; 0, utilizationRatio, IF(totalWork &amp;gt; 0, 1, 0)) 
            ReturnData: true&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;This approach also works for &lt;a href="https://aws.amazon.com/ecs/" target="_blank" rel="noopener noreferrer"&gt;Amazon ECS&lt;/a&gt; services using &lt;a href="https://aws.amazon.com/autoscaling/" target="_blank" rel="noopener noreferrer"&gt;AWS Application Auto Scaling&lt;/a&gt;. The metric math configuration remains the same, but you create an &lt;code&gt;AWS::ApplicationAutoScaling::ScalingPolicy&lt;/code&gt; resource instead, adapting the parameters accordingly.&lt;/p&gt; 
&lt;h2&gt;Choosing a target utilization&lt;/h2&gt; 
&lt;p&gt;Since the worker utilization metric directly tracks the available capacity of your application, the target utilization value you choose reflects your organization’s balance between cost efficiency and availability. Lower target values provide more headroom for traffic spikes and faster response to load changes but result in higher infrastructure costs due to lower utilization. Higher target values maximize cost efficiency by keeping workers busy but leave less headroom for sudden traffic increases.&lt;/p&gt; 
&lt;p&gt;When choosing a target consider traffic patterns, acceptable latency during scale-out events, and cost sensitivity. Applications with unpredictable traffic spikes may benefit from lower targets, while an application with predictable load can safely use higher targets. Start with a moderate value like 0.7 and adjust based on observed behavior and your business requirements. If you previously tracked a resource utilization metric such as CPU, consider starting with the same target.&lt;/p&gt; 
&lt;h2&gt;Monitoring resource utilization for cost optimization&lt;/h2&gt; 
&lt;p&gt;While worker utilization drives scaling decisions, CPU and latency should be regularly evaluated to ensure cost-effective operations. Resource-based metrics can identify host resizing opportunities to better match your application requirements. If no scale-in happens when CPU utilization is consistently low, you are likely running instances that are too large for your workload. By using worker utilization in an auto scaling policy, you can switch to a different instance type without adjusting the auto scaling policy. The formula automatically adapts as you add different instance types or update the capacity per worker.&lt;/p&gt; 
&lt;p&gt;Conversely, if CPU utilization is consistently high while worker utilization remains at your target, your instances might be undersized. Upgrading to larger instance types can improve per-worker throughput, allowing each worker to process tasks faster. Changes to your auto scaling policy are not needed in this situation either. As messages are processed faster, they spend less time in the in-flight state, and the utilization ratio naturally adjusts.&lt;/p&gt; 
&lt;p&gt;This approach manages application availability independent of instance size, while resource utilization guides cost optimization. Each can be optimized independently without complex coordination.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Worker utilization-based auto scaling reduces the operational burden of continuously validating your scaling rules as application requirements and infrastructure change. By tracking the ratio of work to workers, your auto scaling policies automatically respond to capacity constraints based on available work. The approach works across workloads with discrete processing units and remains effective when you modify instance configurations or application worker pool sizes.&lt;/p&gt; 
&lt;p&gt;Implementation requires identifying a metric for available work, publishing a custom metric representing total workers, and using CloudWatch metric math in a target tracking scaling policy. This setup provides resilience that scaling based solely on resource metrics cannot achieve, while maintaining the flexibility to optimize costs and change your instance size without impacting system availability.&lt;/p&gt; 
&lt;p&gt;To get started:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Identify an application in your environment that uses a worker pool.&lt;/li&gt; 
 &lt;li&gt;Instrument the application to publish worker count metrics.&lt;/li&gt; 
 &lt;li&gt;Configure a scaling policy tracking worker utilization.&lt;/li&gt; 
 &lt;li&gt;Monitor how the system responds to traffic changes and capacity events.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Learn more&lt;/h2&gt; 
&lt;p&gt;To learn more about auto scaling and monitoring, see the following resources:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 Auto Scaling target tracking scaling policies&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-autoscaling-targettracking.html" target="_blank" rel="noopener noreferrer"&gt;AWS Application Auto Scaling for Amazon ECS services&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html" target="_blank" rel="noopener noreferrer"&gt;Using Amazon CloudWatch metric math&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html" target="_blank" rel="noopener noreferrer"&gt;Publishing custom CloudWatch metrics&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Best practices for Lambda durable functions using a fraud detection example</title>
		<link>https://aws.amazon.com/blogs/compute/best-practices-for-lambda-durable-functions-using-a-fraud-detection-example/</link>
					
		
		<dc:creator><![CDATA[Debasis Rath]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 22:04:39 +0000</pubDate>
				<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<guid isPermaLink="false">8e5c3ce20aad30d0530d3aa36548678e22b7a636</guid>

					<description>This post walks through a fraud detection system built with durable functions. It also highlights the best practices that you can apply to your own production workflows, from approval processes to data pipelines to AI agent orchestration.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda durable functions&lt;/a&gt;&amp;nbsp;extend the Lambda programming model to build fault-tolerant multi-step applications and AI workflows using familiar programming languages. They preserve progress despite interruptions and execution can suspend for up to one year, for human approvals, scheduled delays, or other external events, without incurring compute charges for on-demand functions.&lt;/p&gt; 
&lt;p&gt;This post walks through a fraud detection system built with durable functions. It also highlights the best practices that you can apply to your own production workflows, from approval processes to data pipelines to AI agent orchestration. You will learn how to handle concurrent notifications, wait for customer responses, and recover from failures without losing progress. If you are new to durable functions, check out the &lt;a href="https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;Introduction to Durable Functions blog post&lt;/a&gt;&amp;nbsp;first.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Fraud detection with human-in-the-loop&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Consider a credit card fraud detection system, which uses an AI agent to analyze incoming transactions and assign risk scores. For ambiguous cases (medium-risk scores), the system needs human approval before authorizing a transaction. The workflow branches based on risk:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Low risk (score &amp;lt; 3)&lt;/strong&gt;: Authorize immediately&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;High risk (score ≥ 5)&lt;/strong&gt;: Send to the fraud department immediately&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Medium risk (score 3–4)&lt;/strong&gt;: Suspend transaction, send SMS and email to cardholder, wait up to 24 hours for confirmation (wait time is customizable)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;div id="attachment_25907" style="width: 946px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25907" loading="lazy" class="wp-image-25907 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/23/compute-2476-arch-diag.png" alt="Figure 1. Agentic Fraud Detection with durable Lambda functions" width="936" height="508"&gt;
 &lt;p id="caption-attachment-25907" class="wp-caption-text"&gt;Figure 1. Agentic Fraud Detection with durable Lambda functions&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;With human-in-the-loop workflows, response times can vary from minutes to hours. These delays introduce the need to durably preserve the state without consuming compute resources while waiting. With financial systems, we must also implement idempotency to guard against duplicate messages (invocations) and recover from failures without reprocessing completed work. To address these requirements, developers implement polling patterns with external state stores like Amazon DynamoDB or Amazon Simple Storage Service (Amazon S3) to manage idempotency, pay for idle compute while waiting for callbacks, introduce external orchestration components, or build asynchronous message-driven systems to handle long-processing tasks.&lt;/p&gt; 
&lt;p&gt;Lambda durable functions provide a new alternative to address these challenges through durable execution, a pattern that uses checkpoints (saved state snapshots) to preserve progress and replays from saved state to recover from failures or resume after waiting. With checkpointing capabilities, you no longer need to pay Lambda compute charges while waiting, whether for callbacks, scheduled delays, or external events. Learn how to implement durable functions using the complete fraud detection implementation at this&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-lambda-durable-functions/tree/main/Industry%20Solutions/Financial%20Services%20%28FSI%29/FraudDetection" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. You can deploy it to your AWS account and experiment with the code as you read. The repository includes deployment instructions, sample data, and helper functions for testing.&lt;/p&gt; 
&lt;p&gt;As we walk through the code, we’ll focus on best practices for designing workflows with durable execution and how to apply these patterns correctly in production workflows.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Design steps to be idempotent&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Durable execution is designed to preserve progress through checkpoints and replay, but that reliability model means step logic can execute more than once. When steps retry, how do you prevent duplicate actions like charges to the credit card or repeated customer SMS or email notifications?&lt;/p&gt; 
&lt;p&gt;Durable functions use&amp;nbsp;&lt;strong&gt;&lt;em&gt;at-least-once execution&lt;/em&gt;&lt;/strong&gt;&amp;nbsp;by default, executing each step at least one time, potentially more if failures occur. When a step fails, it retries. There are two strategies to design idempotent steps that prevent duplicate side effects: using external API idempotency keys and using the at-most-once step semantics built into durable functions.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Strategy A&lt;/strong&gt;: External API Idempotency Keys&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Strategy A: Use external API idempotency keys
await context.step(`authorize-${tx.id}`, async () =&amp;gt; {
  return payment.charges.create({
    amount: tx.amount,
    currency: 'usd',
    idempotency_key: `tx-${tx.id}`, // Prevents duplicate charges
    description: `Transaction ${tx.id}`
  });
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;idempotency_key in API call&lt;/strong&gt;: If the step retries, the payment processor recognizes it’s a duplicate request and returns the original result&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Defense in depth&lt;/strong&gt;: Two layers of protection: Lambda checkpointing and external API idempotency&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Each layer provides independent protection. If Lambda’s checkpoint fails, the external API prevents duplicate charges. For legacy systems without idempotency support, where it’s critical that an operation is not executed more than once, use at-most-once semantics:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Strategy B&lt;/strong&gt;: Use At-Most-Once Semantics&lt;/p&gt; 
&lt;p&gt;For legacy systems without idempotency support, use at-most-once execution, a delivery feature that executes each step zero or one time, never more:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Strategy B: At-most-once step semantics
await context.step("charge-legacy-system", async () =&amp;gt; {
  return await legacyPaymentSystem.charge(tx.amount);
}, {
  semantics: StepSemantics.AtMostOncePerRetry,
  retryStrategy: createRetryStrategy({ maxAttempts: 0 })
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This checkpoints before step execution, preventing the step from re-execution on retries. The tradeoff? If the step fails, you must decide whether to retry (risking duplicates) or fail the entire workflow.&lt;/p&gt; 
&lt;p&gt;Use idempotency for critical side effects like payment processing, database writes, external API calls, state transitions, and resource provisioning. Read more about idempotency&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-idempotency.html" target="_blank" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Prevent duplicate executions with DurableExecutionName&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Idempotent steps prevent duplicate side effects within a single execution, but what about duplicate workflow executions running concurrently? For example, duplicate messages in the queue, users clicking “Submit” multiple times in the UI, or the same event arriving via multiple channels like webhook and API. Without protection, each invocation creates a separate durable execution, potentially running the fraud check multiple times, sending duplicate notifications, and creating confusion about which execution is authoritative. Durable functions provide &lt;code&gt;DurableExecutionName&lt;/code&gt; to help ensure only one concurrent execution per unique name.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Invoke fraud detection function with execution name
await lambda.invoke({
  FunctionName: 'fraud-detection',
  InvocationType: 'Event',
  DurableExecutionName: `tx-${transactionId}`,
  Payload: JSON.stringify({
    id: transactionId,
    amount: 6500,
    location: 'New York, NY',
    vendor: 'Amazon.com'
  })
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;DurableExecutionName: tx-${transactionId}&lt;/strong&gt;: Uses the transaction ID as a unique execution identifier&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;InvocationType: ‘Event’&lt;/strong&gt;: Asynchronous invocation supports long-running workflows beyond 15 minutes&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;One execution per transaction&lt;/strong&gt;: If three invocations arrive with the same transaction ID, only the first creates an execution. Subsequent requests with the same execution name and payload receive an idempotent response returning the existing execution’s ARN, rather than creating a new execution.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Lambda durable functions work with Lambda event sources, including event source mappings (ESM) such as&amp;nbsp;&lt;a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Queue Service (Amazon SQS)&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/kinesis/" target="_blank" rel="noopener noreferrer"&gt;Amazon Kinesis&lt;/a&gt;, and DynamoDB Streams. ESMs invoke durable functions synchronously and inherit Lambda’s&amp;nbsp;&lt;a href="https://docs.amazonaws.cn/en_us/lambda/latest/dg/durable-invoking-esm.html" target="_blank" rel="noopener noreferrer"&gt;15-minute invocation limit&lt;/a&gt;. Therefore, like direct Request/Response invocations, durable functions executions using event source mappings cannot exceed 15 minutes.&lt;/p&gt; 
&lt;p&gt;For workflows exceeding 15 minutes, use an intermediary Lambda function between the event source mapping and durable function:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Intermediary function for SQS -&amp;gt; Durable function
export const handler = async (event) =&amp;gt; {
  for (const record of event.Records) {
    const transaction = JSON.parse(record.body);
    await lambda.invoke({
      FunctionName: process.env.FRAUD_DETECTION_FUNCTION,
      InvocationType: 'Event',
      DurableExecutionName: `tx-${transaction.id}`,
      Payload: JSON.stringify(transaction)
    });
  }
};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This removes the 15-minute limit, allows executions up to one year, and enables custom execution name parameters for idempotency. Use&amp;nbsp;&lt;a href="https://aws.amazon.com/powertools-for-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Powertools for AWS Lambda&lt;/a&gt; to prevent duplicate invocations of the durable function when the event source mapping retries the intermediary function. Additionally, configure failure handling for your event source to capture failed invocations for future redrive or replay. For example, dead letter queues for SQS, or on-failure destinations for other event sources.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Match timeouts to invocation type&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;One important configuration detail ties these patterns together: matching your timeout settings to your invocation type. Lambda synchronous invocations (&lt;code&gt;RequestResponse&lt;/code&gt;) have a hard 15-minute timeout limit. If you configure a durable execution to run for 24 hours but invoke it synchronously, the synchronous invocation fails immediately with an exception. Durable functions support workflows up to one year when invoked asynchronously.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Lambda function configuration
{
  FunctionName: 'fraud-detection',
  Timeout: 300,
  MemorySize: 512,
  DurableConfig: {
    ExecutionTimeout: 90000
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;And invoke asynchronously:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Async invocation for long-running workflow
await lambda.invoke({
  FunctionName: 'fraud-detection',
  InvocationType: 'Event',
  DurableExecutionName: `tx-${transactionId}`,
  Payload: JSON.stringify(transaction)
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Timeout: 300&lt;/strong&gt;: Lambda function timeout (5 minutes in this example, up to a maximum of 15 minutes). This defines the maximum duration for each active execution phase, including the initial invocation and any subsequent replays. Set this to cover the longest expected active processing time in your workflow.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ExecutionTimeout: { hours: 25 }&lt;/strong&gt;: Durable execution timeout covers the workflow’s expected total duration including suspension periods. Set this slightly above the longest wait timeout to avoid edge cases.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;InvocationType: ‘Event’&lt;/strong&gt;: Asynchronous invocation removes the 15-minute limit and enables executions up to one year.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The Lambda function timeout applies to active execution phases (AI calls, notification sending). During suspension (waiting for callbacks), the function isn’t running, so this timeout doesn’t apply. Setting the durable execution timeout to a meaningful boundary prevents workflows from running longer than expected. Without an explicit timeout, executions can run up to the maximum lifetime of one year.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Synchronous (RequestResponse)&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Asynchronous (Event)&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Total duration&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Under 15 minutes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 1 year&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Caller needs result&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;No&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Idempotency support&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Waits with suspension&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;&lt;strong&gt;Execute Concurrent Operations with context.parallel()&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;In the fraud detection workflow, the system notifies the cardholder through multiple channels such as SMS and email. Preserving business logic when executing parallel workflows introduces code complexities such as managing execution state across branches, handling synchronization, and coordinating branch completion. Durable functions simplify parallel workflow implementation using&amp;nbsp;&lt;code&gt;context.parallel()&lt;/code&gt;, which executes branches concurrently while maintaining durable checkpoints for each branch and provides configurable options to handle partial completions. By checkpointing and managing the state internally, durable functions help make sure that the state is preserved even if there are retries or failures. Note that&amp;nbsp;&lt;code&gt;context.parallel()&lt;/code&gt;&amp;nbsp;manages the internal execution state for each branch. If your branches interact with a shared external state (such as a database), you’re responsible for managing concurrent access to that external state.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Human-in-the-loop: verify via email AND SMS (first response wins)
let verified = await context.parallel("human-verification", [
  (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationEmail",
    async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'email', tx)
  ),
  (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationSMS",
    async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'sms', tx)
  )
], {
  maxConcurrency: 2,
  completionConfig: {
    minSuccessful: 1 // Continue after 1 success
  }
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;maxConcurrency: 2&lt;/strong&gt;: Both notifications sent at the same time&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;minSuccessful: 1&lt;/strong&gt;: We only need one channel to succeed, whichever responds first wins&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Each parallel branch waits for its callback independently, and the durable execution checkpoints each branch as part of the execution state. Using the&amp;nbsp;&lt;code&gt;minSuccessful&lt;/code&gt;&amp;nbsp;parameter, you control the minimum number of successful branch executions required for the parallel operation to complete. In this example, only one of the two branches needs to succeed. Verifications through SMS or email are both valid, and the workflow resumes as soon as either channel completes successfully. We call this the&amp;nbsp;&lt;strong&gt;first-response-wins&lt;/strong&gt;&amp;nbsp;pattern. This pattern works well when you only need a single successful result from any parallel branch and want the remaining branches to stop blocking progress.&lt;/p&gt; 
&lt;p&gt;But what happens if neither channel responds? Without timeouts, this workflow could remain suspended for up to the configured execution lifetime.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Always configure callback timeouts&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Let’s add timeout protection to the parallel verification from the previous section.&amp;nbsp;&lt;code&gt;context.waitForCallback()&lt;/code&gt;&amp;nbsp;accepts a&amp;nbsp;timeout&amp;nbsp;option that bounds how long each branch waits before throwing an exception. By wrapping the parallel call in a try/catch, you can implement fallback logic when users don’t respond in time.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Enhanced: parallel verification with timeout and error handling
let verified;
try {
  verified = await context.parallel("human-verification", [
    (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationEmail",
      async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'email', tx),
      { timeout: { days: 1 } }  // Wait up to 1 day for email response
    ),
    (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationSMS",
      async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'sms', tx),
      { timeout: { days: 1 } }  // Wait up to 1 day for SMS response
    )
  ], {
    maxConcurrency: 2,
    completionConfig: {
      minSuccessful: 1
    }
  });
} catch (error) {
  const isTimeout = error.message?.includes("timeout");
  if (isTimeout) {
    context.logger.warn("Customer verification timeout", { error, txId: tx.id });
    // Fallback: escalate to fraud department
    return await context.step("sendToFraudDepartment", async () =&amp;gt;
      sendToFraudDepartment(tx, true)
    );
  }
  throw error; // Re-throw non-timeout errors
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice what changed from the previous section:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;timeout: { days: 1 }&lt;/strong&gt;: Each callback branch now has a maximum wait time of 1 day. If neither the email nor SMS callback arrives within that window, a timeout exception is thrown.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;try/catch with timeout detection&lt;/strong&gt;: The catch block distinguishes between timeout errors and other exceptions. When a timeout occurs, the workflow implements fallback logic by escalating the transaction to the fraud department, while non-timeout errors are re-thrown to be handled by the durable execution retry mechanism.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Without this error handling, the entire execution fails unhandled. The timeout also works with the&amp;nbsp;&lt;code&gt;minSuccessful&lt;/code&gt;&amp;nbsp;configuration: if one branch times out but the other succeeds, the parallel operation still completes successfully since only one successful result is required.&lt;/p&gt; 
&lt;p&gt;For advanced use cases where the callback handler performs long-running work, you can also configure a&amp;nbsp;&lt;code&gt;heartbeatTimeout&lt;/code&gt;&amp;nbsp;to detect stalled callbacks before the main timeout expires. See the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Developer Guide&lt;/a&gt;&amp;nbsp;for details.&lt;/p&gt; 
&lt;p&gt;Use callback timeouts for human approvals, external API callbacks, asynchronous processing, and third-party integrations.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Putting it all together: complete fraud detection implementation&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Now let’s see how all the best practices work together in the complete fraud detection workflow:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;import { withDurableExecution } from "@aws/durable-execution-sdk-js";
import { BedrockAgentCoreClient, InvokeAgentRuntimeCommand } from "@aws-sdk/client-bedrock-agentcore";

const agentRuntimeArn = process.env.AGENT_RUNTIME_ARN;
const agentRegion = process.env.AGENT_REGION || 'us-east-1';
const client = new BedrockAgentCoreClient({ region: agentRegion });

export const handler = withDurableExecution(async (event, context) =&amp;gt; {
  const tx = {
    id: event.id,
    amount: event.amount,
    location: event.location,
    vendor: event.vendor
  };

  // AI fraud assessment with error handling
  tx.score = await context.step("fraudCheck", async () =&amp;gt; {
    try {
      const payloadJson = JSON.stringify({ input: { amount: tx.amount } });
      const command = new InvokeAgentRuntimeCommand({
        agentRuntimeArn: agentRuntimeArn,
        qualifier: 'DEFAULT',
        payload: Buffer.from(payloadJson, 'utf-8'),
        contentType: 'application/json',
        accept: 'application/json'
      });
      const response = await client.send(command);
      const responseText = await response.response.transformToString();
      const result = JSON.parse(responseText);
      return result?.output?.risk_score ?? 5;  // Default to high-risk if score unavailable
    } catch (error) {
      context.logger.error("Fraud check failed", { error, txId: tx.id });
      return 5;
    }
  });

  // Route based on AI decision
  if (tx.score &amp;lt; 3) {
    // Best Practice: Idempotent authorization
    return await context.step(`authorize-${tx.id}`, async () =&amp;gt;
    authorizeTransaction(tx, { idempotency_key: `tx-${tx.id}` })
    );
  }

  if (tx.score &amp;gt;= 5) {
    return await context.step(`sendToFraudDepartment-${tx.id}`, async () =&amp;gt;
      sendToFraudDepartment(tx)
    );
  }

  // Medium risk: need human verification
  await context.step(`suspend-${tx.id}`, async () =&amp;gt; suspendTransaction(tx));

  // Best Practice: Concurrent operations with timeout configuration
  let verified;
  try {
    verified = await context.parallel("human-verification", [
      (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationEmail",
        async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'email', tx),
        { timeout: { days: 1 } }
      ),
      (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationSMS",
        async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'sms', tx),
        { timeout: { days: 1 } }
      )
    ], {
      maxConcurrency: 2,
      completionConfig: {
        minSuccessful: 1
      }
    });
  } catch (error) {
    const isTimeout = error.message?.includes("timeout");
    context.logger.warn(
      isTimeout ? "Customer verification timeout" : "Customer verification failed",
      { error, txId: tx.id }
    );
    return await context.step(`timeout-escalate-${tx.id}`, async () =&amp;gt;
      sendToFraudDepartment(tx, true)
    );
  }

  // Idempotent final step with idempotency key
  return await context.step(`finalize-${tx.id}`, async () =&amp;gt; {
    const action = !verified.hasFailure &amp;amp;&amp;amp; verified.successCount &amp;gt; 0
      ? "authorize"
      : "escalate";
    if (action === "authorize") {
      return authorizeTransaction(tx, true, { idempotency_key: `finalize-${tx.id}` });
    }
    return sendToFraudDepartment(tx, true);
  });
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice how the best practices work together:&amp;nbsp;&lt;code&gt;context.parallel()&lt;/code&gt;&amp;nbsp;sends SMS and email concurrently, resuming when either channel responds. Both callbacks configure 1-day timeouts with try/catch handling that escalates on timeout. The&amp;nbsp;&lt;code&gt;DurableExecutionName: tx-${transactionId}&lt;/code&gt;&amp;nbsp;parameter (specified at invocation time, shown in the following CLI example) provides execution-level deduplication, while idempotency keys in the authorization steps prevent duplicate charges at the application layer. Asynchronous invocation (&lt;code&gt;InvocationType: 'Event'&lt;/code&gt;) enables the 24-hour wait period.&lt;/p&gt; 
&lt;p&gt;Once deployed, invoke the function asynchronously with a sample transaction to see it in action:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;transactionId="123456789"
aws lambda invoke \
  --function-name "fraud-detection:$LATEST" \
  --invocation-type Event \
  --durable-execution-name "tx-${transactionId}" \
  --cli-binary-format raw-in-base64-out \
  --payload "{\"id\": \"${transactionId} \", \"amount\": 6500, \"location\": \"New York, NY\", \"vendor\": \"Amazon.com\"}" \
  --region us-east-2 \
  response.json&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Upon successful invocation, you can view the execution state in the Lambda console’s durable operations view. The execution shows a suspended state, waiting for customer response:&lt;/p&gt; 
&lt;div id="attachment_25859" style="width: 911px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25859" loading="lazy" class="size-full wp-image-25859" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/17/compute-2476-image-2.png" alt="Figure 2: Suspended execution state" width="901" height="495"&gt;
 &lt;p id="caption-attachment-25859" class="wp-caption-text"&gt;Figure 2: Suspended execution state&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Notice the &lt;code&gt;fraudCheck&lt;/code&gt; and &lt;code&gt;suspendTransaction&lt;/code&gt; steps show as succeeded with checkpointed results. The human-verification parallel operation shows that both SMS and email branches started. The timeline shows the function in a suspended state. Simulate a customer response by sending a callback success through the console, AWS Command Line Interface (AWS CLI) or Lambda API:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;div class="hide-language"&gt; 
  &lt;pre&gt;&lt;code class="lang-code"&gt;aws lambda send-durable-execution-callback-success \
	--callback-id &amp;lt;CALLBACK_ID_FROM_EMAIL_OR_SMS&amp;gt; \
	--result '{"status":"approved","channel":"email"}' \
	--cli-binary-format raw-in-base64-out&lt;/code&gt;&lt;/pre&gt; 
 &lt;/div&gt; 
&lt;/div&gt; 
&lt;div id="attachment_25860" style="width: 911px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25860" loading="lazy" class="size-full wp-image-25860" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/17/compute-2476-image-3.png" alt="Figure 3: Completed execution with customer approval" width="901" height="597"&gt;
 &lt;p id="caption-attachment-25860" class="wp-caption-text"&gt;Figure 3: Completed execution with customer approval&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;After receiving the customer’s approval, the durable execution resumes from its checkpoint, authorizes the transaction, and completes. The execution spanned hours but consumed only seconds of compute time.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;With durable functions, Lambda extends beyond single-event processing to power core business processes and long-running workflows, while retaining the operational simplicity, reliability, and scale that define Lambda. You can build applications that run for days or months, survive failures, and resume where they left off, all within the familiar event-driven programming model.&lt;/p&gt; 
&lt;p&gt;Deploy the fraud detection workflow from our&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-lambda-durable-functions/tree/main" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&amp;nbsp;and experiment with human-in-the-loop patterns in your own account. For core concepts, see&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;Introduction to AWS Lambda Durable Functions&lt;/a&gt;. For comprehensive documentation, see the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Developer Guide&lt;/a&gt;. Browse&amp;nbsp;&lt;a href="https://serverlessland.com/search?search=Durable+function" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;&amp;nbsp;for reference architectures and discover where durable execution fits in your designs.&lt;/p&gt; 
&lt;p&gt;Share your feedback, questions, and use cases in the SDK repositories or on&amp;nbsp;&lt;a href="https://repost.aws/" target="_blank" rel="noopener noreferrer"&gt;re:Post&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Testing Step Functions workflows: a guide to the enhanced TestState API</title>
		<link>https://aws.amazon.com/blogs/compute/testing-step-functions-workflows-a-guide-to-the-enhanced-teststate-api/</link>
					
		
		<dc:creator><![CDATA[D Surya Sai]]></dc:creator>
		<pubDate>Sun, 22 Mar 2026 17:06:38 +0000</pubDate>
				<category><![CDATA[AWS Step Functions]]></category>
		<category><![CDATA[Compute]]></category>
		<guid isPermaLink="false">2757f33197f633fca8298a2313f813daf0bb5967</guid>

					<description>AWS Step Functions recently announced new enhancements to local testing capabilities for Step Functions, introducing API-based testing that developers can use to validate workflows before deploying to AWS. As detailed in our Announcement blog post, the TestState API transforms Step Functions development by enabling individual state testing in isolation or as complete workflows. This supports […]</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://aws.amazon.com/step-functions/" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt; recently announced new enhancements to local testing capabilities for Step Functions, introducing API-based testing that developers can use to validate workflows before deploying to AWS. As detailed in our Announcement &lt;a href="https://aws.amazon.com/blogs/aws/accelerate-workflow-development-with-enhanced-local-testing-in-aws-step-functions/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;, the TestState API transforms Step Functions development by enabling individual state testing in isolation or as complete workflows. This supports mocked responses and actual AWS service integrations, and provides advanced capabilities. These capabilities include Map/Parallel states, error simulation with retry mechanisms, context object validation, and detailed inspection metadata for comprehensive local testing of your serverless application.&lt;/p&gt; 
&lt;p&gt;The TestState API can be accessed through multiple interfaces such as &lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface&lt;/a&gt; (AWS CLI), &lt;a href="https://aws.amazon.com/what-is/sdk/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK&lt;/a&gt;, &lt;a href="https://www.localstack.cloud/" target="_blank" rel="noopener noreferrer"&gt;LocalStack&lt;/a&gt;. By default, TestState API in AWS CLI and SDK runs against the remote &lt;a href="https://docs.aws.amazon.com/general/latest/gr/step-functions.html#step-functions_region" target="_blank" rel="noopener noreferrer"&gt;AWS endpoint&lt;/a&gt;, providing validation against the actual Step Functions service infrastructure. We’ve partnered with LocalStack to offer an additional testing endpoint for the TestState API. Developers can use LocalStack for unit testing their workflows by changing the &lt;a href="https://aws.amazon.com/what-is/sdk/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK&lt;/a&gt; client endpoint configuration to point to LocalStack: &lt;code&gt;&lt;em&gt;http://localhost.localstack.cloud:4566/&lt;/em&gt;&lt;/code&gt; instead of &lt;a href="https://docs.aws.amazon.com/general/latest/gr/step-functions.html#step-functions_region" target="_blank" rel="noopener noreferrer"&gt;AWS endpoint&lt;/a&gt;. This approach provides complete network isolation when needed. For a streamlined development experience, you can also use the &lt;a href="https://docs.localstack.cloud/aws/tooling/vscode-extension/" target="_blank" rel="noopener noreferrer"&gt;LocalStack VSCode extension&lt;/a&gt; to automatically configure your environment to point to the LocalStack endpoint. This approach is detailed in the AWS &lt;a href="https://aws.amazon.com/blogs/compute/enhance-the-local-testing-experience-for-serverless-applications-with-localstack/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;This blog post demonstrates building test suites to unit test your Step Functions workflows using the AWS SDK for Python using the &lt;a href="https://docs.pytest.org/en/stable/" target="_blank" rel="noopener noreferrer"&gt;pytest framework&lt;/a&gt;. The complete implementation is available in the &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Building test cases using the TestState API&lt;/h2&gt; 
&lt;p&gt;This example workflow implements a real-world ecommerce order processing system using &lt;a href="https://jsonata.org/" target="_blank" rel="noopener noreferrer"&gt;JSONata&lt;/a&gt; for advanced data transformations. It incorporates complex Step Functions patterns including distributed Map states, Parallel execution, and waitForTaskToken callback mechanisms. The process validates orders through &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; functions, distributes order item processing with configurable failure tolerance, runs parallel payment and inventory updates, handles human approval workflows using task tokens, then persists orders in Amazon DynamoDB with notification delivery. This workflow demonstrates advanced error handling with multiple Catchers and Retriers, exponential backoff for Lambda throttling and DynamoDB limits, and sophisticated state transitions that were previously challenging to test locally. This makes it the recommended choice for demonstrating the use of enhanced TestState API’s local testing features.&lt;/p&gt; 
&lt;p&gt;The complete workflow is available in the &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;, where you can examine the full state machine definition and see how JSONata expressions handle data transformation throughout the execution flow.&lt;/p&gt; 
&lt;div id="attachment_25870" style="width: 872px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25870" loading="lazy" class="size-full wp-image-25870" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/18/compute-2435-img.png" alt="Figure 1: State machine workflow that demonstrates a real-world ecommerce order processing system." width="862" height="1292"&gt;
 &lt;p id="caption-attachment-25870" class="wp-caption-text"&gt;Figure 1: State machine workflow that demonstrates a real-world ecommerce order processing system.&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Effective Step Functions testing requires a systematic approach to TestState API integration that provides state validation, error simulation, and assertion capabilities. The testing framework is built using Python’s pytest framework, using &lt;a href="https://docs.pytest.org/en/stable/explanation/fixtures.html" target="_blank" rel="noopener noreferrer"&gt;fixtures&lt;/a&gt; to automatically provide pre-configured runner instances that handle TestState API client initialization and state machine definition loading. This eliminates repetitive setup code and provides consistent test environments. The enhanced TestState API supports both mock integrations and actual integrations with AWS services, providing flexibility in testing strategies. For this demonstration, you use mock integrations to showcase how a complete local testing can be achieved without having any resources deployed to AWS accounts.&lt;/p&gt; 
&lt;p&gt;This framework is built for demonstration purposes, and you can similarly build your own testing frameworks using other programming languages like &lt;a href="https://www.java.com/en/" target="_blank" rel="noopener noreferrer"&gt;Java&lt;/a&gt;, &lt;a href="https://nodejs.org/en" target="_blank" rel="noopener noreferrer"&gt;Node.js&lt;/a&gt;. The testing framework uses method chaining patterns to create readable test cases with comprehensive assertion methods, automatic output chaining between state executions, and error simulation for testing retry mechanisms, backoff intervals, and catch blocks across AWS service error conditions.&lt;/p&gt; 
&lt;p&gt;The following test implementations demonstrate the testing capabilities that are achievable with the enhanced TestState API in local development environments. The test cases are run against the preceding Statemachine.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 1: Lambda throttling and retry mechanism testing&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Service integrations with Statemachines like AWS Lambda, Amazon &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;DynamoDB&lt;/a&gt; may face throttling depending on their usage. A key capability of the enhanced TestState API is its ability to simulate retry mechanisms with control over retry counts and backoff intervals. This test demonstrates the enhanced TestState API’s retry testing capabilities through the &lt;code&gt;stateConfiguration.retrierRetryCount&lt;/code&gt;&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#StepFunctions-TestState-request-stateConfiguration" target="_blank" rel="noopener noreferrer"&gt;parameter&lt;/a&gt; and &lt;code&gt;inspectionData.errorDetails&lt;/code&gt; &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_InspectionErrorDetails.html" target="_blank" rel="noopener noreferrer"&gt;response fields&lt;/a&gt;. This response field provides &lt;code&gt;retryBackoffIntervalSeconds&lt;/code&gt; for validating exponential backoff calculations, &lt;code&gt;retryIndex&lt;/code&gt; for tracking retry attempt sequences, and &lt;code&gt;catchIndex&lt;/code&gt; for identifying which error handler processed the exception. These enhanced inspection capabilities enable validation of retry logic, &lt;a href="https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/" target="_blank" rel="noopener noreferrer"&gt;backoff strategies&lt;/a&gt;, and error propagation patterns across complex state machine workflows.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_lambda_throttling_retry_mechanism(self, runner):
"""Test retry mechanism for Lambda.TooManyRequestsException"""
throttling_error = {
"Error": "Lambda.TooManyRequestsException",
"Cause": "Request rate exceeded"
}

# Test first retry attempt
(runner
.with_input({"orderId": "order-retry-test"})
.with_mock_error(throttling_error)
.with_retrier_retry_count(0)
.execute("ValidateOrder")
.assert_retriable()
.assert_error("Lambda.TooManyRequestsException"))

# Verify exponential backoff calculation
response = runner.get_response()
error_details = response['inspectionData']['errorDetails']
assert error_details['retryBackoffIntervalSeconds'] == 2

# Test retry exhaustion
(runner
.with_retrier_retry_count(3)
.execute("ValidateOrder")
.assert_caught_error()
.assert_next_state("ValidationFailed"))&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 2: Map state testing with tolerance thresholds&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/state-map.html" target="_blank" rel="noopener noreferrer"&gt;Distributed Map states&lt;/a&gt; present unique testing challenges due to their parallel processing nature and failure tolerance capabilities. The enhanced TestState API provides specialized configuration options for testing these complex scenarios.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_map_state_tolerated_failure_threshold(self, runner):
"""Test Map state with tolerated failure threshold"""
test_input = {
"orderId": "order-map-test",
"orderItems": [
{"itemId": "item-1"}, {"itemId": "item-2"}, 
{"itemId": "item-3"}, {"itemId": "item-4"}
]
}

# Test normal Map state execution
map_success_result = [
{"itemId": "item-1", "processed": True},
{"itemId": "item-2", "processed": True}
]

(runner
.with_input(test_input)
.with_mock_result(map_success_result)
.execute("ProcessOrderItems")
.assert_succeeded()
.assert_next_state("ParallelProcessing"))

# Test tolerance threshold exceeded scenario
tolerance_error = {
"Error": "States.ExceedToleratedFailureThreshold",
"Cause": "Map state exceeded tolerated failure threshold"
}

(runner
.with_input(test_input)
.with_mock_error(tolerance_error)
.execute("ProcessOrderItems")
.assert_caught_error()
.assert_next_state("ValidationFailed"))&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This test demonstrates the enhanced TestState API’s Map state testing capabilities through the &lt;code&gt;stateConfiguration.mapIterationFailureCount&lt;/code&gt; &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#StepFunctions-TestState-request-stateConfiguration" target="_blank" rel="noopener noreferrer"&gt;parameter&lt;/a&gt; for simulating iteration failures. The API provides comprehensive &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#API_TestState_ResponseSyntax" target="_blank" rel="noopener noreferrer"&gt;inspection data&lt;/a&gt; including &lt;code&gt;inspectionData.afterItemSelector&lt;/code&gt; for validating &lt;code&gt;ItemSelector&lt;/code&gt; transformations, &lt;code&gt;inspectionData.afterItemBatcher&lt;/code&gt; for batch processing validation, &lt;code&gt;inspectionData.toleratedFailureCount&lt;/code&gt; and &lt;code&gt;inspectionData.toleratedFailurePercentage&lt;/code&gt; for threshold verification. When the specified failure count exceeds the configured tolerance, the API correctly returns &lt;code&gt;States.ExceedToleratedFailureThreshold&lt;/code&gt;, enabling testing of Map state resilience patterns.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 3: WaitForCallback pattern testing&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;The &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-wait-token" target="_blank" rel="noopener noreferrer"&gt;waitForCallback&lt;/a&gt; integration requires context object construction to simulate realistic execution environments, particularly for human approval workflows.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_context_object_usage_in_jsonata_expressions(self, runner):
"""Test Context object usage in waitForTaskToken scenarios"""
test_input = {
"orderId": "order-context-test",
"amount": 125.0
}

context_data = {
"Task": {"Token": "ahbdgftgehbdcndsjnwjkhas327yr4hendc73yehdb723y"},
"Execution": {
"Id": "arn:aws:states:us-east-1:123456789012:execution:test:exec-123"
},
"State": {
"Name": "WaitForApproval",
"EnteredTime": "2025-01-15T10:45:00Z"
}
}

mock_result = {
"approved": True,
"taskToken": "ahbdgftgehbdcndsjnwjkhas327yr4hendc73yehdb723y"
}

(runner
.with_input(test_input)
.with_context(context_data)
.with_mock_result(mock_result)
.execute("WaitForApproval")
.assert_succeeded()
.assert_next_state("CheckApproval"))

# Verify JSONata expressions processed context correctly
response = runner.get_response()
after_args = json.loads(response['inspectionData']['afterArguments'])
assert after_args['Payload']['taskToken'] == context_data['Task']['Token']&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This test demonstrates the enhanced TestState API’s support for &lt;code&gt;waitForCallback&lt;/code&gt; integrations through the `context` parameter for realistic Context object simulation. The API enables comprehensive testing of JSONata expressions that reference &lt;code&gt;$states.context.Task.Token&lt;/code&gt;, &lt;code&gt;$states.context.Execution.Id&lt;/code&gt;, and other context fields. The &lt;code&gt;inspectionData.afterArguments&lt;/code&gt; &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#API_TestState_ResponseSyntax" target="_blank" rel="noopener noreferrer"&gt;response field&lt;/a&gt; validates that JSONata expressions correctly processed the context data, while the API automatically handles the complexity of task token embedding in service integration payloads for &lt;code&gt;waitForCallback&lt;/code&gt; testing scenarios.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 4: Happy path testing – complete workflow validation&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Happy path testing validates that workflows execute correctly under normal operating conditions. The enhanced TestState API allows you to chain state executions together, automatically passing outputs between states to simulate a complete workflow execution.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_complete_order_processing_workflow(self, runner):
"""Integration test: Complete happy path workflow using method chaining"""
test_input = {
"orderId": "order-12345",
"amount": 150.75,
"customerEmail": "customer@example.com",
"orderItems": [
{"itemId": "item-1", "quantity": 2, "price": 50.25}
]
}

# Test ValidateOrder state
(runner
.with_input(test_input)
.with_mock_result({"statusCode": 200, "isValid": True})
.execute("ValidateOrder")
.assert_succeeded()
.assert_next_state("CheckValidation"))

# Test CheckValidation choice state (no mock needed)
validation_output = runner.get_output()
(runner
.with_input(validation_output)
.clear_mocks()
.execute("CheckValidation")
.assert_succeeded()
.assert_next_state("ProcessOrderItems"))&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This test demonstrates how the TestState API maintains state context between executions, enabling realistic workflow simulation. The &lt;code&gt;get_output()&lt;/code&gt; method retrieves the processed output from one state to use as input for the next, mimicking actual Step Functions execution behavior.&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note&lt;/strong&gt;: The code snippet above shows only the first two states of the complete workflow test for brevity. The full test code with all states (&lt;code&gt;ProcessOrderItems&lt;/code&gt;, &lt;code&gt;ParallelProcessing&lt;/code&gt;, &lt;code&gt;WaitForApproval&lt;/code&gt;, &lt;code&gt;CheckApproval&lt;/code&gt;, &lt;code&gt;SaveOrderDetails&lt;/code&gt;, and &lt;code&gt;SendNotification&lt;/code&gt;) can be viewed in the complete &lt;/em&gt;&lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;&lt;em&gt;GitHub repository&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, demonstrating end-to-end workflow validation using the same method chaining pattern.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Integration with modern CI/CD pipelines&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;In this section, we will explore how to integrate the previous unit tests in a CI CD pipeline to enable local testing.&lt;/p&gt; 
&lt;p&gt;The sample &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;repository&lt;/a&gt; includes a GitHub Actions workflow that demonstrates how TestState API testing integrates into continuous integration and continuous delivery (CI/CD) pipelines. The workflow (&lt;code&gt;.github/workflows/test-and-deploy.yml&lt;/code&gt;) provides a two-step process that validates before any AWS resources are deployed using &lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;AWS Serverless Application Model&lt;/a&gt; (AWS SAM).&lt;/p&gt; 
&lt;p&gt;The CI/CD pipeline follows the following pattern:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Unit Tests&lt;/strong&gt;: Executes the complete TestState API test suite using &lt;code&gt;pytest tests/unit_test.py -v&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;SAM Deploy&lt;/strong&gt;: Deploys AWS resources using &lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-cli-command-reference-sam-build.html" target="_blank" rel="noopener noreferrer"&gt;sam build&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-cli-command-reference-sam-deploy.html" target="_blank" rel="noopener noreferrer"&gt;sam deploy&lt;/a&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;To enable the GitHub Actions workflow to deploy resources to your AWS account, configure these AWS credentials in your GitHub repository settings. For detailed setup instructions, see the AWS &lt;a href="https://aws.amazon.com/blogs/compute/using-github-actions-to-deploy-serverless-applications/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Following are the required secrets to be configured in GitHub repository settings:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;AWS_REGION&lt;/code&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;In production environments, you can typically extend this basic pipeline to include additional stages. The enhanced pipeline often begins with deploying to a development account first, followed by integration testing against deployed resources. The final stage involves moving to production with proper approval gates and security scanning compliance checks.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The enhanced TestState API enables testing Step Functions workflows locally without requiring AWS deployments that accelerated development cycles, and reduce testing times. This post demonstrates how to implement testing for state types including Map states with tolerance thresholds, retry mechanisms with exponential backoff, and &lt;code&gt;waitForTaskToken&lt;/code&gt; patterns with context object simulation using mock integrations for isolated testing.&lt;/p&gt; 
&lt;p&gt;By integrating TestState API testing into CI/CD pipelines, you can validate workflow logic before deployment, reducing the risk of production issues. The GitHub Actions workflow example demonstrates an implementation that runs tests and deploys resources in a controlled sequence. The complete code examples and testing framework are available in the &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; to implement similar testing practices for Step Functions workflows.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Enabling high availability of Amazon EC2 instances on AWS Outposts servers (Part 3)</title>
		<link>https://aws.amazon.com/blogs/compute/enabling-high-availability-of-amazon-ec2-instances-on-aws-outposts-servers-part-3/</link>
					
		
		<dc:creator><![CDATA[Brianna Rosentrater]]></dc:creator>
		<pubDate>Fri, 06 Mar 2026 23:11:22 +0000</pubDate>
				<category><![CDATA[Amazon CloudWatch]]></category>
		<category><![CDATA[Amazon Simple Notification Service (SNS)]]></category>
		<category><![CDATA[AWS CloudFormation]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Outposts]]></category>
		<category><![CDATA[AWS Outposts servers]]></category>
		<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">704bf252a8b038a74199bfc881ff1b43524c00b1</guid>

					<description>This post is part 3 of the three-part series ‘Enabling high availability of Amazon EC2 instances on&amp;nbsp;AWS Outposts&amp;nbsp;servers’. We provide you with code samples and considerations for implementing custom logic to automate&amp;nbsp;Amazon Elastic Compute Cloud (EC2) relaunch on Outposts servers. This post focuses on guidance for using Outposts servers with third party storage for boot […]</description>
										<content:encoded>&lt;p&gt;This post is part 3 of the three-part series ‘Enabling high availability of Amazon EC2 instances on&amp;nbsp;&lt;a href="https://aws.amazon.com/outposts/servers/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts&lt;/a&gt;&amp;nbsp;servers’. We provide you with code samples and considerations for implementing custom logic to automate&amp;nbsp;&lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (EC2&lt;/a&gt;) relaunch on Outposts servers. This post focuses on guidance for using Outposts servers with third party storage for boot and data volumes, whereas &lt;a href="https://aws.amazon.com/blogs/compute/enabling-high-availability-of-amazon-ec2-instances-on-aws-outposts-servers-part-1/" target="_blank" rel="noopener noreferrer"&gt;part 1&lt;/a&gt; and&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/compute/enabling-high-availability-of-amazon-ec2-instances-on-aws-outposts-servers-part-2/" target="_blank" rel="noopener noreferrer"&gt;part 2&lt;/a&gt; focus on automating EC2 relaunch between standalone servers. Outposts servers support integration with&amp;nbsp;&lt;a href="https://www.dell.com/en-us/shop/storage-servers-and-networking-for-business/sf/power-store"&gt;Dell PowerStore&lt;/a&gt;,&amp;nbsp;&lt;a href="https://www.hpe.com/us/en/storage/alletra.html"&gt;HPE Alletra Storage MP B10000&amp;nbsp;systems&lt;/a&gt;, &lt;a href="https://www.netapp.com/data-management/ontap-data-management-software/"&gt;NetApp on-premises enterprise storage arrays&lt;/a&gt;, and &lt;a href="https://www.purestorage.com/products/nvme/flasharray-x.html"&gt;Pure Storage FlashArray&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Outposts servers provide compute and networking services that are designed for low-latency, local data processing needs for on-premises locations such as retail stores, branch offices, healthcare provider locations, or environments that are space-constrained. Outposts servers use &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html"&gt;EC2 instance store storage&lt;/a&gt; to provide non-durable block-level storage to the instances running stateless workloads. For applications that require persistent storage, you can create a three-tier architecture by connecting your Outposts servers to a third-party storage appliance. In this post, you will learn how to implement custom logic to provide high availability (HA) for your applications running on Outposts servers using two or more servers for N+1 fault tolerance. The code provided is meant to help you get started, and can be modified further for your unique workload needs.&lt;/p&gt; 
&lt;h2&gt;Overview&lt;/h2&gt; 
&lt;p&gt;In the following sections we will show how custom logic can be used to automate EC2 instance relaunch between two or more Outposts servers using boot and data volumes on third party storage. If your EC2 instance fails while using this solution, an &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; alarm monitoring the EC2 StatusCheckFailed_Instance metric of your source EC2 instance will be triggered, and you will receive an &lt;a href="https://aws.amazon.com/pm/sns/?trk=a074e8bd-fe9a-4ee3-ad49-f731a39ed149&amp;amp;sc_channel=ps&amp;amp;ef_id=Cj0KCQjw0NPGBhCDARIsAGAzpp09acksHrmkVGsgrQOD0PemL3_g9NKKPFW-WSwnyrwz3JofgE8cE-gaAquyEALw_wcB:G:s&amp;amp;s_kwcid=AL!4422!3!658520967038!!!g!!!19852662602!149878732060&amp;amp;gad_campaignid=19852662602&amp;amp;gbraid=0AAAAADjHtp9ku4mrGWr4lYItA40Hw968W&amp;amp;gclid=Cj0KCQjw0NPGBhCDARIsAGAzpp09acksHrmkVGsgrQOD0PemL3_g9NKKPFW-WSwnyrwz3JofgE8cE-gaAquyEALw_wcB" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Notification Service&lt;/a&gt; (Amazon SNS) notification. An &lt;a href="https://aws.amazon.com/pm/lambda/?trk=a968e0d4-b96f-4cef-9ed9-be59b3588c76&amp;amp;sc_channel=ps&amp;amp;ef_id=Cj0KCQjw0NPGBhCDARIsAGAzpp0GWTfgKmF6tf6S4dDuyzy-xKlzC-ovRXnP2NkmRMM5JtWj8a87UuQaAgGvEALw_wcB:G:s&amp;amp;s_kwcid=AL!4422!3!652240143523!e!!g!!amazon%20lambda!19878797032!147151597893&amp;amp;gad_campaignid=19878797032&amp;amp;gbraid=0AAAAADjHtp87KK8zRjKPBySDn4-2cQ836&amp;amp;gclid=Cj0KCQjw0NPGBhCDARIsAGAzpp0GWTfgKmF6tf6S4dDuyzy-xKlzC-ovRXnP2NkmRMM5JtWj8a87UuQaAgGvEALw_wcB" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; function will then relaunch your EC2 instance onto the destination Outposts server that you’ve set up for resiliency. This is done using a launch template created during setup, and the script will connect your relaunched instance to the existing boot and data volumes on your third party storage appliance. This storage device provides shared storage for your Outposts servers. If a single server fails, new instances can connect to existing volumes on the array. This allows for a zero data loss &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/recovery-objectives.html" target="_blank" rel="noopener noreferrer"&gt;Recovery Point Objective (RPO)&lt;/a&gt; and a &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/recovery-objectives.html" target="_blank" rel="noopener noreferrer"&gt;Recovery Time Objective (RTO)&lt;/a&gt; equaling the time it takes to launch your EC2 instance. Take advantage of the features on your storage appliance for configuring data durability and resiliency to hardware failures, and make sure that you are regularly backing up your SAN volumes.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-1-2.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25778 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-1-2.png" alt="" width="1124" height="604"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;span style="font-size: 16px"&gt;Figure 1 – Solution Architecture for automated EC2 Relaunch&lt;/span&gt;&lt;/p&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;The following prerequisites are required to complete the walkthrough:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Two Outposts servers that can be set up as an&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/" target="_blank" rel="noopener noreferrer"&gt;active-active or active-passive&lt;/a&gt; resilient pair.&lt;/li&gt; 
 &lt;li&gt;For workloads with a low threshold for downtime, ensure that your secondary Outpost server that’s used for recovery has a unique service link connection.&lt;/li&gt; 
 &lt;li&gt;Outposts servers must be colocated within the same Layer 2 (L2) network.&lt;/li&gt; 
 &lt;li&gt;Network latency between the Outposts servers must not exceed 5ms round trip time (RTT).&lt;/li&gt; 
 &lt;li&gt;A storage appliance that supports the iSCSI protocol. Credentials to manage the storage appliance initiator/target mappings. &lt;a href="https://aws.amazon.com/blogs/compute/new-simplifying-the-use-of-third-party-block-storage-with-aws-outposts/" target="_blank" rel="noopener noreferrer"&gt;See Simplifying the use of third-party block storage with AWS Outposts&lt;/a&gt; for more information.&lt;/li&gt; 
 &lt;li&gt;If you’re setting this up from an&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/sharing-outposts.html" target="_blank" rel="noopener noreferrer"&gt;Outposts consumer account&lt;/a&gt;, you must configure &lt;a href="https://aws.amazon.com/blogs/mt/monitoring-best-practices-for-aws-outposts/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch cross-account observability&lt;/a&gt;&amp;nbsp;between the consumer account and the Outposts owning account to view Outposts metrics in your consumer account.&lt;/li&gt; 
 &lt;li&gt;Create &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html"&gt;launch templates&lt;/a&gt; for the EC2 instances that you want to protect, the launch wizard will help you create these.&lt;/li&gt; 
 &lt;li&gt;Credentials with permissions for &lt;a href="https://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;rct=j&amp;amp;opi=89978449&amp;amp;url=https://aws.amazon.com/cloudformation/&amp;amp;ved=2ahUKEwjZmOfljKGQAxWIFFkFHXEGFS4QFnoECB0QAQ&amp;amp;usg=AOvVaw2O20tPzwYsGu9e_oSCbvzG" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt;, &lt;a href="https://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;rct=j&amp;amp;opi=89978449&amp;amp;url=https://aws.amazon.com/ec2/&amp;amp;ved=2ahUKEwjq3rnRjKGQAxW6L1kFHbu9NZgQFnoECBkQAQ&amp;amp;usg=AOvVaw3MI5OycyIjdz9NSdetTohX" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt;, and (optional) &lt;a href="https://aws.amazon.com/secrets-manager/" target="_blank" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt; if authentication is required. IAM Permission Examples.md is provided in the repository.&lt;/li&gt; 
 &lt;li&gt;A Windows or Linux host that can access the storage appliance and your AWS account (management computer).&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://us-east-1.console.aws.amazon.com/marketplace/search/listing/prodview-ytzcqvandumqm" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts iPXE Amazon Machine Image&lt;/a&gt; (AMI) from the &lt;a href="https://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;rct=j&amp;amp;opi=89978449&amp;amp;url=https://aws.amazon.com/marketplace&amp;amp;ved=2ahUKEwig5aGHmaGQAxVQwskDHUdYHS4QFnoECBIQAQ&amp;amp;usg=AOvVaw2kR1wc3JVnglAce4z8i-IH" target="_blank" rel="noopener noreferrer"&gt;AWS Marketplace&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://www.python.org/" target="_blank" rel="noopener noreferrer"&gt;Python&lt;/a&gt;&amp;nbsp;3.8 or later (recommended) is used to run the&amp;nbsp;init.py&amp;nbsp;script that dynamically creates a&amp;nbsp;CloudFormation&amp;nbsp;stack in the account specified as an input parameter.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/sdk-for-python/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK for Python (Boto3)&lt;/a&gt; version 1.26.0 or later recommended.&lt;/li&gt; 
 &lt;li&gt;Operating system with iSCSI boot support (Windows Server 2022 and Red Hat Enterprise Linux 9 AMIs are provided).&lt;/li&gt; 
 &lt;li&gt;Internet access to AWS service endpoints for the private subnet hosting the recovery Lambda function.&lt;/li&gt; 
 &lt;li&gt;Download the repository &lt;a href="https://github.com/amznganske/ec2-outposts-autorestart_3Pstorage" target="_blank" rel="noopener"&gt;ec2-outposts-autorestart_3Pstorage&lt;/a&gt;.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Walkthrough&lt;/h2&gt; 
&lt;p&gt;The first step is to deploy an EC2 instance configured to boot from a volume on the third-party storage that is prepared with an OS boot image. This step uses the launch wizard portion of the solution.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Download and extract the OutpostServer_Recovery_3Pstorage repository to the management computer that has the AWS SDK for Python (Boto3) and Python installed.&lt;/li&gt; 
 &lt;li&gt;Run launch_wizard from the sample-outposts-third-party-storage-integration directory. You can run interactively or provide arguments for region, subnet, iPXE AMI, storage vendor, storage management ip, and credentials.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-2.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25766 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-2.png" alt="" width="1428" height="740"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 2 – Running launch wizard&lt;/p&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;When prompted for a feature name, enter sanboot.&lt;/li&gt; 
 &lt;li&gt;For Guest OS type, enter in Linux or Windows.&lt;/li&gt; 
 &lt;li&gt;When prompted “Do you want to continue with this unverified AMI?”, select Y.&lt;/li&gt; 
 &lt;li&gt;The launch wizard will provide a list of instance types available on the Outpost server associated with the subnet you specified. Enter the instance type that you want to use.&lt;/li&gt; 
 &lt;li&gt;The launch wizard will now prompt you for optional EC2 Key Pair, Security Group, and Instance Profile settings for the EC2 instance that you are launching.&lt;/li&gt; 
 &lt;li&gt;Next, the launch wizard prompts you to specify an instance name. Note that specifying an instance name is required to set up automated instance recovery because the instance name is used as part of the recovery process.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-3.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25767 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-3.png" alt="" width="1432" height="565"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 3 – Taking user input for variable values&lt;/p&gt; 
&lt;ol start="9"&gt; 
 &lt;li&gt;The launch wizard prompts for root volume size. This is the root volume that the iPXE AMI boots from. The default is a 1GB volume on the Outpost server instance storage.&lt;/li&gt; 
 &lt;li&gt;Next, the launch wizard prompts you to select which third party storage controller you want to use based on the management ip that you specified. In this example, we are using NetApp, so I select a NetApp Storage Virtual Machine (SVM) named outpost_iscsi.&lt;/li&gt; 
 &lt;li&gt;If the connection to the storage array is successful and the protocol is available (iSCSI or NVMe over TCP) you are provided additional storage options for initiator group and logical unit number (LUN).&lt;/li&gt; 
 &lt;li&gt;In this example, we are using NetApp with iSCSI, so I can select an existing initiator group or create a new one.&lt;/li&gt; 
 &lt;li&gt;You can specify an existing initiator qualified name (IQN), or the launch wizard can generate a new one. &lt;strong&gt;IMPORTANT:&lt;/strong&gt; Make sure that IQNs are unique to each instance because duplicates can cause data corruption.&lt;/li&gt; 
 &lt;li&gt;Next the launch wizard prompts which LUN’s you want to connect to this instance. For this example, I am going to use a Windows Server 2022 boot volume that I already created on the NetApp storage array.&lt;/li&gt; 
 &lt;li&gt;You are now asked which storage array target interface you want to use for connecting to these LUNs.&lt;/li&gt; 
 &lt;li&gt;The launch wizard provides the capability to specify guest OS scripts to customize the OS after sanboot. Combining this capability with storage array cloning provides a streamlined process for deploying new instances.&lt;/li&gt; 
 &lt;li&gt;The launch wizard now displays the EC2 user data template that it generated for use with the iPXE AMI and asks if you want to proceed with launching the instance.&lt;/li&gt; 
 &lt;li&gt;After the EC2 instance is launched, select yes to proceed with automated instance recovery setup.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-4.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25768 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-4.png" alt="" width="1474" height="96"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 4 – Running launch template creation script&lt;/p&gt; 
&lt;h3&gt;Generating EC2 launch templates for recovery and failback&lt;/h3&gt; 
&lt;p&gt;In the second step, we are generating EC2 launch templates for the EC2 instance launched in step 1. Launch templates can be generated for the primary and secondary Outpost servers. The launch template for the secondary Outpost server can be used for automated or manual recovery of the EC2 instance. Failback to the primary Outpost server is manual using the primary launch template.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Select the instance that you want automated recovery for and select the subnet that you launched the instance in. This subnet represents the primary Outpost server that the instance is running on.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-5.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25769 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-5.png" alt="" width="891" height="809"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 5 – Selecting subnets for EC2 instance relaunch&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;When prompted to create a second launch template for Outpost server recovery, select yes, and then select to use the same instance (for recovery on different Outpost server).&lt;/li&gt; 
 &lt;li&gt;When you get a list of available subnets, select the subnet that’s associated with your secondary Outpost server. This is the server that the EC2 instance will be launched on in the event of the EC2 StatusCheckFailed_Instance metric triggers the CloudWatch alarm.&lt;/li&gt; 
 &lt;li&gt;You will see both launch templates created successfully.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Deploying automated EC2 instance recovery&lt;/h3&gt; 
&lt;p&gt;The third step creates a CloudFormation template for monitoring, notifications, and automated recovery of the EC2 instance deployed in step 1. The CloudFormation template automatically captures the instance and secondary launch template information necessary for automatic recovery.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Select Y to set up automated recovery. This will create a CloudFormation stack.&lt;/li&gt; 
 &lt;li&gt;Provide a name and description for the CloudFormation stack.&lt;/li&gt; 
 &lt;li&gt;Select whether you want automated recovery or notification only. This provides flexibility to choose manual or automatic recovery based on whether you want to verify the primary Outpost server is down before initiating recovery.&lt;/li&gt; 
 &lt;li&gt;In the AWS CloudFormation console, monitor the CloudFormation stack creation process.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-6.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25770 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-6.png" alt="" width="1430" height="220"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 6 – CloudFormation stack creation in progress&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;After the CloudFormation Stack is complete, you have successfully deployed an EC2 instance using third party storage for boot and data volumes on a primary Outpost server. You also created instance recovery capabilities by using the Amazon Outpost server automated recovery solution for third party storage.&lt;/li&gt; 
 &lt;li&gt;You can verify whether the EC2 StatusCheckFailed_Instance is healthy under the Alarms section in the Amazon CloudWatch console.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Considerations&lt;/h2&gt; 
&lt;p&gt;The logic discussed in this post relies on the secondary destination Outposts server having a connected service link. For more information about how to create a highly available service link connection for your Outpost servers, see the &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/aws-outposts-high-availability-design/anchor-connectivity.html" target="_blank" rel="noopener noreferrer"&gt;Networking section&lt;/a&gt; of AWS Outposts High Availability Design and Architecture Considerations whitepaper.&lt;/p&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;Confirm whether it is safe to terminate the Amazon EC2 instance that you launched with this walkthrough. The operating system and data volumes are on the third party storage, so EC2 instance termination only removes the iPXE AMI from the Outposts server instance storage. To clean up, complete the following steps.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Terminate the Amazon EC2 instance. Then, verify that the Instance state is &lt;strong&gt;Terminated&lt;/strong&gt; to ensure that the instance is not using Outposts server resources.&lt;/li&gt; 
 &lt;li&gt;Delete the Amazon EC2 Launch Templates associated with the Amazon EC2 instance that you terminated. The names of the launch templates that were automatically generated will start with ‘lt-‘, followed by the instance name and the instance id. If you generated a recovery launch template, it will have a ‘-recovery’ suffix in the name.&lt;/li&gt; 
 &lt;li&gt;Delete the AWS CloudFormation Stack. The Stack name will start with ‘autorestart-‘ followed by the Amazon EC2 instance name.&lt;/li&gt; 
 &lt;li&gt;Clean up your initiators, initiator group, and LUNs on the third party storage array.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;With the use of custom logic through AWS tools such as CloudFormation,&amp;nbsp;CloudWatch, Amazon SNS, and&amp;nbsp;AWS Lambda, you can architect for HA for stateful workloads on Outposts server. By implementing the custom logic in this post, you can automatically relaunch EC2 instances running on a source Outposts server to a secondary destination Outposts server if an instance fails, and connect to existing volumes on a shared storage appliance for recovery. This also reduces the downtime of your applications in the event of a hardware or service link failure. The code provided in this post can be further expanded upon to meet the unique needs of your workload.&lt;/p&gt; 
&lt;p&gt;While the use of&amp;nbsp;&lt;a href="https://aws.amazon.com/what-is/iac/" target="_blank" rel="noopener noreferrer"&gt;infrastructure-as-code (IaC)&lt;/a&gt;&amp;nbsp;can improve your application’s availability and be used to standardize deployments across multiple Outposts servers, it’s crucial to do regular failure drills to test the custom logic in place. This is to make sure that you understand your application’s expected behavior on relaunch in the event of a failure. To learn more about Outposts servers, visit&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/outposts/latest/server-userguide/what-is-outposts.html" target="_blank" rel="noopener noreferrer"&gt;the Outposts servers User Guide&lt;/a&gt;. Reach out to your AWS account team, or fill out this&amp;nbsp;&lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;form&lt;/a&gt; to learn more about Outposts servers.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Optimizing Compute-Intensive Serverless Workloads with Multi-threaded Rust on AWS Lambda</title>
		<link>https://aws.amazon.com/blogs/compute/optimizing-compute-intensive-serverless-workloads-with-multi-threaded-rust-on-aws-lambda/</link>
					
		
		<dc:creator><![CDATA[Daniel Abib]]></dc:creator>
		<pubDate>Wed, 25 Feb 2026 12:49:44 +0000</pubDate>
				<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Serverless]]></category>
		<guid isPermaLink="false">aa533d430d7b0f6a9e003ec97815f3e0b4968101</guid>

					<description>Customers use 
&lt;a href="https://aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt; to build Serverless applications for a wide variety of use cases, from simple API backends to complex data processing pipelines. Lambda's flexibility makes it an excellent choice for many workloads, and with support for up to 10,240 MB of memory, you can now tackle compute-intensive tasks that were previously challenging in a Serverless environment. When you configure a Lambda function's memory size, you allocate RAM and Lambda automatically provides proportional CPU power. When you configure 10,240 MB, your Lambda function has access to up to 6 vCPUs.</description>
										<content:encoded>&lt;p&gt;Customers use &lt;a href="https://aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt; to build Serverless applications for a wide variety of use cases, from simple API backends to complex data processing pipelines. Lambda’s flexibility makes it an excellent choice for many workloads, and with support for up to 10,240 MB of memory, you can now tackle compute-intensive tasks that were previously challenging in a Serverless environment. When you configure a Lambda function’s memory size, you allocate RAM and Lambda automatically provides proportional CPU power. When you configure 10,240 MB, your Lambda function has access to up to 6 vCPUs.&lt;/p&gt; 
&lt;p&gt;However, there’s an important consideration that many developers discover: &lt;strong&gt;simply allocating more memory may not automatically make your function faster.&lt;/strong&gt; If your code runs sequentially, it will only use one vCPU regardless of how many are available. The remaining vCPUs sit idle while you’re still paying for the full memory allocation.&lt;/p&gt; 
&lt;p&gt;To help benefit from Lambda’s multi-core capabilities, your code should explicitly implement concurrent processing through multi-threading or parallel execution. Without this, you’re paying for compute power you’re not using.&lt;/p&gt; 
&lt;p&gt;Rust provides excellent support for this pattern. The &lt;a href="https://github.com/aws/aws-lambda-rust-runtime"&gt;AWS Lambda Rust Runtime&lt;/a&gt; provides developers with a language that combines exceptional performance with built-in concurrency primitives. In this post, we show you how to implement multi-threading in Rust to achieve 4-6x performance improvements for CPU-intensive workloads.&lt;/p&gt; 
&lt;h2&gt;Our Test Workload: Why Bcrypt Password Hashing?&lt;/h2&gt; 
&lt;p&gt;For this analysis, we use &lt;strong&gt;bcrypt password hashing&lt;/strong&gt; as our CPU-intensive workload to evaluate multi-core scaling behavior. This choice is deliberate for several reasons:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Real-world relevance&lt;/strong&gt;: Bcrypt is commonly used in authentication systems, making our benchmarks practically relevant rather than synthetic.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Predictable CPU work&lt;/strong&gt;: Bcrypt with cost factor 10 provides approximately 100ms of pure CPU work per operation on typical hardware, creating a consistent and measurable baseline.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Embarrassingly parallel&lt;/strong&gt;: Each hash operation is completely independent, making it an ideal candidate for parallel processing without shared state or lock contention.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;CPU-bound&lt;/strong&gt;: Bcrypt is deterministic and CPU-bound (not memory or I/O bound), isolating the performance characteristics we want to measure.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Throughout this post, we process batches of passwords and measure how multi-threading improves throughput as we scale from 1 to 6 vCPUs.&lt;/p&gt; 
&lt;h2&gt;Understanding Lambda’s vCPU Allocation&lt;/h2&gt; 
&lt;p&gt;AWS Lambda allocates CPU resources proportionally to the configured memory. According to &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html"&gt;AWS Lambda function memory documentation&lt;/a&gt;, at 1,769 MB a function has the equivalent of one vCPU.&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;a href="https://www.youtube.com/watch?v=aW5EtKHTMuQ&amp;amp;t=339s"&gt;&lt;strong&gt;vCPU Allocation by Memory&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt;&lt;/p&gt; 
&lt;table style="margin: 0px auto;height: 258px" width="335"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt; &lt;p style="text-align: center"&gt;Memory (MB)&lt;/p&gt; &lt;/td&gt; 
   &lt;td style="text-align: center"&gt;Approximate vCPUs&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;128 – 1,769&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~1&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;1,770 – 3,538&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~2&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;3,539 – 5,307&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~3&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;5,308 – 7,076&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~4&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;7,077 – 8,845&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~5&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;8,846 – 10,240&lt;/td&gt; 
   &lt;td&gt; &lt;p style="text-align: center"&gt;~6&lt;/p&gt; &lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The &lt;code&gt;num_cpus&lt;/code&gt; crate returns the number of logical CPUs visible to the Lambda environment, which may differ from the allocated vCPU share. At lower memory configurations, you may see 2 CPUs reported even though only 1 vCPU worth of compute time is allocated.&lt;/p&gt; 
&lt;h2&gt;Solution Overview&lt;/h2&gt; 
&lt;p&gt;The solution consists of a Rust Lambda function that:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Receives a request specifying the number of items to process&lt;/li&gt; 
 &lt;li&gt;Detects available vCPUs and configures a thread pool accordingly&lt;/li&gt; 
 &lt;li&gt;Processes items in parallel using the &lt;a href="https://github.com/rayon-rs/rayon"&gt;Rayon library&lt;/a&gt; (a data parallelism library that allows you to convert sequential iterators into parallel ones with a &lt;code&gt;.par_iter()&lt;/code&gt; call)&lt;/li&gt; 
 &lt;li&gt;Returns performance metrics including duration and throughput&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/24/Picture1-6.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25731 size-large" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/24/Picture1-6-683x1024.png" alt="" width="683" height="1024"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;em&gt;Architecture Diagram: Lambda receives request, initializes Rayon thread pool based on &lt;code&gt;WORKER_COUNT&lt;/code&gt; environment variable, processes bcrypt hashes in parallel across multiple vCPUs, and returns results.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Creating a Multi-threaded Rust Lambda Function&lt;/h2&gt; 
&lt;p&gt;Create a new Lambda project using Cargo Lambda:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;cargo lambda new rust-multithread-demo
cd rust-multithread-demo&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Dependencies&lt;/h3&gt; 
&lt;p&gt;Update &lt;code&gt;Cargo.toml&lt;/code&gt; with the necessary dependencies:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-toml"&gt;[package]
name = "rust-multithread-lambda"
version = "0.1.0"
edition = "2021"

[dependencies]
lambda_runtime = "1.0.0"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bcrypt = "0.15"
rayon = "1.7"
num_cpus = "1.16"

[profile.release]
opt-level = 3
lto = true
codegen-units = 1
strip = true&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The optimization flags in &lt;code&gt;[profile.release]&lt;/code&gt; reduce binary size and improve performance:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;opt-level = 3&lt;/code&gt;: Maximum optimization&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;lto = true&lt;/code&gt;: Link-time optimization for smaller binaries&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;strip = true&lt;/code&gt;: Remove debug symbols&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Implementing the Lambda Entry Point&lt;/h3&gt; 
&lt;p&gt;First, let’s look at how we initialize the thread pool during cold start:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;src/main.rs&lt;/strong&gt;:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-rust"&gt;use lambda_runtime::{run, service_fn, Error, LambdaEvent};
mod handler;
use handler::{function_handler, get_worker_count, init_thread_pool, ProcessRequest};

#[tokio::main]
async fn main() -&amp;gt; Result&amp;lt;(), Error&amp;gt; {
    // Initialize Rayon thread pool at cold start (once per container lifecycle)
    init_thread_pool(get_worker_count());

    run(service_fn(|event: LambdaEvent&amp;lt;ProcessRequest&amp;gt;| async move {
        function_handler(event.payload).await
    }))
    .await
}&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Why initialize in &lt;code&gt;main()&lt;/code&gt; and not in the handler?&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Deterministic Configuration&lt;/strong&gt;: The thread pool is configured once per container, before any requests arrive. This prevents race conditions if multiple requests try to initialize concurrently.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Container Reuse&lt;/strong&gt;: Lambda containers can serve multiple requests. Initializing in &lt;code&gt;main()&lt;/code&gt; ensures the configuration is set during the cold start and persists for all subsequent warm invocations.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Performance&lt;/strong&gt;: Thread pool setup happens during cold start (already counted as initialization time), not during request processing.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Implementing the Request Handler&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;src/handler.rs&lt;/strong&gt;:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-rust"&gt;use serde::{Deserialize, Serialize};
use std::env;
use std::sync::Once;
use std::time::Instant;
use std::collections::HashSet;
use std::sync::Mutex;
use rayon::prelude::*;

static INIT: Once = Once::new();

#[derive(Deserialize)]
pub struct ProcessRequest {
    count: usize,
    mode: String,
}

#[derive(Serialize)]
pub struct ProcessResponse {
    processed: usize,
    duration_ms: u128,
    mode: String,
    workers: usize,
    detected_cpus: usize,
    avg_ms_per_item: f64,
    memory_used_kb: u64,
    threads_used: usize, // Actual threads that processed items (proves multi-threading)
}

// CPU-intensive bcrypt hashing with cost factor 10
fn hash_password(password: &amp;amp;str) -&amp;gt; Result&amp;lt;String, bcrypt::BcryptError&amp;gt; {
    bcrypt::hash(password, 10)
}

// Process items one at a time (baseline for comparison)
fn process_sequential(items: Vec&amp;lt;String&amp;gt;) -&amp;gt; Result&amp;lt;(Vec&amp;lt;String&amp;gt;, usize), Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;&amp;gt; {
    let results: Result&amp;lt;Vec&amp;lt;String&amp;gt;, _&amp;gt; = items
        .iter()
        .map(|item| hash_password(item))
        .collect();
    results
        .map(|r| (r, 1))
        .map_err(|e| Box::new(e) as Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;)
}

// Process items in parallel using Rayon's work-stealing scheduler
// Thread pool size is configured once at cold start via init_thread_pool()
fn process_parallel(items: Vec&amp;lt;String&amp;gt;) -&amp;gt; Result&amp;lt;(Vec&amp;lt;String&amp;gt;, usize), Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;&amp;gt; {
    let thread_ids: Mutex&amp;lt;HashSet&amp;lt;std::thread::ThreadId&amp;gt;&amp;gt; = Mutex::new(HashSet::new());

    let results: Result&amp;lt;Vec&amp;lt;String&amp;gt;, _&amp;gt; = items
        .par_iter()
        .map(|item| {
            thread_ids.lock().unwrap().insert(std::thread::current().id());
            hash_password(item)
        })
        .collect();

    let threads_used = thread_ids.lock().unwrap().len();
    results
        .map(|r| (r, threads_used))
        .map_err(|e| Box::new(e) as Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;)
}

// Get worker count from env var or detect CPUs, clamped to 1-6
pub fn get_worker_count() -&amp;gt; usize {
    if let Ok(count_str) = env::var("WORKER_COUNT") {
        if let Ok(count) = count_str.parse::&amp;lt;usize&amp;gt;() {
            return count.clamp(1, 6);
        }
    }
    num_cpus::get().clamp(1, 6)
}

// Initialize Rayon global thread pool (only once per Lambda container)
pub fn init_thread_pool(workers: usize) {
    INIT.call_once(|| {
        let _ = rayon::ThreadPoolBuilder::new()
            .num_threads(workers)
            .build_global();
    });
}

// Read RSS memory from /proc/self/statm (Linux only)
fn get_memory_usage_kb() -&amp;gt; u64 {
    std::fs::read_to_string("/proc/self/statm")
        .ok()
        .and_then(|s| s.split_whitespace().nth(1)?.parse::&amp;lt;u64&amp;gt;().ok())
        .map(|pages| pages * 4)
        .unwrap_or(0)
}

// Main Lambda handler - processes items sequentially or in parallel
pub async fn function_handler(request: ProcessRequest) -&amp;gt; Result&amp;lt;ProcessResponse, Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;&amp;gt; {
    if request.count == 0 { return Err("count must be greater than 0".into()); }
    if request.count &amp;gt; 1000 { return Err("count exceeds maximum of 1000 items".into()); }

    let items: Vec&amp;lt;String&amp;gt; = (0..request.count)
        .map(|i| format!("password_{:06}", i))
        .collect();

    let workers = get_worker_count();
    let mode = match request.mode.as_str() {
        "sequential" =&amp;gt; "sequential",
        "parallel"   =&amp;gt; "parallel",
        _            =&amp;gt; if workers &amp;gt; 1 { "parallel" } else { "sequential" },
    };

    let start = Instant::now();
    let (results, threads_used) = match mode {
        "sequential" =&amp;gt; process_sequential(items)?,
        _            =&amp;gt; process_parallel(items)?,
    };
    let duration_ms = start.elapsed().as_millis();

    Ok(ProcessResponse {
        processed: results.len(),
        duration_ms,
        mode: mode.to_string(),
        workers: if mode == "parallel" { workers } else { 1 },
        detected_cpus: num_cpus::get(),
        avg_ms_per_item: duration_ms as f64 / request.count as f64,
        memory_used_kb: get_memory_usage_kb(),
        threads_used,
    })
}&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Key Implementation Details&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Thread Pool Initialization at Cold Start&lt;/strong&gt;: The code initializes the thread pool in &lt;code&gt;main()&lt;/code&gt; before the Lambda runtime starts, not during request processing. This approach is designed to eliminate race conditions and provide deterministic behavior across all invocations.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Important Note&lt;/strong&gt;: Lambda initializes the thread pool once per container. The thread pool configuration retains its original value even if you change the &lt;code&gt;WORKER_COUNT&lt;/code&gt; environment variable between invocations within the same container. For production deployments, keep &lt;code&gt;WORKER_COUNT&lt;/code&gt; consistent for the function’s lifecycle.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Input Validation&lt;/strong&gt;: The handler validates that &lt;code&gt;count&lt;/code&gt; is between 1 and 1000 to prevent resource exhaustion.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Thread Tracking&lt;/strong&gt;: The &lt;code&gt;threads_used&lt;/code&gt; field proves multi-threading is working by tracking unique thread IDs during parallel processing. This provides empirical validation that work is distributed across multiple threads.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Memory Tracking&lt;/strong&gt;: The &lt;code&gt;memory_used_kb&lt;/code&gt; field reports RSS memory usage by reading &lt;code&gt;/proc/self/statm&lt;/code&gt;, providing visibility into actual memory consumption.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Mode Selection&lt;/strong&gt;: The function supports three modes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;sequential&lt;/code&gt;: Single-threaded processing&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;parallel&lt;/code&gt;: Multi-threaded processing using Rayon&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;auto&lt;/code&gt;: Automatically selects based on available workers&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Building and Deploying&lt;/h2&gt; 
&lt;p&gt;With the implementation complete, let’s compile the function for Lambda’s environment and deploy it to AWS.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;# Build for ARM64 (Graviton2) - recommended for cost efficiency
cargo lambda build --release --arm64

# Or build for x86_64
cargo lambda build --release --x86-64&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The build process produces a binary of approximately &lt;strong&gt;1.7 MB&lt;/strong&gt; (uncompressed) or &lt;strong&gt;0.8 MB&lt;/strong&gt; (zipped).&lt;/p&gt; 
&lt;h3&gt;Deploy to AWS&lt;/h3&gt; 
&lt;p&gt;Use Cargo Lambda to deploy the function with your desired memory configuration and worker count.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;# Deploy with 6144 MB memory (4 vCPUs) and 4 workers
cargo lambda deploy rust-multithread-lambda \
    --memory 6144 \
    --timeout 30 \
    --env-var WORKER_COUNT=4&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: To test different configurations, repeat the build and deploy commands with different &lt;code&gt;--memory&lt;/code&gt; values and &lt;code&gt;WORKER_COUNT&lt;/code&gt; settings for each configuration you want to benchmark. For comprehensive testing across architectures, build with &lt;code&gt;--arm64&lt;/code&gt;, deploy all memory configurations, then rebuild with &lt;code&gt;--x86-64&lt;/code&gt; and deploy again.&lt;/p&gt; 
&lt;h3&gt;Required IAM Permissions&lt;/h3&gt; 
&lt;p&gt;The Lambda execution role needs the following permissions:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-json"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Test the Function&lt;/h3&gt; 
&lt;p&gt;After deployment, verify the function works correctly by invoking it with a test payload.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;aws lambda invoke \
    --function-name rust-multithread-lambda \
    --payload '{"count":20,"mode":"parallel"}' \
    --cli-binary-format raw-in-base64-out \
    response.json&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Performance Benchmarks&lt;/h2&gt; 
&lt;p&gt;We tested multiple configurations on ARM64 (Graviton2) to measure the impact of multi-threading.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Test workload&lt;/strong&gt;: Processing 20 bcrypt password hashes (cost factor 10)&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Benchmark results may vary between runs due to factors such as Lambda placement, underlying hardware differences, and AWS infrastructure conditions. The numbers presented here are representative of typical performance observed across multiple test runs.&lt;/p&gt; 
&lt;h3&gt;Performance Results: ARM64 (Graviton2)&lt;/h3&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;vCPUs&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;Avg (ms)&lt;/td&gt; 
   &lt;td&gt;P50 (ms)&lt;/td&gt; 
   &lt;td&gt;P95 (ms)&lt;/td&gt; 
   &lt;td&gt;P99 (ms)&lt;/td&gt; 
   &lt;td&gt;Min&lt;/td&gt; 
   &lt;td&gt;Max&lt;/td&gt; 
   &lt;td&gt;Speedup&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;~1&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,885&lt;/td&gt; 
   &lt;td&gt;1,882&lt;/td&gt; 
   &lt;td&gt;1,898&lt;/td&gt; 
   &lt;td&gt;1,898&lt;/td&gt; 
   &lt;td&gt;1,877&lt;/td&gt; 
   &lt;td&gt;1,907&lt;/td&gt; 
   &lt;td&gt;1.00x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;2048 MB&lt;/td&gt; 
   &lt;td&gt;~2&lt;/td&gt; 
   &lt;td&gt;2&lt;/td&gt; 
   &lt;td&gt;1,334&lt;/td&gt; 
   &lt;td&gt;1,331&lt;/td&gt; 
   &lt;td&gt;1,341&lt;/td&gt; 
   &lt;td&gt;1,341&lt;/td&gt; 
   &lt;td&gt;1,324&lt;/td&gt; 
   &lt;td&gt;1,356&lt;/td&gt; 
   &lt;td&gt;1.41x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;~3&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;685&lt;/td&gt; 
   &lt;td&gt;683&lt;/td&gt; 
   &lt;td&gt;699&lt;/td&gt; 
   &lt;td&gt;699&lt;/td&gt; 
   &lt;td&gt;669&lt;/td&gt; 
   &lt;td&gt;704&lt;/td&gt; 
   &lt;td&gt;2.75x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;~4&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;463&lt;/td&gt; 
   &lt;td&gt;464&lt;/td&gt; 
   &lt;td&gt;467&lt;/td&gt; 
   &lt;td&gt;467&lt;/td&gt; 
   &lt;td&gt;453&lt;/td&gt; 
   &lt;td&gt;469&lt;/td&gt; 
   &lt;td&gt;4.07x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;~5&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;338&lt;/td&gt; 
   &lt;td&gt;343&lt;/td&gt; 
   &lt;td&gt;345&lt;/td&gt; 
   &lt;td&gt;345&lt;/td&gt; 
   &lt;td&gt;325&lt;/td&gt; 
   &lt;td&gt;346&lt;/td&gt; 
   &lt;td&gt;5.57x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;~6&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;280&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;278&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;292&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;292&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;271&lt;/td&gt; 
   &lt;td&gt;293&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;6.73x&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Performance Results: x86_64&lt;/h3&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;vCPUs&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;Avg (ms)&lt;/td&gt; 
   &lt;td&gt;P50 (ms)&lt;/td&gt; 
   &lt;td&gt;P95 (ms)&lt;/td&gt; 
   &lt;td&gt;P99 (ms)&lt;/td&gt; 
   &lt;td&gt;Min&lt;/td&gt; 
   &lt;td&gt;Max&lt;/td&gt; 
   &lt;td&gt;Speedup&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;~1&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,671&lt;/td&gt; 
   &lt;td&gt;1,675&lt;/td&gt; 
   &lt;td&gt;1,681&lt;/td&gt; 
   &lt;td&gt;1,681&lt;/td&gt; 
   &lt;td&gt;1,659&lt;/td&gt; 
   &lt;td&gt;1,684&lt;/td&gt; 
   &lt;td&gt;1.00x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;2048 MB&lt;/td&gt; 
   &lt;td&gt;~2&lt;/td&gt; 
   &lt;td&gt;2&lt;/td&gt; 
   &lt;td&gt;1,253&lt;/td&gt; 
   &lt;td&gt;1,249&lt;/td&gt; 
   &lt;td&gt;1,265&lt;/td&gt; 
   &lt;td&gt;1,265&lt;/td&gt; 
   &lt;td&gt;1,241&lt;/td&gt; 
   &lt;td&gt;1,294&lt;/td&gt; 
   &lt;td&gt;1.33x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;~3&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;892&lt;/td&gt; 
   &lt;td&gt;891&lt;/td&gt; 
   &lt;td&gt;899&lt;/td&gt; 
   &lt;td&gt;899&lt;/td&gt; 
   &lt;td&gt;888&lt;/td&gt; 
   &lt;td&gt;900&lt;/td&gt; 
   &lt;td&gt;1.87x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;~4&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;429&lt;/td&gt; 
   &lt;td&gt;425&lt;/td&gt; 
   &lt;td&gt;443&lt;/td&gt; 
   &lt;td&gt;443&lt;/td&gt; 
   &lt;td&gt;417&lt;/td&gt; 
   &lt;td&gt;449&lt;/td&gt; 
   &lt;td&gt;3.89x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;~5&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;330&lt;/td&gt; 
   &lt;td&gt;323&lt;/td&gt; 
   &lt;td&gt;349&lt;/td&gt; 
   &lt;td&gt;349&lt;/td&gt; 
   &lt;td&gt;317&lt;/td&gt; 
   &lt;td&gt;358&lt;/td&gt; 
   &lt;td&gt;5.06x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;~6&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;292&lt;/td&gt; 
   &lt;td&gt;292&lt;/td&gt; 
   &lt;td&gt;298&lt;/td&gt; 
   &lt;td&gt;298&lt;/td&gt; 
   &lt;td&gt;291&lt;/td&gt; 
   &lt;td&gt;298&lt;/td&gt; 
   &lt;td&gt;5.72x&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Architecture Comparison&lt;/h3&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;ARM64 Avg&lt;/td&gt; 
   &lt;td&gt;x86_64 Avg&lt;/td&gt; 
   &lt;td&gt;Diff %&lt;/td&gt; 
   &lt;td&gt;Faster Arch&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,885 ms&lt;/td&gt; 
   &lt;td&gt;1,671 ms&lt;/td&gt; 
   &lt;td&gt;-12.8%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;2048 MB&lt;/td&gt; 
   &lt;td&gt;2&lt;/td&gt; 
   &lt;td&gt;1,334 ms&lt;/td&gt; 
   &lt;td&gt;1,253 ms&lt;/td&gt; 
   &lt;td&gt;-6.4%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;685 ms&lt;/td&gt; 
   &lt;td&gt;892 ms&lt;/td&gt; 
   &lt;td&gt;+23.2%&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;463 ms&lt;/td&gt; 
   &lt;td&gt;429 ms&lt;/td&gt; 
   &lt;td&gt;-7.9%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;338 ms&lt;/td&gt; 
   &lt;td&gt;330 ms&lt;/td&gt; 
   &lt;td&gt;-2.4%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;280 ms&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;292 ms&lt;/td&gt; 
   &lt;td&gt;+4.1%&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Key Observations&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Cold Start Performance&lt;/strong&gt;: Rust’s cold start initialization times are consistently between 19-28 ms across all memory configurations and architectures. ARM64 (&lt;a href="https://aws.amazon.com/pm/ec2-graviton/"&gt;Graviton2&lt;/a&gt;) shows slightly faster cold starts (19-23 ms) compared to x86_64 (26-29 ms). Both are significantly faster than interpreted runtimes because the binary is pre-compiled.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Near-Linear Scaling&lt;/strong&gt;: Both architectures achieve impressive speedups:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;ARM64: &lt;strong&gt;6.73x speedup&lt;/strong&gt; with 6 workers (exceeds theoretical 6x!)&lt;/li&gt; 
 &lt;li&gt;x86_64: 5.72x speedup with 6 workers&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Latency Consistency&lt;/strong&gt;: The P95 and P99 metrics show excellent consistency:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;ARM64 at 6 vCPUs: P50=278ms, P95=292ms, P99=292ms (low variance)&lt;/li&gt; 
 &lt;li&gt;x86_64 at 6 vCPUs: P50=292ms, P95=298ms, P99=298ms&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Both architectures show consistent latency at maximum parallelization.&lt;/p&gt; 
&lt;h2&gt;Cost Analysis&lt;/h2&gt; 
&lt;p&gt;Let’s analyze the cost implications of different configurations for processing 20 bcrypt hashes.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Cost Comparison: ARM64 vs x86_64&lt;/strong&gt; (us-east-1, as of January 2026):&lt;/p&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Config&lt;/td&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;ARM64 Duration&lt;/td&gt; 
   &lt;td&gt;ARM64 Cost/1M&lt;/td&gt; 
   &lt;td&gt;x86_64 Duration&lt;/td&gt; 
   &lt;td&gt;x86_64 Cost/1M&lt;/td&gt; 
   &lt;td&gt;Cheaper Arch&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1 vCPU&lt;/td&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,885 ms&lt;/td&gt; 
   &lt;td&gt;$38.60&lt;/td&gt; 
   &lt;td&gt;1,671 ms&lt;/td&gt; 
   &lt;td&gt;$42.78&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;2 vCPU&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;2048 MB&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;1,334 ms&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;$36.46&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;1,253 ms&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;$42.77&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64 *&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;3 vCPU&lt;/td&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;685 ms&lt;/td&gt; 
   &lt;td&gt;$37.47&lt;/td&gt; 
   &lt;td&gt;892 ms&lt;/td&gt; 
   &lt;td&gt;$60.80&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4 vCPU&lt;/td&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;463 ms&lt;/td&gt; 
   &lt;td&gt;$37.97&lt;/td&gt; 
   &lt;td&gt;429 ms&lt;/td&gt; 
   &lt;td&gt;$44.00&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;5 vCPU&lt;/td&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;338 ms&lt;/td&gt; 
   &lt;td&gt;$36.94&lt;/td&gt; 
   &lt;td&gt;330 ms&lt;/td&gt; 
   &lt;td&gt;$45.10&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6 vCPU&lt;/td&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;280 ms&lt;/td&gt; 
   &lt;td&gt;$38.27&lt;/td&gt; 
   &lt;td&gt;292 ms&lt;/td&gt; 
   &lt;td&gt;$49.87&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h5&gt;*Cheaper Arch&lt;/h5&gt; 
&lt;p&gt;&lt;strong&gt;Cost Formulas:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;ARM64: (Memory in GB) × (Duration in seconds) × $0.0000133334&lt;/li&gt; 
 &lt;li&gt;x86_64: (Memory in GB) × (Duration in seconds) × $0.0000166667 (25% higher rate)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Key Insight&lt;/strong&gt;: The &lt;strong&gt;2 vCPU ARM64 configuration provides the lowest cost&lt;/strong&gt; at $36.46 per million invocations while achieving 1.41x speedup. All ARM64 configurations remain cost-competitive ($36-$39 range) despite significant performance differences, demonstrating how increased throughput can offset higher memory costs.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Choosing the Right Configuration&lt;/strong&gt;:&lt;/p&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;Priority&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;Recommended Config&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;Rationale&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Lowest Cost&lt;/td&gt; 
   &lt;td&gt;ARM64, 2048 MB, 2 workers&lt;/td&gt; 
   &lt;td&gt;$36.46/1M invocations, 1.41x speedup&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Balanced&lt;/td&gt; 
   &lt;td&gt;ARM64, 4096 MB, 3 workers&lt;/td&gt; 
   &lt;td&gt;$37.47/1M invocations, 2.75x speedup&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Low Latency&lt;/td&gt; 
   &lt;td&gt;ARM64, 10240 MB, 6 workers&lt;/td&gt; 
   &lt;td&gt;280ms avg, 6.73x speedup&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;When to Use Multi-threaded Rust on Lambda&lt;/h2&gt; 
&lt;h3&gt;Recommended Use Cases&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Batch data processing&lt;/strong&gt;: Transform, validate, or enrich large datasets&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cryptographic operations&lt;/strong&gt;: Hashing, encryption, digital signatures&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Image/video processing&lt;/strong&gt;: Resize, transcode, analyze media files&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scientific computing&lt;/strong&gt;: Simulations, data analysis, machine learning inference&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;High-volume workloads&lt;/strong&gt;: Functions invoked &amp;gt;100,000 times per day benefit from optimization&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;When to Consider Alternatives&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;I/O-bound operations&lt;/strong&gt;: Use async Rust instead of multi-threading for database queries or API calls&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Simple transformations&lt;/strong&gt;: Functions completing in &amp;lt;100ms rarely benefit from parallelization&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Low-volume workloads&lt;/strong&gt;: Development overhead may not be justified for &amp;lt;10,000 invocations per day&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Rapid prototyping&lt;/strong&gt;: Python or Node.js may be more appropriate when iteration speed is critical&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Cleanup&lt;/h2&gt; 
&lt;p&gt;To delete the resources created in this post:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;# Delete the Lambda function
aws lambda delete-function --function-name rust-multithread-lambda

# Delete the CloudWatch log group
aws logs delete-log-group --log-group-name /aws/lambda/rust-multithread-lambda&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If you deployed multiple configurations for testing, you’ll need to delete each function individually by repeating the delete command with each function name, or use the SAM template for bulk cleanup:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;aws cloudformation delete-stack --stack-name rust-multithread-benchmark&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;When you allocate more memory to your Lambda function, AWS provides proportionally more vCPUs—up to 6 vCPUs at 10,240 MB. However, &lt;strong&gt;sequential code only uses one vCPU&lt;/strong&gt;, leaving the additional compute power idle while you pay for the full allocation. Multi-threaded Rust with Rayon enables you to harness all available vCPUs for CPU-intensive workloads, transforming unused capacity into real performance gains.&lt;/p&gt; 
&lt;p&gt;Our benchmarks demonstrate this clearly:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Near-linear scaling&lt;/strong&gt;: ARM64 achieved &lt;strong&gt;6.73x speedup&lt;/strong&gt; with 6 workers—you get proportional returns on your vCPU investment&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Fast cold starts&lt;/strong&gt;: 19-28 ms initialization across all configurations, eliminating the cold start concerns often associated with compiled languages&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Consistent latency&lt;/strong&gt;: ARM64 at 6 vCPUs shows only 1ms variance between P50 and P99, critical for predictable response times&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost efficiency&lt;/strong&gt;: ARM64 is 15-20% cheaper than x86_64 with better scaling at maximum parallelization&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;The key takeaway&lt;/strong&gt;: If your Lambda function performs CPU-intensive work and you’re allocating more than 1,769 MB of memory, you likely have multiple vCPUs available. Without multi-threading, those vCPUs sit idle. Rayon’s parallel iterators allow you to switch from sequential to parallel processing by changing &lt;code&gt;.iter()&lt;/code&gt; to &lt;code&gt;.par_iter()&lt;/code&gt; in your code.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Recommended starting point&lt;/strong&gt;: ARM64 with 4096 MB (3 workers) offers an excellent balance of cost and performance for most workloads. Scale up to 6 vCPUs for latency-critical applications, or down to 2 vCPUs for maximum cost savings.&lt;/p&gt; 
&lt;h2&gt;Additional Resources&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://github.com/awslabs/aws-lambda-rust-runtime"&gt;AWS Lambda Rust Runtime&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://www.cargo-lambda.info/"&gt;Cargo Lambda Documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.rs/rayon/latest/rayon/"&gt;Rayon Data Parallelism Library&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html"&gt;AWS Lambda Memory and CPU Configuration&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/lambda/pricing/"&gt;AWS Lambda Pricing&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;em&gt;The complete sample code, SAM template, and test scripts from this post are available at &lt;/em&gt;&lt;a href="https://github.com/aws-samples/sample-rust-multithread-lambda"&gt;&lt;em&gt;Github Repository&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Amazon SageMaker AI now hosts NVIDIA Evo-2 NIM microservices</title>
		<link>https://aws.amazon.com/blogs/compute/amazon-sagemaker-ai-now-hosting-nvidia-evo-2-nim-microservices/</link>
					
		
		<dc:creator><![CDATA[Malvika Viswanathan]]></dc:creator>
		<pubDate>Tue, 24 Feb 2026 18:48:08 +0000</pubDate>
				<category><![CDATA[Amazon SageMaker AI]]></category>
		<category><![CDATA[Amazon SageMaker JumpStart]]></category>
		<category><![CDATA[Amazon SageMaker Unified Studio]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Marketplace]]></category>
		<category><![CDATA[AWS Partner Network]]></category>
		<guid isPermaLink="false">ddd72291399cfef3d140f16b7df049b17d7a3ba9</guid>

					<description>This post is co-written with Neel Patel, Abdullahi Olaoye, Kristopher Kersten, Aniket Deshpande from NVIDIA. Today, we’re excited to announce that the NVIDIA Evo-2 NVIDIA NIM microservice are now listed in Amazon SageMaker JumpStart. You can use this launch to deploy accelerated and specialized NIM microservices to build, experiment, and responsibly scale your drug discovery […]</description>
										<content:encoded>&lt;p&gt;&lt;em&gt;This post is co-written with Neel Patel, Abdullahi Olaoye, Kristopher Kersten, Aniket Deshpande from NVIDIA.&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;Today, we’re excited to announce that the NVIDIA Evo-2 &lt;a href="https://www.nvidia.com/en-us/ai-data-science/products/nim-microservices/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA NIM microservice&lt;/a&gt; are now listed in &lt;a href="https://aws.amazon.com/sagemaker/ai/jumpstart/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker JumpStart&lt;/a&gt;. You can use this launch to deploy accelerated and specialized NIM microservices to build, experiment, and responsibly scale your drug discovery workflows on &lt;a href="https://aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;Amazon Web Services (AWS).&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;In this post, we demonstrate how to get started with these models using &lt;a href="https://aws.amazon.com/sagemaker/ai/studio/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Studio&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;NVIDIA NIM microservices on AWS&lt;/h2&gt; 
&lt;p&gt;NVIDIA NIM integrates closely with AWS managed services, such as &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt;, &lt;a href="https://aws.amazon.com/eks/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (Amazon EKS)&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/sagemaker/ai/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker AI&lt;/a&gt;, to support deployment of generative AI models at scale. As part of &lt;a href="https://www.nvidia.com/en-us/data-center/products/ai-enterprise/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA AI Enterprise&lt;/a&gt;, which is available in the &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-ozgjkov6vq3l6?applicationId=AWSMPContessa&amp;amp;ref_=beagle&amp;amp;sr=0-2" target="_blank" rel="noopener noreferrer"&gt;AWS Marketplace&lt;/a&gt;, NVIDIA NIM is a set of microservices designed to accelerate the deployment of generative AI. These prebuilt containers support a broad spectrum of generative AI models, from open source community models, to &lt;a href="https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA Nemotron&lt;/a&gt; and custom models. NIM microservices are deployed with just a few lines of code, or with a few actions in the SageMaker Studio console. Engineered to facilitate seamless generative AI inferencing at scale, NIM ensures that generative AI applications can be deployed on various AWS services.&lt;/p&gt; 
&lt;h2&gt;NVIDIA BioNeMo Evo 2 overview&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://www.nvidia.com/en-us/clara/biopharma/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA BioNeMo&lt;/a&gt; is a platform of NIM microservices, developer tools, and AI models that accelerate building, adapting, and deploying biomolecular AI models for drug discovery. It packages curated training recipes, data loaders, and domain-optimized pretrained models for DNA, RNA, and proteins, alongside &lt;a href="https://developer.nvidia.com/gpu-accelerated-libraries" target="_blank" rel="noopener noreferrer"&gt;NVIDIA CUDA-X libraries&lt;/a&gt; such as &lt;a href="https://developer.nvidia.com/cuequivariance" target="_blank" rel="noopener noreferrer"&gt;NVIDIA cuEquivariance&lt;/a&gt;. These components power tasks such as 3D structure prediction, de novo design, virtual screening, docking, and property prediction with GPU-accelerated performance.&lt;/p&gt; 
&lt;p&gt;NVIDIA NIM microservices provide optimized, API-first inference that integrates directly into enterprise pipelines across on-premises and the cloud, providing scalable and secure deployment with faster time-to-market and lower Total Cost of Ownership (TCO). The Evo 2 NIM delivers a 40-billion parameter foundation model (FM) trained on a vast dataset of genomes that can be used to predict protein function, identify mutations, and accelerate bioengineering research. Furthermore, the Evo 2 NIM can be chained with other NIM microservices such as ESMFold to create end-to-end, containerized workflows that cut time-to-insight while streamlining deployment through consistent APIs.&lt;/p&gt; 
&lt;h2&gt;SageMaker Studio overview&lt;/h2&gt; 
&lt;p&gt;SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that provides a unified visual interface for all of the tools that you need to complete each step of the ML development lifecycle. SageMaker Studio provides complete access, control, and visibility into each step of the ML workflow, from data preparation to model building, training, and deployment. &lt;/p&gt; 
&lt;p&gt;The key features of SageMaker Studio include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Unified interface&lt;/strong&gt;: Access all SageMaker capabilities through a single, web-based visual interface&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Jupyter notebooks&lt;/strong&gt;: Fully managed Jupyter notebooks with pre-configured kernels for popular ML frameworks&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Model management&lt;/strong&gt;: Browse, deploy, and manage models from AWS Marketplace and other sources through an intuitive interface&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Collaboration&lt;/strong&gt;: Share notebooks, experiments, and models with your team members&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Built-in security&lt;/strong&gt;: Integrated with &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; for secure access control&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost management&lt;/strong&gt;: Monitor and control costs with built-in usage tracking and resource management tools&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Amazon SageMaker JumpStart overview&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/sagemaker/ai/jumpstart/"&gt;SageMaker JumpStart&lt;/a&gt; is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is model hubs, which offer a vast catalog of pre-trained models, such as Mistral, for a variety of tasks. You can now discover and deploy Evo 2 NIM in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, so you can derive model performance and MLOps controls with &lt;a href="https://aws.amazon.com/sagemaker/ai/"&gt;Amazon SageMaker AI&lt;/a&gt; features such as &lt;a href="https://aws.amazon.com/sagemaker/ai/pipelines/"&gt;Amazon SageMaker Pipelines&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html"&gt;Amazon SageMaker Debugger&lt;/a&gt;, or container logs. The model is deployed in a secure AWS environment and in your VPC, helping to support data security for enterprise security needs.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before getting started with deployment, make sure that your IAM service role for SageMaker AI has the SageMakerFullAccess permission policy attached. To deploy the NVIDIA NIM microservices successfully, confirm one of the following:&lt;/p&gt; 
&lt;p&gt;Make sure that your IAM role has the following permissions, and that you have the authority to make AWS Marketplace subscriptions in the AWS account used:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;aws-marketplace:ViewSubscriptions&lt;/li&gt; 
 &lt;li&gt;aws-marketplace:Unsubscribe&lt;/li&gt; 
 &lt;li&gt;aws-marketplace:Subscribe&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;If your account is already subscribed to the model, then you can skip to the following Deploy section. Otherwise, start by subscribing to the model package and move to the Deploy section after.&lt;/p&gt; 
&lt;h2&gt;Subscribe to the model package&lt;/h2&gt; 
&lt;p&gt;To subscribe to the model package, complete the following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the SageMaker Jumpstart portal from the SageMaker AI page.&lt;/li&gt; 
 &lt;li&gt;Search for &lt;strong&gt;Evo 2 NIM&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose View model, and on the Model details page choose Subscribe. This will take you to the AWS Marketplace listing for the Evo 2 NIM.&lt;/li&gt; 
 &lt;li&gt;On the AWS Marketplace listing page, choose View purchase options, review the purchase terms and choose the Subscribe button if you and your organization agree with EULA, pricing, and support terms.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Continue&lt;/strong&gt; to with the configuration and choose an &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Region&lt;/a&gt; where you have the service quota for the desired instance type.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;A product &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Name (ARN)&lt;/a&gt; is displayed. This is the model package ARN that you need to specify while creating a deployable model using the SageMaker SDK.&lt;/p&gt; 
&lt;h2&gt;Option 1: Deploy the Evo 2 NIM using SageMaker Studio&lt;/h2&gt; 
&lt;p&gt;The following section outlines how to deploy the EVO 2 NIM using SageMaker Studio.&lt;/p&gt; 
&lt;h3&gt;Getting started with SageMaker Studio&lt;/h3&gt; 
&lt;p&gt;Begin by accessing the &lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt; and navigating to the SageMaker AI service. When you’re in the SageMaker AI console, locate &lt;strong&gt;Studio&lt;/strong&gt; in the left navigation panel and choose &lt;strong&gt;Open Studio&lt;/strong&gt; next to your user profile. If you haven’t set up a SageMaker Studio domain yet, then you must create a new domain and user profile first. This launches the web-based SageMaker Studio interface where you can manage all aspects of your ML workflow.&lt;/p&gt; 
&lt;h3&gt;Navigating to model packages&lt;/h3&gt; 
&lt;p&gt;Within SageMaker Studio, look for &lt;strong&gt;Models&lt;/strong&gt; in the left sidebar and choose &lt;strong&gt;JumpStart base models&lt;/strong&gt; tab within the &lt;strong&gt;Models&lt;/strong&gt; interface. This section contains all available model packages in &lt;strong&gt;SageMaker JumpStart&lt;/strong&gt;, including those from the AWS Marketplace&lt;/p&gt; 
&lt;h3&gt;Locating the Evo-2 NIM model&lt;/h3&gt; 
&lt;p&gt;Use the search functionality to find the NVIDIA Evo-2 NIM model by searching for terms such as “Evo-2” or “NVIDIA”. When you locate the model package in the filtered results, choose it to view the &lt;strong&gt;Model overview&lt;/strong&gt; page. This page provides an overview of the model and can have a &lt;strong&gt;Notebooks&lt;/strong&gt; tab that will show a sample notebook that contains an example showing how to use the NIM. You can choose &lt;strong&gt;Open in JupyterLab&lt;/strong&gt; to open the notebook in JupyterLab and use it as a starting point for using the NIM.&lt;/p&gt; 
&lt;h3&gt;Configuring the model deployment&lt;/h3&gt; 
&lt;p&gt;On the model package overview page, choose the &lt;strong&gt;Deploy&lt;/strong&gt; button on the top right to begin the deployment process. You must configure several important settings: provide a unique endpoint name (such as “Evo-2-nim-endpoint”), choose an appropriate instance type (ml.g6e.12xlarge is recommended for optimal performance), set the initial instance count (typically 1 for initial testing), and specify an endpoint configuration name. Review all of these settings carefully before proceeding.&lt;/p&gt; 
&lt;h3&gt;Initiating and monitoring the deployment&lt;/h3&gt; 
&lt;p&gt;After verifying your configuration settings, choose &lt;strong&gt;Deploy&lt;/strong&gt; to start the deployment process for creating a &lt;strong&gt;Real-time inferance endpoint&lt;/strong&gt;. Navigate to the &lt;strong&gt;Deployments&lt;/strong&gt; section and then the &lt;strong&gt;Endpoints&lt;/strong&gt; section in the left sidebar to monitor the deployment progress. The endpoint status initially shows &lt;strong&gt;Creating&lt;/strong&gt; and typically takes 5–10 minutes to complete. You can track the progress and should see the status change to &lt;strong&gt;InService&lt;/strong&gt; once the deployment is successful.&lt;/p&gt; 
&lt;h3&gt;Testing and validation&lt;/h3&gt; 
&lt;p&gt;When your endpoint is deployed and shows the &lt;strong&gt;In Service&lt;/strong&gt; status, you can optionally test it directly through the SageMaker Studio interface. Choose your deployed endpoint from the endpoints list to access the &lt;strong&gt;Endpoint summary&lt;/strong&gt; page. Scroll down and select the &lt;strong&gt;Playground&lt;/strong&gt; tab. If available, you will see two options: &lt;strong&gt;Test the sample request&lt;/strong&gt; and &lt;strong&gt;Use Python SDK example code&lt;/strong&gt;. You can use either option to validate the deployment by using a sample protein sequence. This validates the endpoint is working correctly before integrating it into your applications.&lt;/p&gt; 
&lt;h2&gt;Option 2: Deploy Evo 2 using the SageMaker SDK&lt;/h2&gt; 
&lt;p&gt;In this section we walk through deploying the Evo-2 NIM through the SageMaker SDK. Make sure that you have the account-level service limit for using ml.g6e.12xlarge for endpoint usage as one or more instances. Furthermore, NVIDIA provides a list of supported instance types that support deployment. Refer to the AWS Marketplace listing for the model to see the supported instance types. To request a service quota increase, go to the &lt;a href="https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html" target="_blank" rel="noopener noreferrer"&gt;AWS service quotas&lt;/a&gt;.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;import sagemaker
import boto3
from sagemaker import ModelPackage, get_execution_role
import json
# Initialize SageMaker session and role
role = get_execution_role()
sagemaker_session = sagemaker.Session()
# Model Package ARN from your AWS Marketplace subscription
# Replace this with your actual Model Package ARN after subscription
model_package_arn = "arn:aws:sagemaker:&amp;lt;region&amp;gt;:&amp;lt;account-id&amp;gt;:model-package/Evo-2-nim-model"
# Create model from AWS Marketplace Model Package
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session
)
# Deploy the model to an endpoint
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g6e.12xlarge",  # Using recommended NVIDIA GPU instance
    endpoint_name="Evo-2-endpoint",
    wait=True
)&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Run Inference with Evo 2 SageMaker endpoint&lt;/h3&gt; 
&lt;p&gt;When you have the model, you can use a sample text to do an inference request. NIM on SageMaker supports the OpenAI API inference protocol inference request format. For an explanation of the supported parameters, go to the &lt;a href="https://docs.api.nvidia.com/nim/reference/colabfold-msa-search-infer" target="_blank" rel="noopener noreferrer"&gt;Evo-2 API documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Real-time inference example&lt;/h3&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;sm_runtime = boto3.client("sagemaker-runtime", region_name=region)

generate_payload = {

 "sequence": "ACGTACGTACGT",

 "num_tokens": 100,

 "temperature": 0.7,

 "top_k": 3,

}

response = sm_runtime.invoke_endpoint(

EndpointName='Evo2-40b-2-1-0',

ContentType="application/json",

Body=json.dumps(generate_payload),

)

result = json.loads(response["Body"].read())

print("Generated DNA:", result["sequence"])
print("Elapsed (ms):", result.get("elapsed_ms"))
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Example output:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Generated DNA: ACGTACATATGTTCGTACATTCGCACAGACGCCATTTTGAAAAATGCTTTAAATGGATTCAGAATTGGTCAAAATGCATAAATCCATCAAAATTTTTTTC&lt;br&gt; Elapsed (ms): 10770&lt;/p&gt; 
&lt;h2&gt;Cleaning up&lt;/h2&gt; 
&lt;p&gt;To avoid unwanted charges, complete the steps in this section to clean up your resources.&lt;/p&gt; 
&lt;h3&gt;Deleting the endpoint from SageMaker Studio&lt;/h3&gt; 
&lt;p&gt;In SageMaker Studio, navigate to the &lt;strong&gt;Endpoints&lt;/strong&gt; section in the left sidebar under &lt;strong&gt;Inference&lt;/strong&gt; to view all your active endpoints. Locate your Evo-2 NIM endpoint in the list and select it to open the endpoint details page. On this page, there is a &lt;strong&gt;Delete&lt;/strong&gt; button. Choose &lt;strong&gt;Delete&lt;/strong&gt; and confirm the deletion when prompted. The endpoint status changes to &lt;strong&gt;Deleting&lt;/strong&gt; and disappears from your endpoints list when the deletion is complete. This process typically takes a few minutes, and when it’s deleted the endpoint stops incurring charges immediately.&lt;/p&gt; 
&lt;h3&gt;Delete the SageMaker endpoint&lt;/h3&gt; 
&lt;p&gt;The SageMaker endpoint that you deployed incurs costs if you leave it running. Use the following code to delete the endpoint if you want to stop incurring charges. For more details, go to &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-delete-resources.html" target="_blank" rel="noopener noreferrer"&gt;Delete endpoints and resources&lt;/a&gt;.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;# Delete endpoint when done (important for cost management)
predictor.delete_endpoint()&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The availability of NVIDIA Evo-2 NIM microservices on Amazon SageMaker Jumpstart represents a significant advancement for researchers and organizations working in drug discovery. This solution provides GPU-accelerated multiple sequence alignments and dramatically speeds up structure prediction pipelines that are critical for protein design and antibody research. Users can implement the flexible deployment options—through SageMaker Studio, or SageMaker SDK—to choose the approach that best fits their workflow and technical expertise. The optimized performance of these NIM microservices, combined with the scalability and security of SageMaker, enables faster time-to-insight while streamlining the deployment of complex biomolecular AI models. We encourage you to try the Evo-2 NIM today and look out for future release of MSA-search and Boltz-2 NIMs to accelerate your drug discovery workflows and use the power of NVIDIA’s specialized microservices on AWS infrastructure.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building fault-tolerant applications with AWS Lambda durable functions</title>
		<link>https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/</link>
					
		
		<dc:creator><![CDATA[Rahul Pisal]]></dc:creator>
		<pubDate>Fri, 06 Feb 2026 16:54:39 +0000</pubDate>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<guid isPermaLink="false">7a6c1a48050b7b4f20e0d12430c82dc3fe579fc1</guid>

					<description>Business applications often coordinate multiple steps that need to run reliably or wait for extended periods, such as customer onboarding, payment processing, or orchestrating large language model inference. These critical processes require completion despite temporary disruptions or system failures. Developers currently spend significant time implementing mechanisms to track progress, handle failures, and manage resources when […]</description>
										<content:encoded>&lt;p&gt;Business applications often coordinate multiple steps that need to run reliably or wait for extended periods, such as customer onboarding, payment processing, or orchestrating large language model inference. These critical processes require completion despite temporary disruptions or system failures. Developers currently spend significant time implementing mechanisms to track progress, handle failures, and manage resources when waiting for external events, shifting focus from business logic to undifferentiated tasks.&lt;/p&gt; 
&lt;p&gt;At re:Invent 2025,&amp;nbsp;&lt;a href="https://aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;Amazon Web Services (AWS)&lt;/a&gt;&amp;nbsp;launched&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;&amp;nbsp;durable functions, a new capability extending Lambda’s event-driven programming model with built-in capabilities to build fault-tolerant multi-step applications and AI workflows using familiar programming languages. At its core, durable functions are regular Lambda functions, so your development and operational processes for Lambda continue to apply. However, when you create a Lambda function you can now enable durable execution, so that you can checkpoint progress, automatically recover from failures, and suspend execution for up to one year when waiting on long-running tasks, such as human-in-the-loop processes.&lt;/p&gt; 
&lt;h2&gt;How Lambda durable functions work&lt;/h2&gt; 
&lt;p&gt;When working with standard Lambda functions, your code runs from start to finish in a single invocation. If a failure occurs at any point during the execution, the entire function must be retried by the invoking event source. Any state that needs to be preserved between executions must be explicitly saved and retrieved. This is typically done by using external storage services such as&amp;nbsp;&lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt;&amp;nbsp;or&amp;nbsp;&lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3&lt;/a&gt;). Furthermore, you must typically guard against duplicate (concurrent) invocations of the same event and have a strategy to safely deploy updates while continuing to process events.&lt;/p&gt; 
&lt;p&gt;In contrast, with Lambda durable functions, developers use durable operations such as “Steps” and “Waits” in the event handler to checkpoint progress, handle failures, and suspend execution during wait periods without incurring compute charges for on-demand functions. These durable operations and any optional state returned from them are automatically persisted by Lambda in a fully-managed durable execution backend. If failures occur during the execution, or if your function resumes its execution after being paused, Lambda invokes your function again, restoring (replaying) the previous state by executing the event handler from the start, but skipping over completed durable operations. To streamline this checkpoint/replay mechanism for developers, you can use the Lambda durable execution SDK to wrap or annotate your event handler, which enhances the existing Lambda context with several new methods like&amp;nbsp;context.step()&amp;nbsp;and context.wait(). Furthermore, you can use methods such as&amp;nbsp;context.waitForCallback()&amp;nbsp;to wait on external jobs or asynchronous processes, such as “human-in-the-loop” scenarios. The execution is paused until a&amp;nbsp;SendDurableExecutionCallbackSuccess&amp;nbsp;or&amp;nbsp;SendDurableExecutionCallbackFailure&amp;nbsp;response is sent to the Lambda API.&lt;/p&gt; 
&lt;h2&gt;Getting started&lt;/h2&gt; 
&lt;p&gt;Use the&amp;nbsp;&lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;AWS Serverless Application Model (AWS SAM)&lt;/a&gt;&amp;nbsp;to create a new durable function with&amp;nbsp;&lt;code&gt;sam init&lt;/code&gt;&amp;nbsp;with an AWS Quick Start Template. Lambda durable functions are also supported by the&amp;nbsp;&lt;a href="https://aws.amazon.com/cdk/" target="_blank" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (AWS CDK)&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI),&lt;/a&gt;&amp;nbsp;&lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt;&amp;nbsp;and other infrastructure as code (IaC) frameworks such as Terraform.&lt;/p&gt; 
&lt;p&gt;Consider the following function, which performs user onboarding. First, it creates a user profile based on some data, then it sends out an email for verification and waits until the user either confirms the email address, or a 24-hour timeout is reached. Finally, it sends out a confirmation.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;import {
  DurableContext,
  withDurableExecution,
} from '@aws/durable-execution-sdk-js';
export const handler = withDurableExecution(
  async (event: OnboardingEvent, context: DurableContext) =&amp;gt; {
    try {    
      // Create user profile
      const profile = await context.step("create-profile", async () =&amp;gt;
        createUserProfile(event.email, event.name)
      );
      // Wait for email verification via callback
      const verification = await context.waitForCallback(
        "wait-for-email-verification",
        async (callbackId) =&amp;gt; {
          // Send email to user and pass callbackId
          await sendVerificationEmail(profile, callbackId);
        },
        {
          timeout: { hours: 24 } 
        }
      );
      // Send confirmation and welcome email
      const result = await context.step("complete-onboarding", async () =&amp;gt; {
        if (!verification || !verification.verified) 
     return { ...profile, status: 'failed' };
        await sendWelcomeEmail(profile.email, profile.name);
        return { ...profile, status: 'active' };
      });
      return result;
    } catch (error) {
      // omitted 
    }
  }
);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Durable functions have built-in and fully customizable error handling for steps. For example, if the profile was successfully created and verified, but a temporary error occurred when sending out the confirmation, then the step is retried. The retry skips over any previously completed checkpoints, such as the profile creation and callback. Only the code within the send confirmation step is run again.&lt;/p&gt; 
&lt;p&gt;Next, you update the AWS SAM template to include your durable function. You create a Lambda durable function by including the DurableConfig setting for your function. Note that you currently cannot add a durable configuration to a function that was originally created without it. The&amp;nbsp;ExecutionTimeout&amp;nbsp;defines after which time the durable execution times out to protect against runaway or deadlock application bugs. This setting is separate from the invocation timeout, which defines for how long a single invocation can run. The maximum invocation timeout for a single function invocations remains unchanged at 15 minutes. With Lambda durable functions, you will typically see multiple invocations per durable execution, such as when using the wait capabilities in the SDK or automatic retries. You can set the ExecutionTimeout for up to one year when using asynchronous invocations.&lt;/p&gt; 
&lt;p&gt;The&amp;nbsp;RetentionPeriodInDays&amp;nbsp;defines how long the execution data of a durable execution is available to you after executions complete.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
 
Resources:
  UserOnboardingFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: UserOnboardingFunction
      CodeUri: ./src
      Handler: index.handler
      Runtime: nodejs24.x
      Architectures:
        - x86_64
      MemorySize: 256
      Timeout: 60		   // Timeout for an individual invocation
      DurableConfig:		   // This makes the function a durable function
        ExecutionTimeout: 90000 // 25h timeout for the durable execution overall
        RetentionPeriodInDays: 7 
UserOnboardingFunctionRole:
    Type: AWS::IAM::Role
    // omitted for brevity&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;You must include the necessary permissions for your function. For example, the&amp;nbsp;&lt;code&gt;AWSLambdaBasicDurableExecutionRole&lt;/code&gt; managed policy only allows the minimal&amp;nbsp;&lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt;&amp;nbsp;actions to create/retrieve checkpoints and logs to increase security. Therefore, it does not include permissions to invoke other (durable) functions or manage callbacks. Refer to the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&amp;nbsp;for more details.&lt;/p&gt; 
&lt;h2&gt;Testing locally&lt;/h2&gt; 
&lt;p&gt;Before deploying your function, you can test it locally using AWS SAM local invoke.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/05/compute-2471-image-1.png"&gt;&lt;/p&gt; 
&lt;p&gt;AWS SAM locally invokes your function and runs the event handler until it reaches the&amp;nbsp;&lt;code&gt;context.waitForCallback()&lt;/code&gt;. To complete callbacks, AWS SAM offers new commands to interact with your durable functions. In this example, you send a&amp;nbsp;&lt;code&gt;Success&lt;/code&gt;&amp;nbsp;response to complete the callback. You can also include relevant data in the response. You can send the response directly using the on-screen guide or using another AWS SAM CLI command from another process.&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;sam local callback succeed &amp;lt;your-callback-id&amp;gt; --result '&amp;lt;your data&amp;gt;'&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/05/compute-2471-image-2.png"&gt;&lt;/p&gt; 
&lt;p&gt;To inspect an execution, you can use AWS SAM to retrieve the durable execution history of your function, which includes details about steps, callbacks, and wait durations, as shown in the following example code.&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;sam local execution history &amp;lt;execution-arn&amp;gt;&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/05/compute-2471-image-3.png"&gt;&lt;/p&gt; 
&lt;p&gt;Depending on your use case, you can instead send a Failure response to a callback and handle those errors in your code. For example, by performing compensation logic in a subsequent step:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;sam local callback fail &amp;lt;your-callback-id&amp;gt; --error-data '&amp;lt;your data&amp;gt;'&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Now that you have verified that your function works as intended, deploy it to AWS using&amp;nbsp;&lt;code&gt;sam&amp;nbsp;deploy&lt;/code&gt; command.&lt;/p&gt; 
&lt;h2&gt;Best practices and considerations&lt;/h2&gt; 
&lt;p&gt;Invoking a Lambda durable function requires a qualified&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Name (ARN),&lt;/a&gt;&amp;nbsp;such as an alias or version. We recommend that you don’t use the&amp;nbsp;&lt;code&gt;$LATEST&lt;/code&gt;&amp;nbsp;qualifier except for rapid prototyping or local testing. Using explicit versions ensures that replays always happen with the same code with which the execution was started. This is to ensure deterministic execution and prevent inconsistencies when updating your function code during executions.&lt;/p&gt; 
&lt;p&gt;We recommend bundling the durable execution SDK with your function code using your preferred package manager. The SDKs are fast-moving, so you can update dependencies as new features become available.&lt;/p&gt; 
&lt;p&gt;There are&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html#durable-sdk-operations" target="_blank" rel="noopener noreferrer"&gt;other durable operations&lt;/a&gt;&amp;nbsp;in the Lambda durable functions SDK that you can use to build your application:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;waitForCondition()&lt;/code&gt;: Pauses the execution of your function until a condition is met. For example, the status of a job polled with an API. For this to work, you provide the waitStrategy and a check function to poll the status.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;parallel()&lt;/code&gt;: Runs multiple durable operations in parallel within the same function, with configurable options such as the maximum number of concurrent branches and desired failure behavior. This streamlines managing durability and checkpointing for simultaneous asynchronous actions.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;map()&lt;/code&gt;: Creates a durable operation and checkpoint for each item of an array, based on the provided mapping function. The items are processed concurrently.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;invoke()&lt;/code&gt;: Invokes another Lambda function and waits for its result. The SDK creates a checkpoint, invokes the target function, and resumes your function when the invocation completes. This enables function composition and workflow decomposition.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Refer to the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html" target="_blank" rel="noopener noreferrer"&gt;developer guide&lt;/a&gt;&amp;nbsp;for more details.&lt;/p&gt; 
&lt;p&gt;Lambda compute charges apply to all invocations, including any replays. When using wait operations, the function suspends execution and, for on-demand functions, doesn’t incur duration charges until execution resumes. You’re also charged for durable operations, data written, and data retention. To learn more about Lambda durable functions pricing, refer to the&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/pricing/?trk=c4ea046f-18ad-4d23-a1ac-cdd1267f942c&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;Lambda pricing&lt;/a&gt;&amp;nbsp;page.&lt;/p&gt; 
&lt;p&gt;For the latest Region availability, visit the&amp;nbsp;&lt;a href="https://builder.aws.com/build/capabilities" target="_blank" rel="noopener noreferrer"&gt;AWS Capabilities by Region page&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;AWS Lambda durable functions extends the Lambda programming model to streamline building fault-tolerant, long-running applications using familiar programming patterns. You can use Lambda durable functions to write multi-step workflows in your preferred programming language, using built-in methods that automatically handle progress checkpointing and error recovery. This streamlines your architectures so that you can focus on your business logic, and optimize cost by charging only for active compute time.&lt;/p&gt; 
&lt;p&gt;You can build durable functions for Python or Node.js based Lambda functions using the Lambda API,&amp;nbsp;&lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;, AWS CLI, AWS CloudFormation, AWS SAM, AWS SDK, and AWS CDK.&lt;/p&gt; 
&lt;p&gt;To get started, visit the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Developer Guide&lt;/a&gt;&amp;nbsp;or watch the&amp;nbsp;&lt;a href="https://www.youtube.com/watch?v=XJ80NBOwsow" target="_blank" rel="noopener noreferrer"&gt;re:Invent breakout session&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Serverless ICYMI Q4 2025</title>
		<link>https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2025/</link>
					
		
		<dc:creator><![CDATA[Julian Wood]]></dc:creator>
		<pubDate>Fri, 30 Jan 2026 15:23:57 +0000</pubDate>
				<category><![CDATA[Amazon API Gateway]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon DynamoDB]]></category>
		<category><![CDATA[Amazon EC2 Container Registry]]></category>
		<category><![CDATA[Amazon Elastic Container Service]]></category>
		<category><![CDATA[Amazon EventBridge]]></category>
		<category><![CDATA[Amazon Simple Queue Service (SQS)]]></category>
		<category><![CDATA[Amazon Simple Storage Service (S3)]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Serverless Application Model]]></category>
		<category><![CDATA[AWS Step Functions]]></category>
		<category><![CDATA[Serverless]]></category>
		<category><![CDATA[Strands Agents]]></category>
		<category><![CDATA[serverless]]></category>
		<category><![CDATA[Serverless ICYMI]]></category>
		<guid isPermaLink="false">c010d77d402d1cc5648d23c95ebb47993b11000f</guid>

					<description>Stay current with the latest serverless innovations that can transform your applications. In this 31st quarterly recap, discover the most impactful AWS serverless launches, features, and resources from Q4 2025 that you might have missed.</description>
										<content:encoded>&lt;p&gt;Stay current with the latest serverless innovations that can transform your applications. In this 31st quarterly recap, discover the most impactful AWS serverless launches, features, and resources from Q4 2025 that you might have missed.&lt;/p&gt; 
&lt;p&gt;In case you missed our last ICYMI, check out what happened in &lt;a href="https://aws.amazon.com/blogs/compute/serverless-icymi-q3-2025/" target="_blank" rel="noopener noreferrer"&gt;Q3 2025&lt;/a&gt;.&lt;/p&gt; 
&lt;div id="attachment_25659" style="width: 596px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/2025-Q4-calendar.png"&gt;&lt;img aria-describedby="caption-attachment-25659" loading="lazy" class="size-full wp-image-25659" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/2025-Q4-calendar.png" alt="2025 Q4 calendar" width="586" height="148"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25659" class="wp-caption-text"&gt;2025 Q4 calendar&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Serverless at re:Invent 2025&lt;/h2&gt; 
&lt;p&gt;This post covers the biggest serverless announcements from re:Invent 2025, highlighting key feature updates that can improve your applications, and shares valuable resources to keep you informed.&lt;/p&gt; 
&lt;p&gt;AWS re:Invent 2025 had more than 60,000 in-person attendees and more than 2 million online viewers for the keynotes. The event featured 3,500 sessions from 3,000 speakers, which included information on 530 AWS service and feature announcements.&lt;/p&gt; 
&lt;div id="attachment_25665" style="width: 942px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Keynote-Igniting-the-serverless-movement.png"&gt;&lt;img aria-describedby="caption-attachment-25665" loading="lazy" class="size-full wp-image-25665" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Keynote-Igniting-the-serverless-movement.png" alt="Keynote Igniting the serverless movement" width="932" height="555"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25665" class="wp-caption-text"&gt;Keynote Igniting the serverless movement&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The serverless content consisted of two tracks: Containers and Serverless (CNS) and Application Integration (API). These tracks included 150 unique sessions watched in-person by more than 16,000 attendees. There were developer-focused experiences including a &lt;a href="https://builder.aws.com/content/3515K374s531rNhcd2gu3HIV5BX/the-road-to-reinvent-hackathon-what-it-is-and-how-to-watch" target="_blank" rel="noopener noreferrer"&gt;Road to re:Invent Hackathon&lt;/a&gt;, AWS Builder Loft, and Builders Arena. &lt;a href="https://catalog.workshops.aws/serverlesspresso/en-US" target="_blank" rel="noopener noreferrer"&gt;Serverlesspresso&lt;/a&gt;, the coffee shop powered by serverless technology, operated in two locations during the event: the Expo Hall and the certification lounge.&lt;/p&gt; 
&lt;div id="attachment_25667" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Serverless-and-developer-community-photo.jpeg"&gt;&lt;img aria-describedby="caption-attachment-25667" loading="lazy" class="size-large wp-image-25667" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Serverless-and-developer-community-photo-1024x683.jpeg" alt="Serverless and developer community photo" width="1024" height="683"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25667" class="wp-caption-text"&gt;Serverless and developer community photo&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Find a curated list of serverless videos on &lt;a href="https://www.youtube.com/playlist?list=PLJo-rJlep0ECbKWbv1Ie-MdKFfSmqjmma" target="_blank" rel="noopener noreferrer"&gt;Serverless Land YouTube&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;AWS Lambda durable functions&lt;/h2&gt; 
&lt;p&gt;Managing state across multi-step serverless workflows has traditionally required complex external orchestration tools. &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;durable functions&lt;/a&gt; expand how developers can use Lambda. You can now build reliable multi-step applications and AI workflows directly within Lambda.&lt;/p&gt; 
&lt;div id="attachment_25662" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-durable-functions-code.png"&gt;&lt;img aria-describedby="caption-attachment-25662" loading="lazy" class="size-large wp-image-25662" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-durable-functions-code-1024x685.png" alt="AWS Lambda durable functions code" width="1024" height="685"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25662" class="wp-caption-text"&gt;AWS Lambda durable functions code&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Durable functions automatically checkpoint progress by saving the current state and completed steps at key points during execution. This allows them to suspend execution for up to one year during long-running tasks and recover from failures by resuming from the last checkpoint rather than restarting from the beginning, all without requiring additional infrastructure management.&lt;/p&gt; 
&lt;p&gt;Developers can now build in Python or TypeScript, wrap calls in steps with automatic retries and checkpointing. You can use waits to suspend execution for minutes, hours, or even up to a year without paying for idle compute. Durable functions use a replay mechanism to maintain state and handle failures gracefully. The replay mechanism works by re-executing your function code from checkpoints when recovering from failures, ensuring state consistency without data loss. This also means you don’t need complex external orchestration tools for many use cases. This can be helpful for AI workflows and multi-step applications where you need reliable state management without managing external infrastructure.&lt;/p&gt; 
&lt;p&gt;For more information, &lt;a href="https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;read the launch blog post&lt;/a&gt; and watch the re:Invent Breakout Session video: &lt;a href="https://www.youtube.com/watch?v=XJ80NBOwsow" target="_blank" rel="noopener noreferrer"&gt;Deep Dive on AWS Lambda durable functions (CNS380)&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;AWS Lambda Managed Instances&lt;/h2&gt; 
&lt;p&gt;Lambda now offers &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Managed Instances&lt;/a&gt;, a new compute option that combines &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt; flexibility with fully managed infrastructure. AWS automatically handles instance provisioning, scaling, and maintenance while allowing access to the full range of EC2 capabilities, including Graviton4, network-optimized instances, and other specialized compute options.&lt;/p&gt; 
&lt;div id="attachment_25663" style="width: 829px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-Managed-Instances-configuration.png"&gt;&lt;img aria-describedby="caption-attachment-25663" loading="lazy" class="size-large wp-image-25663" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-Managed-Instances-configuration-819x1024.png" alt="AWS Lambda Managed Instances configuration" width="819" height="1024"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25663" class="wp-caption-text"&gt;AWS Lambda Managed Instances configuration&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Your functions run on dedicated EC2 capacity from your account, in your own &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud (Amazon VPC)&lt;/a&gt;. AWS still manages the operational overhead, including OS patching, load balancing, and auto-scaling. This gives you access to specialized hardware options while maintaining the serverless operational model. You can further improve costs by using EC2 pricing models, including &lt;a href="https://aws.amazon.com/savingsplans/compute-pricing/" target="_blank" rel="noopener noreferrer"&gt;Compute Savings Plans&lt;/a&gt; and &lt;a href="https://aws.amazon.com/ec2/pricing/reserved-instances/" target="_blank" rel="noopener noreferrer"&gt;Reserved Instances&lt;/a&gt; for Lambda workloads. Each instance can handle multiple concurrent requests, making this particularly valuable for high-volume, steady-state workloads where predictable pricing and specific hardware requirements matter.&lt;/p&gt; 
&lt;p&gt;For more information, read the &lt;a href="https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/" target="_blank" rel="noopener noreferrer"&gt;launch blog post&lt;/a&gt; and watch the re:Invent Breakout Session video: &lt;a href="https://www.youtube.com/watch?v=7mWa2HpCZfg" target="_blank" rel="noopener noreferrer"&gt;Lambda Managed Instances: EC2 Power with Serverless Simplicity (CNS382)&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Other Lambda announcements&lt;/h2&gt; 
&lt;p&gt;Multi-tenant SaaS applications face challenges like data leakage between tenants and noisy neighbor effects where one tenant’s workload impacts others. They also struggle with implementing custom isolation mechanisms. &lt;a href="https://aws.amazon.com/blogs/compute/building-multi-tenant-saas-applications-with-aws-lambdas-new-tenant-isolation-mode/" target="_blank" rel="noopener noreferrer"&gt;Tenant isolation mode&lt;/a&gt; addresses these by processing function invocations in separate execution environments for each tenant. This manages tenant-level compute environment isolation automatically.&lt;/p&gt; 
&lt;div id="attachment_25664" style="width: 905px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-tenant-isolation.png"&gt;&lt;img aria-describedby="caption-attachment-25664" loading="lazy" class="size-full wp-image-25664" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-tenant-isolation.png" alt="AWS Lambda tenant isolation" width="895" height="226"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25664" class="wp-caption-text"&gt;AWS Lambda tenant isolation&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Lambda adds &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/aws-lambda-provisioned-mode-sqs-esm/" target="_blank" rel="noopener noreferrer"&gt;Provisioned Mode&lt;/a&gt; for &lt;a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener noreferrer"&gt;Amazon SQS&lt;/a&gt; event-source mappings, providing predictable performance and reduced cold starts for high-throughput SQS processing workloads.&lt;/p&gt; 
&lt;p&gt;You can now send &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/10/aws-lambda-payload-size-256-kb-1-mb-invocations/" target="_blank" rel="noopener noreferrer"&gt;up to 1 MB of data in asynchronous Lambda invocations&lt;/a&gt;, increased from 256 KB, helping you build more complex data processing scenarios.&lt;/p&gt; 
&lt;p&gt;Lambda functions now support &lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-networking-over-ipv6/" target="_blank" rel="noopener noreferrer"&gt;IPv6 networking&lt;/a&gt;, so you don’t need &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html" target="_blank" rel="noopener noreferrer"&gt;NAT Gateways&lt;/a&gt; when accessing the internet or other AWS services from VPC-connected functions.&lt;/p&gt; 
&lt;div id="attachment_25666" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Lambda-internet-connectivity.png"&gt;&lt;img aria-describedby="caption-attachment-25666" loading="lazy" class="size-large wp-image-25666" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Lambda-internet-connectivity-1024x419.png" alt="Lambda internet connectivity through a NAT Gateway (IPv4) and Lambda internet connectivity through an egress-only internet gateway (IPv6)." width="1024" height="419"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25666" class="wp-caption-text"&gt;Lambda internet connectivity through a NAT Gateway (IPv4) and Lambda internet connectivity through an egress-only internet gateway (IPv6).&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/aws-lambda-rust/" target="_blank" rel="noopener noreferrer"&gt;Lambda Rust support&lt;/a&gt; is now generally available, moving from experimental status. This is backed by AWS Support and the Lambda availability SLA.&lt;/p&gt; 
&lt;p&gt;Lambda has expanded its runtime support by adding &lt;a href="https://aws.amazon.com/blogs/compute/python-3-14-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Python 3.14&lt;/a&gt;, &lt;a href="https://aws.amazon.com/blogs/compute/node-js-24-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Node.js 24&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-now-supports-java-25/" target="_blank" rel="noopener noreferrer"&gt;Java 25&lt;/a&gt; as both managed runtimes and container base images, providing access to the latest language features and ensuring long-term support.&lt;/p&gt; 
&lt;h2&gt;Amazon ECS&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ecs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Container Service (Amazon ECS)&lt;/a&gt; Express Mode streamlines the deployment and management of containerized applications by automating the infrastructure setup that traditionally slows down developers.&lt;/p&gt; 
&lt;div id="attachment_25661" style="width: 798px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Amazon-ECS-Express-Mode-deployment.png"&gt;&lt;img aria-describedby="caption-attachment-25661" loading="lazy" class="size-large wp-image-25661" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Amazon-ECS-Express-Mode-deployment-788x1024.png" alt="Amazon ECS Express Mode deployment" width="788" height="1024"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25661" class="wp-caption-text"&gt;Amazon ECS Express Mode deployment&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;This means you can focus on building applications while deploying with confidence using AWS best practices. Express Mode lets you deploy production-ready containerized web applications and APIs with a single command. This automatically handles domains, networking, load balancing, &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; roles, and auto-scaling through simplified APIs. When your applications evolve and require advanced features, you can seamlessly configure and access the full capabilities of the resources, including Amazon ECS. Learn more from the &lt;a href="https://aws.amazon.com/blogs/aws/build-production-ready-applications-without-infrastructure-complexity-using-amazon-ecs-express-mode/" target="_blank" rel="noopener noreferrer"&gt;launch blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Amazon ECS announced a public preview of a &lt;a href="https://aws.amazon.com/blogs/containers/accelerate-container-troubleshooting-with-the-fully-managed-amazon-ecs-mcp-server-preview/" target="_blank" rel="noopener noreferrer"&gt;fully managed MCP server&lt;/a&gt;, enabling AI-powered experiences for development and operations. The Model Context Protocol (MCP) server provides enterprise-grade capabilities like automatic updates and patching, centralized security through AWS IAM integration, comprehensive audit logging via &lt;a href="https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html" target="_blank" rel="noopener noreferrer"&gt;AWS CloudTrail&lt;/a&gt;, and the proven scalability, reliability, and support of AWS.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ecr/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Container Registry (ECR)&lt;/a&gt; &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-ecr-managed-container-image-signing/" target="_blank" rel="noopener noreferrer"&gt;managed container image signing&lt;/a&gt; enhances your security posture and eliminates the operational overhead of setting up signing. Container image signing allows you to verify that images are from trusted sources. ECR automatically signs images as they are pushed using the identity of the entity pushing the image. Signing operations are logged through CloudTrail for full auditability.&lt;/p&gt; 
&lt;h2&gt;Amazon API Gateway&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/api-gateway/" target="_blank" rel="noopener noreferrer"&gt;Amazon API Gateway&lt;/a&gt;&amp;nbsp;allows you to improve the responsiveness of your REST APIs by &lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" target="_blank" rel="noopener noreferrer"&gt;progressively streaming response payloads&lt;/a&gt; back to the client. With this new capability, you can use streamed responses to enhance user experience when building LLM-driven applications (such as AI agents and chatbots), improve time-to-first-byte (TTFB) performance for web and mobile applications, stream large files, and perform long-running operations while reporting incremental progress using protocols such as &lt;a href="https://en.wikipedia.org/wiki/Server-sent_events" target="_blank" rel="noopener noreferrer"&gt;server-sent events&lt;/a&gt; (SSE).&lt;/p&gt; 
&lt;div id="attachment_25661" style="width: 798px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/06/compute-2459-apigw-streaming-compar.gif"&gt;&lt;img aria-describedby="caption-attachment-25661" loading="lazy" class="aligncenter size-full wp-image-25083" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/06/compute-2459-apigw-streaming-compar.gif" alt="" width="1032" height="500"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25661" class="wp-caption-text"&gt;Amazon API Gateway streaming&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;API Gateway introduces &lt;a href="https://aws.amazon.com/blogs/compute/build-scalable-rest-apis-using-amazon-api-gateway-private-integration-with-application-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;private integration&lt;/a&gt; with &lt;a href="https://aws.amazon.com/elasticloadbalancing/application-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;Application Load Balancers (ALBs)&lt;/a&gt;. You can use this to expose your VPC-based applications securely through REST APIs without exposing your ALBs to the public internet.&lt;/p&gt; 
&lt;p&gt;You can also now configure &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-api-gateway-tls-security-rest-apis/" target="_blank" rel="noopener noreferrer"&gt;enhanced TLS security policies&lt;/a&gt; on API endpoints and custom domain names, providing you with greater control over the security posture of your APIs.&lt;/p&gt; 
&lt;h2&gt;Amazon EventBridge&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/eventbridge/" target="_blank" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt; introduced an &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/eventbridge-enhanced-visual-rule-builder" target="_blank" rel="noopener noreferrer"&gt;enhanced visual rule builder&lt;/a&gt; that helps developers discover and subscribe to events from custom applications and over 200 AWS services. The console-based interface integrates the EventBridge &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-schema-registry.html" target="_blank" rel="noopener noreferrer"&gt;schema registry&lt;/a&gt; with a comprehensive event catalog and intuitive drag-and-drop canvas that simplifies building event-driven applications. Developers can browse and search through events with readily available sample payloads and schemas without having to hunt through individual service documentation. The schema-aware visual builder guides developers through creating event filter patterns and rules, reducing syntax errors and accelerating development time.&lt;/p&gt; 
&lt;p&gt;EventBridge also allows targeting &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-eventbridge-sqs-fair-queue-targets/" target="_blank" rel="noopener noreferrer"&gt;SQS fair queues&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;AWS Step Functions&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/step-functions" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt; allows for enhanced local testing through the &lt;a href="https://aws.amazon.com/blogs/aws/accelerate-workflow-development-with-enhanced-local-testing-in-aws-step-functions/" target="_blank" rel="noopener noreferrer"&gt;TestState API&lt;/a&gt;, providing programmatic access to comprehensive testing capabilities without deploying to AWS. This helps you build automated test suites that validate your workflow definitions locally on your development machines. Test error handling patterns, data transformations, and mock service integrations using your preferred testing frameworks.&lt;/p&gt; 
&lt;p&gt;There is also a new &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/10/aws-step-functions-metrics-dashboard/" target="_blank" rel="noopener noreferrer"&gt;metrics dashboard&lt;/a&gt;, giving you visibility into your workflow operations at both the account and state machine levels.&lt;/p&gt; 
&lt;h2&gt;Other announcements&lt;/h2&gt; 
&lt;p&gt;Savings Plans flexible pricing model extends to AWS managed database services with the launch of &lt;a href="https://aws.amazon.com/blogs/aws/introducing-database-savings-plans-for-aws-databases/" target="_blank" rel="noopener noreferrer"&gt;Database Savings Plans&lt;/a&gt;. This helps reduce database costs by up to 35% when committing to a consistent amount of usage ($/hour) over a&amp;nbsp;1-year&amp;nbsp;term. Savings automatically apply each hour to eligible usage across supported database services, and additional usage beyond the commitment is billed at on-demand rates.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt; now supports &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-dynamodb-multi-attribute-composite-keys-global-secondary-indexes/" target="_blank" rel="noopener noreferrer"&gt;multi-attribute composite keys in global secondary indexes&lt;/a&gt;. You no longer need to concatenate values into synthetic keys manually, which sometimes results in the need to backfill data before adding new indexes. Instead, you can create primary keys using up to eight existing attributes, making it easier to model diverse access patterns and adapt to new query requirements.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; introduced &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-bedrock-agentcore-quality-evaluations-policy-controls/" target="_blank" rel="noopener noreferrer"&gt;AgentCore with quality evaluations and policy controls&lt;/a&gt; for deploying trusted AI agents at scale.&lt;/p&gt; 
&lt;p&gt;Bedrock also added &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-bedrock-18-fully-managed-open-weight-models/" target="_blank" rel="noopener noreferrer"&gt;18 fully managed open weight models&lt;/a&gt;, expanding AI model options for developers.&lt;/p&gt; 
&lt;p&gt;The &lt;a href="https://strandsagents.com/latest/" target="_blank" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt; is an open source framework that takes a model-driven approach to building and running AI agents in just a few lines of code. TypeScript support is &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/typescript-strands-agents-preview/" target="_blank" rel="noopener noreferrer"&gt;now available&lt;/a&gt; in preview so you can choose between Python and TypeScript for building Strands Agents.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-s3-vectors-generally-available/" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 Vectors&lt;/a&gt; became generally available. S3 Vectors delivers purpose-built, cost-optimized vector storage for AI agents, inference, Retrieval Augmented Generation (RAG), and semantic search at billion-vector scale.&lt;/p&gt; 
&lt;h2&gt;Serverless blog posts&lt;/h2&gt; 
&lt;h3&gt;October&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/breaking-down-monolith-workflows-modularizing-aws-step-functions-workflows/" target="_blank" rel="noopener noreferrer"&gt;Breaking down monolith workflows: Modularizing AWS Step Functions workflows&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/introducing-aws-lambda-event-source-mapping-tools-in-the-aws-serverless-mcp-server/" target="_blank" rel="noopener noreferrer"&gt;Introducing AWS Lambda event source mapping tools in the AWS Serverless MCP Server&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/processing-amazon-s3-objects-at-scale-with-aws-step-functions-distributed-map-s3-prefix/" target="_blank" rel="noopener noreferrer"&gt;Processing Amazon S3 objects at scale with AWS Step Functions Distributed Map S3 prefix&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;November&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-networking-over-ipv6/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda networking over IPv6&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/orchestrating-big-data-processing-with-aws-step-functions-distributed-map/" target="_blank" rel="noopener noreferrer"&gt;Orchestrating big data processing with AWS Step Functions Distributed Map&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/optimizing-nested-json-array-processing-using-aws-step-functions-distributed-map/" target="_blank" rel="noopener noreferrer"&gt;Optimizing nested JSON array processing using AWS Step Functions Distributed Map&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/improve-api-discoverability-with-the-new-amazon-api-gateway-portal/" target="_blank" rel="noopener noreferrer"&gt;Improve API discoverability with the new Amazon API Gateway Portal&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" target="_blank" rel="noopener noreferrer"&gt;Building responsive APIs with Amazon API Gateway response streaming&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/python-3-14-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Python 3.14 runtime now available in AWS Lambda&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-serverless-applications-with-rust-on-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Building serverless applications with Rust on AWS Lambda&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/handle-unpredictable-processing-times-with-operational-consistency-when-integrating-asynchronous-aws-services-with-an-aws-step-functions-state-machine/" target="_blank" rel="noopener noreferrer"&gt;Handle unpredictable processing times with operational consistency when integrating asynchronous AWS services with an AWS Step Functions state machine&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-now-supports-java-25/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda now supports Java 25&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/enhancing-api-security-with-amazon-api-gateway-tls-security-policies/" target="_blank" rel="noopener noreferrer"&gt;Enhancing API security with Amazon API Gateway TLS security policies&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/improving-throughput-of-serverless-streaming-workloads-for-kafka/" target="_blank" rel="noopener noreferrer"&gt;Improving throughput of serverless streaming workloads for Kafka&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/build-scalable-rest-apis-using-amazon-api-gateway-private-integration-with-application-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;Build scalable REST APIs using Amazon API Gateway private integration with Application Load Balancer&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/serverless-strategies-for-streaming-llm-responses/" target="_blank" rel="noopener noreferrer"&gt;Serverless strategies for streaming LLM responses&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-multi-tenant-saas-applications-with-aws-lambdas-new-tenant-isolation-mode/" target="_blank" rel="noopener noreferrer"&gt;Building multi-tenant SaaS applications with AWS Lambda’s new tenant isolation mode&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/orchestrating-large-scale-document-processing-with-aws-step-functions-and-amazon-bedrock-batch-inference" target="_blank" rel="noopener noreferrer"&gt;Orchestrating large-scale document processing with AWS Step Functions and Amazon Bedrock batch inference&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/node-js-24-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Node.js 24 runtime now available in AWS Lambda&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Serverless Office Hours&lt;/h2&gt; 
&lt;p&gt;Join our livestream every Tuesday at 11 AM PT for live discussions, Q&amp;amp;A sessions, and deep dives into serverless technologies. Episodes are available on-demand at &lt;a href="https://serverlessland.com/office-hours" target="_blank" rel="noopener noreferrer"&gt;serverlessland.com/office-hours&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;October&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Oct 7 – &lt;a href="https://www.youtube.com/watch?v=XTVgHC7K2-s" target="_blank" rel="noopener noreferrer"&gt;Amazon API Gateway Routing Rules&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Oct 14 – &lt;a href="https://www.youtube.com/watch?v=eKN5TgxA4R8" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB Global Tables&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Oct 21 – &lt;a href="https://www.youtube.com/watch?v=ZGElhJmN_8o" target="_blank" rel="noopener noreferrer"&gt;Building agents with Amazon Bedrock AgentCore&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Oct 28 – &lt;a href="https://www.youtube.com/watch?v=mZ1xksrL8Lw" target="_blank" rel="noopener noreferrer"&gt;“What’s new with Observability&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;November&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Nov 4 – &lt;a href="https://www.youtube.com/watch?v=fTOg4FRFEZA" target="_blank" rel="noopener noreferrer"&gt;Getting your AI spec right!&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Nov 11 – &lt;a href="https://www.youtube.com/watch?v=RlG71WUZa7Q" target="_blank" rel="noopener noreferrer"&gt;Running Swift in AWS Lambda&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Nov 18 – &lt;a href="https://www.youtube.com/watch?v=N3uo__CCXKg" target="_blank" rel="noopener noreferrer"&gt;What’s new in EventCatalog&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Nov 24 – &lt;a href="https://www.youtube.com/watch?v=CwECZ4SHwQ4" target="_blank" rel="noopener noreferrer"&gt;pre:Invent 2025&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;December&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Dec 9 – &lt;a href="https://www.youtube.com/watch?v=b5VtHydva1A" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Dec 16 – &lt;a href="https://www.youtube.com/watch?v=giNnpHauWT0" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda durable functions&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Still looking for more?&lt;/h2&gt; 
&lt;p&gt;The&amp;nbsp;&lt;a href="http://aws.amazon.com/serverless" target="_blank" rel="noopener noreferrer"&gt;Serverless landing page&lt;/a&gt;&amp;nbsp;has overall information about building serverless applications. The&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/resources/?aws-lambda-resources-blog.sort-by=item.additionalFields.createdDate&amp;amp;aws-lambda-resources-blog.sort-order=desc" target="_blank" rel="noopener noreferrer"&gt;Lambda resources page&lt;/a&gt;&amp;nbsp;contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.&lt;/p&gt; 
&lt;p&gt;You can also&amp;nbsp;follow the Serverless Developer Advocacy team to see the latest news, follow conversations, and interact with the team.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Julian Wood:&amp;nbsp;&lt;a href="https://twitter.com/julian_wood" target="_blank" rel="noopener noreferrer"&gt;@julian_wood&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/julianrwood/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/julianrwood/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Eric Johnson:&amp;nbsp;&lt;a href="https://twitter.com/edjgeek" target="_blank" rel="noopener noreferrer"&gt;@edjgeek&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/singledigit/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/singledigit/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Gunnar Grosch: &lt;a href="https://x.com/GunnarGrosch" target="_blank" rel="noopener noreferrer"&gt;@GunnarGrosch&lt;/a&gt;, &lt;a href="https://se.linkedin.com/in/gunnargrosch" target="_blank" rel="noopener noreferrer"&gt;https://se.linkedin.com/in/gunnargrosch&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Erik Hanchet: &lt;a href="https://x.com/ErikCH" target="_blank" rel="noopener noreferrer"&gt;@ErikCH&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/erikhanchett/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/erikhanchett/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Salih Gueler: &lt;a href="https://x.com/salihgueler" target="_blank" rel="noopener noreferrer"&gt;@salihgueler&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/salihgueler/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/salihgueler/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Marcia Villalba:&amp;nbsp;&lt;a href="https://twitter.com/mavi888uy/" target="_blank" rel="noopener noreferrer"&gt;@mavi888uy&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/marciavillalba" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/marciavillalba&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;And finally, visit &lt;a href="http://serverlessland.com/" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;&amp;nbsp;for all your serverless needs.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>More room to build: serverless services now support payloads up to 1 MB</title>
		<link>https://aws.amazon.com/blogs/compute/more-room-to-build-serverless-services-now-support-payloads-up-to-1-mb/</link>
					
		
		<dc:creator><![CDATA[Anton Aleksandrov]]></dc:creator>
		<pubDate>Thu, 29 Jan 2026 22:16:14 +0000</pubDate>
				<category><![CDATA[Amazon EventBridge]]></category>
		<category><![CDATA[Amazon Simple Queue Service (SQS)]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Serverless]]></category>
		<guid isPermaLink="false">2de167a1befc19d6f6074428fc4217704a9fe6de</guid>

					<description>To support cloud applications that increasingly depend on rich contextual data, AWS is raising the maximum payload size from 256 KB to 1 MB for asynchronous AWS Lambda function invocations, Amazon Amazon SQS, and Amazon EventBridge. Developers can use this enhancement to build and maintain context-rich event-driven systems and reduce the need for complex workarounds such as data chunking or external large object storage.</description>
										<content:encoded>&lt;p&gt;To support cloud applications that increasingly depend on rich contextual data, AWS has raised the maximum payload size from 256 KB to 1 MB for asynchronous &lt;a href="https://aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt; function invocations, &lt;a href="https://aws.amazon.com/sqs/"&gt;Amazon Simple Queue Service&lt;/a&gt; (Amazon SQS), and &lt;a href="https://aws.amazon.com/eventbridge/"&gt;Amazon EventBridge&lt;/a&gt;. Developers can use this enhancement to build and maintain context-rich event-driven systems and reduce the need for complex workarounds such as data chunking or external large object storage.&lt;/p&gt; 
&lt;h1&gt;Overview&lt;/h1&gt; 
&lt;p&gt;Modern cloud applications rely on context-rich, structured data to drive intelligent behavior. Large language model (LLM) prompts, telemetry signals, personalization data, machine learning (ML) outputs, and user interaction logs are no longer simple strings. Instead, they’re typically complex, nested JSON or YAML objects carrying meaningful context. Previously, developers working with serverless services such as Amazon SQS, Lambda (asynchronous invocations and Amazon SQS event-source mapping), or EventBridge had to carefully manage their data to fit within the 256 KB payload size limit. This commonly meant chunking larger payloads, externalizing payloads to object stores such as &lt;a href="https://aws.amazon.com/s3/"&gt;Amazon S3&lt;/a&gt;, or using &lt;a href="https://aws.amazon.com/blogs/compute/optimizing-network-footprint-in-serverless-applications/"&gt;data compression&lt;/a&gt;. These workarounds added complexity and latency, creating edge cases that were difficult to monitor and debug.&lt;/p&gt; 
&lt;p&gt;With the recent launches, you can now transmit payloads up to 1 MB, significantly reducing the need for complex data chunking and architectural workarounds. This increased capacity streamlines design patterns, reduces operational overhead, and makes event-driven systems more intuitive to build and maintain. Developers can now include richer data in single payloads—from detailed LLM prompts and full system states to comprehensive context and complete transaction histories.&lt;/p&gt; 
&lt;p&gt;The new 1 MB payload size limit applies to asynchronous Lambda function invocations, whether you trigger them using either &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html"&gt;SQS event-source mapping&lt;/a&gt;, &lt;a href="https://aws.amazon.com/cli/"&gt;AWS Command Line Interface&lt;/a&gt; (AWS CLI), &lt;a href="https://builder.aws.com/build/tools"&gt;AWS SDKs&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/lambda/latest/api/API_Invoke.html"&gt;Lambda Invoke API&lt;/a&gt;, or AWS services such as EventBridge. The increased limit also extends to all messages and events flowing through Amazon SQS queues and EventBridge Event Buses.&lt;/p&gt; 
&lt;h1&gt;Getting started&lt;/h1&gt; 
&lt;p&gt;There’s nothing you need to do to get started. This enhancement is automatically applied to all new and existing Lambda functions, SQS queues, and EventBridge Event Buses.&lt;/p&gt; 
&lt;p&gt;If you were previously chunking data at 256KB (or lower) threshold, then you might need to make changes to your service configurations or business logic code to start using the new limit. For example, if you’ve explicitly set Amazon SQS &lt;strong&gt;MaximumMessageSize&lt;/strong&gt; attribute, then you might need to adjust it to a new desired value. Larger payloads might also result in higher costs, as described in the following section.&lt;/p&gt; 
&lt;h1&gt;Real-world example: rich event context in agentic event-driven architectures&lt;/h1&gt; 
&lt;p&gt;Event-driven architectures allow services to operate independently without centralized coordination. In these systems, comprehensive event context is essential. With the increased 1 MB payload limit, events can now carry more comprehensive data—from user profiles and order details to historical interactions. This enables services such as inventory, shipping, and notifications to act autonomously.&lt;/p&gt; 
&lt;p&gt;Consider the following example. In hospitality and quick-service industries, customer satisfaction depends on timely, thoughtful service recovery. When a guest submits negative feedback through a survey, review, or complaint form, service teams must gather context, interpret the issue, and craft a response. Traditionally, this meant manually piecing together visit logs, loyalty data, and prior complaints. Now, this can be fully automated using an AI agent powered by AWS serverless services and &lt;a href="https://aws.amazon.com/bedrock/"&gt;Amazon Bedrock&lt;/a&gt;, as shown in the following figure.&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/09/25/compute-2424-img1.png"&gt;&lt;img loading="lazy" class="aligncenter size-full wp-image-24614" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/09/25/compute-2424-img1.png" alt="" width="1313" height="609"&gt;&lt;/a&gt;Figure 1: Customer feedback processing pipeline&lt;/p&gt; 
&lt;p&gt;The workflow:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Receive&lt;/strong&gt;: A new review is submitted through the Review application and emitted as an event to EventBridge Event Bus.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Detect&lt;/strong&gt;: Event Bus delivers the event to downstream Feedback analysis agent. The agent running in a Lambda function recognizes the review as low-rating or complaint.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Enrich&lt;/strong&gt;: The agent collects the guest’s visit metadata, booking details, loyalty activity, and complaint history using attached MCP tools into a single structured JSON payload (up to 1 MB).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Queue&lt;/strong&gt;: The payload is sent to an SQS queue for further asynchronous processing by downstream components.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Generate&lt;/strong&gt;: A separate Lambda function polls messages from Amazon SQS and invokes an Amazon Bedrock model to analyze the full complaint context, draft a personalized response, suggest a gesture (such as a refund or credit), and classify issue severity.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Deliver&lt;/strong&gt;: The message is logged and sent to the customer, and to the service team for further analysis.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This use case demonstrates the importance of having a rich context: current and previous visits details, loyalty tier, prior interactions, and feedback history. Previously, teams had to offload pieces of context to Amazon S3 and reference them externally, adding latency and architectural complexity. The new 1 MB payload size means that all this information can be transported together, improving the serverless agentic workflow efficiency and streamlining maintenance.&lt;/p&gt; 
&lt;h1&gt;Best practices when using large payloads&lt;/h1&gt; 
&lt;p&gt;The following sections outline best practices that you should apply when using larger payloads.&lt;/p&gt; 
&lt;h2&gt;Performance considerations&lt;/h2&gt; 
&lt;p&gt;Monitor Lambda function memory usage carefully when working with larger payloads, because parsing and processing complex JSON objects can increase memory usage and execution duration. Test your systems thoroughly under load, especially for high-throughput applications, by benchmarking with realistic payload sizes and traffic patterns. Although the payload limit has increased to 1 MB, the Lambda 15-minute timeout and memory limits remain unchanged. When applicable, you can &lt;a href="https://aws.amazon.com/blogs/compute/optimizing-network-footprint-in-serverless-applications/"&gt;use compression&lt;/a&gt; to process even larger datasets efficiently, but remember to account for the added CPU overhead of compression and decompression in your performance calculations. Read the&lt;a href="https://aws.amazon.com/blogs/compute/monitoring-best-practices-for-event-delivery-with-amazon-eventbridge/"&gt; Monitoring best practices for event delivery with Amazon EventBridge&lt;/a&gt; post for more best practices to tune your event-driven architectures performances.&lt;/p&gt; 
&lt;h2&gt;Operational guidelines&lt;/h2&gt; 
&lt;p&gt;Configure &lt;a href="https://aws.amazon.com/what-is/dead-letter-queue/"&gt;dead-letter-queues&lt;/a&gt; (DLQ) to make sure that failed messages are retained for inspection and troubleshooting. This becomes especially important with larger payloads, because debugging complex data structures necessitates access to the complete message context. Implement robust error handling and retries to manage transient failures, particularly when processing rich payload content that may contain nested structures or complex relationships.&lt;/p&gt; 
&lt;p&gt;To further optimize throughput, you can batch similar smaller events together into a single payload. However, avoid mixing unrelated events and maintain clear boundaries between different business domains and processes.&lt;/p&gt; 
&lt;p&gt;Always make sure that your downstream dependencies are capable of handling larger payloads.&lt;/p&gt; 
&lt;h2&gt;When to use external storage&lt;/h2&gt; 
&lt;p&gt;Even with the increased 1 MB payload limit, there are scenarios where patterns such as &lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/StoreInLibrary.html"&gt;claim check&lt;/a&gt; remain a sound architectural choice. These patterns involve storing a full payload in an external system, such as Amazon S3, and passing a lightweight reference through your event stream. This approach continues to provide value when payloads exceed the new limit, when data needs to be reused by multiple consumers, or when strict governance, traceability, and security requirements are involved. For example, audit logs, image metadata, or large ML inference inputs may still surpass the 1 MB boundary, even when compressed. Instead of risking truncation or fragmentation, a claim check enables consistent, scalable access to the complete data set.&lt;/p&gt; 
&lt;p&gt;You can use open source libraries such as the &lt;a href="https://github.com/aws/eventbridge-kafka-connector"&gt;Kafka sink connector for EventBridge&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-managing-large-messages.html"&gt;Amazon SQS Extended Client Library&lt;/a&gt; (available for Python and Java) that abstract complexities of storing large objects in external storage.&lt;/p&gt; 
&lt;h2&gt;Cost management&lt;/h2&gt; 
&lt;p&gt;Although larger payloads enable richer context in your applications, logging full payloads can increase storage and processing costs. Services such as CloudWatch Logs charge based on data volume, thus implementing selective logging, payload truncation, or sampling becomes crucial for high-volume events. Consider logging only essential fields or implementing smart sampling strategies based on business importance.&lt;/p&gt; 
&lt;p&gt;For full payload archival and retention, evaluate cost-effective storage solutions such as Amazon S3 with appropriate lifecycle policies. This can include moving older logs to cheaper storage tiers or implementing automated cleanup procedures for non-critical data. Balance your retention needs with cost optimization by defining clear policies for what data needs to be kept and for how long.&lt;/p&gt; 
&lt;p&gt;Review the pricing pages for &lt;a href="https://aws.amazon.com/lambda/pricing/"&gt;AWS Lambda&lt;/a&gt;, &lt;a href="https://aws.amazon.com/eventbridge/pricing/"&gt;Amazon EventBridge&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/sqs/pricing/"&gt;Amazon SQS&lt;/a&gt; to learn about the costs of delivering and processing events and messages.&lt;/p&gt; 
&lt;h1&gt;Conclusion&lt;/h1&gt; 
&lt;p&gt;The increase in maximum payload size from 256 KB to 1 MB enables developers to build more efficient distributed architectures. You can use this enhancement to transport richer context in event and message payloads, reducing the need for complex workarounds that previously added architectural complexity and operational overhead. This added room to transmit rich context means that you can streamline your workflows, improve observability, and reduce architectural complexity whether using choreography or orchestration patterns.&lt;/p&gt; 
&lt;p&gt;Go to the developer guides for &lt;a href="https://docs.aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html"&gt;Amazon EventBridge&lt;/a&gt;, and &lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html"&gt;Amazon SQS,&lt;/a&gt; to learn more about how to take advantage of this update.&lt;/p&gt; 
&lt;p&gt;To learn more about serverless architectures, visit &lt;a href="https://serverlessland.com/"&gt;Serverless Land&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Simplify network segmentation for AWS Outposts racks with multiple local gateway routing domains</title>
		<link>https://aws.amazon.com/blogs/compute/simplify-network-segmentation-for-aws-outposts-racks-with-multiple-local-gateway-routing-domains/</link>
					
		
		<dc:creator><![CDATA[Brianna Rosentrater]]></dc:creator>
		<pubDate>Fri, 16 Jan 2026 18:49:35 +0000</pubDate>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Outposts rack]]></category>
		<guid isPermaLink="false">4043a52fea844cf29608e9ad0cbbd0e13a14705d</guid>

					<description>AWS now supports multiple local gateway (LGW) routing domains on AWS Outposts racks to simplify network segmentation. Network segmentation is the practice of splitting a computer network into isolated subnetworks, or network segments. This reduces the attack surface so that if a host on one network segment is compromised, the hosts on the other network segments are not affected. Many customers in regulated industries such as manufacturing, health care and life sciences, banking, and others implement network segmentation as part of their on-premises network security standards to reduce the impact of a breach and help address compliance requirements.</description>
										<content:encoded>&lt;p&gt;AWS now supports multiple &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html" target="_blank" rel="noopener noreferrer"&gt;local gateway (LGW) routing domains&lt;/a&gt; on &lt;a href="https://aws.amazon.com/outposts/rack/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts racks&lt;/a&gt; to simplify network segmentation. Network segmentation is the practice of splitting a computer network into isolated subnetworks, or network segments. This reduces the attack surface so that if a host on one network segment is compromised, the hosts on the other network segments are not affected. Many customers in regulated industries such as manufacturing, health care and life sciences, banking, and others implement network segmentation as part of their on-premises network security standards to reduce the impact of a breach and help address compliance requirements. Some AWS services also have network requirements that specify certain IP ranges to be used for endpoints, and may or may not support &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/routing.html#ip-addressing" target="_blank" rel="noopener noreferrer"&gt;customers bringing their own IP pool&lt;/a&gt; (also called CoIP routing, see &lt;a href="https://aws.amazon.com/blogs/compute/how-to-choose-between-coip-and-direct-vpc-routing-modes-on-aws-outposts-rack/" target="_blank" rel="noopener noreferrer"&gt;How to choose between CoIP and Direct VPC routing (DVR) modes on AWS Outposts rack&lt;/a&gt; for more information). Customers want the flexibility to use both routing modes (CoIP and DVR) on the same logical Outpost. With this new feature, AWS Outposts racks now support multiple LGW routing domains to meet subnetwork isolation and cloud service network requirements in an on-premises environment. For example, a leading automotive company&amp;nbsp;deploys&amp;nbsp;latency-sensitive manufacturing workloads on Outposts racks in a multi-AZ architecture for resiliency. This feature provides traffic separation between routing domains and enables both customer-owned IP (CoIP) and direct VPC routing (DVR) modes on the same logical Outpost.&lt;/p&gt; 
&lt;p&gt;In this post you will learn how to use multiple LGW routing domains on Outposts racks and considerations for implementation.&lt;/p&gt; 
&lt;h2&gt;Overview&lt;/h2&gt; 
&lt;p&gt;With the introduction of multiple LGW routing domains on Outposts, you can now create multiple routing domains and associate one or more VLANs with each routing domain. This allows you to integrate your Outposts rack into your existing on-premises network schema. Each LGW routing domain will have a unique &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/vif-vif-groups.html" target="_blank" rel="noopener noreferrer"&gt;LGW Virtual Interface (VIF) Group&lt;/a&gt; and an &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing.html" target="_blank" rel="noopener noreferrer"&gt;LGW Route Table&lt;/a&gt;, enabling logical network traffic isolation. You can have a mix of up to 10 active routing domains with route tables using either DVR or CoIP routing mode, and you can make changes to these routing domains as needed in a self-service fashion allowing for network flexibility as architectures are updated over time. These settings can be found in the AWS Outposts console under the &lt;strong&gt;Networking&lt;/strong&gt; tab in the menu.&lt;/p&gt; 
&lt;p&gt;The following diagram shows an example of 3 VPCs, each with at least 1 subnet on the Outpost rack, and each VPC corresponds to its own routing domain. Each routing domain can then be associated with one or more VLANs, and one or more VPCs. You can only associate a VPC to one LGW routing domain per Outpost.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-1.png" alt="Architecture diagram showing 3 routing domains uplinking to an on-premises network."&gt;&lt;/p&gt; 
&lt;p&gt;Figure 1 – Architecture diagram showing 3 routing domains&lt;/p&gt; 
&lt;h2&gt;Walkthrough&lt;/h2&gt; 
&lt;p&gt;Before creating a LGW routing domain, first you’ll need to &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html#vif-best-practices" target="_blank" rel="noopener noreferrer"&gt;create an LGW VIF group&lt;/a&gt; and an LGW route table. A local gateway routing domain is the association of a local gateway route table and local gateway VIF group. Each VIF group can be associated with one or more VLANs, but a route table can only be associated with one VIF group.&lt;/p&gt; 
&lt;p&gt;To create a LGW VIF Group, navigate to the AWS Outposts console, go to &lt;strong&gt;LGW virtual interfaces groups&lt;/strong&gt;, and select &lt;strong&gt;Create VIF group&lt;/strong&gt;. Enter your VIF details which include &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/outposts-rack2ndgen-local-rack.html#local-gateway-bgp-connectivity" target="_blank" rel="noopener noreferrer"&gt;BGP and VLAN routing information&lt;/a&gt;, you must create 4 LGW VIFs per VIF group.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-2.png" alt="Creating VIF group for RD1 routing domain"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 2 – Creating VIF group for RD1 routing domain&lt;/p&gt; 
&lt;p&gt;After creating your VIF group, create a LGW route table. You’ll have the option to use &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing.html#direct-vpc-routing" target="_blank" rel="noopener noreferrer"&gt;Direct VPC Routing (DVR)&lt;/a&gt; or &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing.html#ip-addressing" target="_blank" rel="noopener noreferrer"&gt;Customer-owned IP address pool (CoIP)&lt;/a&gt; routing. If CoIP routing is selected, you’ll have the option to enter your CIDR before creating. A LGW route table’s routing mode cannot be changed after creating. However, you can disassociate a LGW route table from a VIF group and attach a new route table if you need to change the routing mode of a VIF group.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-3.png"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 3 – Creating LGW route table for RD1 routing domain&lt;/p&gt; 
&lt;p&gt;After you’ve created your LGW route table and VIF group, you can proceed to the final step which is to &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html#creating-routing-domains" target="_blank" rel="noopener noreferrer"&gt;create your LGW routing domain&lt;/a&gt; where you will associate the LGW route table and VIF group.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-4.png" alt="Create LGW routing domain form for RD1 example"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 4 – Creating LGW routing domain for RD1&lt;/p&gt; 
&lt;p&gt;You can view and create up to 10 active routing domains through the AWS Outposts console under the &lt;strong&gt;Networking&lt;/strong&gt; tab.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-5.png"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 5 – Local Gateway (LGW) routing domains&lt;/p&gt; 
&lt;h2&gt;Considerations&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;Multiple LGW routing domains feature is only available on &lt;a href="https://aws.amazon.com/blogs/aws/announcing-second-generation-aws-outposts-racks-with-breakthrough-performance-and-scalability-on-premises/" target="_blank" rel="noopener noreferrer"&gt;second-generation Outposts racks&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Avoid overlapping IP addresses across subnetworks and local routing domains as those can create IP routing conflicts.&lt;/li&gt; 
 &lt;li&gt;A VIF group can only be associated to one LGW route table/routing domain at a time. A routing domain is the association of a VIF group and LGW route table.&lt;/li&gt; 
 &lt;li&gt;LGW routing domain will allow for logical local network traffic isolation, however all traffic will still travel across your &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/local-rack.html#link-aggregation" target="_blank" rel="noopener noreferrer"&gt;local gateway Link Aggregation Control Protocol (LACP) Link Aggregation Group (LAG)&lt;/a&gt; to uplink into your on-premises network.&lt;/li&gt; 
 &lt;li&gt;Additional network isolation can be achieved through &lt;a href="https://secure.cisco.com/secure-firewall/docs/virtual-routing-and-forwarding" target="_blank" rel="noopener noreferrer"&gt;Virtual Routing and Forwarding (VRF)&lt;/a&gt; on Cisco platforms or &lt;a href="https://www.juniper.net/documentation/us/en/software/junos/routing-overview/topics/concept/routing-instances-overview.html" target="_blank" rel="noopener noreferrer"&gt;Routing Instances&lt;/a&gt; on Juniper equipment, providing logical separation of routing tables and enabling secure multi-tenancy within the same physical infrastructure.&lt;/li&gt; 
 &lt;li&gt;You can only associate a VPC to one LGW routing domain per Outpost. You can self-serve to change VPC association as needed. Multiple on-premises &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/local-rack.html#vlans" target="_blank" rel="noopener noreferrer"&gt;VLANs&lt;/a&gt; can be connected to a single routing domain.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;This post demonstrated how to configure multiple local routing domains on Outposts racks to integrate into your on-premises network. For more information see &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html"&gt;LGW routing domains&lt;/a&gt; section in the AWS Outposts user guide. Reach out to your AWS account team to learn more about Outposts racks network configuration options.&lt;/p&gt; 
&lt;p&gt;In addition to multiple LGW routing domains, we have also announced several updates to Outposts in the past week to help you meet digital sovereignty and local data processing needs. To learn more, read the following announcements:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li style="list-style-type: none"&gt; 
  &lt;ul&gt; 
   &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/opening-the-aws-european-sovereign-cloud/"&gt;AWS Outposts as an option to extend the AWS European Sovereign Cloud&lt;/a&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/01/amazon-s3-second-generation-aws-outposts-racks/"&gt;Amazon S3 on Outposts now available on second-generation Outposts racks&lt;/a&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/01/second-generation-aws-outposts-racks-additional-aws-regions/"&gt;Second-generation Outposts racks now supported in the South America (São Paulo) and Europe (Stockholm) Regions&lt;/a&gt;&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;To discuss Outposts with an expert on any of these topics, submit &lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;this form&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Optimizing storage performance for Amazon EKS on AWS Outposts</title>
		<link>https://aws.amazon.com/blogs/compute/optimizing-storage-performance-for-amazon-eks-on-aws-outposts/</link>
					
		
		<dc:creator><![CDATA[Arun Kumar]]></dc:creator>
		<pubDate>Tue, 13 Jan 2026 18:57:12 +0000</pubDate>
				<category><![CDATA[Amazon Elastic Block Store (Amazon EBS)]]></category>
		<category><![CDATA[Amazon Elastic File System (EFS)]]></category>
		<category><![CDATA[Amazon Elastic Kubernetes Service]]></category>
		<category><![CDATA[AWS Outposts]]></category>
		<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Amazon EBS]]></category>
		<category><![CDATA[Amazon EFS]]></category>
		<category><![CDATA[Amazon EKS]]></category>
		<category><![CDATA[Amazon S3]]></category>
		<guid isPermaLink="false">2e302008fe1896e9f4a550585f79afd24a8f81e9</guid>

					<description>&lt;a href="https://aws.amazon.com/eks/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (Amazon EKS)&lt;/a&gt; on 
&lt;a href="https://aws.amazon.com/outposts/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts&lt;/a&gt; brings the power of managed 
&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-concepts.html" target="_blank" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; to your on-premises infrastructure. Use Amazon EKS on Outposts rack to create hybrid cloud deployments that maintain consistent AWS experiences across environments. As organizations increasingly adopt edge computing and hybrid architectures, storage optimization and performance tuning become critical for successful workload deployment.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://aws.amazon.com/eks/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (Amazon EKS)&lt;/a&gt; on &lt;a href="https://aws.amazon.com/outposts/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts&lt;/a&gt; brings the power of managed &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-concepts.html" target="_blank" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; to your on-premises infrastructure. Use Amazon EKS on Outposts rack to create hybrid cloud deployments that maintain consistent AWS experiences across environments. As organizations increasingly adopt edge computing and hybrid architectures, storage optimization and performance tuning become critical for successful workload deployment.&lt;/p&gt; 
&lt;p&gt;Outposts extend AWS infrastructure, services, APIs, and tools to virtually any datacenter, co-location space, or on-premises facility. In this blog post you will learn about your storage options and their performance characteristics which is essential for building resilient, high-performing applications using Amazon EKS on Outposts.&lt;/p&gt; 
&lt;h2&gt;Amazon EKS on Outposts deployment options&lt;/h2&gt; 
&lt;p&gt;The following two sections outline the differences between &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/eks-outposts.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EKS extended and local cluster deployment options available on Outposts&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Amazon EKS extended cluster architecture&lt;/h3&gt; 
&lt;p&gt;Amazon EKS extended clusters on Outposts provide a powerful solution for organizations seeking to use the benefits of Kubernetes while maintaining certain workloads on-premises, as shown in the following figure. This hybrid architecture allows businesses to extend their EKS clusters from the AWS Cloud to their own data centers or edge locations using Outposts. The Kubernetes control plane remains in the &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Region&lt;/a&gt;, providing centralized management and benefiting from the AWS infrastructure in the cloud and on the Outpost.&lt;/p&gt; 
&lt;p&gt;Outposts is designed to be a connected service, and needs reliable&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;network connectivity to the AWS Region using the &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/service-links.html" target="_blank" rel="noopener noreferrer"&gt;Outposts service link&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic1.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25558 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic1.png" alt="" width="1430" height="1698"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 1 : Extended cluster&lt;/p&gt; 
&lt;h3&gt;Amazon EKS local cluster architecture&lt;/h3&gt; 
&lt;p&gt;Amazon EKS local clusters deploy the Kubernetes control plane on your Outpost, as shown in the following figure. This provides greater network resilience against outages as cluster operations run entirely on the Outposts and reduces the dependency on network connectivity to the AWS Region. Having the Kubernetes control plane hosted on your Outpost also reduces latency for cluster operations.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic2.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25557 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic2.png" alt="" width="1430" height="1925"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&amp;nbsp; Figure 2: Local cluster&lt;/p&gt; 
&lt;h3&gt;&lt;/h3&gt; 
&lt;h2&gt;Storage options for Amazon EKS extended clusters on Outposts&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/" target="_blank" rel="noopener noreferrer"&gt;Persistent Volumes (PV)&lt;/a&gt; and &lt;a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/" target="_blank" rel="noopener noreferrer"&gt;Persistent Volume Claims (PVC)&lt;/a&gt; serve as a critical abstraction layer in Kubernetes, separating the storage consumption details from storage provisioning, and allowing administrators to manage storage resources independently from how applications consume them. PVs and PVCs make sure of data persistence across pod restarts and rescheduling events, making them essential for applications that need to maintain state, such as databases, file storage systems, and other data-intensive workloads. The abstraction provided by PV and PVC enables platform-agnostic storage management, where applications can request storage through PVCs without needing to know the underlying storage implementation details. PVs and PVCs support dynamic provisioning through &lt;a href="https://kubernetes.io/docs/concepts/storage/storage-classes/" target="_blank" rel="noopener noreferrer"&gt;Storage Classes&lt;/a&gt;, allowing for automated storage allocation based on application demands, while also providing features such as access modes, capacity management, and reclaim policies to effectively manage the storage lifecycle in a Kubernetes cluster.&lt;/p&gt; 
&lt;h3&gt;Integrating Amazon EBS with Amazon EKS&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ebs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Block Store (Amazon EBS)&lt;/a&gt; provides high-performance block storage that’s ideal for low-latency applications providing consistent performance. When deployed on Outposts racks, EBS volumes are stored on the Outposts hardware, providing significant performance advantages over network-attached storage solutions, as shown in the following figure.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic3.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25556 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic3.png" alt="" width="1430" height="1961"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 3 : Integrating Amazon EBS with Amazon EKS on Outposts&lt;/p&gt; 
&lt;h3&gt;Benefits and use cases&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Storage&lt;/strong&gt;: EBS volumes on Outposts racks provide data access without dependency on external connectivity.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Performance:&lt;/strong&gt; Local storage delivers consistent latency and high IOPS/throughput.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost: &lt;/strong&gt;On-premises storage eliminates data transfer costs and reduces bandwidth needs, lowering the total cost of ownership.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Implementation considerations&lt;/h3&gt; 
&lt;p&gt;Consider the following when using EBS on Outposts rack:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;EBS volumes on Outposts are tied to a single rack and the availability zone the Outpost is homed to, needing applications to address single-point-of-failure risks.&lt;/li&gt; 
 &lt;li&gt;Protect data using EBS snapshots in the parent Region and schedule regular backups.&lt;/li&gt; 
 &lt;li&gt;Capacity on Outposts is finite, monitor Outposts storage usage and plan expansions proactively to avoid insufficient capacity errors.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Refer to &lt;a href="https://github.com/kubernetes-sigs/aws-ebs-csi-driver/tree/master/examples/kubernetes/dynamic-provisioning" target="_blank" rel="noopener noreferrer"&gt;Dynamic Volume Provisioning&lt;/a&gt;&amp;nbsp;to learn more about deploying pod with the EBS volume attached.&lt;/p&gt; 
&lt;h3&gt;Amazon EFS with Amazon EKS&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/efs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic File System (Amazon EFS)&lt;/a&gt; provides scalable, shared file storage that can be accessed across multiple AWS Availability Zones (AZs) and on-premises environments. Although Amazon EFS with Amazon EKS on Outposts maintains the same setup procedures as standard cloud deployments, there is a critical dependency on the &lt;a href="https://docs.aws.amazon.com/outposts/latest/server-userguide/service-links.html" target="_blank" rel="noopener noreferrer"&gt;service link&lt;/a&gt; connection between your Outposts and the AWS Region. Amazon EFS is not a locally supported service on Outposts, so connectivity to the AWS Region is required to use this service with your Outpost.&lt;/p&gt; 
&lt;p&gt;Amazon EFS allows multiple pods to concurrently access shared file systems. It is well-suited for applications that need collaborative data access, content management, and distributed processing workloads.&lt;/p&gt; 
&lt;h4&gt;Amazon EFS as a persistent storage solution for Amazon EKS extended cluster instances&lt;/h4&gt; 
&lt;p&gt;Amazon EFS as a PV for your Amazon EKS extended cluster operates through a hybrid architecture where the Amazon EFS file system resides in the Region, but mount points can be created on the worker nodes running on Outposts subnets through the service link as shown in the following figure.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic4.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25555 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic4.png" alt="" width="1430" height="1897"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 4 : Amazon EFS as a persistent storage solution for extended clusters&lt;/p&gt; 
&lt;h4&gt;Benefits and use cases&lt;/h4&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Shared storage capabilities&lt;/strong&gt;: multiple pods can access a centralized file system, enabling shared data, code, and assets across instances.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: storage capacity and performance automatically scale with usage, eliminating manual provisioning and upfront planning.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Compliance&lt;/strong&gt;: Amazon EFS provides full file system features and compatibility for traditional applications, such as locking, permissions, and directory structure.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h4&gt;Challenges and limitations&lt;/h4&gt; 
&lt;p&gt;Consider the following when using Amazon EFS with Outposts:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Network latency: file access involves network traversal to Amazon EFS in the Region, adding more latency and making small or metadata operations potentially slow for latency-sensitive applications.&lt;/li&gt; 
 &lt;li&gt;Throughput: aggregate throughput is restricted by the available bandwidth on the service link between the Outposts and AWS Region. This impacts concurrent access and large file transfers during peak usage.&lt;/li&gt; 
 &lt;li&gt;Dependency on AWS Region connectivity: Amazon EFS needs continuous connectivity to the parent Region. Disruptions may affect file system availability, operations, and disaster recovery processes.&lt;/li&gt; 
 &lt;li&gt;Data Transfer charges: Since EFS is in AWS Parent region and EKS worker nodes and pods are in Outpost additional charges are applicable.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;You can refer to&amp;nbsp;&lt;a href="https://aws.amazon.com/efs/features/" target="_blank" rel="noopener noreferrer"&gt;Amazon EFS Features&lt;/a&gt; and &lt;a href="https://aws.amazon.com/efs/when-to-choose-efs/" target="_blank" rel="noopener noreferrer"&gt;When to Choose Amazon EFS&lt;/a&gt;&amp;nbsp;for&amp;nbsp;more detailed insights into its capabilities and use cases.&lt;/p&gt; 
&lt;h4&gt;Deploying pods on extended clusters using Amazon EFS as PV&lt;/h4&gt; 
&lt;p&gt;Refer to &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html" target="_blank" rel="noopener noreferrer"&gt;Use Elastic File System Storage with Amazon EFS&lt;/a&gt; for deployment guidance. Note, Create Amazon EFS mount targets in subnets that are in the same Availability Zone (AZ) as the Outposts subnets.&lt;/p&gt; 
&lt;h3&gt;Amazon S3 with Amazon EKS extended cluster&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&lt;/a&gt;&amp;nbsp;on Outposts delivers local object storage on your Outposts, allowing applications to use Amazon S3 APIs for storing and retrieving data while keeping it onsite. It is ideal for workloads that need Amazon S3 compatibility, low latency access to object data, and local data residency.&lt;/p&gt; 
&lt;p&gt;You should use Amazon S3 access point &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Names (ARNs)&lt;/a&gt; and not bucket ARNs for proper integration with Amazon EKS workloads.&lt;/p&gt; 
&lt;p&gt;Learn more about &lt;a href="https://aws.amazon.com/s3/outposts/" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 on Outposts&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic5.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25554 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic5.png" alt="" width="1368" height="1980"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 5 : Amazon S3 with Amazon EKS extended cluster on Outposts&lt;/p&gt; 
&lt;h3&gt;Benefits and use cases&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Data archiving and compliance: Enables cost-effective, locally retained storage for logs, audit trails, regulatory compliance, backups, and sensitive healthcare data with strict residency requirements.&lt;/li&gt; 
 &lt;li&gt;Content distribution and media: Provides ultra-low latency local storage for serving static content, media streaming, digital asset management, and gaming asset delivery.&lt;/li&gt; 
 &lt;li&gt;Data lake and analytics: Supports local data processing for analytics, ETL, machine learning (ML), real-time Internet of Things (IoT) data handling, and business intelligence with reduced latency and transfer costs.&lt;/li&gt; 
 &lt;li&gt;Application integration: Seamlessly integrates with Amazon S3 compatible apps for backup, synchronization, microservices storage, API-driven workflows, and container image management on-premises.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Refer to&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/s3-outposts/S3OnOutpostsRestrictionsLimitations.html" target="_blank" rel="noopener noreferrer"&gt;How is Amazon S3 on Outposts different from Amazon S3&lt;/a&gt; and the&amp;nbsp;&lt;a href="https://aws.amazon.com/s3/storage-classes/#topic-6" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 on Outposts&lt;/a&gt;&amp;nbsp;documentation to learn more.&lt;/p&gt; 
&lt;h4&gt;Deploying pods on extended clusters using Amazon S3 as PV&lt;/h4&gt; 
&lt;p&gt;&lt;strong&gt;Step 1: &lt;/strong&gt;Create Amazon S3 on Outposts bucket&lt;br&gt; &lt;strong&gt;Step 2: &lt;/strong&gt;Create Amazon S3 Access Point (necessary for Amazon EKS integration)&lt;br&gt; &lt;strong&gt;Step 3: &lt;/strong&gt;Configure IAM roles and policies&lt;br&gt; &lt;strong&gt;Step 4: &lt;/strong&gt;Install Amazon S3 CSI driver&lt;br&gt; &lt;strong&gt;Step 5:&amp;nbsp;&lt;/strong&gt;Deploying your pod with Amazon S3 volume attached&lt;br&gt; &lt;strong&gt;Step 6:&lt;/strong&gt; Complete Amazon S3 configuration with Kubernetes&lt;/p&gt; 
&lt;p&gt;Refer to the documentation &lt;a href="https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/outpost_bucket.yaml" target="_blank" rel="noopener noreferrer"&gt;Static Provisioning on Outposts bucket&lt;/a&gt; for more details on Step 5.&lt;/p&gt; 
&lt;h2&gt;Best practices for optimizing performance&lt;/h2&gt; 
&lt;p&gt;Optimizing performance starts with selecting the right storage type for your workload: Amazon EBS for low-latency, high-throughput block storage; Amazon EFS for shared POSIX-compliant file systems; and Amazon S3 for scalable object storage with API compatibility. Ensure proper volume sizing, monitor usage proactively, and configure CPU and memory requests accurately to balance performance and efficiency—auto scaling and QoS classes can further optimize resource management. Improve data locality by using local storage, apply caching with intelligent eviction, and design for efficient, asynchronous, and compressed data access patterns.&lt;/p&gt; 
&lt;h2&gt;Monitoring and observability&lt;/h2&gt; 
&lt;p&gt;Monitoring key performance metrics is essential to maintain storage efficiency and application reliability. For Amazon EBS, track IOPS, throughput, latency, burst balance, queue depth, and snapshot performance to avoid degradation—see the &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/outposts-cloudwatch-metrics.html#metrics-ebs" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch metrics for Amazon EBS&lt;/a&gt; for the full list. For Amazon EFS, monitor total I/O, throughput, client connections, metadata operations, burst credits, and Regional data transfers to support effective capacity planning—refer to &lt;a href="https://docs.aws.amazon.com/efs/latest/ug/efs-metrics.html" target="_blank" rel="noopener noreferrer"&gt;CloudWatch metrics for Amazon EFS&lt;/a&gt;. For Amazon S3, observe request and error rates, data transfer, storage usage, latency, multipart upload efficiency, and access patterns to optimize performance and cost—see &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html" target="_blank" rel="noopener noreferrer"&gt;Metrics and dimensions&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Security considerations&lt;/h2&gt; 
&lt;p&gt;Strong security practices are critical for Amazon EKS on Outposts. Use &lt;a href="https://aws.amazon.com/kms/" target="_blank" rel="noopener noreferrer"&gt;AWS Key Management Service (AWS KMS)&lt;/a&gt; for Amazon EBS encryption, encrypt Amazon EFS data at rest and in transit, and enable server- or client-side encryption for Amazon S3. Enforce TLS for all data transfers and apply key rotation with compliance controls. Implement least privilege IAM policies, scoped roles, and Kubernetes Role-Based Access Control (RBAC) for granular pod access. Secure traffic with security groups and NACLs, and maintain audit logs for all storage operations.&lt;/p&gt; 
&lt;h2&gt;Cost optimization strategies&lt;/h2&gt; 
&lt;p&gt;Manage storage costs by right-sizing volumes, automating lifecycle policies, selecting appropriate storage classes, monitoring data transfer, and using de-duplication and compression where applicable. Lower operational expenses through automated backups, infrastructure as code (IaC), monitoring automation, leveraging managed services, applying cost allocation tags, and conducting regular usage reviews.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Amazon EKS on Outposts empowers organizations to build hybrid applications with storage options that align to performance, compliance, and data residency needs. By selecting the right storage solution for each workload and leveraging Outposts’ local infrastructure, you can reduce latency, minimize network dependencies, and maintain consistency across environments. As Outposts capabilities continue to evolve, they offer a strong foundation for modern, resilient, and cost-efficient hybrid cloud architectures.&lt;/p&gt; 
&lt;p&gt;Reach out to your AWS account team, or fill out this&amp;nbsp;&lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;form&lt;/a&gt;&amp;nbsp;to learn more about running containarized applications on Outposts.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
	</channel>
</rss>