<?xml version="1.0" encoding="UTF-8" standalone="no"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" version="2.0">

<channel>
	<title>AWS Compute Blog</title>
	<atom:link href="https://aws.amazon.com/blogs/compute/feed/" rel="self" type="application/rss+xml"/>
	<link>https://aws.amazon.com/blogs/compute/</link>
	<description/>
	<lastBuildDate>Tue, 14 Apr 2026 16:18:12 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>AWS Outposts monitoring and reporting: A comprehensive Amazon EventBridge solution</title>
		<link>https://aws.amazon.com/blogs/compute/aws-outposts-monitoring-and-reporting-a-comprehensive-amazon-eventbridge-solution/</link>
					
		
		<dc:creator><![CDATA[Matt Price]]></dc:creator>
		<pubDate>Tue, 14 Apr 2026 16:18:12 +0000</pubDate>
				<category><![CDATA[Amazon DynamoDB]]></category>
		<category><![CDATA[Amazon Elastic Block Store (Amazon EBS)]]></category>
		<category><![CDATA[Amazon EventBridge]]></category>
		<category><![CDATA[Amazon RDS]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Organizations]]></category>
		<category><![CDATA[AWS Outposts]]></category>
		<category><![CDATA[AWS Outposts rack]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Resource Access Manager (RAM)]]></category>
		<guid isPermaLink="false">60eb57ed8879462a862a621ab1a93ec42341ab0d</guid>

					<description>Organizations using AWS Outposts racks commonly manage capacity from a single AWS account and share resources through AWS Resource Access Manager (AWS RAM) with other AWS accounts (consumer accounts) within AWS Organizations. In this post, we demonstrate one approach to create a multi-account serverless solution to surface costs in shared AWS Outposts environments using Amazon […]</description>
										<content:encoded>&lt;p&gt;Organizations using &lt;a href="https://aws.amazon.com/outposts/rack/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts racks&lt;/a&gt; commonly manage capacity from a single AWS account and share resources through &lt;a href="https://aws.amazon.com/ram/" target="_blank" rel="noopener noreferrer"&gt;AWS Resource Access Manager&lt;/a&gt; (AWS RAM) with other AWS accounts (consumer accounts) within &lt;a href="https://aws.amazon.com/organizations/" target="_blank" rel="noopener noreferrer"&gt;AWS Organizations&lt;/a&gt;. In this post, we demonstrate one approach to create a multi-account serverless solution to surface costs in shared AWS Outposts environments using &lt;a href="https://aws.amazon.com/eventbridge/" target="_blank" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt;, &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt;. This solution reports on instance runtime and allocated storage for &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt;, &lt;a href="https://aws.amazon.com/rds" target="_blank" rel="noopener noreferrer"&gt;Amazon Relational Database Services (Amazon RDS)&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/ebs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Block Store (Amazon EBS)&lt;/a&gt; services running on Outposts racks. In turn, teams can track the cost of infrastructure associated with their workloads across AWS accounts. This solution is a framework that can be customized to meet your organization’s specific business objectives.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;The following is the &lt;a href="https://developer.hashicorp.com/terraform" target="_blank" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt;-based reference architecture used to represent the solution, including EventBridge, DynamoDB, and Lambda across a multi-account environment. Relevant launch events are tracked in EventBridge that invoke Lambda functions, which are logged in DynamoDB tables (&lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports" target="_blank" rel="noopener noreferrer"&gt;see sample code&lt;/a&gt;). This allows reporting on captured event data through the &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/index.html" target="_blank" rel="noopener noreferrer"&gt;AWS SDK for Python (Boto3)&lt;/a&gt;.&amp;nbsp;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25970" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-1.png" alt="AWS architecture diagram showing data collection and workload account integration with EventBridge, CloudTrail, and Outposts" width="1280" height="720"&gt;&lt;/a&gt;&lt;br&gt; &lt;em&gt;Figure 1: Reference architecture for reporting solution on AWS Outposts&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;The following prerequisites are necessary to implement this solution:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;At least two active AWS accounts in the same &lt;a href="https://aws.amazon.com/organizations/" target="_blank" rel="noopener noreferrer"&gt;AWS Organization&lt;/a&gt; as the Outposts owner account. 
  &lt;ul&gt; 
   &lt;li&gt;One AWS account, which is the data collection account to store the event data (this doesn’t have to be the account that owns the Outposts).&lt;/li&gt; 
   &lt;li&gt;Workload accounts where resources are deployed on Outposts.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt; installed and configured on an administrative instance. For more information, see &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html" target="_blank" rel="noopener noreferrer"&gt;Installing, updating, and uninstalling the AWS CLI &lt;/a&gt;in the AWS CLI documentation.&lt;/li&gt; 
 &lt;li&gt;Terraform installed on the same administrative instance. For more information, see the &lt;a href="https://learn.hashicorp.com/tutorials/terraform/install-cli" target="_blank" rel="noopener noreferrer"&gt;Terraform documentation&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Make sure that you have the necessary &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; permissions necessary to create the AWS resources using Terraform in all accounts.&lt;/li&gt; 
 &lt;li&gt;Prior Experience with Terraform deployments on AWS Cloud. To increase your familiarity, you can explore &lt;a href="https://learn.hashicorp.com/collections/terraform/aws-get-started" target="_blank" rel="noopener noreferrer"&gt;Get Started – AWS&lt;/a&gt; on the HashiCorp website.&lt;/li&gt; 
 &lt;li&gt;Access to clone the &lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts Monitoring and Reporting&lt;/a&gt; git repository.&lt;/li&gt; 
 &lt;li&gt;SDK for Python installed and configured on a local machine.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Walkthrough&lt;/h2&gt; 
&lt;p&gt;The following sections walk you through how to deploy this solution.&lt;/p&gt; 
&lt;h3&gt;Deploying in data collection account&lt;/h3&gt; 
&lt;p&gt;Step 1: Create a bucket in-Region to hold the Terraform state file in the data collection account.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws s3 mb s3://state-bucket-name&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 2:&amp;nbsp;Clone the repository.On your local machine, clone the repository that contains the sample by running the following command:&lt;/p&gt; 
&lt;p&gt;git clone &lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports.git" target="_blank" rel="noopener noreferrer"&gt;https://github.com/aws-samples/sample-outposts-monitoring-and-reports.git&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Navigate to the cloned repository by running the following command:cd sample-outposts-monitoring-and-reports/data_collection&lt;/p&gt; 
&lt;p&gt;Step 3: Edit the providers.tf to configure the AWS provider.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;

provider "aws" {
&amp;nbsp;&amp;nbsp;region = ""
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 4: Edit the backend.tf to provide the Terraform state bucket and Outposts anchored &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Region&lt;/a&gt;.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;terraform {
&amp;nbsp;&amp;nbsp;backend "s3" {
&amp;nbsp;&amp;nbsp; &amp;nbsp;bucket = ""
&amp;nbsp;&amp;nbsp; &amp;nbsp;key &amp;nbsp; &amp;nbsp;= "terraform.tfstate"
&amp;nbsp;&amp;nbsp; &amp;nbsp;region = ""
&amp;nbsp;&amp;nbsp;}
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 5: Modify the variables.tf.From the root directory of the cloned repository, modify the variables.tf file with the target Region and workload accounts as shown in the following example. The target Region is the collection destination.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-typescript"&gt;variable "aws_region" {
&amp;nbsp;&amp;nbsp;description = "AWS region for resources"
&amp;nbsp;&amp;nbsp;type &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= string
&amp;nbsp;&amp;nbsp;default &amp;nbsp; &amp;nbsp; = ""
}

variable "allowed_account_id" {
&amp;nbsp;&amp;nbsp;description = "AWS account ID allowed to put events to the event bus"
&amp;nbsp;&amp;nbsp;

}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Initialize the configuration directory of the data collection account to download and install the providers defined in the configuration by running the following command:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform init&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;All resources are deployed with minimal permissions to serve as an example. We recommend viewing all configurations to make sure that they meet your organizational security policies.&amp;nbsp;Step 6: Deploy infrastructure in the data collection account.Run terraform plan on the configuration to and review which resources are created:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform plan&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;When you have reviewed the plan, run the following command and enter “yes” to accept the changes and deploy:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform apply&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Deployment should take less than 5 minutes. If you receive any errors, review the previously mentioned steps to ensure that you followed them in their entirety. If the errors persist, reach out to AWS Support for additional guidance.&lt;/p&gt; 
&lt;h3&gt;Deploying in workload account&lt;/h3&gt; 
&lt;p&gt;The data collection account receives events from EventBridge and performs intelligent analysis and storage from the AWS Outposts resource data.Step 1: Navigate to the workload account directory by running the following command:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;cd ../workload_account&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 2: Edit variables.tf to set up the Region and event bus &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Name (ARN).&amp;nbsp;&lt;/a&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-typescript"&gt;variable "aws_region" {
&amp;nbsp;&amp;nbsp;description = "AWS region for resources"
&amp;nbsp;&amp;nbsp;type &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= string
&amp;nbsp;&amp;nbsp;default &amp;nbsp; &amp;nbsp; = ""
}

variable "event_bus_arn" {
&amp;nbsp;&amp;nbsp;description = "target event bus arn"
&amp;nbsp;&amp;nbsp;type &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= string
&amp;nbsp;&amp;nbsp;default &amp;nbsp; &amp;nbsp; = ""
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Edit the code to update the event bus name.&lt;/p&gt; 
&lt;p&gt;Step 3: Run the following command to create the backend.tf and create the Terraform state bucket for each workload account.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;./init-backend.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This is an idempotent operation that creates a file from the template and a bucket with a fixed name including the account ID if it doesn’t exist.&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;Step 4:&amp;nbsp;Initialize the configuration directory of the Data Collection Account to download and install the providers defined in the configuration by running the following command:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform init&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Step 5: Deploy the infrastructure in the Data Collection Account.Run a terraform plan on the configuration and review which resources are created:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform plan&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After you have reviewed the plan, run the following command and enter “yes” to accept the changes and deploy:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform apply&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Deployment should take less than 5 minutes. If you receive any errors, follow the troubleshooting steps in the previous section.&lt;/p&gt; 
&lt;p&gt;At this point, any Amazon EC2 or Amazon RDS instances and Amazon EBS volumes are logged to the DynamoDB tables in the data collection account. Repeat Steps 3–5 for each workload account running resources on AWS Outposts with appropriate account credentials. If you’re deploying at scale and using &lt;a href="https://docs.aws.amazon.com/controltower/latest/userguide/what-is-control-tower.html" target="_blank" rel="noopener noreferrer"&gt;AWS Control Tower&lt;/a&gt; consider using &lt;a href="https://docs.aws.amazon.com/controltower/latest/userguide/aft-overview.html" target="_blank" rel="noopener noreferrer"&gt;AWS Control Tower Account Factory for Terraform (AFT)&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Running monthly reports&lt;/h2&gt; 
&lt;p&gt;With this solution in place, reports can be generated on demand. These reports can be customized by modifying the Python example scripts shown to support your needs. Reports can be created from a local machine with credentials that have access to the DynamoDB tables in the data collection account. The examples were created from the source directory of the data collection account git repository.&amp;nbsp;Run the following command to view the report for Amazon RDS usage in September 2025:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;./rds_runtime_calculator.py --year 2025 --month 9 --output rds_report.csv&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-2.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25971" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-2.png" alt="Spreadsheet showing RDS database instances with configuration details, storage allocation, and operational status in us-west-2 region" width="1519" height="155"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 2: Example of RDS runtime report&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;Run the following command to view the report for Amazon EBS usage in September 2025:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;./ebs_volume_reporter.py --year 2025 --month 9 --output ebs_report.csv&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-4.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25973" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-4.png" alt="EBS volume tracking table showing volume configurations, lifecycle hours, and active/deleted status in us-west-2" width="1431" height="95"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 3: Example of EBS usage report&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;p&gt;Run the following command to view the report for Amazon EC2 usage in September 2025:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;./ec2_runtime_calculator.py --month 9 --year 2025 --output ec2_report.csv&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-6.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25975" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/03/computeblog-2496-6.png" alt="EC2 instance tracking table showing c5.large instances with runtime hours and running/stopped status on AWS Outposts" width="1431" height="139"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 4: Example of EC2 runtime report&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt; 
&lt;h2&gt;Cleaning up&lt;/h2&gt; 
&lt;p&gt;Complete the following steps to clean up the resources that were deployed by this solution. For each workload account, complete the following:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;cd sample-outposts-monitoring-and-reports/workload_account&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform destroy &lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Enter “yes” to proceed. You can then manually empty and remove the terraform state S3 bucket for that account.&lt;/p&gt; 
&lt;p&gt;For the data collection, complete the following:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-js"&gt;cd ../data_collection&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-js"&gt;terraform destroy&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Enter “yes” to proceed. You can then manually empty and remove the terraform state S3 bucket for that account.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Customers who have shared multi-account Outposts deployments can use this solution to create account level reporting for Outposts resources using real-time event capture and processing, state analysis and categorization, historical usage metrics, and serverless architecture.&amp;nbsp;Teams can use this to visualize and report on the costs of running their workloads on Outposts. The event-driven design supports accurate tracking while maintaining low operational overhead. The solution scales effectively across multiple Outposts and accounts, providing a unified view of hybrid infrastructure. Keep in mind that you can extend the functionality described here to meet your business objectives.&lt;/p&gt; 
&lt;p&gt;Deploy this solution today using the &lt;a href="https://github.com/aws-samples/sample-outposts-monitoring-and-reports" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; to gain financial insights to share with the tenants of your Outposts workload accounts.&amp;nbsp;Reach out to your AWS account team, or fill out &lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;this form&lt;/a&gt; to learn more about Outposts.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building Memory-Intensive Apps with AWS Lambda Managed Instances</title>
		<link>https://aws.amazon.com/blogs/compute/building-memory-intensive-apps-with-aws-lambda-managed-instances/</link>
					
		
		<dc:creator><![CDATA[Guy Haddad]]></dc:creator>
		<pubDate>Fri, 10 Apr 2026 19:54:44 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Simple Storage Service (S3)]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Amazon S3]]></category>
		<category><![CDATA[AWS Compute]]></category>
		<guid isPermaLink="false">c4d2a0fd8a069c4ff4c99146159ea8e803cf7d0e</guid>

					<description>Building memory-intensive applications with AWS Lambda just got easier. AWS Lambda Managed Instances gives you up to 32 GB of memory—3x more than standard AWS Lambda—while maintaining the serverless experience you know. Modern applications increasingly require substantial memory resources to process large datasets, perform complex analytics, and deliver real-time insights for use cases such as […]</description>
										<content:encoded>&lt;p&gt;Building memory-intensive applications with &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; just got easier. &lt;a href="https://aws.amazon.com/lambda/lambda-managed-instances/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances&lt;/a&gt; gives you up to 32 GB of memory—3x more than standard AWS Lambda—while maintaining the serverless experience you know. Modern applications increasingly require substantial memory resources to process large datasets, perform complex analytics, and deliver real-time insights for use cases such as in-memory analytics, Machine Learning (ML) model inference, and real-time semantic search. AWS Lambda Managed Instances gives you a familiar serverless programming model and experience combined with the flexibility of being able to choose the underlying &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt; instance types and providing developers with access to large memory configurations.&lt;/p&gt; 
&lt;p&gt;In this post, you will see how AWS Lambda Managed Instances enables memory-intensive workloads that were previously challenging to run in serverless environments, using an AI-powered customer analytics application as a practical example. You’ll see cost savings of up to 33% compared to standard Lambda for predictable workloads, while eliminating the operational overhead of managing EC2 instances.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Understanding AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances runs your AWS Lambda functions on the Amazon EC2 instance types of your choice in your account, including &lt;a href="https://aws.amazon.com/ec2/graviton/" target="_blank" rel="noopener noreferrer"&gt;Graviton4&lt;/a&gt; and memory-optimized instance types. AWS handles underlying infrastructure lifecycle including provisioning, scaling, patching, and routing, while you benefit from Amazon EC2 pricing advantages like &lt;a href="https://aws.amazon.com/savingsplans/" target="_blank" rel="noopener noreferrer"&gt;Savings Plans&lt;/a&gt; and &lt;a href="https://aws.amazon.com/aws-cost-management/aws-cost-optimization/reserved-instances/" target="_blank" rel="noopener noreferrer"&gt;Reserved Instances&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Key benefits include:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Flexible instance selection:&lt;/strong&gt; Choose from compute-optimized (C), general-purpose (M), and memory-optimized (R) instance families&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Configurable memory-CPU ratios:&lt;/strong&gt; Optimize resource allocation for your workload&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Multi-concurrent invocations:&lt;/strong&gt; One execution environment handles multiple invocations simultaneously, improving utilization for I/O-heavy applications&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Dynamic scaling:&lt;/strong&gt; Instances scale based on CPU utilization without cold starts&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;AWS Lambda Managed Instances is best suited for high-volume, predictable workloads that benefit from sustained compute capacity and larger memory configurations.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Memory-Intensive Workloads Work Best with AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;This blog focuses on one of AWS Lambda Managed Instances’ most powerful capabilities: running memory-intensive workloads that require more than the standard AWS Lambda’s 10 GB memory and 250MB ZIP limits. Here are the use cases where AWS Lambda Managed Instances helps:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;In-Memory Analytics&lt;/strong&gt; — Load gigabytes of structured data into memory at initialization and serve sub-millisecond analytical queries across thousands of invocations&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ML Model Inference&lt;/strong&gt; — Keep large model weights resident in memory across invocations for consistent, low-latency inference without a dedicated endpoint.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Real-Time Semantic Search&lt;/strong&gt; — Build vector similarity search over large embedding indexes held entirely in memory, enabling natural language queries over millions of records without an external vector database.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Graph Processing&lt;/strong&gt; — Hold large graph structures in memory for traversal algorithms that require the full graph to be accessible at once.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scientific &amp;amp; Numerical Computing&lt;/strong&gt; — Run simulations, Monte Carlo methods, and large matrix operations that require substantial working memory and benefit from memory-optimized Amazon EC2 instance families.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Large-Scale Report Generation&lt;/strong&gt; — Aggregate and transform multi-gigabyte datasets in memory to generate complex reports or dashboards on demand, without staging data through intermediate storage.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Use Case: AI-Powered Customer Analytics with AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;To demonstrate the power of AWS Lambda Managed Instances for memory-intensive applications, we built an AI-Powered Customer Analytics application that combines in-memory data processing with ML-based semantic search. The application loads in memory 1 million customer behavioral records (sessions, purchases, browsing patterns) from a Parquet file in S3 into a Pandas DataFrame and an embeddings cache consuming 200MB, then responds for analytics queries:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Customer Analysis&lt;/strong&gt; — Deep-dive into individual customer behavior: engagement scores, conversion rates, purchase patterns, and AI-generated customer segments&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Semantic Search&lt;/strong&gt; — Natural language queries powered by FastEmbed (sentence-transformers/all-MiniLM-L6-v2) that find similar customers using vector similarity&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cohort Analysis&lt;/strong&gt; — Real-time segmentation by device, country, age group with aggregated metrics&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;&lt;strong&gt;Architecture Overview&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Our AI-powered customer analytics application demonstrates this in practice: 1 million records in memory (200MB), a compact sentence transformer model for semantic search, sub-second query performance, and zero infrastructure to manage. The solution uses a simple, serverless architecture:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Customer transaction data (Parquet format) is stored in Amazon S3&lt;/li&gt; 
 &lt;li&gt;Amazon Cognito User Pool authenticates users and issues JWT tokens for API access&lt;/li&gt; 
 &lt;li&gt;Amazon API Gateway routes requests with Cognito authorizer validation, rate limiting (5 requests/second, burst 10), X-Ray tracing, and access logging&lt;/li&gt; 
 &lt;li&gt;AWS Lambda function with AWS Lambda Managed Instances loads the entire dataset (200MB) and all-MiniLM-L6-v2 model (900MB) into memory during initialization while also performing a threaded embeddings cache generation. This step can consume about 14GB of the allocated memory, exceeding standard AWS Lambda’s 10 GB limit&lt;/li&gt; 
 &lt;li&gt;Analytics queries execute against the in-memory data using the model&lt;/li&gt; 
 &lt;li&gt;Results are returned in milliseconds for interactive analysis&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26050" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-1.png" alt="Architecture diagram" width="1566" height="718"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Deploy the Application&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;The below steps walk you through deploying the application to AWS using the AWS Serverless Application Model (SAM). The deployment process packages your Lambda function code, uploads artifacts to Amazon S3, and provisions all required AWS resources including Lambda functions, IAM roles, and any configured VPC networking via AWS CloudFormation.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Make sure you have the following tools installed locally:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS CLI&lt;/a&gt; configured with credentials&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;SAM CLI&lt;/a&gt; installed&lt;/li&gt; 
 &lt;li&gt;Python 3.13+ installed locally&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://www.docker.com/" target="_blank" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; or &lt;a href="https://runfinch.com/" target="_blank" rel="noopener noreferrer"&gt;Finch&lt;/a&gt; (required for container builds)&lt;/li&gt; 
 &lt;li&gt;AWS account with appropriate permissions&lt;/li&gt; 
 &lt;li&gt;A VPC with at least 2 subnets (across different Availability Zones) and a security group — required for the Lambda Managed Instances capacity provider&lt;/li&gt; 
 &lt;li&gt;Supported regions: Check &lt;a href="https://builder.aws.com/capabilities/" target="_blank" rel="noopener noreferrer"&gt;AWS Capabilities by Region&lt;/a&gt; for supported regions&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Getting Started&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The complete source code for this application is available in our GitHub repository. To deploy it yourself follow the below steps and refer to the full &lt;a href="https://github.com/aws-samples/sample-lambda-managed-instances-analytics"&gt;deployment instructions&lt;/a&gt; hosted on GitHub.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;1. Clone the repository&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;git clone &lt;a href="https://github.com/aws-samples/sample-lambda-managed-instances-analytics.git"&gt;https://github.com/aws-samples/sample-lambda-managed-instances-analytics.git&lt;/a&gt;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;2. Navigate to the project folder&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;cd sample-lambda-managed-instances-analytics&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;chmod +x setup-data.sh deploy-lambda.sh&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;3. Generate sample data and upload to S3&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;./setup-data.sh&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;This script will create an S3 bucket (if needed), generate 1M rows of sample data, and upload the data to S3.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;4. Build and deploy the Lambda function&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;./deploy-lambda.sh&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;This script will build the container image with FastEmbed, push it to ECR, and deploy the Lambda function along with Capacity Provider, API Gateway, and Cognito User Pool. After deployment, it automatically generates the UI authentication configuration and prompts you to create a test user.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-2.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26051" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-2.png" alt="SAM template" width="484" height="221"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-3.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26052" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-3.png" alt="Capacity provider configuration" width="1071" height="430"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Run the Application&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;1. Start the UI&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The application includes a simple HTML-based UI through which you can test the AWS Lambda function using Amazon API Gateway:&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;cd ui &amp;amp;&amp;amp; python3 -m http.server 8000&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;2. Open your browser at &lt;a href="http://localhost:8000" target="_blank" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt; and click ‘Sign In’ to authenticate via Cognito using the username/password that you created during deployment&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-4.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26053" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-4.png" alt="Starting the UI" width="2232" height="256"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;3. Enter your API endpoint URL. Test connection and click system Info.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-5-2.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26054" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-5-2.png" alt="Testing the connection" width="2230" height="1206"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test the Application&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;a. Customer Analysis&lt;/strong&gt; — Enter one or more User IDs to get more information on the customer behavior: engagement scores, conversion rates, purchase patterns, and AI-generated customer segments&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-6.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26055" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-6.png" alt="Running customer analysis" width="1240" height="798"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;b. Semantic Search – &lt;/strong&gt;Enter natural language queries like “list high value customers from USA” in the Semantic Search and verify the results. Note that the response is very fast as the analytics data and FastEmbed models are loaded into memory during init stage&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-7-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26056" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-7-1.png" alt="Running semantic search" width="1240" height="798"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;c. Cohort Analysis&lt;/strong&gt; — Enter the query data to get Real-time segmentation by device, country, age group with aggregated metrics&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-8-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26057" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-8-1.png" alt="Running cohort analysis" width="1227" height="833"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;AWS Lambda Managed Instances automatically publishes metrics to Amazon CloudWatch, giving you visibility into function performance and capacity utilization. Monitor &lt;strong&gt;InitDuration&lt;/strong&gt; to track dataset and model load time at startup, &lt;strong&gt;MaxMemoryUsed&lt;/strong&gt; to confirm your data fits within configured memory, and &lt;strong&gt;ProvisionedConcurrencySpilloverInvocations&lt;/strong&gt; to detect when AWS Lambda Managed Instances capacity is exhausted.&lt;/p&gt; 
&lt;p&gt;Enable &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights.html" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Lambda Insights&lt;/strong&gt;&lt;/a&gt; for enhanced per-invocation metrics including CPU time and memory utilization over time. Use &lt;strong&gt;Amazon CloudWatch Log Insights&lt;/strong&gt; to query INIT_START, INIT_END, and REPORT log entries for initialization and memory details per invocation.&lt;/p&gt; 
&lt;h2&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-9-1.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-26058" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/04/09/imageComputeBlog-2543-9-1.png" alt="AWS Lambda Insights" width="1660" height="735"&gt;&lt;/a&gt;&lt;/h2&gt; 
&lt;h2&gt;&lt;strong&gt;What Makes This Better with AWS Lambda Managed Instances&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Without AWS Lambda Managed Instances, building this same application would require one of these alternatives:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Option A: EC2 with auto-scaling&lt;/strong&gt; — Full control, full responsibility: patching, scaling policies, load balancing, and deployment pipelines — all on you.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Option B: Redesign for standard Lambda&lt;/strong&gt; — Swap in-memory data for an external database and replace the ML model with &lt;a href="https://aws.amazon.com/sagemaker/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker&lt;/a&gt; endpoint. More latency, more cost, more complexity.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;With AWS Lambda Managed Instances, you write a single AWS Lambda function, define a Capacity Provider, and deploy with SAM. AWS Lambda handles the Amazon EC2 instances, scaling, and lifecycle, giving you the memory you need with the operational simplicity you want. The in-memory approach eliminates network latency and disk I/O, delivering consistent sub-200ms response times for complex analytics.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Cost Considerations &lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances uses Amazon EC2-based pricing with a management fee. For predictable workloads, you can leverage Amazon EC2 Savings Plans or Reserved Instances to reduce costs significantly.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Example cost comparison&lt;/strong&gt; (us-east-1, 32 GB memory, 1M invocations/month):&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;AWS Lambda (standard):&lt;/strong&gt; ~$267/month (on-demand pricing)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AWS Lambda Managed Instances:&lt;/strong&gt; ~$180/month (with 1-year Compute Savings Plan)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Savings:&lt;/strong&gt; 33% reduction&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The cost benefits increase with higher memory configurations and sustained workloads that can take advantage of Amazon EC2 pricing discounts.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Best Practices&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Based on experience building this solution, here are key recommendations:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Memory sizing:&lt;/strong&gt; Start with your dataset size plus 50% overhead for processing. Monitor Amazon CloudWatch metrics to optimize.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Initialization strategy:&lt;/strong&gt; Load large datasets during the init phase to amortize the cost across multiple invocations.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Concurrency configuration:&lt;/strong&gt; Set PerExecutionEnvironmentMaxConcurrency based on your workload’s I/O characteristics. Higher values work well for I/O-bound analytics.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data format:&lt;/strong&gt; Use columnar formats like Parquet for efficient memory usage and fast loading.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Monitoring:&lt;/strong&gt; Track initialization duration, memory utilization, and invocation latency in Amazon CloudWatch to identify optimization opportunities.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Cleanup&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;When you’re done exploring the solution, it’s good practice to remove all provisioned resources to avoid ongoing charges. For the full cleanup commands and exact steps, refer to the project’s README.md in GitHub repository.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances opens up a new class of serverless applications that support larger AWS Lambda layer packages and more memory. Memory-intensive workloads — in-memory analytics, ML inference, graph processing, scientific computing — can now run with the simplicity of AWS Lambda and the resources of Amazon EC2. The customer analytics example demonstrates how in-memory processing with AWS Lambda Managed Instances delivers performance improvements over traditional database queries while maintaining serverless benefits like automatic scaling and pay-per-use pricing.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Ready to get started?&lt;/strong&gt; Explore the &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances documentation&lt;/a&gt; and try building your own memory-intensive serverless application. You can find the complete code for &lt;a href="https://github.com/aws-samples/sample-lambda-managed-instances-analytics"&gt;this example on GitHub&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2</title>
		<link>https://aws.amazon.com/blogs/compute/accelerate-cpu-based-ai-inference-workloads-using-intel-amx-on-amazon-ec2/</link>
					
		
		<dc:creator><![CDATA[Santosh Kumar]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 16:43:10 +0000</pubDate>
				<category><![CDATA[*Post Types]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[PyTorch on AWS]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">21db657322c27b28f881000b3cc565d6157c04e7</guid>

					<description>This post shows you how to accelerate your AI inference workloads by up to 76% using Intel Advanced Matrix Extensions (AMX) – an accelerator that uses specialized hardware and instructions to perform matrix operations directly on processor cores – on Amazon Elastic Compute Cloud (Amazon EC2) 8th generation instances. You'll learn when CPU-based inference is cost-effective, how to enable AMX with minimal code changes, and which configurations deliver optimal performance for your models.</description>
										<content:encoded>&lt;p&gt;This post shows you how to accelerate your AI inference workloads by up to 76% using &lt;a href="https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/what-is-intel-amx.html" target="_blank" rel="noopener noreferrer"&gt;Intel Advanced Matrix Extensions (AMX)&lt;/a&gt; – an accelerator that uses specialized hardware and instructions to perform matrix operations directly on processor cores – on &lt;a href="https://aws.amazon.com/pm/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt; 8th generation instances. You’ll learn when CPU-based inference is cost-effective, how to enable AMX with minimal code changes, and which configurations deliver optimal performance for your models.&lt;/p&gt; 
&lt;p&gt;Many organizations find that CPU-based inference is more suitable for their production Artificial Intelligence/Machine Learning (AI/ML) workloads after evaluating factors like cost, operational complexity, and infrastructure compatibility. As more organizations deploy AI solutions, improving how models run on standard CPUs has become a critical cost control strategy for workloads where CPU inference provides the right balance of performance and economics.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://my.idc.com/getdoc.jsp?containerId=prUS52530724" target="_blank" rel="noopener noreferrer"&gt;IDC&lt;/a&gt;, a global market intelligence and advisory firm, projects that worldwide AI spending will reach $632 billion by 2028, growing at a 29% compound annual growth rate from 2024, with inference costs representing a significant portion of operational expenses. &lt;a href="https://www.deloitte.com/us/en/about/press-room/deloitte-2026-tmt-predictions.html" target="_blank" rel="noopener noreferrer"&gt;Deloitte&lt;/a&gt;, a leading professional services firm specializing in technology consulting and research, forecasts that inference – the running of AI models – will make up two-thirds of all AI compute by 2026, far exceeding initial training costs. This makes optimizing AI/ML inference on CPU crucial for controlling long-term AI/ML operational expenses.&lt;/p&gt; 
&lt;p&gt;At the core of AI inference workloads are matrix multiplication operations – the mathematical foundation of neural networks that drives computational demand. These matrix-heavy operations create a performance bottleneck for CPU-based inference, resulting in suboptimal performance for AI/ML workloads. This creates three key challenges for organizations: balancing cost optimization with performance requirements, meeting real-time latency demands, and scaling efficiently with variable workload demands. Intel’s Advanced Matrix Extensions (AMX) technology addresses these challenges by accelerating matrix operations directly on CPU cores, making CPU-based inference competitive and cost-effective.&lt;/p&gt; 
&lt;h3&gt;AMX capabilities and architecture&lt;/h3&gt; 
&lt;p&gt;AMX supports multiple data formats including &lt;a href="https://www.intel.com/content/www/us/en/content-details/671279/bfloat16-hardware-numerics-definition.html" target="_blank" rel="noopener noreferrer"&gt;BF16&lt;/a&gt; which preserves the range of 32-bit floating point operations in half the space, INT8 maximizes throughput when accuracy can be slightly compromised, and FP16 offers a balance between the two. This flexibility lets you match precision to your specific needs.&lt;/p&gt; 
&lt;p&gt;Introduced in 2023 with 4th Generation &lt;a href="https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html" target="_blank" rel="noopener noreferrer"&gt;Intel Xeon Scalable processors&lt;/a&gt;, AMX consists of eight 1KB tile registers (specialized on-chip memory for matrix data) and a Tile Matrix Multiply Unit (TMUL – dedicated hardware for matrix calculations) that enables processors to perform 2048 INT8 operations or 1024 BF16 operations per cycle. These tile registers provide efficient matrix storage, reducing memory access overhead and improving computational efficiency for matrix operations central to neural networks.&amp;nbsp;For real-world customer workloads, this translates to significantly faster inference times for &lt;a href="https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/" target="_blank" rel="noopener noreferrer"&gt;transformer&lt;/a&gt; models, recommendation systems, and natural language processing tasks, while reducing the total cost of ownership through improved resource utilization and lower infrastructure requirements.&lt;/p&gt; 
&lt;div id="attachment_25812" style="width: 567px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/1-ComputeBlog-2473-AMX-Architecture.png"&gt;&lt;img aria-describedby="caption-attachment-25812" loading="lazy" class=" wp-image-25812" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/20/1-ComputeBlog-2473-AMX-Architecture.png" alt="Architecture diagram of Intel Advanced Matrix Extensions (AMX) showing the key components: Intel Xeon CPU with AMX support, tile architecture with 8 tiles of 1 KiB each as 2D registers, Tile Matrix Multiply Unit (TMUL) with data flow between them, supported data types (BF16, INT8, FP16), and AMX instruction categories (Configuration, Data Management, Operations)" width="557" height="453"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25812" class="wp-caption-text"&gt;Figure 1: AMX Architecture showing AMX tile registers, processing units, and data flow within CPU core&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note: &lt;/em&gt;&lt;/strong&gt;&lt;em&gt;AMX operations, including tile setup and memory-to-tile data movement (which are handled automatically by the system), introduce small overhead that may outweigh benefits for smaller models or single-batch processing where insufficient matrix operations cannot amortize these costs, making batch size optimization critical for performance gains.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;When to choose CPU inference with AMX&lt;/h2&gt; 
&lt;p&gt;CPU inference with AMX acceleration benefits workloads including:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Batch processing and traditional ML&lt;/strong&gt;: Content summarization, recommendation systems, and analytical workloads benefit from CPU’s cost efficiency and ability to handle sparse data structures and branching logic.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Small to medium-sized models: &lt;/strong&gt;Models under 7B parameters and batch sizes of 8-16 samples achieve excellent performance through optimized threading, making CPUs ideal for applications like fraud detection and chatbots.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Variable demand workloads&lt;/strong&gt;: E-commerce systems and applications with unpredictable traffic patterns can quickly scale CPU instances up or down based on demand, avoiding the fixed costs of dedicated accelerator hardware that sits idle during low-traffic periods.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Complex business logic&lt;/strong&gt;: Applications like financial risk assessment and content moderation that need to combine ML predictions with business rules and conditional logic work well on CPUs, which handle mixed workloads better than specialized accelerators.&lt;/p&gt; 
&lt;h2&gt;Implementation: AMX optimization with PyTorch&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://pytorch.org/" target="_blank" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt;, a popular open-source machine learning framework, includes built-in Intel optimizations through &lt;a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/onednn.html" target="_blank" rel="noopener noreferrer"&gt;oneDNN&lt;/a&gt; (Intel’s Deep Neural Network library) that automatically use AMX when available. Setup requires installing dependencies and configuring environment variables for optimal performance.&lt;/p&gt; 
&lt;h3&gt;Install dependencies&lt;/h3&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;# Install transformers and torch
pip install torch transformers&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Configure environment variables&lt;/h3&gt; 
&lt;p&gt;These environment variables tell oneDNN library how to optimize your inference workload for AMX.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Enable AMX instruction set (tells oneDNN to use AMX tiles for matrix operations): &lt;pre&gt;&lt;code class="lang-bash"&gt;export DNNL_MAX_CPU_ISA=AVX512_CORE_AMX&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Optimize thread affinity (binds threads to CPU cores for better cache performance): &lt;pre&gt;&lt;code class="lang-bash"&gt;export KMP_AFFINITY=granularity=fine,compact,1,0&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Use all available CPU cores for parallel processing: &lt;pre&gt;&lt;code class="lang-bash"&gt;export OMP_NUM_THREADS=$(nproc)&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Cache compiled kernels (avoids recompilation overhead on subsequent runs): &lt;pre&gt;&lt;code class="lang-bash"&gt;export ONEDNN_PRIMITIVE_CACHE_CAPACITY=4096&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Set default precision to BF16 (enables automatic AMX acceleration): &lt;pre&gt;&lt;code class="lang-bash"&gt;export ONEDNN_DEFAULT_FPMATH_MODE=bf16&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;(Optional) Enable verbose logging to verify AMX activation: &lt;pre&gt;&lt;code class="lang-bash"&gt;export ONEDNN_VERBOSE=1&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;BF16 optimization example&lt;/h3&gt; 
&lt;p&gt;With environment variables configured, implementing BF16 optimization requires minimal to no code changes. The following example demonstrates how PyTorch automatically leverages AMX tile registers for matrix operations when BF16 precision is used.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is a simplified example for demonstration purposes; adapt the code to your specific use case and requirements.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import time

# Load model and tokenizer from HuggingFace
model_name = "google/gemma-3-1b-it"

model_revision = "dcc83ea841ab6100d6b47a070329e1ba4cf78752"
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    revision=model_revision
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    revision=model_revision
)
# Fix tokenizer padding issue for batch processing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Enable BF16 precision for automatic AMX acceleration
model = model.to(dtype=torch.bfloat16)
model.eval()  # Set to inference mode

# Inference function with BF16 autocast
def run_optimized_inference(prompts):
    inputs = tokenizer(prompts, padding=True, 
                      return_tensors="pt")  # Tokenize input
    
    with torch.no_grad():  # Disable gradients for inference
        with torch.amp.autocast('cpu',
                               dtype=torch.bfloat16):  # BF16 autocast
            outputs = model.generate(
                **inputs,
                max_length=100,     # Set maximum sequence length 
                do_sample=False     # Use greedy decoding
            )
    return outputs

# Example usage with performance measurement
prompts = ["What are the benefits of cloud computing?"]
start_time = time.time()
results = run_optimized_inference(prompts)  # Run BF16-optimized inference
elapsed_time = time.time() - start_time
tokens_generated = len(results[0]) - len(tokenizer.encode(
    prompts[0]))  # Count new tokens

# Display results and performance metrics
print(tokenizer.decode(results[0], skip_special_tokens=True))
print(f"Latency: {elapsed_time*1000:.1f}ms, "
      f"Throughput: {tokens_generated/elapsed_time:.1f} "
      f"tokens/sec")&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Performance benchmarks&lt;/h2&gt; 
&lt;p&gt;To validate AMX performance benefits, we conducted benchmarks across multiple popular language models representing different use cases and model sizes.&lt;/p&gt; 
&lt;h3&gt;Benchmarking methodology and environment&lt;/h3&gt; 
&lt;p&gt;We tested two improvements: hardware generation advances (m8i vs m7i) and AMX optimization impact (FP32 vs BF16). This shows you both upgrade paths for your workloads.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Models tested&lt;/strong&gt;: BigBird-RoBERTa-large (355M), Microsoft DialoGPT-large (762M), Google Gemma-3-1b-it (1B), DeepSeek-R1-Distill-Qwen-1.5B (1.5B), Llama-3.2-3B-Instruct (3B), YOLOv5&amp;nbsp;(tested with 30 images at ~1200×800 resolution with 5 iterations for each image)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon EC2 instance types&lt;/strong&gt;: &lt;a href="https://aws.amazon.com/ec2/instance-types/m8i/" target="_blank" rel="noopener noreferrer"&gt;m8i.4xlarge&lt;/a&gt;, &lt;a href="https://aws.amazon.com/ec2/instance-types/m7i/" target="_blank" rel="noopener noreferrer"&gt;m7i.4xlarge&lt;/a&gt; (8&lt;sup&gt;th&lt;/sup&gt; &amp;amp; 7&lt;sup&gt;th&lt;/sup&gt; gen general-purpose Amazon EC2 instances with 16 vCPUs and 64 GiB memory, both AMX-capable)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch sizes&lt;/strong&gt;: 1, 8, 32&amp;nbsp;(number of input samples processed simultaneously in a single inference call)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Iterations&lt;/strong&gt;: 5 runs per configuration&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Comparison types&lt;/strong&gt;: 
  &lt;ul&gt; 
   &lt;li&gt;Instance generation comparison (m8i vs m7i performance)&lt;/li&gt; 
   &lt;li&gt;AMX optimization impact (32-bit floating-point (FP32) vs Brain Floating Point 16 (BF16) on same instance)&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Optimizations&lt;/strong&gt;: FP32 baseline vs BF16 AMX&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Framework&lt;/strong&gt;:&amp;nbsp;PyTorch 2.8.0 (which has built-in Intel optimizations)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Region&lt;/strong&gt;: AWS us-west-2&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Measurement methodology&lt;/strong&gt;: In our benchmarks, ‘inference latency’ represents the complete model inference execution time including input tokenization and full sequence generation (for generative models) or complete forward pass (for non-generative models). Each measurement is the average of 5 iterations after warm-up iterations, excluding model loading time. We use this metric because AMX’s matrix multiplication acceleration improves performance throughout the complete forward pass.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Throughout this blog, FP32 refers to the default 32-bit floating-point precision, while BF16 refers to Brain Floating Point 16-bit precision with AMX acceleration enabled.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: Performance results are based on internal testing and may vary depending on specific workloads, configurations, and environments.&lt;/p&gt; 
&lt;h3&gt;Detailed result: BigBird-RoBERTa-large&lt;/h3&gt; 
&lt;p&gt;This benchmark represents document classification, content summarization, and text analysis workloads typical in batch processing where high throughput is desirable and offline inference scenarios where strict latency requirements are not critical.&lt;/p&gt; 
&lt;div id="attachment_25811" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/2-ComputeBlog-2473-latency-datatypeVsbatch-roberta.png"&gt;&lt;img aria-describedby="caption-attachment-25811" loading="lazy" class="wp-image-25811 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/20/2-ComputeBlog-2473-latency-datatypeVsbatch-roberta.png" alt="Bar chart comparing BigBird-RoBERTa-large inference latency between m7i and m8i instances with FP32 and BF16 precision across batch sizes 1, 8, and 32, showing 55-67% latency reduction with BF16 AMX." width="1431" height="728"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25811" class="wp-caption-text"&gt;Figure 2: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model BigBird-RoBERTa-large (355M parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25828" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/3-ComputeBlog-2473-throughput-roberta.png"&gt;&lt;img aria-describedby="caption-attachment-25828" loading="lazy" class="wp-image-25828 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/20/3-ComputeBlog-2473-throughput-roberta.png" alt="Bar chart comparing throughput for the BigBird-RoBERTa-large model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32. m8i.4xlarge achieves 4–25% higher throughput, with the largest gain at FP32 batch size 1 (25%, from 1214.29 to 1512.03 tokens/sec). BF16(AMX) batch size 1 reaches the highest overall throughput at 3391.06 tokens/sec on m8i.4xlarge with a 14 % improvement over m7i.4xlarge. Throughput gains with BF16(AMX) are smaller at larger batch sizes (4–5%), as AMX overhead limits scaling for this smaller model." width="2497" height="1274"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25828" class="wp-caption-text"&gt;Figure 3: m7i.4xlarge vs m8i.4xlarge throughput comparison for BigBird-RoBERTa-large model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25829" style="width: 2122px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/4-ComputeBlog-2473-latency-instancetypeVsbatch-roberta.png"&gt;&lt;img aria-describedby="caption-attachment-25829" loading="lazy" class="wp-image-25829 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/4-ComputeBlog-2473-latency-instancetypeVsbatch-roberta.png" alt="Bar chart comparing inference latency for bigbird-roberta-large between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 55–69% compared to FP32 across all configurations" width="2112" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25829" class="wp-caption-text"&gt;Figure 4: FP32 vs BF16 inference latency comparison for model BigBird-RoBERTa-large (355M parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;BigBird-RoBERTa-large model benchmarking demonstrates three key performance improvements. &lt;strong&gt;Figure 2&lt;/strong&gt; shows m8i hardware delivers 4-20% latency reduction across batch sizes compared to m7i for both FP32 and BF16 with AMX, providing immediate benefits without application changes. With AMX and BF16, performance gains decrease at higher batch sizes as AMX overhead exceeds benefits for smaller models like BigBird-RoBERTa-large. &lt;strong&gt;Figure 3&lt;/strong&gt; validates these improvements with corresponding 4-25% throughput gains, enabling better resource utilization for production applications. &lt;strong&gt;Figure 4&lt;/strong&gt; demonstrates that enabling AMX with BF16 optimization provides the most significant impact, reducing m8i latency by 55-67% compared to non-AMX FP32 baseline, enabling 2-3x higher processing capacity and reduced compute costs.&lt;/p&gt; 
&lt;p&gt;The analysis above demonstrates the methodology for interpreting benchmark results using BigBird-RoBERTa-large as a representative example. The remaining models (DialoGPT-large, Gemma-3-1b-it, DeepSeek-R1-Distill-Qwen-1.5B, and Llama-3.2-3B-Instruct) follow identical testing procedures and exhibit similar performance patterns, with variations primarily in the magnitude of improvements based on model size and architecture. The comprehensive analysis of five models and their performance implications are synthesized in the following section.&lt;/p&gt; 
&lt;h3&gt;Benchmarking result for additional models&lt;/h3&gt; 
&lt;p&gt;To validate AMX’s effectiveness across diverse AI workloads, we benchmarked five additional models representing different use cases and model sizes. Each model follows the same testing methodology described above, with performance patterns showing how AMX benefits vary based on model architecture, parameter count, and batch size.&lt;/p&gt; 
&lt;h4&gt;DialoGPT-large (762M) – Conversational AI&lt;/h4&gt; 
&lt;p&gt;This benchmark represents conversational AI, chatbots, and real-time dialogue systems where low latency and consistent response times are critical for user experience.&lt;/p&gt; 
&lt;div id="attachment_25808" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/5-ComputeBlog-2473-latency-datatypeVsbatch-dialogpt.png"&gt;&lt;img aria-describedby="caption-attachment-25808" loading="lazy" class="size-full wp-image-25808" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/5-ComputeBlog-2473-latency-datatypeVsbatch-dialogpt.png" alt="Bar chart comparing inference latency for the DialoGPT-large model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 9– 25% latency reduction, with the largest improvement at FP32 batch size 32 (25%)" width="1431" height="733"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25808" class="wp-caption-text"&gt;Figure 5: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model DialoGPT-large (762M parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25830" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/6-ComputeBlog-2473-throughput-dialogpt.png"&gt;&lt;img aria-describedby="caption-attachment-25830" loading="lazy" class="size-full wp-image-25830" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/6-ComputeBlog-2473-throughput-dialogpt.png" alt="Bar chart comparing throughput for the DialoGPT-large model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 10–34% higher throughput, with the largest gain at FP32 batch size 32 (34%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 355.9 tokens/sec" width="2497" height="1283"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25830" class="wp-caption-text"&gt;Figure 6: m7i.4xlarge vs m8i.4xlarge throughput comparison for DialoGPT-large model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25831" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/7-ComputeBlog-2473-latency-instancetypeVsbatch-dialogpt.png"&gt;&lt;img aria-describedby="caption-attachment-25831" loading="lazy" class="size-full wp-image-25831" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/7-ComputeBlog-2473-latency-instancetypeVsbatch-dialogpt.png" alt="Bar chart comparing inference latency for DialoGPT-large between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) increases latency at batch size 1 (negative improvement of -44% and -45%) but reduces latency at larger batch sizes, with up to 43% reduction at m7i.4xlarge batch size 32" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25831" class="wp-caption-text"&gt;Figure 7: FP32 vs BF16 inference latency comparison for model DialoGPT-large (762M parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;Gemma-3-1b-it (1B) – General Purpose&lt;/h4&gt; 
&lt;p&gt;This benchmark represents general-purpose language understanding tasks, content generation, and smaller model deployments suitable for cost-sensitive applications and variable demand workloads.&lt;/p&gt; 
&lt;div id="attachment_25805" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/8-ComputeBlog-2473-latency-datatypeVsbatch-gemma.png"&gt;&lt;img aria-describedby="caption-attachment-25805" loading="lazy" class="size-full wp-image-25805" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/8-ComputeBlog-2473-latency-datatypeVsbatch-gemma.png" alt="Bar chart comparing inference latency for the Gemma-3-1b-it model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 7– 17% latency reduction, with the largest improvement at BF16(AMX) batch size 1 (17%)" width="1431" height="730"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25805" class="wp-caption-text"&gt;Figure 8: M7i.4xlarge vs M8i.4xlarge inference latency comparison for model Gemma-3-1b-it (1B parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25832" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/9-ComputeBlog-2473-throughput-gemma-1.png"&gt;&lt;img aria-describedby="caption-attachment-25832" loading="lazy" class="size-full wp-image-25832" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/9-ComputeBlog-2473-throughput-gemma-1.png" alt="Bar chart comparing throughput for the Gemma-3-1b-it model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 7–20% higher throughput, with the largest gain at BF16(AMX) batch size 1 (20%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 127.8 tokens/sec" width="2497" height="1278"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25832" class="wp-caption-text"&gt;Figure 9: m7i.4xlarge vs m8i.4xlarge latency and throughput comparison for Gemma-3-1b-it across model batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25833" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/10-ComputeBlog-2473-latency-instancetypeVsbatch-gemma-1.png"&gt;&lt;img aria-describedby="caption-attachment-25833" loading="lazy" class="size-full wp-image-25833" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/10-ComputeBlog-2473-latency-instancetypeVsbatch-gemma-1.png" alt="Bar chart comparing inference latency for Gemma-3-1b-it between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 24–42% at larger batch sizes but slightly increases latency at m7i.4xlarge batch size 1 (-4%), with the best improvement of 42% on m8i.4xlarge at batch size 8" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25833" class="wp-caption-text"&gt;Figure 10: FP32 vs BF16 inference latency comparison for model Gemma-3-1b-it (1B parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;DeepSeek-R1-Distill-Qwen-1.5B (1.5B) – Reasoning&lt;/h4&gt; 
&lt;p&gt;This benchmark represents reasoning and analytical workloads, including complex decision-making systems, financial analysis, and applications requiring sophisticated logic processing.&lt;/p&gt; 
&lt;div id="attachment_25802" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/11-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek.png"&gt;&lt;img aria-describedby="caption-attachment-25802" loading="lazy" class="size-full wp-image-25802" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/11-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek.png" alt="Bar chart comparing inference latency for the DeepSeek-R1-Distill-Qwen-1.5B model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 7–16% latency reduction, with the largest improvements at BF16(AMX) batch sizes 1 and 8 (both 16%)" width="1431" height="730"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25802" class="wp-caption-text"&gt;Figure 11: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model DeepSeek-R1-Distill-Qwen-1.5B (1.5B parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25834" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/12-ComputeBlog-2473-throughput-deepseek.png"&gt;&lt;img aria-describedby="caption-attachment-25834" loading="lazy" class="size-full wp-image-25834" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/12-ComputeBlog-2473-throughput-deepseek.png" alt="Bar chart comparing throughput for the DeepSeek-R1-Distill-Qwen-1.5B model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 8–19% higher throughput, with the largest gains at BF16(AMX) batch sizes 1 and 8 (both 19%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 415.1 tokens/sec" width="2497" height="1278"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25834" class="wp-caption-text"&gt;Figure 12: m7i.4xlarge vs m8i.4xlarge latency and throughput comparison for DeepSeek-R1-Distill-Qwen-1.5B model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25835" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/13-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek-1.png"&gt;&lt;img aria-describedby="caption-attachment-25835" loading="lazy" class="size-full wp-image-25835" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/13-ComputeBlog-2473-latency-instancetypeVsbatch-deepseek-1.png" alt="Bar chart comparing inference latency for DeepSeek-R1-Distill-Qwen-1.5B between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 17–68% across all configurations, with the largest improvement of 68% on m8i.4xlarge at batch size 8 and consistently strong reductions of 59–66% at larger batch sizes" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25835" class="wp-caption-text"&gt;Figure 13: FP32 vs BF16 inference latency comparison for model DeepSeek-R1-Distill-Qwen-1.5B (1.5B parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;Llama-3.2-3B-Instruct (3B) – Large model&lt;/h4&gt; 
&lt;p&gt;This benchmark represents larger model deployments for complex instruction-following tasks, advanced content generation, and applications requiring higher model capacity while maintaining cost efficiency.&lt;/p&gt; 
&lt;div id="attachment_25799" style="width: 1441px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/14-ComputeBlog-2473-latency-instancetypeVsbatch-llama.png"&gt;&lt;img aria-describedby="caption-attachment-25799" loading="lazy" class="size-full wp-image-25799" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/14-ComputeBlog-2473-latency-instancetypeVsbatch-llama.png" alt="Bar chart comparing inference latency for the Llama-3.2-3B-Instruct model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 8–15% latency reduction, with the largest improvement at FP32 batch size 8 (15%) and consistent gains of 12–14% with BF16(AMX) at smaller batch sizes" width="1431" height="730"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25799" class="wp-caption-text"&gt;Figure 14: m7i.4xlarge vs m8i.4xlarge inference latency comparison for model Llama-3.2-3B-Instruct (3B parameters)&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25836" style="width: 2507px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/15-ComputeBlog-2473-throughput-llama.png"&gt;&lt;img aria-describedby="caption-attachment-25836" loading="lazy" class="size-full wp-image-25836" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/15-ComputeBlog-2473-throughput-llama.png" alt="Bar chart comparing throughput for the Llama-3.2-3B-Instruct model between m7i.4xlarge and m8i.4xlarge instances across FP32 and BF16(AMX) data types at batch sizes 1, 8, and 32, showing m8i.4xlarge achieves 8– 17% higher throughput, with the largest gains at FP32 batch size 8 and BF16(AMX) batch size 1 (both 17%) and BF16(AMX) batch size 32 reaching the highest overall throughput at 187.3 tokens/sec" width="2497" height="1278"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25836" class="wp-caption-text"&gt;Figure 15: m7i.4xlarge vs m8i.4xlarge latency and throughput comparison for Llama-3.2-3B-Instruct model across batch sizes 1, 8, and 32&lt;/p&gt;
&lt;/div&gt; 
&lt;div id="attachment_25837" style="width: 2118px" class="wp-caption alignnone"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/16-ComputeBlog-2473-latency-instancetypeVsbatch-llama-1.png"&gt;&lt;img aria-describedby="caption-attachment-25837" loading="lazy" class="size-full wp-image-25837" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/08/16-ComputeBlog-2473-latency-instancetypeVsbatch-llama-1.png" alt="Bar chart comparing inference latency for Llama-3.2-3B-Instruct between FP32 and BF16(AMX) data types on m8i.4xlarge and m7i.4xlarge instances at batch sizes 1, 8, and 32, showing BF16(AMX) reduces latency by 24–72% across all configurations, with the largest improvements of 72% on both m8i.4xlarge batch size 8 and m7i.4xlarge batch size 8, and consistently strong reductions of 68–70% at batch size 32" width="2108" height="1164"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25837" class="wp-caption-text"&gt;Figure 16: FP32 vs BF16 inference latency comparison for model Llama-3.2-3B-Instruct (3B parameters) on m7i.4xlarge and m8i.4xlarge instances across batch sizes&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;Yolov5 – Computer vision model&lt;/h4&gt; 
&lt;p&gt;This benchmark represents computer vision workloads including object detection, image classification, and real-time video processing applications where consistent throughput is important for production deployments.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Instance type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt;&lt;strong&gt;Inference latency in Sec &lt;/strong&gt;(Processing time per image)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt; &lt;p&gt;&lt;strong&gt;Throughput&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;(Image processed per sec)&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;m8i.4xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.034&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.029&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;29.23&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;34.63&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;m7i.4xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.038&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;0.031&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;26.39&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32.28&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;m8i improvement&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;10.5%&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;6.5%&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;10.8%&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;7.3%&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Key insights:&lt;/strong&gt; m8i instances deliver 7-11% better performance than m7i across both precision formats. Combining hardware upgrade with AMX optimization, m8i with BF16 delivers up to 24% lower latency and 31% higher throughput compared to m7i with FP32.&lt;/p&gt; 
&lt;h2&gt;Benchmark result summary&lt;/h2&gt; 
&lt;p&gt;The detailed graphs above demonstrate consistent performance patterns across &lt;strong&gt;tested&lt;/strong&gt; models. Key findings:&lt;/p&gt; 
&lt;h3&gt;M8i vs M7i instance performance&lt;/h3&gt; 
&lt;p&gt;m8i instances deliver 9-14% average and up to 20% better performance than m7i across the tested models through hardware advances: up to 4.6x larger L3 cache, higher base frequencies, up to 2.5x higher &lt;a href="https://en.wikipedia.org/wiki/DDR5_SDRAM" target="_blank" rel="noopener noreferrer"&gt;DDR5&lt;/a&gt; bandwidth, and enhanced AMX execution with FP16 support.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;m8i average latency improvement*&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BigBird-RoBERTa-large (355M)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Document analysis&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DialoGPT-large (762M)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Conversational AI&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;14%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Gemma-3-1b-it (1B)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;General purpose&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DeepSeek-R1 (1.5B)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Reasoning tasks&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;11%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Llama-3.2-3B (3B)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Large model deployment&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;12%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;YOLOv5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Computer vision&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;9%&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;* Average across all tested configurations (FP32 and BF16 at batch sizes 1, 8, and 32)&lt;/p&gt; 
&lt;h3&gt;AMX acceleration impact (FP32 vs BF16)&lt;/h3&gt; 
&lt;p&gt;BF16 precision with AMX delivers 21-72% performance improvements at batch sizes of 8 and above compared to FP32 baseline on the same instance type. These results compare FP32 vs BF16 performance on m8i.4xlarge, with performance gains varying by model size and batch configuration. Larger batch sizes show greater AMX benefits.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;Model&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="3"&gt;Latency improvement (%)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 32&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BigBird-RoBERTa-large&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;55&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;67&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;63&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DialoGPT-large&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;– 44*&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;21&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;30&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Gemma-3-1b-it&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;42&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;24&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DeepSeek-R1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;24&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;68&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;59&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Llama-3.2-3B&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;27&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;72&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;68&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;* &lt;em&gt;At batch size 1, DialoGPT-large’s autoregressive decoding generates tokens sequentially, producing many small matrix operations where AMX tile setup overhead exceeds the acceleration benefit. At batch sizes 8 and above, multiple sequences are processed in parallel, creating larger matrix operations that amortize this overhead and deliver 21-30% improvement.&lt;/em&gt;&lt;/p&gt; 
&lt;h4&gt;Performance patterns by batch size&lt;/h4&gt; 
&lt;p&gt;Larger models (1B+ parameters) show consistently better AMX performance across the tested batch sizes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Batch size 1&lt;/strong&gt;: Mixed results – larger models show 6-27% improvement, smaller models may experience AMX overhead&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch size 8&lt;/strong&gt;: Strong performance gains of 21-72% across the tested models, with larger models showing greater benefits&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch size 32&lt;/strong&gt;: Significant improvements of 24-68% for most models, demonstrating AMX’s batch processing strength&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h4&gt;Batch size optimization guidelines&lt;/h4&gt; 
&lt;p&gt;AMX performance scales with batch size, with optimal range varies by model size. Performance saturates beyond batch 16 due to hardware limits including memory bandwidth and compute bottlenecks.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Model Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Performance Gain&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Recommended Batch Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&amp;lt;1B parameters&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;21-67%&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8-32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Batch 1 results vary by architecture*&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1-2B parameters&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;42-68%&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;4-16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6-24% gains even at batch 1&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;3B+ parameters&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;27-72%&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1-8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Benefits across batch sizes&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;* Encoder models (BigBird) show 55% gains at batch 1; autoregressive models (DialoGPT) may experience overhead.&lt;/p&gt; 
&lt;h4&gt;Combined performance benefits&lt;/h4&gt; 
&lt;p&gt;When we combine AMX optimization with 8th generation instances (m8i), the performance improvements compound significantly. For example, Llama-3.2-3B-Instruct running with BF16 AMX on m8i instances can achieve up to 76% better performance compared to FP32 inference on m7i instances at optimal batch sizes (batch 8: m7i FP32 45.51s vs m8i BF16 10.93s = 76% improvement; batch 32: m7i FP32 62.60s vs m8i BF16 17.47s = 72% improvement).&lt;/p&gt; 
&lt;h3&gt;Throughput scaling&lt;/h3&gt; 
&lt;p&gt;Across the tested models, throughput (tokens/sec) increases proportionally with latency reduction. This consistent relationship demonstrates that AMX optimizations translate directly to improved inference efficiency.&lt;/p&gt; 
&lt;h3&gt;Price-Performance Analysis: Gemma-3-1b-it Model&lt;/h3&gt; 
&lt;p&gt;While m8i.4xlarge instances are priced slightly higher than m7i.4xlarge ($0.847 vs $0.806 per hour in us-west-2), they deliver superior price-performance. To illustrate the economic benefits, we analyzed cost per 1 million tokens using Gemma-3-1b-it as a representative example. M8i delivers up to 13% better price-performance over m7i through hardware generation advances, with both instances running BF16 AMX.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Batch Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Data Type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt;&lt;strong&gt;m7i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="2"&gt;&lt;strong&gt;m8i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Price-Performance improvement&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;14.3&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$15.66&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;17.2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$13.67&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;13%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;71&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$3.16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;82.3&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$2.86&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;9%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;119.1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$1.88&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;127.8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$1.84&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;2%&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;Combining the hardware upgrade with BF16 AMX optimization delivers up to 44% better price-performance compared to FP32 on m7i.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt;&lt;strong&gt;Batch Size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="3"&gt;&lt;strong&gt;m8i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" colspan="3"&gt;&lt;strong&gt;m7i.4xlarge&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd" rowspan="2"&gt; &lt;p&gt;&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Price-Performance improvement&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Data Type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Data Type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Throughput&lt;br&gt; &lt;/strong&gt;(tokens/sec)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$ per 1M token&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;17.2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$13.67&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;14.9&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$15.03&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;9%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;82.3&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$2.86&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;44.1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$5.08&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;44%&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;BF16(AMX)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;127.8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$1.84&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FP32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;89.2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;$2.51&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;27%&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h4&gt;Key findings from the price-performance analysis:&lt;/h4&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Combined optimization delivers up to 44% better price-performance&lt;/strong&gt;: m8i with AMX and BF16 outperforms m7i with FP32 at batch size 8 – consistent with our batch size optimization guidelines where batch sizes of 4-16 deliver optimal results for 1B models like Gemma-3-1b-it, achieving $2.86 per 1M tokens for applications like chatbots and fraud detection.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Larger batches maximize cost efficiency&lt;/strong&gt;: Batch size 32 reduces costs further to $1.84 per 1M tokens, a 27% improvement over m7i FP32 – ideal for throughput-oriented workloads like content summarization and recommendation systems where latency requirements are flexible.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Production deployment recommendation&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;BF16 AMX&lt;/strong&gt;:&amp;nbsp;Delivers 21-72% performance improvements at recommended batch sizes while maintaining model accuracy, making it suitable for production workloads including fraud detection systems, content moderation, and real-time recommendation engines&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch processing&lt;/strong&gt;: Target batch sizes of 4-16 based on your use case – smaller batches (1-4) for latency-sensitive applications like chatbots, larger batches (8-16) for throughput-focused scenarios like document analysis and offline processing&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Instance selection&lt;/strong&gt;:&amp;nbsp;m8i instances provide consistent 9-14% performance improvements over m7i, delivering immediate ROI for existing CPU inference workloads without requiring application changes&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Model size consideration&lt;/strong&gt;:&amp;nbsp;Larger models (1B+ parameters) show better AMX utilization across batch sizes, making them ideal candidates for m8i deployment in complex reasoning and content generation applications&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion and next steps&lt;/h2&gt; 
&lt;p&gt;By using Intel AMX on Amazon EC2 8th generation instances, you can achieve substantial performance improvements for AI inference workloads. Our benchmarks demonstrate&amp;nbsp;up to 72% performance improvements across popular language models, making CPU inference more competitive for batch processing, real-time applications, recommender systems, and variable demand workloads while delivering substantial cost savings through improved resource utilization.&lt;/p&gt; 
&lt;p&gt;Key takeaways&lt;strong&gt;:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;BF16 AMX optimization&lt;/strong&gt;&amp;nbsp;delivers up to 72% performance improvements across model sizes, with batch 8 showing 21-72% gains and batch 32 showing 24-68% gains&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch sizes of 4-8 &lt;/strong&gt;provide optimal performance for most models—DialoGPT achieves 21% improvement in latency at batch 8, while Llama-3.2-3B achieves 72% improvement&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;8th generation instances&lt;/strong&gt;&amp;nbsp;deliver up to 14% performance improvements over m7i across the tested workloads&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Combined optimizations&lt;/strong&gt;&amp;nbsp;(m8i + BF16 AMX) can achieve compound performance improvements up to 76% in optimal configurations (vs m7i FP32), making CPU inference highly competitive for cost-sensitive applications&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;M8i instances deliver up to 13% better price-performance vs m7i&lt;/strong&gt; (lower cost per 1M tokens), based on our analysis of the Gemma-3-1b-it model&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Proper environment configuration&lt;/strong&gt; is critical for AMX activation&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;You can implement these optimizations immediately. &lt;/strong&gt;AMX hardware acceleration combined with PyTorch’s Intel-specific enhancements requires configuring environment variables while delivering substantial speed gains. Begin with BF16 optimization on your existing models, then explore INT8 quantization for additional gains.&lt;/p&gt; 
&lt;h3&gt;Next steps:&lt;/h3&gt; 
&lt;ol&gt; 
 &lt;li&gt;Launch an Intel based&amp;nbsp;Amazon EC2 8th generation instance (m8i.4xlarge)&lt;/li&gt; 
 &lt;li&gt;Install PyTorch (includes built-in Intel optimizations)&lt;/li&gt; 
 &lt;li&gt;Configure AMX environment variables&lt;/li&gt; 
 &lt;li&gt;Measure performance improvements&lt;/li&gt; 
 &lt;li&gt;Scale your optimized inference workloads&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Additional resources&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/what-is-intel-amx.html" target="_blank" rel="noopener noreferrer"&gt;Intel AMX documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/ec2/instance-types/m8i/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 m8i instances&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html" target="_blank" rel="noopener noreferrer"&gt;PyTorch Intel optimizations guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://huggingface.co/models" target="_blank" rel="noopener noreferrer"&gt;HuggingFace model hub&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://github.com/oneapi-src/oneDNN" target="_blank" rel="noopener noreferrer"&gt;oneDNN library documentation&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Build high-performance apps with AWS Lambda Managed Instances</title>
		<link>https://aws.amazon.com/blogs/compute/build-high-performance-apps-with-aws-lambda-managed-instances/</link>
					
		
		<dc:creator><![CDATA[Debasis Rath]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 14:53:01 +0000</pubDate>
				<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">423c73bf0bbaf0cc504a6aca239ab3187bf33a14</guid>

					<description>In this post, you will learn how to configure AWS Lambda Managed Instances by creating a Capacity Provider that defines your compute infrastructure, associating your Lambda function with that provider, and publishing a function version to provision the execution environments. We will conclude with production best practices including scaling strategies, thread safety, and observability for reliable performance.</description>
										<content:encoded>&lt;p&gt;High-performance applications such as CPU-intensive processing, memory-heavy analytics, and steady-state data pipelines often require more predictable compute resources than standard &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; configurations provide. &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances (LMI)&lt;/a&gt; addresses this by letting you run Lambda functions on selected Amazon EC2 &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html" target="_blank" rel="noopener noreferrer"&gt;instance types&lt;/a&gt; while preserving the Lambda programming model. You can choose over 400 Amazon Elastic Compute Cloud (Amazon EC2) instance types from general purpose, compute optimized, or memory optimized instance families to match workload requirements. AWS Lambda continues to manage infrastructure operations such as instance lifecycle management, operating system patching, runtime updates, request routing, and automatic scaling. This approach gives your teams greater control over compute characteristics, &lt;a href="https://aws.amazon.com/ec2/pricing/" target="_blank" rel="noopener noreferrer"&gt;EC2 pricing model&lt;/a&gt; and reduces operational overhead of managing servers or clusters.&lt;/p&gt; 
&lt;p&gt;In this post, you will learn how to configure AWS Lambda Managed Instances by creating a &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-capacity-providers.html" target="_blank" rel="noopener noreferrer"&gt;Capacity Provider&lt;/a&gt; that defines your compute infrastructure, associating your Lambda function with that provider, and publishing a function version to provision the execution environments. We will conclude with production best practices including scaling strategies, thread safety, and observability for reliable performance.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/create-lmi.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25941" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/create-lmi.png" alt="Figure 1. Creating Function on LMI" width="1358" height="467"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 1. Creating Function on LMI&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Creating Capacity Providers&lt;/h2&gt; 
&lt;p&gt;A Capacity Provider defines the infrastructure blueprint for running LMI functions on Amazon EC2. It specifies instance types, network placement, and scaling behavior. To create a Capacity Provider, you need two parameters: an IAM role (Capacity Provider Operator Role) granting Lambda permissions to launch and manage instances and your VPC configuration with subnets and security groups. Create this role in your account with the &lt;code&gt;&lt;a href="https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaManagedEC2ResourceOperator.html" target="_blank" rel="noopener noreferrer"&gt;AWSLambdaManagedEC2ResourceOperator&lt;/a&gt;&lt;/code&gt; managed policy following the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html" target="_blank" rel="noopener noreferrer"&gt;Principle of Least Privilege&lt;/a&gt; (granting only the minimum permissions necessary).&lt;/p&gt; 
&lt;p&gt;This &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/create-capacity-provider.html" target="_blank" rel="noopener noreferrer"&gt;command&lt;/a&gt; creates a Capacity Provider with instance types and scaling configuration:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ruby"&gt;aws lambda create-capacity-provider \
  --capacity-provider-name my-lmi-capacity \
  --vpc-config SubnetIds=subnet-abc123,subnet-def456,SecurityGroupIds=sg-xyz789 \
  --permissions-config CapacityProviderOperatorRoleArn=arn:aws:iam::123456789012:role/LMIOperatorRole \
  --instance-requirements Architectures=x86_64,AllowedInstanceTypes=c5.2xlarge,r5.4xlarge \
  --capacity-provider-scaling-config MaxVCpuCount=50,ScalingMode=Auto \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This command returns a Capacity Provider ARN that you’ll use to create your LMI function. Your functions behavior depends on four main configurations in the capacity provider:&lt;/p&gt; 
&lt;h3&gt;Instance selection&lt;/h3&gt; 
&lt;p&gt;Lambda currently supports three Amazon EC2 instance families (.large and up): C (&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/compute-optimized-instances.html" target="_blank" rel="noopener noreferrer"&gt;compute optimized&lt;/a&gt;) for CPU-heavy work, M (&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose-instances.html" target="_blank" rel="noopener noreferrer"&gt;general purpose&lt;/a&gt;) for balanced workloads, and R (&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/memory-optimized-instances.html" target="_blank" rel="noopener noreferrer"&gt;memory optimized&lt;/a&gt;) for large datasets. Choose x86 (Intel/AMD) or ARM (Graviton) architectures. If you don’t specify instance types, Lambda defaults to appropriate instances based on your function’s memory and CPU configuration. This is the recommended starting point unless you have specific performance requirements. When you need more control, use &lt;code&gt;AllowedInstanceTypes&lt;/code&gt; to specify only the instance types that Lambda can use or use &lt;code&gt;ExcludedInstanceTypes&lt;/code&gt; to exclude specific types while allowing all other instance types. You can’t use both parameters together.&lt;/p&gt; 
&lt;h3&gt;VPC and networking&lt;/h3&gt; 
&lt;p&gt;Configure multiple subnets across Availability Zones. Lambda creates a minimum Amazon EC2 fleet of three instances distributed across your configured Availability Zones to maintain availability and resiliency. Egress traffic from functions, including Amazon CloudWatch Logs, transits through the Amazon EC2 instance’s network interface in your Amazon Virtual Private Cloud (Amazon VPC). As functions send logs and metrics to CloudWatch, you will need internet access through a NAT Gateway or &lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html" target="_blank" rel="noopener noreferrer"&gt;VPC endpoints&lt;/a&gt; with &lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/what-is-privatelink.html" target="_blank" rel="noopener noreferrer"&gt;AWS PrivateLink&lt;/a&gt; for Amazon CloudWatch. This only affects egress traffic; function invoke requests don’t flow through your VPC. Security groups attached to your instances should allow only the traffic your function code needs. With LMI, configure VPC once at the Capacity Provider level instead of per function, simplifying management for multiple LMI functions. Standard Lambda functions continue to use their own VPC configurations. This Capacity Provider VPC configuration applies only to LMI functions.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-2-11.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25946" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-2-11.png" alt="Figure 2. LMI Networking" width="1543" height="680"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Figure 2. LMI Networking&lt;/strong&gt;&lt;/p&gt; 
&lt;h3&gt;Scaling configuration&lt;/h3&gt; 
&lt;p&gt;Set &lt;strong&gt;MaxVCpuCount&lt;/strong&gt; to cap compute capacity and control costs. New invocations throttle when you reach this limit until capacity frees up. Lambda monitors CPU utilization and scales instances automatically. Choose automatic scaling mode where Lambda tunes thresholds based on load patterns, or manual mode where you set a target CPU utilization percentage. Multiple functions can share the same Capacity Provider to reduce costs through better resource utilization, though you might want separate providers for functions with different performance or isolation requirements.&lt;/p&gt; 
&lt;h3&gt;Security&lt;/h3&gt; 
&lt;p&gt;Lambda encrypts &lt;a href="https://docs.aws.amazon.com/ebs/latest/userguide/ebs-encryption.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Block Store (Amazon EBS)&lt;/a&gt; volumes attached to EC2 instances with a service-managed key by default. You can provide your own &lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/overview.html" target="_blank" rel="noopener noreferrer"&gt;AWS Key Management Service (AWS KMS) key&lt;/a&gt; for encryption. Place instances in private subnets with restrictive security groups for enhanced security.&lt;/p&gt; 
&lt;h2&gt;Creating Lambda Managed Instance Functions&lt;/h2&gt; 
&lt;p&gt;You create an LMI function similarly to creating a standard Lambda function. You package your code, set your runtime, assign an execution role, and configure memory. The difference is specifying a &lt;code&gt;CapacityProviderConfig&lt;/code&gt; to tell Lambda which Capacity Provider to use and how to size each execution environment. Specify &lt;code&gt;CapacityProviderConfig&lt;/code&gt; during function creation with the Capacity Provider ARN and configure two execution environment settings. &lt;code&gt;ExecutionEnvironmentMemoryGiBPerVCpu&lt;/code&gt; sets the &lt;code&gt;memory-to-vCPU&lt;/code&gt; ratio (2:1, 4:1, or 8:1) based on your workload type and &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; defines how many concurrent requests share each execution environment. This table shows how memory and vCPU allocation maps across supported execution environment ratio.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px" colspan="2"&gt;2:1 Ratio(Compute optimized)&lt;/td&gt; 
   &lt;td style="padding: 10px" colspan="2"&gt;4:1 Ratio(General purpose)&lt;/td&gt; 
   &lt;td style="padding: 10px" colspan="2"&gt;8:1 Ratio(Memory optimized)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Memory (GB)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;vCPU(s)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;Memory (GB)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;vCPU(s)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;Memory (GB)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;vCPU(s)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;1&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;2&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;3&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;12&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;3&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;24&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;3&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;4&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;10&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;20&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;12&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;24&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;6&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;14&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;7&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;28&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;7&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;…&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;…&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Function Memory-to-CPU configuration&lt;/h3&gt; 
&lt;p&gt;Set the function’s memory size (up to 32 GB for LMI) and &lt;code&gt;ExecutionEnvironmentMemoryGiBPerVCpu&lt;/code&gt; ratio. The default ratio is 2:1. A 2:1 ratio map to compute optimized instances for CPU-intensive tasks like video encoding, 4:1 map to a general purpose for balanced workloads, and 8:1 maps to a memory optimized instances for large in-memory datasets or caching. You must set memory in multiples of the ratio. LMI requires a 2 GB minimum as execution environments need sufficient memory to handle multiple concurrent requests. LMI supports up to 32 GB memory per execution environment.&lt;/p&gt; 
&lt;h3&gt;Multi-Concurrency settings&lt;/h3&gt; 
&lt;p&gt;LMI supports &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-runtimes.html" target="_blank" rel="noopener noreferrer"&gt;multiple concurrent invocations&lt;/a&gt; sharing the same execution environment, reducing cost per invocation by maximizing vCPU utilization. This is particularly effective for I/O-bound workloads, where invocations waiting on database queries or API calls yield vCPU usage to other invocations during idle periods. Lambda defaults to max concurrency per execution environment based on your runtime: Node.js (64 per vCPU), Java, and .NET (32 per vCPU), Python (16 per vCPU). Use &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; to set a lower limit based on your workload’s resource needs. Decrease it if you’re experiencing memory pressure or CPU contention. When environments reach their configured max concurrency, new invocations throttle until capacity frees up at the execution environment level. This table captures the maximum concurrency per vCPU for each supported programming language.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Language&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;Default Max Concurrency&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Node.js&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;64 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Java&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;.NET&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;32 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Python&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;16 per vCPU&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;This &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html" target="_blank" rel="noopener noreferrer"&gt;command&lt;/a&gt; creates a Lambda function and associates it with your Capacity Provider:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;aws lambda create-function \
  --function-name my-lmi-function \
  --runtime python3.13 \
  --role arn:aws:iam::123456789012:role/LambdaExecutionRole \
  --handler app.lambda_handler \
  --zip-file fileb://function.zip \
  --memory-size 4096 \
  --capacity-provider-config '{
    "LambdaManagedInstancesCapacityProviderConfig": {
      "CapacityProviderArn": "arn:aws:lambda:us-east-1:123456789012:capacity-provider:my-lmi-capacity",
      "ExecutionEnvironmentMemoryGiBPerVCpu": 4.0,
      "PerExecutionEnvironmentMaxConcurrency": 10
    }
  }' \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Publishing Lambda Managed Instance Functions&lt;/h2&gt; 
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt;&amp;nbsp;publish a function version before invoking an LMI function. Publishing triggers Lambda to provision Amazon EC2 instances and initialize execution environments, so that the configured baseline capacity is ready before you start invoking. Expect a brief delay before your code goes live as Lambda provisions and launches Amazon EC2 instances. With LMI, execution environments pre-warm after publishing and remain invoke-ready, without cold starts for published versions. Standard Lambda environments initialize on first invoke (cold starts).&lt;/p&gt; 
&lt;p&gt;This &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/lambda/publish-version.html" target="_blank" rel="noopener noreferrer"&gt;command&lt;/a&gt; publishes a Lambda function version and provisions capacity:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws lambda publish-version --function-name my-lmi-function \
--region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After publishing, the function works with standard invocation methods including direct invokes, &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html" target="_blank" rel="noopener noreferrer"&gt;event source mappings&lt;/a&gt;, and service integrations with Amazon API Gateway, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB Streams, and Amazon EventBridge.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-3-8.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25947" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-3-8.png" alt="Figure 3. LMI Invocation from event sources" width="1073" height="519"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 3. LMI Invocation from event sources&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Scaling LMI Functions&lt;/h2&gt; 
&lt;p&gt;Lambda monitors &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-scaling.html" target="_blank" rel="noopener noreferrer"&gt;CPU utilization&lt;/a&gt; at Capacity Provider level. When CPU utilization reaches the target threshold, Lambda automatically provisions additional EC2 instances, and creates more execution environments on those instances, up to the &lt;code&gt;MaxVCpuCount&lt;/code&gt; limit you configured for your capacity provider. As demand decreases, Lambda consolidates workloads onto fewer EC2 instances. You can choose &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-scaling.html" target="_blank" rel="noopener noreferrer"&gt;automatic scaling mode&lt;/a&gt; (Lambda adjusts thresholds based on your patterns) or manual mode (you set a target CPU percentage). Automatic mode works for variable traffic patterns or when getting started. Manual mode fits when you have predictable patterns and want precise control over scaling thresholds for cost optimization.&lt;/p&gt; 
&lt;h3&gt;Min and max execution environments&lt;/h3&gt; 
&lt;p&gt;Control scaling at the function level with min and max execution environments. The default minimum is 3 execution environments to maintain &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html" target="_blank" rel="noopener noreferrer"&gt;high availability&lt;/a&gt; across Availability Zones. Your total function concurrency equals the number of execution environments multiplied by &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt;. For example, with min set to 3 and &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; of 10, you have provided capacity for 30 concurrent invocations. With max set to 20, you can scale up to 200 concurrent invocations with incoming traffic, based on CPU utilization or concurrency saturation per execution environment. Set max to cap total concurrency and prevent noisy neighbor issues when multiple functions share a Capacity Provider. LMI maintains a minimum number of execution environments with a minimum Amazon EC2 fleet, while standard Lambda scales to zero when idle. Set both min and max to 0 to deactivate a function without deleting it.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-4-7.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25936" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/25/image-4-7.png" alt="Figure 4. LMI Scaling" width="1241" height="615"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Figure 4. LMI Scaling&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;This command updates the minimum and maximum execution environments for your function:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;aws lambda put-function-scaling-config \
  --function-name my-lmi-function \
  --qualifier $LATEST \
  --function-scaling-config MinExecutionEnvironments=5,MaxExecutionEnvironments=20 \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;We’ll cover scaling patterns and throughput optimization strategies in depth in a separate blog post.&lt;/p&gt; 
&lt;h2&gt;Best Practices and Production Considerations&lt;/h2&gt; 
&lt;h3&gt;Thread Safety&lt;/h3&gt; 
&lt;p&gt;Since LMI supports multiple invocations sharing execution environments, your code must be &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html" target="_blank" rel="noopener noreferrer"&gt;thread-safe.&lt;/a&gt; Code that isn’t thread-safe causes data corruption, security issues, or unpredictable behavior under concurrent load.&lt;/p&gt; 
&lt;h4&gt;Thread safety essentials&lt;/h4&gt; 
&lt;p&gt;Avoid mutating shared objects or global variables. Use thread-local storage for request-specific data. Initialize shared clients (AWS SDK, database connections) outside the function handler and verify that configurations remain immutable during invocations. Write to &lt;code&gt;/tmp&lt;/code&gt; using request-specific file names to prevent concurrent writes.&lt;/p&gt; 
&lt;h4&gt;Runtime-specific guidance&lt;/h4&gt; 
&lt;p&gt;Java applications should use immutable objects, thread-safe collections, and proper synchronization. Node.js applications should use async context for request isolation. Python applications run separate processes per execution environment. So, focus on interprocess coordination and file locking for &lt;code&gt;/tmp&lt;/code&gt; access.&lt;/p&gt; 
&lt;h3&gt;Workload Optimization&lt;/h3&gt; 
&lt;p&gt;I/O-bound workloads perform better with higher concurrency per environment. Use asynchronous patterns and non-blocking I/O to maximize efficiency. CPU-bound workloads get no benefit from concurrency greater than one per vCPU. Instead, configure more vCPUs per function for true parallelism for compute-heavy tasks like data transformation or image processing.&lt;/p&gt; 
&lt;h3&gt;Testing&lt;/h3&gt; 
&lt;p&gt;Validate your code under concurrent execution. Test with multiple simultaneous invocations to detect race conditions and shared state issues before production deployment. You can use LocalStack for local emulation of LMI. Learn more about LocalStack’s LMI support in their &lt;a href="https://blog.localstack.cloud/testing-locally-with-lambda-managed-instances/" target="_blank" rel="noopener noreferrer"&gt;announcement blog&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Compatibility&lt;/h3&gt; 
&lt;p&gt;Tools like &lt;a href="https://docs.aws.amazon.com/powertools/" target="_blank" rel="noopener noreferrer"&gt;Powertools&lt;/a&gt; for AWS work with LMI without code changes. However, if you’re reusing existing Lambda function code, layers, or packaged dependencies on LMI, test for thread safety and compatibility with the multi-concurrent execution model before production deployment.&lt;/p&gt; 
&lt;h3&gt;Observability&lt;/h3&gt; 
&lt;p&gt;LMI automatically publishes CloudWatch metrics at two levels: capacity provider (CPU, memory, network, and disk utilization across your Amazon EC2 fleet) and execution environment (concurrency, CPU, and memory per function). Monitor &lt;code&gt;CPUUtilization&lt;/code&gt; to understand scaling headroom and right-size your &lt;code&gt;MaxVCpuCount&lt;/code&gt;. Track &lt;code&gt;ExecutionEnvironmentConcurrency&lt;/code&gt; against &lt;code&gt;ExecutionEnvironmentConcurrencyLimit&lt;/code&gt; to catch throttling before it impacts users. Lambda publishes metrics at 5-minute intervals. Use CloudWatch alarms to stay ahead of capacity limits in production.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;AWS Lambda Managed Instances combines serverless simplicity with compute flexibility, helping you run high-performance workloads with reduced operational complexity. You maintain the familiar programming model of Lambda while accessing the diverse instance types of Amazon EC2 and predictable pricing, making it well-suited for data processing pipelines, compute intensive operations and cost-sensitive steady-state applications.&lt;/p&gt; 
&lt;p&gt;Ready to get started with LMI?&amp;nbsp;Deploy our&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-aws-lambda-managed-instances/tree/main/examples/fsi/sample-retirement-savings-simulator" target="_blank" rel="noopener noreferrer"&gt;Monte Carlo risk simulation example&amp;nbsp;&lt;/a&gt;from GitHub to see LMI in action with a real compute-intensive workload. The sample includes complete infrastructure code and walks you through capacity provider configuration, function setup, and performance optimization.&lt;/p&gt; 
&lt;p&gt;We want to hear from you. Share your feedback, questions, and use cases on &lt;a href="https://repost.aws/" target="_blank" rel="noopener noreferrer"&gt;re:Post&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Enhancing auto scaling resilience by tracking worker utilization metrics</title>
		<link>https://aws.amazon.com/blogs/compute/enhancing-auto-scaling-resilience-by-tracking-worker-utilization-metrics/</link>
					
		
		<dc:creator><![CDATA[Brian Moore]]></dc:creator>
		<pubDate>Tue, 24 Mar 2026 16:17:58 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Auto Scaling]]></category>
		<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Resilience]]></category>
		<guid isPermaLink="false">d9fc642874b341b8afa90f9f3c8c6eeed67691fb</guid>

					<description>A resilient auto scaling policy requires metrics that correlate with application utilization, which may not be tied to system resources. Traditionally, auto scaling policies track system resource such as CPU utilization. These metrics are easily available, but they only work when resource consumption correlates with worker capacity. Factors such as high variance in request processing time, mixed instance types, or natural changes in application behavior over time can break this assumption.</description>
										<content:encoded>&lt;p&gt;A resilient auto scaling policy requires metrics that correlate with application utilization, which may not be tied to system resources. Traditionally, auto scaling policies track system resource such as CPU utilization. These metrics are easily available, but they only work when resource consumption correlates with worker capacity. Factors such as high variance in request processing time, mixed instance types, or natural changes in application behavior over time can break this assumption.&lt;/p&gt; 
&lt;p&gt;Worker utilization tracking offers an alternative approach. Using a combination of total worker slots, work in flight, and work waiting in the backlog, a utilization value can be calculated for use in an auto scaling policy. This approach remains accurate across fleets with mixed instance types, applications with variable latencies, and requires no changes as your application evolves.&lt;/p&gt; 
&lt;h2&gt;The limitations of resource-based scaling&lt;/h2&gt; 
&lt;p&gt;Traditional auto scaling policies track system resource metrics like CPU utilization, assuming a direct correlation between resource consumption and available application capacity. Consider an application that reads messages from &lt;a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Queue Service (SQS)&lt;/a&gt;, processes them, and writes results to &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt;. If this application uses a fixed-size thread pool to process messages, such as 10 worker threads, the application reaches maximum capacity when all threads are busy, regardless of CPU utilization.&lt;/p&gt; 
&lt;p&gt;In our example, each worker spends most of its time waiting for DynamoDB responses rather than consuming CPU. All 10 threads become occupied handling requests, but CPU utilization stays low. From the perspective of the auto scaling policy, the fleet looks like it has enough capacity because plenty of CPU headroom remains. Meanwhile, new messages accumulate in the SQS queue because no workers are available to process them.&lt;/p&gt; 
&lt;p&gt;For queue-based workloads, &lt;a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html#scale-sqs-queue-custom-metric" target="_blank" rel="noopener noreferrer"&gt;AWS provides guidance&lt;/a&gt; to scale based on an acceptable backlog per worker. This is a calculated target based on your application’s average processing latency (queue delay). This works well when processing times are consistent, but breaks down if an application has variable latency characteristics.&lt;/p&gt; 
&lt;p&gt;Consider an image processing application that initially handles thumbnails taking 500 ms each. Using the traditional guidance with a target latency of 5 seconds you calculate an acceptable backlog of 10 messages per worker and deploy your scaling policy. Over time, the application evolves to also process 4K photos which take 2 seconds each. Eventually 4K photos are 50% of your traffic and total latency for queued messages has increased to 12.5 seconds, 2.5x more than your initial target.&lt;/p&gt; 
&lt;p&gt;The scaling policy is no longer fit for its intended purpose because your original latency assumptions no longer reflect reality. To keep this type of scaling effective you must also remember to update your scaling policies as your application behavior evolves.&lt;/p&gt; 
&lt;p&gt;A shift to using mixed instance types in your application can lead to additional complexity when using traditional resource-based scaling policies. Different instance types may handle the same workload at different CPU levels leading to an unbalanced average that misrepresents your actual application health. By changing your mental model to consider how much work your application can accept instead of how much of a system resource is available you can improve your scaling rules and better model your application’s capacity.&lt;/p&gt; 
&lt;h2&gt;Understanding worker utilization&lt;/h2&gt; 
&lt;p&gt;Worker utilization measures the ratio of active work to available processing capacity. To calculate it, divide total work by total workers.&lt;/p&gt; 
&lt;p&gt;We use an SQS-based processing application as an example to demonstrate how worker utilization operates, but this approach can also be applied to other applications where work units and worker capacity are measurable. In our example application total work consists of messages waiting to be processed plus messages currently being processed. &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; provides these values through the &lt;code&gt;ApproximateNumberOfMessagesVisible&lt;/code&gt; metric (messages waiting in the queue) and the &lt;code&gt;ApproximateNumberOfMessagesNotVisible&lt;/code&gt; metric (messages currently being processed or in flight). Each host in your application should publish the number of available workers as a custom CloudWatch metric with at least a 1-minute period. For Java thread pools or Python multiprocessing pools, this represents the pool or process count. The formula works regardless of the metric period. Using the shortest period possible allows more responsive target tracking and enables &lt;a href="https://aws.amazon.com/blogs/compute/faster-scaling-with-amazon-ec2-auto-scaling-target-tracking/" target="_blank" rel="noopener noreferrer"&gt;Fast Target Tracking&lt;/a&gt; if your application has sub-minute data points.&lt;/p&gt; 
&lt;p&gt;To derive the formula, we can use the following CloudWatch Metric Math expressions:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;totalWork&lt;/code&gt; = FILL(&lt;code&gt;backlog&lt;/code&gt;, REPEAT) + FILL(&lt;code&gt;inFlight&lt;/code&gt;, REPEAT)&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;utilizationRatio&lt;/code&gt; = &lt;code&gt;totalWork&lt;/code&gt; / &lt;code&gt;workers&lt;/code&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Where:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;backlog&lt;/code&gt; = &lt;code&gt;ApproximateNumberOfMessagesVisible&lt;/code&gt; with the &lt;code&gt;Maximum&lt;/code&gt; statistic.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;inFlight&lt;/code&gt; = &lt;code&gt;ApproximateNumberOfMessagesNotVisible&lt;/code&gt; with the &lt;code&gt;Maximum&lt;/code&gt; statistic.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;workers&lt;/code&gt; = Your custom &lt;code&gt;TotalWorkers&lt;/code&gt; metric with the &lt;code&gt;Sum&lt;/code&gt; statistic.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Putting the components together the final expression for your target tracking scaling policy uses the following formula:&lt;/p&gt; 
&lt;p&gt;IF(FILL(&lt;code&gt;workers&lt;/code&gt;, 0) &amp;gt; 0, &lt;code&gt;utilizationRatio&lt;/code&gt;, IF(&lt;code&gt;totalWork&lt;/code&gt; &amp;gt; 0, 1, 0))&lt;/p&gt; 
&lt;p&gt;The FILL function uses last known values if SQS metrics are delayed, and the IF statement handles the case where you have no traffic and your fleet scales to zero instances. When there are no available workers, the formula metric reports 1 to indicate that the workers are fully saturated. This prevents the application from getting stuck at zero capacity and not being able to respond to any requests.&lt;/p&gt; 
&lt;p&gt;In this formula, a value of 1 or higher represents full or over saturation, where all workers are busy with no spare capacity, like running at 100% CPU. Values below 1 indicate available capacity for your application to process more work.&lt;/p&gt; 
&lt;p&gt;For applications without a measurable backlog metric, you can track worker utilization using only the in-flight work. This approach works for APIs or other synchronous workloads where work arrives and is immediately assigned to workers rather than queuing. In these cases, the formula becomes:&lt;/p&gt; 
&lt;p&gt;IF(FILL (&lt;code&gt;workers&lt;/code&gt;, 0) &amp;gt; 0, &lt;code&gt;utilizationRatio&lt;/code&gt;, IF(FILL(&lt;code&gt;inFlight&lt;/code&gt;, 0) &amp;gt; 0, 1, 0))&lt;/p&gt; 
&lt;p&gt;In this scenario the utilization ratio is calculated as follows:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;utilizationRatio&lt;/code&gt; = FILL(&lt;code&gt;inFlight&lt;/code&gt;, REPEAT) / &lt;code&gt;workers&lt;/code&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The definitions of &lt;code&gt;workers&lt;/code&gt; and &lt;code&gt;inFlight&lt;/code&gt; remain the same for this formula. The primary difference is that the ratio directly tracks workers available and does not consider the backlog as an option.&lt;/p&gt; 
&lt;h2&gt;How worker utilization prevents outages&lt;/h2&gt; 
&lt;p&gt;Worker utilization-based scaling works for any application that can define available workers and total work. When the ratio of total work to available workers exceeds your threshold, the system scales out. This approach measures whether workers are available to handle the workload and treats application bottlenecks consistently. Whether workers are waiting on network I/O, performing CPU-intensive calculations, or experiencing another bottleneck doesn’t matter; the only question is whether total work exceeds available worker capacity. Any situation causing messages to accumulate on the queue increases the utilization ratio and triggers scale-out.&lt;/p&gt; 
&lt;h2&gt;Implementing worker utilization scaling&lt;/h2&gt; 
&lt;p&gt;To set up worker utilization-based auto scaling, identify metrics to use in the formula discussed earlier. First, identify a metric to track the amount of work being worked on. For SQS-based processing, AWS provides this metric. Second, implement a custom metric from your application representing the total workers. Optionally you can also identify a metric to track the available backlog of work.&lt;/p&gt; 
&lt;p&gt;Using CloudWatch metric math, you calculate the utilization metric and use it in a target tracking scaling policy. Here is an example &lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt; snippet showing the metric math configuration for a &lt;a href="https://aws.amazon.com/pm/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt; Auto Scaling group. This snippet shows only the scaling policy configuration and is only an example, before using in production fully test with your application. Your complete template also needs IAM roles with appropriate permissions for SQS, DynamoDB, and CloudWatch access.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-yaml"&gt;ScalingPolicy: 
  Type: AWS::AutoScaling::ScalingPolicy 
  Properties: 
    AutoScalingGroupName: !Ref AutoScalingGroup 
    PolicyType: TargetTrackingScaling 
    TargetTrackingConfiguration: 
      TargetValue: 0.7 
      CustomizedMetricSpecification: 
        Metrics: 
          - Id: backlog 
            MetricStat: 
            Metric: 
              Namespace: AWS/SQS 
              MetricName: ApproximateNumberOfMessagesVisible 
              Dimensions: 
                - Name: QueueName 
                  Value: !GetAtt ProcessingQueue.QueueName 
              Stat: Maximum 
          - Id: inFlight 
            MetricStat: 
            Metric: 
              Namespace: AWS/SQS 
              MetricName: ApproximateNumberOfMessagesNotVisible 
              Dimensions: 
                - Name: QueueName 
                  Value: !GetAtt ProcessingQueue.QueueName 
              Stat: Maximum 
          - Id: workers 
            MetricStat: 
            Metric: 
              Namespace: YourApp 
              MetricName: TotalWorkers 
            Stat: Sum 
          - Id: totalWork 
            Expression: FILL(backlog, REPEAT) + FILL(inFlight, REPEAT) 
          - Id: utilizationRatio 
            Expression: totalWork / workers 
          - Id: utilization 
            Expression: IF(FILL(workers, 0) &amp;gt; 0, utilizationRatio, IF(totalWork &amp;gt; 0, 1, 0)) 
            ReturnData: true&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;This approach also works for &lt;a href="https://aws.amazon.com/ecs/" target="_blank" rel="noopener noreferrer"&gt;Amazon ECS&lt;/a&gt; services using &lt;a href="https://aws.amazon.com/autoscaling/" target="_blank" rel="noopener noreferrer"&gt;AWS Application Auto Scaling&lt;/a&gt;. The metric math configuration remains the same, but you create an &lt;code&gt;AWS::ApplicationAutoScaling::ScalingPolicy&lt;/code&gt; resource instead, adapting the parameters accordingly.&lt;/p&gt; 
&lt;h2&gt;Choosing a target utilization&lt;/h2&gt; 
&lt;p&gt;Since the worker utilization metric directly tracks the available capacity of your application, the target utilization value you choose reflects your organization’s balance between cost efficiency and availability. Lower target values provide more headroom for traffic spikes and faster response to load changes but result in higher infrastructure costs due to lower utilization. Higher target values maximize cost efficiency by keeping workers busy but leave less headroom for sudden traffic increases.&lt;/p&gt; 
&lt;p&gt;When choosing a target consider traffic patterns, acceptable latency during scale-out events, and cost sensitivity. Applications with unpredictable traffic spikes may benefit from lower targets, while an application with predictable load can safely use higher targets. Start with a moderate value like 0.7 and adjust based on observed behavior and your business requirements. If you previously tracked a resource utilization metric such as CPU, consider starting with the same target.&lt;/p&gt; 
&lt;h2&gt;Monitoring resource utilization for cost optimization&lt;/h2&gt; 
&lt;p&gt;While worker utilization drives scaling decisions, CPU and latency should be regularly evaluated to ensure cost-effective operations. Resource-based metrics can identify host resizing opportunities to better match your application requirements. If no scale-in happens when CPU utilization is consistently low, you are likely running instances that are too large for your workload. By using worker utilization in an auto scaling policy, you can switch to a different instance type without adjusting the auto scaling policy. The formula automatically adapts as you add different instance types or update the capacity per worker.&lt;/p&gt; 
&lt;p&gt;Conversely, if CPU utilization is consistently high while worker utilization remains at your target, your instances might be undersized. Upgrading to larger instance types can improve per-worker throughput, allowing each worker to process tasks faster. Changes to your auto scaling policy are not needed in this situation either. As messages are processed faster, they spend less time in the in-flight state, and the utilization ratio naturally adjusts.&lt;/p&gt; 
&lt;p&gt;This approach manages application availability independent of instance size, while resource utilization guides cost optimization. Each can be optimized independently without complex coordination.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Worker utilization-based auto scaling reduces the operational burden of continuously validating your scaling rules as application requirements and infrastructure change. By tracking the ratio of work to workers, your auto scaling policies automatically respond to capacity constraints based on available work. The approach works across workloads with discrete processing units and remains effective when you modify instance configurations or application worker pool sizes.&lt;/p&gt; 
&lt;p&gt;Implementation requires identifying a metric for available work, publishing a custom metric representing total workers, and using CloudWatch metric math in a target tracking scaling policy. This setup provides resilience that scaling based solely on resource metrics cannot achieve, while maintaining the flexibility to optimize costs and change your instance size without impacting system availability.&lt;/p&gt; 
&lt;p&gt;To get started:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Identify an application in your environment that uses a worker pool.&lt;/li&gt; 
 &lt;li&gt;Instrument the application to publish worker count metrics.&lt;/li&gt; 
 &lt;li&gt;Configure a scaling policy tracking worker utilization.&lt;/li&gt; 
 &lt;li&gt;Monitor how the system responds to traffic changes and capacity events.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Learn more&lt;/h2&gt; 
&lt;p&gt;To learn more about auto scaling and monitoring, see the following resources:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 Auto Scaling target tracking scaling policies&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-autoscaling-targettracking.html" target="_blank" rel="noopener noreferrer"&gt;AWS Application Auto Scaling for Amazon ECS services&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html" target="_blank" rel="noopener noreferrer"&gt;Using Amazon CloudWatch metric math&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html" target="_blank" rel="noopener noreferrer"&gt;Publishing custom CloudWatch metrics&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Best practices for Lambda durable functions using a fraud detection example</title>
		<link>https://aws.amazon.com/blogs/compute/best-practices-for-lambda-durable-functions-using-a-fraud-detection-example/</link>
					
		
		<dc:creator><![CDATA[Debasis Rath]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 22:04:39 +0000</pubDate>
				<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<guid isPermaLink="false">8e5c3ce20aad30d0530d3aa36548678e22b7a636</guid>

					<description>This post walks through a fraud detection system built with durable functions. It also highlights the best practices that you can apply to your own production workflows, from approval processes to data pipelines to AI agent orchestration.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda durable functions&lt;/a&gt;&amp;nbsp;extend the Lambda programming model to build fault-tolerant multi-step applications and AI workflows using familiar programming languages. They preserve progress despite interruptions and execution can suspend for up to one year, for human approvals, scheduled delays, or other external events, without incurring compute charges for on-demand functions.&lt;/p&gt; 
&lt;p&gt;This post walks through a fraud detection system built with durable functions. It also highlights the best practices that you can apply to your own production workflows, from approval processes to data pipelines to AI agent orchestration. You will learn how to handle concurrent notifications, wait for customer responses, and recover from failures without losing progress. If you are new to durable functions, check out the &lt;a href="https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;Introduction to Durable Functions blog post&lt;/a&gt;&amp;nbsp;first.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Fraud detection with human-in-the-loop&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Consider a credit card fraud detection system, which uses an AI agent to analyze incoming transactions and assign risk scores. For ambiguous cases (medium-risk scores), the system needs human approval before authorizing a transaction. The workflow branches based on risk:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Low risk (score &amp;lt; 3)&lt;/strong&gt;: Authorize immediately&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;High risk (score ≥ 5)&lt;/strong&gt;: Send to the fraud department immediately&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Medium risk (score 3–4)&lt;/strong&gt;: Suspend transaction, send SMS and email to cardholder, wait up to 24 hours for confirmation (wait time is customizable)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;div id="attachment_25907" style="width: 946px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25907" loading="lazy" class="wp-image-25907 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/23/compute-2476-arch-diag.png" alt="Figure 1. Agentic Fraud Detection with durable Lambda functions" width="936" height="508"&gt;
 &lt;p id="caption-attachment-25907" class="wp-caption-text"&gt;Figure 1. Agentic Fraud Detection with durable Lambda functions&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;With human-in-the-loop workflows, response times can vary from minutes to hours. These delays introduce the need to durably preserve the state without consuming compute resources while waiting. With financial systems, we must also implement idempotency to guard against duplicate messages (invocations) and recover from failures without reprocessing completed work. To address these requirements, developers implement polling patterns with external state stores like Amazon DynamoDB or Amazon Simple Storage Service (Amazon S3) to manage idempotency, pay for idle compute while waiting for callbacks, introduce external orchestration components, or build asynchronous message-driven systems to handle long-processing tasks.&lt;/p&gt; 
&lt;p&gt;Lambda durable functions provide a new alternative to address these challenges through durable execution, a pattern that uses checkpoints (saved state snapshots) to preserve progress and replays from saved state to recover from failures or resume after waiting. With checkpointing capabilities, you no longer need to pay Lambda compute charges while waiting, whether for callbacks, scheduled delays, or external events. Learn how to implement durable functions using the complete fraud detection implementation at this&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-lambda-durable-functions/tree/main/Industry%20Solutions/Financial%20Services%20%28FSI%29/FraudDetection" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. You can deploy it to your AWS account and experiment with the code as you read. The repository includes deployment instructions, sample data, and helper functions for testing.&lt;/p&gt; 
&lt;p&gt;As we walk through the code, we’ll focus on best practices for designing workflows with durable execution and how to apply these patterns correctly in production workflows.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Design steps to be idempotent&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Durable execution is designed to preserve progress through checkpoints and replay, but that reliability model means step logic can execute more than once. When steps retry, how do you prevent duplicate actions like charges to the credit card or repeated customer SMS or email notifications?&lt;/p&gt; 
&lt;p&gt;Durable functions use&amp;nbsp;&lt;strong&gt;&lt;em&gt;at-least-once execution&lt;/em&gt;&lt;/strong&gt;&amp;nbsp;by default, executing each step at least one time, potentially more if failures occur. When a step fails, it retries. There are two strategies to design idempotent steps that prevent duplicate side effects: using external API idempotency keys and using the at-most-once step semantics built into durable functions.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Strategy A&lt;/strong&gt;: External API Idempotency Keys&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Strategy A: Use external API idempotency keys
await context.step(`authorize-${tx.id}`, async () =&amp;gt; {
  return payment.charges.create({
    amount: tx.amount,
    currency: 'usd',
    idempotency_key: `tx-${tx.id}`, // Prevents duplicate charges
    description: `Transaction ${tx.id}`
  });
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;idempotency_key in API call&lt;/strong&gt;: If the step retries, the payment processor recognizes it’s a duplicate request and returns the original result&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Defense in depth&lt;/strong&gt;: Two layers of protection: Lambda checkpointing and external API idempotency&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Each layer provides independent protection. If Lambda’s checkpoint fails, the external API prevents duplicate charges. For legacy systems without idempotency support, where it’s critical that an operation is not executed more than once, use at-most-once semantics:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Strategy B&lt;/strong&gt;: Use At-Most-Once Semantics&lt;/p&gt; 
&lt;p&gt;For legacy systems without idempotency support, use at-most-once execution, a delivery feature that executes each step zero or one time, never more:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Strategy B: At-most-once step semantics
await context.step("charge-legacy-system", async () =&amp;gt; {
  return await legacyPaymentSystem.charge(tx.amount);
}, {
  semantics: StepSemantics.AtMostOncePerRetry,
  retryStrategy: createRetryStrategy({ maxAttempts: 0 })
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This checkpoints before step execution, preventing the step from re-execution on retries. The tradeoff? If the step fails, you must decide whether to retry (risking duplicates) or fail the entire workflow.&lt;/p&gt; 
&lt;p&gt;Use idempotency for critical side effects like payment processing, database writes, external API calls, state transitions, and resource provisioning. Read more about idempotency&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-idempotency.html" target="_blank" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Prevent duplicate executions with DurableExecutionName&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Idempotent steps prevent duplicate side effects within a single execution, but what about duplicate workflow executions running concurrently? For example, duplicate messages in the queue, users clicking “Submit” multiple times in the UI, or the same event arriving via multiple channels like webhook and API. Without protection, each invocation creates a separate durable execution, potentially running the fraud check multiple times, sending duplicate notifications, and creating confusion about which execution is authoritative. Durable functions provide &lt;code&gt;DurableExecutionName&lt;/code&gt; to help ensure only one concurrent execution per unique name.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Invoke fraud detection function with execution name
await lambda.invoke({
  FunctionName: 'fraud-detection',
  InvocationType: 'Event',
  DurableExecutionName: `tx-${transactionId}`,
  Payload: JSON.stringify({
    id: transactionId,
    amount: 6500,
    location: 'New York, NY',
    vendor: 'Amazon.com'
  })
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;DurableExecutionName: tx-${transactionId}&lt;/strong&gt;: Uses the transaction ID as a unique execution identifier&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;InvocationType: ‘Event’&lt;/strong&gt;: Asynchronous invocation supports long-running workflows beyond 15 minutes&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;One execution per transaction&lt;/strong&gt;: If three invocations arrive with the same transaction ID, only the first creates an execution. Subsequent requests with the same execution name and payload receive an idempotent response returning the existing execution’s ARN, rather than creating a new execution.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Lambda durable functions work with Lambda event sources, including event source mappings (ESM) such as&amp;nbsp;&lt;a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Queue Service (Amazon SQS)&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/kinesis/" target="_blank" rel="noopener noreferrer"&gt;Amazon Kinesis&lt;/a&gt;, and DynamoDB Streams. ESMs invoke durable functions synchronously and inherit Lambda’s&amp;nbsp;&lt;a href="https://docs.amazonaws.cn/en_us/lambda/latest/dg/durable-invoking-esm.html" target="_blank" rel="noopener noreferrer"&gt;15-minute invocation limit&lt;/a&gt;. Therefore, like direct Request/Response invocations, durable functions executions using event source mappings cannot exceed 15 minutes.&lt;/p&gt; 
&lt;p&gt;For workflows exceeding 15 minutes, use an intermediary Lambda function between the event source mapping and durable function:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Intermediary function for SQS -&amp;gt; Durable function
export const handler = async (event) =&amp;gt; {
  for (const record of event.Records) {
    const transaction = JSON.parse(record.body);
    await lambda.invoke({
      FunctionName: process.env.FRAUD_DETECTION_FUNCTION,
      InvocationType: 'Event',
      DurableExecutionName: `tx-${transaction.id}`,
      Payload: JSON.stringify(transaction)
    });
  }
};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This removes the 15-minute limit, allows executions up to one year, and enables custom execution name parameters for idempotency. Use&amp;nbsp;&lt;a href="https://aws.amazon.com/powertools-for-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Powertools for AWS Lambda&lt;/a&gt; to prevent duplicate invocations of the durable function when the event source mapping retries the intermediary function. Additionally, configure failure handling for your event source to capture failed invocations for future redrive or replay. For example, dead letter queues for SQS, or on-failure destinations for other event sources.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Match timeouts to invocation type&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;One important configuration detail ties these patterns together: matching your timeout settings to your invocation type. Lambda synchronous invocations (&lt;code&gt;RequestResponse&lt;/code&gt;) have a hard 15-minute timeout limit. If you configure a durable execution to run for 24 hours but invoke it synchronously, the synchronous invocation fails immediately with an exception. Durable functions support workflows up to one year when invoked asynchronously.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Lambda function configuration
{
  FunctionName: 'fraud-detection',
  Timeout: 300,
  MemorySize: 512,
  DurableConfig: {
    ExecutionTimeout: 90000
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;And invoke asynchronously:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Async invocation for long-running workflow
await lambda.invoke({
  FunctionName: 'fraud-detection',
  InvocationType: 'Event',
  DurableExecutionName: `tx-${transactionId}`,
  Payload: JSON.stringify(transaction)
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Timeout: 300&lt;/strong&gt;: Lambda function timeout (5 minutes in this example, up to a maximum of 15 minutes). This defines the maximum duration for each active execution phase, including the initial invocation and any subsequent replays. Set this to cover the longest expected active processing time in your workflow.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ExecutionTimeout: { hours: 25 }&lt;/strong&gt;: Durable execution timeout covers the workflow’s expected total duration including suspension periods. Set this slightly above the longest wait timeout to avoid edge cases.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;InvocationType: ‘Event’&lt;/strong&gt;: Asynchronous invocation removes the 15-minute limit and enables executions up to one year.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The Lambda function timeout applies to active execution phases (AI calls, notification sending). During suspension (waiting for callbacks), the function isn’t running, so this timeout doesn’t apply. Setting the durable execution timeout to a meaningful boundary prevents workflows from running longer than expected. Without an explicit timeout, executions can run up to the maximum lifetime of one year.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Synchronous (RequestResponse)&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Asynchronous (Event)&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Total duration&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Under 15 minutes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 1 year&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Caller needs result&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;No&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Idempotency support&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Waits with suspension&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Yes&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;&lt;strong&gt;Execute Concurrent Operations with context.parallel()&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;In the fraud detection workflow, the system notifies the cardholder through multiple channels such as SMS and email. Preserving business logic when executing parallel workflows introduces code complexities such as managing execution state across branches, handling synchronization, and coordinating branch completion. Durable functions simplify parallel workflow implementation using&amp;nbsp;&lt;code&gt;context.parallel()&lt;/code&gt;, which executes branches concurrently while maintaining durable checkpoints for each branch and provides configurable options to handle partial completions. By checkpointing and managing the state internally, durable functions help make sure that the state is preserved even if there are retries or failures. Note that&amp;nbsp;&lt;code&gt;context.parallel()&lt;/code&gt;&amp;nbsp;manages the internal execution state for each branch. If your branches interact with a shared external state (such as a database), you’re responsible for managing concurrent access to that external state.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Human-in-the-loop: verify via email AND SMS (first response wins)
let verified = await context.parallel("human-verification", [
  (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationEmail",
    async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'email', tx)
  ),
  (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationSMS",
    async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'sms', tx)
  )
], {
  maxConcurrency: 2,
  completionConfig: {
    minSuccessful: 1 // Continue after 1 success
  }
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;maxConcurrency: 2&lt;/strong&gt;: Both notifications sent at the same time&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;minSuccessful: 1&lt;/strong&gt;: We only need one channel to succeed, whichever responds first wins&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Each parallel branch waits for its callback independently, and the durable execution checkpoints each branch as part of the execution state. Using the&amp;nbsp;&lt;code&gt;minSuccessful&lt;/code&gt;&amp;nbsp;parameter, you control the minimum number of successful branch executions required for the parallel operation to complete. In this example, only one of the two branches needs to succeed. Verifications through SMS or email are both valid, and the workflow resumes as soon as either channel completes successfully. We call this the&amp;nbsp;&lt;strong&gt;first-response-wins&lt;/strong&gt;&amp;nbsp;pattern. This pattern works well when you only need a single successful result from any parallel branch and want the remaining branches to stop blocking progress.&lt;/p&gt; 
&lt;p&gt;But what happens if neither channel responds? Without timeouts, this workflow could remain suspended for up to the configured execution lifetime.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Always configure callback timeouts&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Let’s add timeout protection to the parallel verification from the previous section.&amp;nbsp;&lt;code&gt;context.waitForCallback()&lt;/code&gt;&amp;nbsp;accepts a&amp;nbsp;timeout&amp;nbsp;option that bounds how long each branch waits before throwing an exception. By wrapping the parallel call in a try/catch, you can implement fallback logic when users don’t respond in time.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Enhanced: parallel verification with timeout and error handling
let verified;
try {
  verified = await context.parallel("human-verification", [
    (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationEmail",
      async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'email', tx),
      { timeout: { days: 1 } }  // Wait up to 1 day for email response
    ),
    (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationSMS",
      async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'sms', tx),
      { timeout: { days: 1 } }  // Wait up to 1 day for SMS response
    )
  ], {
    maxConcurrency: 2,
    completionConfig: {
      minSuccessful: 1
    }
  });
} catch (error) {
  const isTimeout = error.message?.includes("timeout");
  if (isTimeout) {
    context.logger.warn("Customer verification timeout", { error, txId: tx.id });
    // Fallback: escalate to fraud department
    return await context.step("sendToFraudDepartment", async () =&amp;gt;
      sendToFraudDepartment(tx, true)
    );
  }
  throw error; // Re-throw non-timeout errors
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice what changed from the previous section:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;timeout: { days: 1 }&lt;/strong&gt;: Each callback branch now has a maximum wait time of 1 day. If neither the email nor SMS callback arrives within that window, a timeout exception is thrown.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;try/catch with timeout detection&lt;/strong&gt;: The catch block distinguishes between timeout errors and other exceptions. When a timeout occurs, the workflow implements fallback logic by escalating the transaction to the fraud department, while non-timeout errors are re-thrown to be handled by the durable execution retry mechanism.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Without this error handling, the entire execution fails unhandled. The timeout also works with the&amp;nbsp;&lt;code&gt;minSuccessful&lt;/code&gt;&amp;nbsp;configuration: if one branch times out but the other succeeds, the parallel operation still completes successfully since only one successful result is required.&lt;/p&gt; 
&lt;p&gt;For advanced use cases where the callback handler performs long-running work, you can also configure a&amp;nbsp;&lt;code&gt;heartbeatTimeout&lt;/code&gt;&amp;nbsp;to detect stalled callbacks before the main timeout expires. See the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Developer Guide&lt;/a&gt;&amp;nbsp;for details.&lt;/p&gt; 
&lt;p&gt;Use callback timeouts for human approvals, external API callbacks, asynchronous processing, and third-party integrations.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Putting it all together: complete fraud detection implementation&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Now let’s see how all the best practices work together in the complete fraud detection workflow:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;import { withDurableExecution } from "@aws/durable-execution-sdk-js";
import { BedrockAgentCoreClient, InvokeAgentRuntimeCommand } from "@aws-sdk/client-bedrock-agentcore";

const agentRuntimeArn = process.env.AGENT_RUNTIME_ARN;
const agentRegion = process.env.AGENT_REGION || 'us-east-1';
const client = new BedrockAgentCoreClient({ region: agentRegion });

export const handler = withDurableExecution(async (event, context) =&amp;gt; {
  const tx = {
    id: event.id,
    amount: event.amount,
    location: event.location,
    vendor: event.vendor
  };

  // AI fraud assessment with error handling
  tx.score = await context.step("fraudCheck", async () =&amp;gt; {
    try {
      const payloadJson = JSON.stringify({ input: { amount: tx.amount } });
      const command = new InvokeAgentRuntimeCommand({
        agentRuntimeArn: agentRuntimeArn,
        qualifier: 'DEFAULT',
        payload: Buffer.from(payloadJson, 'utf-8'),
        contentType: 'application/json',
        accept: 'application/json'
      });
      const response = await client.send(command);
      const responseText = await response.response.transformToString();
      const result = JSON.parse(responseText);
      return result?.output?.risk_score ?? 5;  // Default to high-risk if score unavailable
    } catch (error) {
      context.logger.error("Fraud check failed", { error, txId: tx.id });
      return 5;
    }
  });

  // Route based on AI decision
  if (tx.score &amp;lt; 3) {
    // Best Practice: Idempotent authorization
    return await context.step(`authorize-${tx.id}`, async () =&amp;gt;
    authorizeTransaction(tx, { idempotency_key: `tx-${tx.id}` })
    );
  }

  if (tx.score &amp;gt;= 5) {
    return await context.step(`sendToFraudDepartment-${tx.id}`, async () =&amp;gt;
      sendToFraudDepartment(tx)
    );
  }

  // Medium risk: need human verification
  await context.step(`suspend-${tx.id}`, async () =&amp;gt; suspendTransaction(tx));

  // Best Practice: Concurrent operations with timeout configuration
  let verified;
  try {
    verified = await context.parallel("human-verification", [
      (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationEmail",
        async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'email', tx),
        { timeout: { days: 1 } }
      ),
      (ctx) =&amp;gt; ctx.waitForCallback("SendVerificationSMS",
        async (callbackId) =&amp;gt; sendCustomerNotification(callbackId, 'sms', tx),
        { timeout: { days: 1 } }
      )
    ], {
      maxConcurrency: 2,
      completionConfig: {
        minSuccessful: 1
      }
    });
  } catch (error) {
    const isTimeout = error.message?.includes("timeout");
    context.logger.warn(
      isTimeout ? "Customer verification timeout" : "Customer verification failed",
      { error, txId: tx.id }
    );
    return await context.step(`timeout-escalate-${tx.id}`, async () =&amp;gt;
      sendToFraudDepartment(tx, true)
    );
  }

  // Idempotent final step with idempotency key
  return await context.step(`finalize-${tx.id}`, async () =&amp;gt; {
    const action = !verified.hasFailure &amp;amp;&amp;amp; verified.successCount &amp;gt; 0
      ? "authorize"
      : "escalate";
    if (action === "authorize") {
      return authorizeTransaction(tx, true, { idempotency_key: `finalize-${tx.id}` });
    }
    return sendToFraudDepartment(tx, true);
  });
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice how the best practices work together:&amp;nbsp;&lt;code&gt;context.parallel()&lt;/code&gt;&amp;nbsp;sends SMS and email concurrently, resuming when either channel responds. Both callbacks configure 1-day timeouts with try/catch handling that escalates on timeout. The&amp;nbsp;&lt;code&gt;DurableExecutionName: tx-${transactionId}&lt;/code&gt;&amp;nbsp;parameter (specified at invocation time, shown in the following CLI example) provides execution-level deduplication, while idempotency keys in the authorization steps prevent duplicate charges at the application layer. Asynchronous invocation (&lt;code&gt;InvocationType: 'Event'&lt;/code&gt;) enables the 24-hour wait period.&lt;/p&gt; 
&lt;p&gt;Once deployed, invoke the function asynchronously with a sample transaction to see it in action:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;transactionId="123456789"
aws lambda invoke \
  --function-name "fraud-detection:$LATEST" \
  --invocation-type Event \
  --durable-execution-name "tx-${transactionId}" \
  --cli-binary-format raw-in-base64-out \
  --payload "{\"id\": \"${transactionId} \", \"amount\": 6500, \"location\": \"New York, NY\", \"vendor\": \"Amazon.com\"}" \
  --region us-east-2 \
  response.json&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Upon successful invocation, you can view the execution state in the Lambda console’s durable operations view. The execution shows a suspended state, waiting for customer response:&lt;/p&gt; 
&lt;div id="attachment_25859" style="width: 911px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25859" loading="lazy" class="size-full wp-image-25859" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/17/compute-2476-image-2.png" alt="Figure 2: Suspended execution state" width="901" height="495"&gt;
 &lt;p id="caption-attachment-25859" class="wp-caption-text"&gt;Figure 2: Suspended execution state&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Notice the &lt;code&gt;fraudCheck&lt;/code&gt; and &lt;code&gt;suspendTransaction&lt;/code&gt; steps show as succeeded with checkpointed results. The human-verification parallel operation shows that both SMS and email branches started. The timeline shows the function in a suspended state. Simulate a customer response by sending a callback success through the console, AWS Command Line Interface (AWS CLI) or Lambda API:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;div class="hide-language"&gt; 
  &lt;pre&gt;&lt;code class="lang-code"&gt;aws lambda send-durable-execution-callback-success \
	--callback-id &amp;lt;CALLBACK_ID_FROM_EMAIL_OR_SMS&amp;gt; \
	--result '{"status":"approved","channel":"email"}' \
	--cli-binary-format raw-in-base64-out&lt;/code&gt;&lt;/pre&gt; 
 &lt;/div&gt; 
&lt;/div&gt; 
&lt;div id="attachment_25860" style="width: 911px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25860" loading="lazy" class="size-full wp-image-25860" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/17/compute-2476-image-3.png" alt="Figure 3: Completed execution with customer approval" width="901" height="597"&gt;
 &lt;p id="caption-attachment-25860" class="wp-caption-text"&gt;Figure 3: Completed execution with customer approval&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;After receiving the customer’s approval, the durable execution resumes from its checkpoint, authorizes the transaction, and completes. The execution spanned hours but consumed only seconds of compute time.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;With durable functions, Lambda extends beyond single-event processing to power core business processes and long-running workflows, while retaining the operational simplicity, reliability, and scale that define Lambda. You can build applications that run for days or months, survive failures, and resume where they left off, all within the familiar event-driven programming model.&lt;/p&gt; 
&lt;p&gt;Deploy the fraud detection workflow from our&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-lambda-durable-functions/tree/main" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&amp;nbsp;and experiment with human-in-the-loop patterns in your own account. For core concepts, see&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;Introduction to AWS Lambda Durable Functions&lt;/a&gt;. For comprehensive documentation, see the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Developer Guide&lt;/a&gt;. Browse&amp;nbsp;&lt;a href="https://serverlessland.com/search?search=Durable+function" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;&amp;nbsp;for reference architectures and discover where durable execution fits in your designs.&lt;/p&gt; 
&lt;p&gt;Share your feedback, questions, and use cases in the SDK repositories or on&amp;nbsp;&lt;a href="https://repost.aws/" target="_blank" rel="noopener noreferrer"&gt;re:Post&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Testing Step Functions workflows: a guide to the enhanced TestState API</title>
		<link>https://aws.amazon.com/blogs/compute/testing-step-functions-workflows-a-guide-to-the-enhanced-teststate-api/</link>
					
		
		<dc:creator><![CDATA[D Surya Sai]]></dc:creator>
		<pubDate>Sun, 22 Mar 2026 17:06:38 +0000</pubDate>
				<category><![CDATA[AWS Step Functions]]></category>
		<category><![CDATA[Compute]]></category>
		<guid isPermaLink="false">2757f33197f633fca8298a2313f813daf0bb5967</guid>

					<description>AWS Step Functions recently announced new enhancements to local testing capabilities for Step Functions, introducing API-based testing that developers can use to validate workflows before deploying to AWS. As detailed in our Announcement blog post, the TestState API transforms Step Functions development by enabling individual state testing in isolation or as complete workflows. This supports […]</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://aws.amazon.com/step-functions/" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt; recently announced new enhancements to local testing capabilities for Step Functions, introducing API-based testing that developers can use to validate workflows before deploying to AWS. As detailed in our Announcement &lt;a href="https://aws.amazon.com/blogs/aws/accelerate-workflow-development-with-enhanced-local-testing-in-aws-step-functions/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;, the TestState API transforms Step Functions development by enabling individual state testing in isolation or as complete workflows. This supports mocked responses and actual AWS service integrations, and provides advanced capabilities. These capabilities include Map/Parallel states, error simulation with retry mechanisms, context object validation, and detailed inspection metadata for comprehensive local testing of your serverless application.&lt;/p&gt; 
&lt;p&gt;The TestState API can be accessed through multiple interfaces such as &lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface&lt;/a&gt; (AWS CLI), &lt;a href="https://aws.amazon.com/what-is/sdk/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK&lt;/a&gt;, &lt;a href="https://www.localstack.cloud/" target="_blank" rel="noopener noreferrer"&gt;LocalStack&lt;/a&gt;. By default, TestState API in AWS CLI and SDK runs against the remote &lt;a href="https://docs.aws.amazon.com/general/latest/gr/step-functions.html#step-functions_region" target="_blank" rel="noopener noreferrer"&gt;AWS endpoint&lt;/a&gt;, providing validation against the actual Step Functions service infrastructure. We’ve partnered with LocalStack to offer an additional testing endpoint for the TestState API. Developers can use LocalStack for unit testing their workflows by changing the &lt;a href="https://aws.amazon.com/what-is/sdk/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK&lt;/a&gt; client endpoint configuration to point to LocalStack: &lt;code&gt;&lt;em&gt;http://localhost.localstack.cloud:4566/&lt;/em&gt;&lt;/code&gt; instead of &lt;a href="https://docs.aws.amazon.com/general/latest/gr/step-functions.html#step-functions_region" target="_blank" rel="noopener noreferrer"&gt;AWS endpoint&lt;/a&gt;. This approach provides complete network isolation when needed. For a streamlined development experience, you can also use the &lt;a href="https://docs.localstack.cloud/aws/tooling/vscode-extension/" target="_blank" rel="noopener noreferrer"&gt;LocalStack VSCode extension&lt;/a&gt; to automatically configure your environment to point to the LocalStack endpoint. This approach is detailed in the AWS &lt;a href="https://aws.amazon.com/blogs/compute/enhance-the-local-testing-experience-for-serverless-applications-with-localstack/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;This blog post demonstrates building test suites to unit test your Step Functions workflows using the AWS SDK for Python using the &lt;a href="https://docs.pytest.org/en/stable/" target="_blank" rel="noopener noreferrer"&gt;pytest framework&lt;/a&gt;. The complete implementation is available in the &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Building test cases using the TestState API&lt;/h2&gt; 
&lt;p&gt;This example workflow implements a real-world ecommerce order processing system using &lt;a href="https://jsonata.org/" target="_blank" rel="noopener noreferrer"&gt;JSONata&lt;/a&gt; for advanced data transformations. It incorporates complex Step Functions patterns including distributed Map states, Parallel execution, and waitForTaskToken callback mechanisms. The process validates orders through &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; functions, distributes order item processing with configurable failure tolerance, runs parallel payment and inventory updates, handles human approval workflows using task tokens, then persists orders in Amazon DynamoDB with notification delivery. This workflow demonstrates advanced error handling with multiple Catchers and Retriers, exponential backoff for Lambda throttling and DynamoDB limits, and sophisticated state transitions that were previously challenging to test locally. This makes it the recommended choice for demonstrating the use of enhanced TestState API’s local testing features.&lt;/p&gt; 
&lt;p&gt;The complete workflow is available in the &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;, where you can examine the full state machine definition and see how JSONata expressions handle data transformation throughout the execution flow.&lt;/p&gt; 
&lt;div id="attachment_25870" style="width: 872px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-25870" loading="lazy" class="size-full wp-image-25870" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/03/18/compute-2435-img.png" alt="Figure 1: State machine workflow that demonstrates a real-world ecommerce order processing system." width="862" height="1292"&gt;
 &lt;p id="caption-attachment-25870" class="wp-caption-text"&gt;Figure 1: State machine workflow that demonstrates a real-world ecommerce order processing system.&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Effective Step Functions testing requires a systematic approach to TestState API integration that provides state validation, error simulation, and assertion capabilities. The testing framework is built using Python’s pytest framework, using &lt;a href="https://docs.pytest.org/en/stable/explanation/fixtures.html" target="_blank" rel="noopener noreferrer"&gt;fixtures&lt;/a&gt; to automatically provide pre-configured runner instances that handle TestState API client initialization and state machine definition loading. This eliminates repetitive setup code and provides consistent test environments. The enhanced TestState API supports both mock integrations and actual integrations with AWS services, providing flexibility in testing strategies. For this demonstration, you use mock integrations to showcase how a complete local testing can be achieved without having any resources deployed to AWS accounts.&lt;/p&gt; 
&lt;p&gt;This framework is built for demonstration purposes, and you can similarly build your own testing frameworks using other programming languages like &lt;a href="https://www.java.com/en/" target="_blank" rel="noopener noreferrer"&gt;Java&lt;/a&gt;, &lt;a href="https://nodejs.org/en" target="_blank" rel="noopener noreferrer"&gt;Node.js&lt;/a&gt;. The testing framework uses method chaining patterns to create readable test cases with comprehensive assertion methods, automatic output chaining between state executions, and error simulation for testing retry mechanisms, backoff intervals, and catch blocks across AWS service error conditions.&lt;/p&gt; 
&lt;p&gt;The following test implementations demonstrate the testing capabilities that are achievable with the enhanced TestState API in local development environments. The test cases are run against the preceding Statemachine.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 1: Lambda throttling and retry mechanism testing&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Service integrations with Statemachines like AWS Lambda, Amazon &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;DynamoDB&lt;/a&gt; may face throttling depending on their usage. A key capability of the enhanced TestState API is its ability to simulate retry mechanisms with control over retry counts and backoff intervals. This test demonstrates the enhanced TestState API’s retry testing capabilities through the &lt;code&gt;stateConfiguration.retrierRetryCount&lt;/code&gt;&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#StepFunctions-TestState-request-stateConfiguration" target="_blank" rel="noopener noreferrer"&gt;parameter&lt;/a&gt; and &lt;code&gt;inspectionData.errorDetails&lt;/code&gt; &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_InspectionErrorDetails.html" target="_blank" rel="noopener noreferrer"&gt;response fields&lt;/a&gt;. This response field provides &lt;code&gt;retryBackoffIntervalSeconds&lt;/code&gt; for validating exponential backoff calculations, &lt;code&gt;retryIndex&lt;/code&gt; for tracking retry attempt sequences, and &lt;code&gt;catchIndex&lt;/code&gt; for identifying which error handler processed the exception. These enhanced inspection capabilities enable validation of retry logic, &lt;a href="https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/" target="_blank" rel="noopener noreferrer"&gt;backoff strategies&lt;/a&gt;, and error propagation patterns across complex state machine workflows.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_lambda_throttling_retry_mechanism(self, runner):
"""Test retry mechanism for Lambda.TooManyRequestsException"""
throttling_error = {
"Error": "Lambda.TooManyRequestsException",
"Cause": "Request rate exceeded"
}

# Test first retry attempt
(runner
.with_input({"orderId": "order-retry-test"})
.with_mock_error(throttling_error)
.with_retrier_retry_count(0)
.execute("ValidateOrder")
.assert_retriable()
.assert_error("Lambda.TooManyRequestsException"))

# Verify exponential backoff calculation
response = runner.get_response()
error_details = response['inspectionData']['errorDetails']
assert error_details['retryBackoffIntervalSeconds'] == 2

# Test retry exhaustion
(runner
.with_retrier_retry_count(3)
.execute("ValidateOrder")
.assert_caught_error()
.assert_next_state("ValidationFailed"))&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 2: Map state testing with tolerance thresholds&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/state-map.html" target="_blank" rel="noopener noreferrer"&gt;Distributed Map states&lt;/a&gt; present unique testing challenges due to their parallel processing nature and failure tolerance capabilities. The enhanced TestState API provides specialized configuration options for testing these complex scenarios.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_map_state_tolerated_failure_threshold(self, runner):
"""Test Map state with tolerated failure threshold"""
test_input = {
"orderId": "order-map-test",
"orderItems": [
{"itemId": "item-1"}, {"itemId": "item-2"}, 
{"itemId": "item-3"}, {"itemId": "item-4"}
]
}

# Test normal Map state execution
map_success_result = [
{"itemId": "item-1", "processed": True},
{"itemId": "item-2", "processed": True}
]

(runner
.with_input(test_input)
.with_mock_result(map_success_result)
.execute("ProcessOrderItems")
.assert_succeeded()
.assert_next_state("ParallelProcessing"))

# Test tolerance threshold exceeded scenario
tolerance_error = {
"Error": "States.ExceedToleratedFailureThreshold",
"Cause": "Map state exceeded tolerated failure threshold"
}

(runner
.with_input(test_input)
.with_mock_error(tolerance_error)
.execute("ProcessOrderItems")
.assert_caught_error()
.assert_next_state("ValidationFailed"))&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This test demonstrates the enhanced TestState API’s Map state testing capabilities through the &lt;code&gt;stateConfiguration.mapIterationFailureCount&lt;/code&gt; &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#StepFunctions-TestState-request-stateConfiguration" target="_blank" rel="noopener noreferrer"&gt;parameter&lt;/a&gt; for simulating iteration failures. The API provides comprehensive &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#API_TestState_ResponseSyntax" target="_blank" rel="noopener noreferrer"&gt;inspection data&lt;/a&gt; including &lt;code&gt;inspectionData.afterItemSelector&lt;/code&gt; for validating &lt;code&gt;ItemSelector&lt;/code&gt; transformations, &lt;code&gt;inspectionData.afterItemBatcher&lt;/code&gt; for batch processing validation, &lt;code&gt;inspectionData.toleratedFailureCount&lt;/code&gt; and &lt;code&gt;inspectionData.toleratedFailurePercentage&lt;/code&gt; for threshold verification. When the specified failure count exceeds the configured tolerance, the API correctly returns &lt;code&gt;States.ExceedToleratedFailureThreshold&lt;/code&gt;, enabling testing of Map state resilience patterns.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 3: WaitForCallback pattern testing&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;The &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-wait-token" target="_blank" rel="noopener noreferrer"&gt;waitForCallback&lt;/a&gt; integration requires context object construction to simulate realistic execution environments, particularly for human approval workflows.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_context_object_usage_in_jsonata_expressions(self, runner):
"""Test Context object usage in waitForTaskToken scenarios"""
test_input = {
"orderId": "order-context-test",
"amount": 125.0
}

context_data = {
"Task": {"Token": "ahbdgftgehbdcndsjnwjkhas327yr4hendc73yehdb723y"},
"Execution": {
"Id": "arn:aws:states:us-east-1:123456789012:execution:test:exec-123"
},
"State": {
"Name": "WaitForApproval",
"EnteredTime": "2025-01-15T10:45:00Z"
}
}

mock_result = {
"approved": True,
"taskToken": "ahbdgftgehbdcndsjnwjkhas327yr4hendc73yehdb723y"
}

(runner
.with_input(test_input)
.with_context(context_data)
.with_mock_result(mock_result)
.execute("WaitForApproval")
.assert_succeeded()
.assert_next_state("CheckApproval"))

# Verify JSONata expressions processed context correctly
response = runner.get_response()
after_args = json.loads(response['inspectionData']['afterArguments'])
assert after_args['Payload']['taskToken'] == context_data['Task']['Token']&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This test demonstrates the enhanced TestState API’s support for &lt;code&gt;waitForCallback&lt;/code&gt; integrations through the `context` parameter for realistic Context object simulation. The API enables comprehensive testing of JSONata expressions that reference &lt;code&gt;$states.context.Task.Token&lt;/code&gt;, &lt;code&gt;$states.context.Execution.Id&lt;/code&gt;, and other context fields. The &lt;code&gt;inspectionData.afterArguments&lt;/code&gt; &lt;a href="https://docs.aws.amazon.com/step-functions/latest/apireference/API_TestState.html#API_TestState_ResponseSyntax" target="_blank" rel="noopener noreferrer"&gt;response field&lt;/a&gt; validates that JSONata expressions correctly processed the context data, while the API automatically handles the complexity of task token embedding in service integration payloads for &lt;code&gt;waitForCallback&lt;/code&gt; testing scenarios.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Test Case 4: Happy path testing – complete workflow validation&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Happy path testing validates that workflows execute correctly under normal operating conditions. The enhanced TestState API allows you to chain state executions together, automatically passing outputs between states to simulate a complete workflow execution.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def test_complete_order_processing_workflow(self, runner):
"""Integration test: Complete happy path workflow using method chaining"""
test_input = {
"orderId": "order-12345",
"amount": 150.75,
"customerEmail": "customer@example.com",
"orderItems": [
{"itemId": "item-1", "quantity": 2, "price": 50.25}
]
}

# Test ValidateOrder state
(runner
.with_input(test_input)
.with_mock_result({"statusCode": 200, "isValid": True})
.execute("ValidateOrder")
.assert_succeeded()
.assert_next_state("CheckValidation"))

# Test CheckValidation choice state (no mock needed)
validation_output = runner.get_output()
(runner
.with_input(validation_output)
.clear_mocks()
.execute("CheckValidation")
.assert_succeeded()
.assert_next_state("ProcessOrderItems"))&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This test demonstrates how the TestState API maintains state context between executions, enabling realistic workflow simulation. The &lt;code&gt;get_output()&lt;/code&gt; method retrieves the processed output from one state to use as input for the next, mimicking actual Step Functions execution behavior.&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note&lt;/strong&gt;: The code snippet above shows only the first two states of the complete workflow test for brevity. The full test code with all states (&lt;code&gt;ProcessOrderItems&lt;/code&gt;, &lt;code&gt;ParallelProcessing&lt;/code&gt;, &lt;code&gt;WaitForApproval&lt;/code&gt;, &lt;code&gt;CheckApproval&lt;/code&gt;, &lt;code&gt;SaveOrderDetails&lt;/code&gt;, and &lt;code&gt;SendNotification&lt;/code&gt;) can be viewed in the complete &lt;/em&gt;&lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;&lt;em&gt;GitHub repository&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, demonstrating end-to-end workflow validation using the same method chaining pattern.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Integration with modern CI/CD pipelines&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;In this section, we will explore how to integrate the previous unit tests in a CI CD pipeline to enable local testing.&lt;/p&gt; 
&lt;p&gt;The sample &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;repository&lt;/a&gt; includes a GitHub Actions workflow that demonstrates how TestState API testing integrates into continuous integration and continuous delivery (CI/CD) pipelines. The workflow (&lt;code&gt;.github/workflows/test-and-deploy.yml&lt;/code&gt;) provides a two-step process that validates before any AWS resources are deployed using &lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;AWS Serverless Application Model&lt;/a&gt; (AWS SAM).&lt;/p&gt; 
&lt;p&gt;The CI/CD pipeline follows the following pattern:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Unit Tests&lt;/strong&gt;: Executes the complete TestState API test suite using &lt;code&gt;pytest tests/unit_test.py -v&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;SAM Deploy&lt;/strong&gt;: Deploys AWS resources using &lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-cli-command-reference-sam-build.html" target="_blank" rel="noopener noreferrer"&gt;sam build&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-cli-command-reference-sam-deploy.html" target="_blank" rel="noopener noreferrer"&gt;sam deploy&lt;/a&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;To enable the GitHub Actions workflow to deploy resources to your AWS account, configure these AWS credentials in your GitHub repository settings. For detailed setup instructions, see the AWS &lt;a href="https://aws.amazon.com/blogs/compute/using-github-actions-to-deploy-serverless-applications/" target="_blank" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Following are the required secrets to be configured in GitHub repository settings:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;AWS_REGION&lt;/code&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;In production environments, you can typically extend this basic pipeline to include additional stages. The enhanced pipeline often begins with deploying to a development account first, followed by integration testing against deployed resources. The final stage involves moving to production with proper approval gates and security scanning compliance checks.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The enhanced TestState API enables testing Step Functions workflows locally without requiring AWS deployments that accelerated development cycles, and reduce testing times. This post demonstrates how to implement testing for state types including Map states with tolerance thresholds, retry mechanisms with exponential backoff, and &lt;code&gt;waitForTaskToken&lt;/code&gt; patterns with context object simulation using mock integrations for isolated testing.&lt;/p&gt; 
&lt;p&gt;By integrating TestState API testing into CI/CD pipelines, you can validate workflow logic before deployment, reducing the risk of production issues. The GitHub Actions workflow example demonstrates an implementation that runs tests and deploys resources in a controlled sequence. The complete code examples and testing framework are available in the &lt;a href="https://github.com/aws-samples/sample-stepfunctions-testing-with-testStateAPI/" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; to implement similar testing practices for Step Functions workflows.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Enabling high availability of Amazon EC2 instances on AWS Outposts servers (Part 3)</title>
		<link>https://aws.amazon.com/blogs/compute/enabling-high-availability-of-amazon-ec2-instances-on-aws-outposts-servers-part-3/</link>
					
		
		<dc:creator><![CDATA[Brianna Rosentrater]]></dc:creator>
		<pubDate>Fri, 06 Mar 2026 23:11:22 +0000</pubDate>
				<category><![CDATA[Amazon CloudWatch]]></category>
		<category><![CDATA[Amazon Simple Notification Service (SNS)]]></category>
		<category><![CDATA[AWS CloudFormation]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Outposts]]></category>
		<category><![CDATA[AWS Outposts servers]]></category>
		<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">704bf252a8b038a74199bfc881ff1b43524c00b1</guid>

					<description>This post is part 3 of the three-part series ‘Enabling high availability of Amazon EC2 instances on&amp;nbsp;AWS Outposts&amp;nbsp;servers’. We provide you with code samples and considerations for implementing custom logic to automate&amp;nbsp;Amazon Elastic Compute Cloud (EC2) relaunch on Outposts servers. This post focuses on guidance for using Outposts servers with third party storage for boot […]</description>
										<content:encoded>&lt;p&gt;This post is part 3 of the three-part series ‘Enabling high availability of Amazon EC2 instances on&amp;nbsp;&lt;a href="https://aws.amazon.com/outposts/servers/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts&lt;/a&gt;&amp;nbsp;servers’. We provide you with code samples and considerations for implementing custom logic to automate&amp;nbsp;&lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (EC2&lt;/a&gt;) relaunch on Outposts servers. This post focuses on guidance for using Outposts servers with third party storage for boot and data volumes, whereas &lt;a href="https://aws.amazon.com/blogs/compute/enabling-high-availability-of-amazon-ec2-instances-on-aws-outposts-servers-part-1/" target="_blank" rel="noopener noreferrer"&gt;part 1&lt;/a&gt; and&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/compute/enabling-high-availability-of-amazon-ec2-instances-on-aws-outposts-servers-part-2/" target="_blank" rel="noopener noreferrer"&gt;part 2&lt;/a&gt; focus on automating EC2 relaunch between standalone servers. Outposts servers support integration with&amp;nbsp;&lt;a href="https://www.dell.com/en-us/shop/storage-servers-and-networking-for-business/sf/power-store"&gt;Dell PowerStore&lt;/a&gt;,&amp;nbsp;&lt;a href="https://www.hpe.com/us/en/storage/alletra.html"&gt;HPE Alletra Storage MP B10000&amp;nbsp;systems&lt;/a&gt;, &lt;a href="https://www.netapp.com/data-management/ontap-data-management-software/"&gt;NetApp on-premises enterprise storage arrays&lt;/a&gt;, and &lt;a href="https://www.purestorage.com/products/nvme/flasharray-x.html"&gt;Pure Storage FlashArray&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Outposts servers provide compute and networking services that are designed for low-latency, local data processing needs for on-premises locations such as retail stores, branch offices, healthcare provider locations, or environments that are space-constrained. Outposts servers use &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html"&gt;EC2 instance store storage&lt;/a&gt; to provide non-durable block-level storage to the instances running stateless workloads. For applications that require persistent storage, you can create a three-tier architecture by connecting your Outposts servers to a third-party storage appliance. In this post, you will learn how to implement custom logic to provide high availability (HA) for your applications running on Outposts servers using two or more servers for N+1 fault tolerance. The code provided is meant to help you get started, and can be modified further for your unique workload needs.&lt;/p&gt; 
&lt;h2&gt;Overview&lt;/h2&gt; 
&lt;p&gt;In the following sections we will show how custom logic can be used to automate EC2 instance relaunch between two or more Outposts servers using boot and data volumes on third party storage. If your EC2 instance fails while using this solution, an &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; alarm monitoring the EC2 StatusCheckFailed_Instance metric of your source EC2 instance will be triggered, and you will receive an &lt;a href="https://aws.amazon.com/pm/sns/?trk=a074e8bd-fe9a-4ee3-ad49-f731a39ed149&amp;amp;sc_channel=ps&amp;amp;ef_id=Cj0KCQjw0NPGBhCDARIsAGAzpp09acksHrmkVGsgrQOD0PemL3_g9NKKPFW-WSwnyrwz3JofgE8cE-gaAquyEALw_wcB:G:s&amp;amp;s_kwcid=AL!4422!3!658520967038!!!g!!!19852662602!149878732060&amp;amp;gad_campaignid=19852662602&amp;amp;gbraid=0AAAAADjHtp9ku4mrGWr4lYItA40Hw968W&amp;amp;gclid=Cj0KCQjw0NPGBhCDARIsAGAzpp09acksHrmkVGsgrQOD0PemL3_g9NKKPFW-WSwnyrwz3JofgE8cE-gaAquyEALw_wcB" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Notification Service&lt;/a&gt; (Amazon SNS) notification. An &lt;a href="https://aws.amazon.com/pm/lambda/?trk=a968e0d4-b96f-4cef-9ed9-be59b3588c76&amp;amp;sc_channel=ps&amp;amp;ef_id=Cj0KCQjw0NPGBhCDARIsAGAzpp0GWTfgKmF6tf6S4dDuyzy-xKlzC-ovRXnP2NkmRMM5JtWj8a87UuQaAgGvEALw_wcB:G:s&amp;amp;s_kwcid=AL!4422!3!652240143523!e!!g!!amazon%20lambda!19878797032!147151597893&amp;amp;gad_campaignid=19878797032&amp;amp;gbraid=0AAAAADjHtp87KK8zRjKPBySDn4-2cQ836&amp;amp;gclid=Cj0KCQjw0NPGBhCDARIsAGAzpp0GWTfgKmF6tf6S4dDuyzy-xKlzC-ovRXnP2NkmRMM5JtWj8a87UuQaAgGvEALw_wcB" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; function will then relaunch your EC2 instance onto the destination Outposts server that you’ve set up for resiliency. This is done using a launch template created during setup, and the script will connect your relaunched instance to the existing boot and data volumes on your third party storage appliance. This storage device provides shared storage for your Outposts servers. If a single server fails, new instances can connect to existing volumes on the array. This allows for a zero data loss &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/recovery-objectives.html" target="_blank" rel="noopener noreferrer"&gt;Recovery Point Objective (RPO)&lt;/a&gt; and a &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/recovery-objectives.html" target="_blank" rel="noopener noreferrer"&gt;Recovery Time Objective (RTO)&lt;/a&gt; equaling the time it takes to launch your EC2 instance. Take advantage of the features on your storage appliance for configuring data durability and resiliency to hardware failures, and make sure that you are regularly backing up your SAN volumes.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-1-2.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25778 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-1-2.png" alt="" width="1124" height="604"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;span style="font-size: 16px"&gt;Figure 1 – Solution Architecture for automated EC2 Relaunch&lt;/span&gt;&lt;/p&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;The following prerequisites are required to complete the walkthrough:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Two Outposts servers that can be set up as an&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/" target="_blank" rel="noopener noreferrer"&gt;active-active or active-passive&lt;/a&gt; resilient pair.&lt;/li&gt; 
 &lt;li&gt;For workloads with a low threshold for downtime, ensure that your secondary Outpost server that’s used for recovery has a unique service link connection.&lt;/li&gt; 
 &lt;li&gt;Outposts servers must be colocated within the same Layer 2 (L2) network.&lt;/li&gt; 
 &lt;li&gt;Network latency between the Outposts servers must not exceed 5ms round trip time (RTT).&lt;/li&gt; 
 &lt;li&gt;A storage appliance that supports the iSCSI protocol. Credentials to manage the storage appliance initiator/target mappings. &lt;a href="https://aws.amazon.com/blogs/compute/new-simplifying-the-use-of-third-party-block-storage-with-aws-outposts/" target="_blank" rel="noopener noreferrer"&gt;See Simplifying the use of third-party block storage with AWS Outposts&lt;/a&gt; for more information.&lt;/li&gt; 
 &lt;li&gt;If you’re setting this up from an&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/sharing-outposts.html" target="_blank" rel="noopener noreferrer"&gt;Outposts consumer account&lt;/a&gt;, you must configure &lt;a href="https://aws.amazon.com/blogs/mt/monitoring-best-practices-for-aws-outposts/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch cross-account observability&lt;/a&gt;&amp;nbsp;between the consumer account and the Outposts owning account to view Outposts metrics in your consumer account.&lt;/li&gt; 
 &lt;li&gt;Create &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html"&gt;launch templates&lt;/a&gt; for the EC2 instances that you want to protect, the launch wizard will help you create these.&lt;/li&gt; 
 &lt;li&gt;Credentials with permissions for &lt;a href="https://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;rct=j&amp;amp;opi=89978449&amp;amp;url=https://aws.amazon.com/cloudformation/&amp;amp;ved=2ahUKEwjZmOfljKGQAxWIFFkFHXEGFS4QFnoECB0QAQ&amp;amp;usg=AOvVaw2O20tPzwYsGu9e_oSCbvzG" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt;, &lt;a href="https://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;rct=j&amp;amp;opi=89978449&amp;amp;url=https://aws.amazon.com/ec2/&amp;amp;ved=2ahUKEwjq3rnRjKGQAxW6L1kFHbu9NZgQFnoECBkQAQ&amp;amp;usg=AOvVaw3MI5OycyIjdz9NSdetTohX" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt;, and (optional) &lt;a href="https://aws.amazon.com/secrets-manager/" target="_blank" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt; if authentication is required. IAM Permission Examples.md is provided in the repository.&lt;/li&gt; 
 &lt;li&gt;A Windows or Linux host that can access the storage appliance and your AWS account (management computer).&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://us-east-1.console.aws.amazon.com/marketplace/search/listing/prodview-ytzcqvandumqm" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts iPXE Amazon Machine Image&lt;/a&gt; (AMI) from the &lt;a href="https://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;rct=j&amp;amp;opi=89978449&amp;amp;url=https://aws.amazon.com/marketplace&amp;amp;ved=2ahUKEwig5aGHmaGQAxVQwskDHUdYHS4QFnoECBIQAQ&amp;amp;usg=AOvVaw2kR1wc3JVnglAce4z8i-IH" target="_blank" rel="noopener noreferrer"&gt;AWS Marketplace&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://www.python.org/" target="_blank" rel="noopener noreferrer"&gt;Python&lt;/a&gt;&amp;nbsp;3.8 or later (recommended) is used to run the&amp;nbsp;init.py&amp;nbsp;script that dynamically creates a&amp;nbsp;CloudFormation&amp;nbsp;stack in the account specified as an input parameter.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/sdk-for-python/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK for Python (Boto3)&lt;/a&gt; version 1.26.0 or later recommended.&lt;/li&gt; 
 &lt;li&gt;Operating system with iSCSI boot support (Windows Server 2022 and Red Hat Enterprise Linux 9 AMIs are provided).&lt;/li&gt; 
 &lt;li&gt;Internet access to AWS service endpoints for the private subnet hosting the recovery Lambda function.&lt;/li&gt; 
 &lt;li&gt;Download the repository &lt;a href="https://github.com/amznganske/ec2-outposts-autorestart_3Pstorage" target="_blank" rel="noopener"&gt;ec2-outposts-autorestart_3Pstorage&lt;/a&gt;.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Walkthrough&lt;/h2&gt; 
&lt;p&gt;The first step is to deploy an EC2 instance configured to boot from a volume on the third-party storage that is prepared with an OS boot image. This step uses the launch wizard portion of the solution.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Download and extract the OutpostServer_Recovery_3Pstorage repository to the management computer that has the AWS SDK for Python (Boto3) and Python installed.&lt;/li&gt; 
 &lt;li&gt;Run launch_wizard from the sample-outposts-third-party-storage-integration directory. You can run interactively or provide arguments for region, subnet, iPXE AMI, storage vendor, storage management ip, and credentials.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-2.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25766 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-2.png" alt="" width="1428" height="740"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 2 – Running launch wizard&lt;/p&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;When prompted for a feature name, enter sanboot.&lt;/li&gt; 
 &lt;li&gt;For Guest OS type, enter in Linux or Windows.&lt;/li&gt; 
 &lt;li&gt;When prompted “Do you want to continue with this unverified AMI?”, select Y.&lt;/li&gt; 
 &lt;li&gt;The launch wizard will provide a list of instance types available on the Outpost server associated with the subnet you specified. Enter the instance type that you want to use.&lt;/li&gt; 
 &lt;li&gt;The launch wizard will now prompt you for optional EC2 Key Pair, Security Group, and Instance Profile settings for the EC2 instance that you are launching.&lt;/li&gt; 
 &lt;li&gt;Next, the launch wizard prompts you to specify an instance name. Note that specifying an instance name is required to set up automated instance recovery because the instance name is used as part of the recovery process.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-3.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25767 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-3.png" alt="" width="1432" height="565"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 3 – Taking user input for variable values&lt;/p&gt; 
&lt;ol start="9"&gt; 
 &lt;li&gt;The launch wizard prompts for root volume size. This is the root volume that the iPXE AMI boots from. The default is a 1GB volume on the Outpost server instance storage.&lt;/li&gt; 
 &lt;li&gt;Next, the launch wizard prompts you to select which third party storage controller you want to use based on the management ip that you specified. In this example, we are using NetApp, so I select a NetApp Storage Virtual Machine (SVM) named outpost_iscsi.&lt;/li&gt; 
 &lt;li&gt;If the connection to the storage array is successful and the protocol is available (iSCSI or NVMe over TCP) you are provided additional storage options for initiator group and logical unit number (LUN).&lt;/li&gt; 
 &lt;li&gt;In this example, we are using NetApp with iSCSI, so I can select an existing initiator group or create a new one.&lt;/li&gt; 
 &lt;li&gt;You can specify an existing initiator qualified name (IQN), or the launch wizard can generate a new one. &lt;strong&gt;IMPORTANT:&lt;/strong&gt; Make sure that IQNs are unique to each instance because duplicates can cause data corruption.&lt;/li&gt; 
 &lt;li&gt;Next the launch wizard prompts which LUN’s you want to connect to this instance. For this example, I am going to use a Windows Server 2022 boot volume that I already created on the NetApp storage array.&lt;/li&gt; 
 &lt;li&gt;You are now asked which storage array target interface you want to use for connecting to these LUNs.&lt;/li&gt; 
 &lt;li&gt;The launch wizard provides the capability to specify guest OS scripts to customize the OS after sanboot. Combining this capability with storage array cloning provides a streamlined process for deploying new instances.&lt;/li&gt; 
 &lt;li&gt;The launch wizard now displays the EC2 user data template that it generated for use with the iPXE AMI and asks if you want to proceed with launching the instance.&lt;/li&gt; 
 &lt;li&gt;After the EC2 instance is launched, select yes to proceed with automated instance recovery setup.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-4.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25768 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-4.png" alt="" width="1474" height="96"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 4 – Running launch template creation script&lt;/p&gt; 
&lt;h3&gt;Generating EC2 launch templates for recovery and failback&lt;/h3&gt; 
&lt;p&gt;In the second step, we are generating EC2 launch templates for the EC2 instance launched in step 1. Launch templates can be generated for the primary and secondary Outpost servers. The launch template for the secondary Outpost server can be used for automated or manual recovery of the EC2 instance. Failback to the primary Outpost server is manual using the primary launch template.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Select the instance that you want automated recovery for and select the subnet that you launched the instance in. This subnet represents the primary Outpost server that the instance is running on.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-5.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25769 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-5.png" alt="" width="891" height="809"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 5 – Selecting subnets for EC2 instance relaunch&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;When prompted to create a second launch template for Outpost server recovery, select yes, and then select to use the same instance (for recovery on different Outpost server).&lt;/li&gt; 
 &lt;li&gt;When you get a list of available subnets, select the subnet that’s associated with your secondary Outpost server. This is the server that the EC2 instance will be launched on in the event of the EC2 StatusCheckFailed_Instance metric triggers the CloudWatch alarm.&lt;/li&gt; 
 &lt;li&gt;You will see both launch templates created successfully.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Deploying automated EC2 instance recovery&lt;/h3&gt; 
&lt;p&gt;The third step creates a CloudFormation template for monitoring, notifications, and automated recovery of the EC2 instance deployed in step 1. The CloudFormation template automatically captures the instance and secondary launch template information necessary for automatic recovery.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Select Y to set up automated recovery. This will create a CloudFormation stack.&lt;/li&gt; 
 &lt;li&gt;Provide a name and description for the CloudFormation stack.&lt;/li&gt; 
 &lt;li&gt;Select whether you want automated recovery or notification only. This provides flexibility to choose manual or automatic recovery based on whether you want to verify the primary Outpost server is down before initiating recovery.&lt;/li&gt; 
 &lt;li&gt;In the AWS CloudFormation console, monitor the CloudFormation stack creation process.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-6.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25770 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/25/ComputeBlog-2445-image-6.png" alt="" width="1430" height="220"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 6 – CloudFormation stack creation in progress&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;After the CloudFormation Stack is complete, you have successfully deployed an EC2 instance using third party storage for boot and data volumes on a primary Outpost server. You also created instance recovery capabilities by using the Amazon Outpost server automated recovery solution for third party storage.&lt;/li&gt; 
 &lt;li&gt;You can verify whether the EC2 StatusCheckFailed_Instance is healthy under the Alarms section in the Amazon CloudWatch console.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Considerations&lt;/h2&gt; 
&lt;p&gt;The logic discussed in this post relies on the secondary destination Outposts server having a connected service link. For more information about how to create a highly available service link connection for your Outpost servers, see the &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/aws-outposts-high-availability-design/anchor-connectivity.html" target="_blank" rel="noopener noreferrer"&gt;Networking section&lt;/a&gt; of AWS Outposts High Availability Design and Architecture Considerations whitepaper.&lt;/p&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;Confirm whether it is safe to terminate the Amazon EC2 instance that you launched with this walkthrough. The operating system and data volumes are on the third party storage, so EC2 instance termination only removes the iPXE AMI from the Outposts server instance storage. To clean up, complete the following steps.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Terminate the Amazon EC2 instance. Then, verify that the Instance state is &lt;strong&gt;Terminated&lt;/strong&gt; to ensure that the instance is not using Outposts server resources.&lt;/li&gt; 
 &lt;li&gt;Delete the Amazon EC2 Launch Templates associated with the Amazon EC2 instance that you terminated. The names of the launch templates that were automatically generated will start with ‘lt-‘, followed by the instance name and the instance id. If you generated a recovery launch template, it will have a ‘-recovery’ suffix in the name.&lt;/li&gt; 
 &lt;li&gt;Delete the AWS CloudFormation Stack. The Stack name will start with ‘autorestart-‘ followed by the Amazon EC2 instance name.&lt;/li&gt; 
 &lt;li&gt;Clean up your initiators, initiator group, and LUNs on the third party storage array.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;With the use of custom logic through AWS tools such as CloudFormation,&amp;nbsp;CloudWatch, Amazon SNS, and&amp;nbsp;AWS Lambda, you can architect for HA for stateful workloads on Outposts server. By implementing the custom logic in this post, you can automatically relaunch EC2 instances running on a source Outposts server to a secondary destination Outposts server if an instance fails, and connect to existing volumes on a shared storage appliance for recovery. This also reduces the downtime of your applications in the event of a hardware or service link failure. The code provided in this post can be further expanded upon to meet the unique needs of your workload.&lt;/p&gt; 
&lt;p&gt;While the use of&amp;nbsp;&lt;a href="https://aws.amazon.com/what-is/iac/" target="_blank" rel="noopener noreferrer"&gt;infrastructure-as-code (IaC)&lt;/a&gt;&amp;nbsp;can improve your application’s availability and be used to standardize deployments across multiple Outposts servers, it’s crucial to do regular failure drills to test the custom logic in place. This is to make sure that you understand your application’s expected behavior on relaunch in the event of a failure. To learn more about Outposts servers, visit&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/outposts/latest/server-userguide/what-is-outposts.html" target="_blank" rel="noopener noreferrer"&gt;the Outposts servers User Guide&lt;/a&gt;. Reach out to your AWS account team, or fill out this&amp;nbsp;&lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;form&lt;/a&gt; to learn more about Outposts servers.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Optimizing Compute-Intensive Serverless Workloads with Multi-threaded Rust on AWS Lambda</title>
		<link>https://aws.amazon.com/blogs/compute/optimizing-compute-intensive-serverless-workloads-with-multi-threaded-rust-on-aws-lambda/</link>
					
		
		<dc:creator><![CDATA[Daniel Abib]]></dc:creator>
		<pubDate>Wed, 25 Feb 2026 12:49:44 +0000</pubDate>
				<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Serverless]]></category>
		<guid isPermaLink="false">aa533d430d7b0f6a9e003ec97815f3e0b4968101</guid>

					<description>Customers use 
&lt;a href="https://aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt; to build Serverless applications for a wide variety of use cases, from simple API backends to complex data processing pipelines. Lambda's flexibility makes it an excellent choice for many workloads, and with support for up to 10,240 MB of memory, you can now tackle compute-intensive tasks that were previously challenging in a Serverless environment. When you configure a Lambda function's memory size, you allocate RAM and Lambda automatically provides proportional CPU power. When you configure 10,240 MB, your Lambda function has access to up to 6 vCPUs.</description>
										<content:encoded>&lt;p&gt;Customers use &lt;a href="https://aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt; to build Serverless applications for a wide variety of use cases, from simple API backends to complex data processing pipelines. Lambda’s flexibility makes it an excellent choice for many workloads, and with support for up to 10,240 MB of memory, you can now tackle compute-intensive tasks that were previously challenging in a Serverless environment. When you configure a Lambda function’s memory size, you allocate RAM and Lambda automatically provides proportional CPU power. When you configure 10,240 MB, your Lambda function has access to up to 6 vCPUs.&lt;/p&gt; 
&lt;p&gt;However, there’s an important consideration that many developers discover: &lt;strong&gt;simply allocating more memory may not automatically make your function faster.&lt;/strong&gt; If your code runs sequentially, it will only use one vCPU regardless of how many are available. The remaining vCPUs sit idle while you’re still paying for the full memory allocation.&lt;/p&gt; 
&lt;p&gt;To help benefit from Lambda’s multi-core capabilities, your code should explicitly implement concurrent processing through multi-threading or parallel execution. Without this, you’re paying for compute power you’re not using.&lt;/p&gt; 
&lt;p&gt;Rust provides excellent support for this pattern. The &lt;a href="https://github.com/aws/aws-lambda-rust-runtime"&gt;AWS Lambda Rust Runtime&lt;/a&gt; provides developers with a language that combines exceptional performance with built-in concurrency primitives. In this post, we show you how to implement multi-threading in Rust to achieve 4-6x performance improvements for CPU-intensive workloads.&lt;/p&gt; 
&lt;h2&gt;Our Test Workload: Why Bcrypt Password Hashing?&lt;/h2&gt; 
&lt;p&gt;For this analysis, we use &lt;strong&gt;bcrypt password hashing&lt;/strong&gt; as our CPU-intensive workload to evaluate multi-core scaling behavior. This choice is deliberate for several reasons:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Real-world relevance&lt;/strong&gt;: Bcrypt is commonly used in authentication systems, making our benchmarks practically relevant rather than synthetic.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Predictable CPU work&lt;/strong&gt;: Bcrypt with cost factor 10 provides approximately 100ms of pure CPU work per operation on typical hardware, creating a consistent and measurable baseline.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Embarrassingly parallel&lt;/strong&gt;: Each hash operation is completely independent, making it an ideal candidate for parallel processing without shared state or lock contention.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;CPU-bound&lt;/strong&gt;: Bcrypt is deterministic and CPU-bound (not memory or I/O bound), isolating the performance characteristics we want to measure.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Throughout this post, we process batches of passwords and measure how multi-threading improves throughput as we scale from 1 to 6 vCPUs.&lt;/p&gt; 
&lt;h2&gt;Understanding Lambda’s vCPU Allocation&lt;/h2&gt; 
&lt;p&gt;AWS Lambda allocates CPU resources proportionally to the configured memory. According to &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html"&gt;AWS Lambda function memory documentation&lt;/a&gt;, at 1,769 MB a function has the equivalent of one vCPU.&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;a href="https://www.youtube.com/watch?v=aW5EtKHTMuQ&amp;amp;t=339s"&gt;&lt;strong&gt;vCPU Allocation by Memory&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt;&lt;/p&gt; 
&lt;table style="margin: 0px auto;height: 258px" width="335"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt; &lt;p style="text-align: center"&gt;Memory (MB)&lt;/p&gt; &lt;/td&gt; 
   &lt;td style="text-align: center"&gt;Approximate vCPUs&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;128 – 1,769&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~1&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;1,770 – 3,538&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~2&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;3,539 – 5,307&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~3&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;5,308 – 7,076&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~4&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;7,077 – 8,845&lt;/td&gt; 
   &lt;td style="text-align: center"&gt;~5&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="text-align: center"&gt;8,846 – 10,240&lt;/td&gt; 
   &lt;td&gt; &lt;p style="text-align: center"&gt;~6&lt;/p&gt; &lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The &lt;code&gt;num_cpus&lt;/code&gt; crate returns the number of logical CPUs visible to the Lambda environment, which may differ from the allocated vCPU share. At lower memory configurations, you may see 2 CPUs reported even though only 1 vCPU worth of compute time is allocated.&lt;/p&gt; 
&lt;h2&gt;Solution Overview&lt;/h2&gt; 
&lt;p&gt;The solution consists of a Rust Lambda function that:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Receives a request specifying the number of items to process&lt;/li&gt; 
 &lt;li&gt;Detects available vCPUs and configures a thread pool accordingly&lt;/li&gt; 
 &lt;li&gt;Processes items in parallel using the &lt;a href="https://github.com/rayon-rs/rayon"&gt;Rayon library&lt;/a&gt; (a data parallelism library that allows you to convert sequential iterators into parallel ones with a &lt;code&gt;.par_iter()&lt;/code&gt; call)&lt;/li&gt; 
 &lt;li&gt;Returns performance metrics including duration and throughput&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/24/Picture1-6.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25731 size-large" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/24/Picture1-6-683x1024.png" alt="" width="683" height="1024"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;em&gt;Architecture Diagram: Lambda receives request, initializes Rayon thread pool based on &lt;code&gt;WORKER_COUNT&lt;/code&gt; environment variable, processes bcrypt hashes in parallel across multiple vCPUs, and returns results.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Creating a Multi-threaded Rust Lambda Function&lt;/h2&gt; 
&lt;p&gt;Create a new Lambda project using Cargo Lambda:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;cargo lambda new rust-multithread-demo
cd rust-multithread-demo&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Dependencies&lt;/h3&gt; 
&lt;p&gt;Update &lt;code&gt;Cargo.toml&lt;/code&gt; with the necessary dependencies:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-toml"&gt;[package]
name = "rust-multithread-lambda"
version = "0.1.0"
edition = "2021"

[dependencies]
lambda_runtime = "1.0.0"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bcrypt = "0.15"
rayon = "1.7"
num_cpus = "1.16"

[profile.release]
opt-level = 3
lto = true
codegen-units = 1
strip = true&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The optimization flags in &lt;code&gt;[profile.release]&lt;/code&gt; reduce binary size and improve performance:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;opt-level = 3&lt;/code&gt;: Maximum optimization&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;lto = true&lt;/code&gt;: Link-time optimization for smaller binaries&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;strip = true&lt;/code&gt;: Remove debug symbols&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Implementing the Lambda Entry Point&lt;/h3&gt; 
&lt;p&gt;First, let’s look at how we initialize the thread pool during cold start:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;src/main.rs&lt;/strong&gt;:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-rust"&gt;use lambda_runtime::{run, service_fn, Error, LambdaEvent};
mod handler;
use handler::{function_handler, get_worker_count, init_thread_pool, ProcessRequest};

#[tokio::main]
async fn main() -&amp;gt; Result&amp;lt;(), Error&amp;gt; {
    // Initialize Rayon thread pool at cold start (once per container lifecycle)
    init_thread_pool(get_worker_count());

    run(service_fn(|event: LambdaEvent&amp;lt;ProcessRequest&amp;gt;| async move {
        function_handler(event.payload).await
    }))
    .await
}&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Why initialize in &lt;code&gt;main()&lt;/code&gt; and not in the handler?&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Deterministic Configuration&lt;/strong&gt;: The thread pool is configured once per container, before any requests arrive. This prevents race conditions if multiple requests try to initialize concurrently.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Container Reuse&lt;/strong&gt;: Lambda containers can serve multiple requests. Initializing in &lt;code&gt;main()&lt;/code&gt; ensures the configuration is set during the cold start and persists for all subsequent warm invocations.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Performance&lt;/strong&gt;: Thread pool setup happens during cold start (already counted as initialization time), not during request processing.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Implementing the Request Handler&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;src/handler.rs&lt;/strong&gt;:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-rust"&gt;use serde::{Deserialize, Serialize};
use std::env;
use std::sync::Once;
use std::time::Instant;
use std::collections::HashSet;
use std::sync::Mutex;
use rayon::prelude::*;

static INIT: Once = Once::new();

#[derive(Deserialize)]
pub struct ProcessRequest {
    count: usize,
    mode: String,
}

#[derive(Serialize)]
pub struct ProcessResponse {
    processed: usize,
    duration_ms: u128,
    mode: String,
    workers: usize,
    detected_cpus: usize,
    avg_ms_per_item: f64,
    memory_used_kb: u64,
    threads_used: usize, // Actual threads that processed items (proves multi-threading)
}

// CPU-intensive bcrypt hashing with cost factor 10
fn hash_password(password: &amp;amp;str) -&amp;gt; Result&amp;lt;String, bcrypt::BcryptError&amp;gt; {
    bcrypt::hash(password, 10)
}

// Process items one at a time (baseline for comparison)
fn process_sequential(items: Vec&amp;lt;String&amp;gt;) -&amp;gt; Result&amp;lt;(Vec&amp;lt;String&amp;gt;, usize), Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;&amp;gt; {
    let results: Result&amp;lt;Vec&amp;lt;String&amp;gt;, _&amp;gt; = items
        .iter()
        .map(|item| hash_password(item))
        .collect();
    results
        .map(|r| (r, 1))
        .map_err(|e| Box::new(e) as Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;)
}

// Process items in parallel using Rayon's work-stealing scheduler
// Thread pool size is configured once at cold start via init_thread_pool()
fn process_parallel(items: Vec&amp;lt;String&amp;gt;) -&amp;gt; Result&amp;lt;(Vec&amp;lt;String&amp;gt;, usize), Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;&amp;gt; {
    let thread_ids: Mutex&amp;lt;HashSet&amp;lt;std::thread::ThreadId&amp;gt;&amp;gt; = Mutex::new(HashSet::new());

    let results: Result&amp;lt;Vec&amp;lt;String&amp;gt;, _&amp;gt; = items
        .par_iter()
        .map(|item| {
            thread_ids.lock().unwrap().insert(std::thread::current().id());
            hash_password(item)
        })
        .collect();

    let threads_used = thread_ids.lock().unwrap().len();
    results
        .map(|r| (r, threads_used))
        .map_err(|e| Box::new(e) as Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;)
}

// Get worker count from env var or detect CPUs, clamped to 1-6
pub fn get_worker_count() -&amp;gt; usize {
    if let Ok(count_str) = env::var("WORKER_COUNT") {
        if let Ok(count) = count_str.parse::&amp;lt;usize&amp;gt;() {
            return count.clamp(1, 6);
        }
    }
    num_cpus::get().clamp(1, 6)
}

// Initialize Rayon global thread pool (only once per Lambda container)
pub fn init_thread_pool(workers: usize) {
    INIT.call_once(|| {
        let _ = rayon::ThreadPoolBuilder::new()
            .num_threads(workers)
            .build_global();
    });
}

// Read RSS memory from /proc/self/statm (Linux only)
fn get_memory_usage_kb() -&amp;gt; u64 {
    std::fs::read_to_string("/proc/self/statm")
        .ok()
        .and_then(|s| s.split_whitespace().nth(1)?.parse::&amp;lt;u64&amp;gt;().ok())
        .map(|pages| pages * 4)
        .unwrap_or(0)
}

// Main Lambda handler - processes items sequentially or in parallel
pub async fn function_handler(request: ProcessRequest) -&amp;gt; Result&amp;lt;ProcessResponse, Box&amp;lt;dyn std::error::Error + Send + Sync&amp;gt;&amp;gt; {
    if request.count == 0 { return Err("count must be greater than 0".into()); }
    if request.count &amp;gt; 1000 { return Err("count exceeds maximum of 1000 items".into()); }

    let items: Vec&amp;lt;String&amp;gt; = (0..request.count)
        .map(|i| format!("password_{:06}", i))
        .collect();

    let workers = get_worker_count();
    let mode = match request.mode.as_str() {
        "sequential" =&amp;gt; "sequential",
        "parallel"   =&amp;gt; "parallel",
        _            =&amp;gt; if workers &amp;gt; 1 { "parallel" } else { "sequential" },
    };

    let start = Instant::now();
    let (results, threads_used) = match mode {
        "sequential" =&amp;gt; process_sequential(items)?,
        _            =&amp;gt; process_parallel(items)?,
    };
    let duration_ms = start.elapsed().as_millis();

    Ok(ProcessResponse {
        processed: results.len(),
        duration_ms,
        mode: mode.to_string(),
        workers: if mode == "parallel" { workers } else { 1 },
        detected_cpus: num_cpus::get(),
        avg_ms_per_item: duration_ms as f64 / request.count as f64,
        memory_used_kb: get_memory_usage_kb(),
        threads_used,
    })
}&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Key Implementation Details&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Thread Pool Initialization at Cold Start&lt;/strong&gt;: The code initializes the thread pool in &lt;code&gt;main()&lt;/code&gt; before the Lambda runtime starts, not during request processing. This approach is designed to eliminate race conditions and provide deterministic behavior across all invocations.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Important Note&lt;/strong&gt;: Lambda initializes the thread pool once per container. The thread pool configuration retains its original value even if you change the &lt;code&gt;WORKER_COUNT&lt;/code&gt; environment variable between invocations within the same container. For production deployments, keep &lt;code&gt;WORKER_COUNT&lt;/code&gt; consistent for the function’s lifecycle.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Input Validation&lt;/strong&gt;: The handler validates that &lt;code&gt;count&lt;/code&gt; is between 1 and 1000 to prevent resource exhaustion.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Thread Tracking&lt;/strong&gt;: The &lt;code&gt;threads_used&lt;/code&gt; field proves multi-threading is working by tracking unique thread IDs during parallel processing. This provides empirical validation that work is distributed across multiple threads.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Memory Tracking&lt;/strong&gt;: The &lt;code&gt;memory_used_kb&lt;/code&gt; field reports RSS memory usage by reading &lt;code&gt;/proc/self/statm&lt;/code&gt;, providing visibility into actual memory consumption.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Mode Selection&lt;/strong&gt;: The function supports three modes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;sequential&lt;/code&gt;: Single-threaded processing&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;parallel&lt;/code&gt;: Multi-threaded processing using Rayon&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;auto&lt;/code&gt;: Automatically selects based on available workers&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Building and Deploying&lt;/h2&gt; 
&lt;p&gt;With the implementation complete, let’s compile the function for Lambda’s environment and deploy it to AWS.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;# Build for ARM64 (Graviton2) - recommended for cost efficiency
cargo lambda build --release --arm64

# Or build for x86_64
cargo lambda build --release --x86-64&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The build process produces a binary of approximately &lt;strong&gt;1.7 MB&lt;/strong&gt; (uncompressed) or &lt;strong&gt;0.8 MB&lt;/strong&gt; (zipped).&lt;/p&gt; 
&lt;h3&gt;Deploy to AWS&lt;/h3&gt; 
&lt;p&gt;Use Cargo Lambda to deploy the function with your desired memory configuration and worker count.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;# Deploy with 6144 MB memory (4 vCPUs) and 4 workers
cargo lambda deploy rust-multithread-lambda \
    --memory 6144 \
    --timeout 30 \
    --env-var WORKER_COUNT=4&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: To test different configurations, repeat the build and deploy commands with different &lt;code&gt;--memory&lt;/code&gt; values and &lt;code&gt;WORKER_COUNT&lt;/code&gt; settings for each configuration you want to benchmark. For comprehensive testing across architectures, build with &lt;code&gt;--arm64&lt;/code&gt;, deploy all memory configurations, then rebuild with &lt;code&gt;--x86-64&lt;/code&gt; and deploy again.&lt;/p&gt; 
&lt;h3&gt;Required IAM Permissions&lt;/h3&gt; 
&lt;p&gt;The Lambda execution role needs the following permissions:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-json"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;Test the Function&lt;/h3&gt; 
&lt;p&gt;After deployment, verify the function works correctly by invoking it with a test payload.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;aws lambda invoke \
    --function-name rust-multithread-lambda \
    --payload '{"count":20,"mode":"parallel"}' \
    --cli-binary-format raw-in-base64-out \
    response.json&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Performance Benchmarks&lt;/h2&gt; 
&lt;p&gt;We tested multiple configurations on ARM64 (Graviton2) to measure the impact of multi-threading.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Test workload&lt;/strong&gt;: Processing 20 bcrypt password hashes (cost factor 10)&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Benchmark results may vary between runs due to factors such as Lambda placement, underlying hardware differences, and AWS infrastructure conditions. The numbers presented here are representative of typical performance observed across multiple test runs.&lt;/p&gt; 
&lt;h3&gt;Performance Results: ARM64 (Graviton2)&lt;/h3&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;vCPUs&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;Avg (ms)&lt;/td&gt; 
   &lt;td&gt;P50 (ms)&lt;/td&gt; 
   &lt;td&gt;P95 (ms)&lt;/td&gt; 
   &lt;td&gt;P99 (ms)&lt;/td&gt; 
   &lt;td&gt;Min&lt;/td&gt; 
   &lt;td&gt;Max&lt;/td&gt; 
   &lt;td&gt;Speedup&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;~1&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,885&lt;/td&gt; 
   &lt;td&gt;1,882&lt;/td&gt; 
   &lt;td&gt;1,898&lt;/td&gt; 
   &lt;td&gt;1,898&lt;/td&gt; 
   &lt;td&gt;1,877&lt;/td&gt; 
   &lt;td&gt;1,907&lt;/td&gt; 
   &lt;td&gt;1.00x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;2048 MB&lt;/td&gt; 
   &lt;td&gt;~2&lt;/td&gt; 
   &lt;td&gt;2&lt;/td&gt; 
   &lt;td&gt;1,334&lt;/td&gt; 
   &lt;td&gt;1,331&lt;/td&gt; 
   &lt;td&gt;1,341&lt;/td&gt; 
   &lt;td&gt;1,341&lt;/td&gt; 
   &lt;td&gt;1,324&lt;/td&gt; 
   &lt;td&gt;1,356&lt;/td&gt; 
   &lt;td&gt;1.41x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;~3&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;685&lt;/td&gt; 
   &lt;td&gt;683&lt;/td&gt; 
   &lt;td&gt;699&lt;/td&gt; 
   &lt;td&gt;699&lt;/td&gt; 
   &lt;td&gt;669&lt;/td&gt; 
   &lt;td&gt;704&lt;/td&gt; 
   &lt;td&gt;2.75x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;~4&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;463&lt;/td&gt; 
   &lt;td&gt;464&lt;/td&gt; 
   &lt;td&gt;467&lt;/td&gt; 
   &lt;td&gt;467&lt;/td&gt; 
   &lt;td&gt;453&lt;/td&gt; 
   &lt;td&gt;469&lt;/td&gt; 
   &lt;td&gt;4.07x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;~5&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;338&lt;/td&gt; 
   &lt;td&gt;343&lt;/td&gt; 
   &lt;td&gt;345&lt;/td&gt; 
   &lt;td&gt;345&lt;/td&gt; 
   &lt;td&gt;325&lt;/td&gt; 
   &lt;td&gt;346&lt;/td&gt; 
   &lt;td&gt;5.57x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;~6&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;280&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;278&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;292&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;292&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;271&lt;/td&gt; 
   &lt;td&gt;293&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;6.73x&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Performance Results: x86_64&lt;/h3&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;vCPUs&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;Avg (ms)&lt;/td&gt; 
   &lt;td&gt;P50 (ms)&lt;/td&gt; 
   &lt;td&gt;P95 (ms)&lt;/td&gt; 
   &lt;td&gt;P99 (ms)&lt;/td&gt; 
   &lt;td&gt;Min&lt;/td&gt; 
   &lt;td&gt;Max&lt;/td&gt; 
   &lt;td&gt;Speedup&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;~1&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,671&lt;/td&gt; 
   &lt;td&gt;1,675&lt;/td&gt; 
   &lt;td&gt;1,681&lt;/td&gt; 
   &lt;td&gt;1,681&lt;/td&gt; 
   &lt;td&gt;1,659&lt;/td&gt; 
   &lt;td&gt;1,684&lt;/td&gt; 
   &lt;td&gt;1.00x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;2048 MB&lt;/td&gt; 
   &lt;td&gt;~2&lt;/td&gt; 
   &lt;td&gt;2&lt;/td&gt; 
   &lt;td&gt;1,253&lt;/td&gt; 
   &lt;td&gt;1,249&lt;/td&gt; 
   &lt;td&gt;1,265&lt;/td&gt; 
   &lt;td&gt;1,265&lt;/td&gt; 
   &lt;td&gt;1,241&lt;/td&gt; 
   &lt;td&gt;1,294&lt;/td&gt; 
   &lt;td&gt;1.33x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;~3&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;892&lt;/td&gt; 
   &lt;td&gt;891&lt;/td&gt; 
   &lt;td&gt;899&lt;/td&gt; 
   &lt;td&gt;899&lt;/td&gt; 
   &lt;td&gt;888&lt;/td&gt; 
   &lt;td&gt;900&lt;/td&gt; 
   &lt;td&gt;1.87x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;~4&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;429&lt;/td&gt; 
   &lt;td&gt;425&lt;/td&gt; 
   &lt;td&gt;443&lt;/td&gt; 
   &lt;td&gt;443&lt;/td&gt; 
   &lt;td&gt;417&lt;/td&gt; 
   &lt;td&gt;449&lt;/td&gt; 
   &lt;td&gt;3.89x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;~5&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;330&lt;/td&gt; 
   &lt;td&gt;323&lt;/td&gt; 
   &lt;td&gt;349&lt;/td&gt; 
   &lt;td&gt;349&lt;/td&gt; 
   &lt;td&gt;317&lt;/td&gt; 
   &lt;td&gt;358&lt;/td&gt; 
   &lt;td&gt;5.06x&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;~6&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;292&lt;/td&gt; 
   &lt;td&gt;292&lt;/td&gt; 
   &lt;td&gt;298&lt;/td&gt; 
   &lt;td&gt;298&lt;/td&gt; 
   &lt;td&gt;291&lt;/td&gt; 
   &lt;td&gt;298&lt;/td&gt; 
   &lt;td&gt;5.72x&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Architecture Comparison&lt;/h3&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;ARM64 Avg&lt;/td&gt; 
   &lt;td&gt;x86_64 Avg&lt;/td&gt; 
   &lt;td&gt;Diff %&lt;/td&gt; 
   &lt;td&gt;Faster Arch&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,885 ms&lt;/td&gt; 
   &lt;td&gt;1,671 ms&lt;/td&gt; 
   &lt;td&gt;-12.8%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;2048 MB&lt;/td&gt; 
   &lt;td&gt;2&lt;/td&gt; 
   &lt;td&gt;1,334 ms&lt;/td&gt; 
   &lt;td&gt;1,253 ms&lt;/td&gt; 
   &lt;td&gt;-6.4%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;685 ms&lt;/td&gt; 
   &lt;td&gt;892 ms&lt;/td&gt; 
   &lt;td&gt;+23.2%&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;463 ms&lt;/td&gt; 
   &lt;td&gt;429 ms&lt;/td&gt; 
   &lt;td&gt;-7.9%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;338 ms&lt;/td&gt; 
   &lt;td&gt;330 ms&lt;/td&gt; 
   &lt;td&gt;-2.4%&lt;/td&gt; 
   &lt;td&gt;x86_64&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;280 ms&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;292 ms&lt;/td&gt; 
   &lt;td&gt;+4.1%&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Key Observations&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Cold Start Performance&lt;/strong&gt;: Rust’s cold start initialization times are consistently between 19-28 ms across all memory configurations and architectures. ARM64 (&lt;a href="https://aws.amazon.com/pm/ec2-graviton/"&gt;Graviton2&lt;/a&gt;) shows slightly faster cold starts (19-23 ms) compared to x86_64 (26-29 ms). Both are significantly faster than interpreted runtimes because the binary is pre-compiled.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Near-Linear Scaling&lt;/strong&gt;: Both architectures achieve impressive speedups:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;ARM64: &lt;strong&gt;6.73x speedup&lt;/strong&gt; with 6 workers (exceeds theoretical 6x!)&lt;/li&gt; 
 &lt;li&gt;x86_64: 5.72x speedup with 6 workers&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Latency Consistency&lt;/strong&gt;: The P95 and P99 metrics show excellent consistency:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;ARM64 at 6 vCPUs: P50=278ms, P95=292ms, P99=292ms (low variance)&lt;/li&gt; 
 &lt;li&gt;x86_64 at 6 vCPUs: P50=292ms, P95=298ms, P99=298ms&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Both architectures show consistent latency at maximum parallelization.&lt;/p&gt; 
&lt;h2&gt;Cost Analysis&lt;/h2&gt; 
&lt;p&gt;Let’s analyze the cost implications of different configurations for processing 20 bcrypt hashes.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Cost Comparison: ARM64 vs x86_64&lt;/strong&gt; (us-east-1, as of January 2026):&lt;/p&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Config&lt;/td&gt; 
   &lt;td&gt;Memory&lt;/td&gt; 
   &lt;td&gt;Workers&lt;/td&gt; 
   &lt;td&gt;ARM64 Duration&lt;/td&gt; 
   &lt;td&gt;ARM64 Cost/1M&lt;/td&gt; 
   &lt;td&gt;x86_64 Duration&lt;/td&gt; 
   &lt;td&gt;x86_64 Cost/1M&lt;/td&gt; 
   &lt;td&gt;Cheaper Arch&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;1 vCPU&lt;/td&gt; 
   &lt;td&gt;1536 MB&lt;/td&gt; 
   &lt;td&gt;1&lt;/td&gt; 
   &lt;td&gt;1,885 ms&lt;/td&gt; 
   &lt;td&gt;$38.60&lt;/td&gt; 
   &lt;td&gt;1,671 ms&lt;/td&gt; 
   &lt;td&gt;$42.78&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;2 vCPU&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;2048 MB&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;1,334 ms&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;$36.46&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;1,253 ms&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;$42.77&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64 *&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;3 vCPU&lt;/td&gt; 
   &lt;td&gt;4096 MB&lt;/td&gt; 
   &lt;td&gt;3&lt;/td&gt; 
   &lt;td&gt;685 ms&lt;/td&gt; 
   &lt;td&gt;$37.47&lt;/td&gt; 
   &lt;td&gt;892 ms&lt;/td&gt; 
   &lt;td&gt;$60.80&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;4 vCPU&lt;/td&gt; 
   &lt;td&gt;6144 MB&lt;/td&gt; 
   &lt;td&gt;4&lt;/td&gt; 
   &lt;td&gt;463 ms&lt;/td&gt; 
   &lt;td&gt;$37.97&lt;/td&gt; 
   &lt;td&gt;429 ms&lt;/td&gt; 
   &lt;td&gt;$44.00&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;5 vCPU&lt;/td&gt; 
   &lt;td&gt;8192 MB&lt;/td&gt; 
   &lt;td&gt;5&lt;/td&gt; 
   &lt;td&gt;338 ms&lt;/td&gt; 
   &lt;td&gt;$36.94&lt;/td&gt; 
   &lt;td&gt;330 ms&lt;/td&gt; 
   &lt;td&gt;$45.10&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;6 vCPU&lt;/td&gt; 
   &lt;td&gt;10240 MB&lt;/td&gt; 
   &lt;td&gt;6&lt;/td&gt; 
   &lt;td&gt;280 ms&lt;/td&gt; 
   &lt;td&gt;$38.27&lt;/td&gt; 
   &lt;td&gt;292 ms&lt;/td&gt; 
   &lt;td&gt;$49.87&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;ARM64&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h5&gt;*Cheaper Arch&lt;/h5&gt; 
&lt;p&gt;&lt;strong&gt;Cost Formulas:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;ARM64: (Memory in GB) × (Duration in seconds) × $0.0000133334&lt;/li&gt; 
 &lt;li&gt;x86_64: (Memory in GB) × (Duration in seconds) × $0.0000166667 (25% higher rate)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Key Insight&lt;/strong&gt;: The &lt;strong&gt;2 vCPU ARM64 configuration provides the lowest cost&lt;/strong&gt; at $36.46 per million invocations while achieving 1.41x speedup. All ARM64 configurations remain cost-competitive ($36-$39 range) despite significant performance differences, demonstrating how increased throughput can offset higher memory costs.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Choosing the Right Configuration&lt;/strong&gt;:&lt;/p&gt; 
&lt;table width="100%"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;Priority&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;Recommended Config&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt;&lt;strong&gt;Rationale&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Lowest Cost&lt;/td&gt; 
   &lt;td&gt;ARM64, 2048 MB, 2 workers&lt;/td&gt; 
   &lt;td&gt;$36.46/1M invocations, 1.41x speedup&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Balanced&lt;/td&gt; 
   &lt;td&gt;ARM64, 4096 MB, 3 workers&lt;/td&gt; 
   &lt;td&gt;$37.47/1M invocations, 2.75x speedup&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;Low Latency&lt;/td&gt; 
   &lt;td&gt;ARM64, 10240 MB, 6 workers&lt;/td&gt; 
   &lt;td&gt;280ms avg, 6.73x speedup&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;When to Use Multi-threaded Rust on Lambda&lt;/h2&gt; 
&lt;h3&gt;Recommended Use Cases&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Batch data processing&lt;/strong&gt;: Transform, validate, or enrich large datasets&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cryptographic operations&lt;/strong&gt;: Hashing, encryption, digital signatures&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Image/video processing&lt;/strong&gt;: Resize, transcode, analyze media files&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scientific computing&lt;/strong&gt;: Simulations, data analysis, machine learning inference&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;High-volume workloads&lt;/strong&gt;: Functions invoked &amp;gt;100,000 times per day benefit from optimization&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;When to Consider Alternatives&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;I/O-bound operations&lt;/strong&gt;: Use async Rust instead of multi-threading for database queries or API calls&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Simple transformations&lt;/strong&gt;: Functions completing in &amp;lt;100ms rarely benefit from parallelization&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Low-volume workloads&lt;/strong&gt;: Development overhead may not be justified for &amp;lt;10,000 invocations per day&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Rapid prototyping&lt;/strong&gt;: Python or Node.js may be more appropriate when iteration speed is critical&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Cleanup&lt;/h2&gt; 
&lt;p&gt;To delete the resources created in this post:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;# Delete the Lambda function
aws lambda delete-function --function-name rust-multithread-lambda

# Delete the CloudWatch log group
aws logs delete-log-group --log-group-name /aws/lambda/rust-multithread-lambda&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If you deployed multiple configurations for testing, you’ll need to delete each function individually by repeating the delete command with each function name, or use the SAM template for bulk cleanup:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-bash"&gt;aws cloudformation delete-stack --stack-name rust-multithread-benchmark&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;When you allocate more memory to your Lambda function, AWS provides proportionally more vCPUs—up to 6 vCPUs at 10,240 MB. However, &lt;strong&gt;sequential code only uses one vCPU&lt;/strong&gt;, leaving the additional compute power idle while you pay for the full allocation. Multi-threaded Rust with Rayon enables you to harness all available vCPUs for CPU-intensive workloads, transforming unused capacity into real performance gains.&lt;/p&gt; 
&lt;p&gt;Our benchmarks demonstrate this clearly:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Near-linear scaling&lt;/strong&gt;: ARM64 achieved &lt;strong&gt;6.73x speedup&lt;/strong&gt; with 6 workers—you get proportional returns on your vCPU investment&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Fast cold starts&lt;/strong&gt;: 19-28 ms initialization across all configurations, eliminating the cold start concerns often associated with compiled languages&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Consistent latency&lt;/strong&gt;: ARM64 at 6 vCPUs shows only 1ms variance between P50 and P99, critical for predictable response times&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost efficiency&lt;/strong&gt;: ARM64 is 15-20% cheaper than x86_64 with better scaling at maximum parallelization&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;The key takeaway&lt;/strong&gt;: If your Lambda function performs CPU-intensive work and you’re allocating more than 1,769 MB of memory, you likely have multiple vCPUs available. Without multi-threading, those vCPUs sit idle. Rayon’s parallel iterators allow you to switch from sequential to parallel processing by changing &lt;code&gt;.iter()&lt;/code&gt; to &lt;code&gt;.par_iter()&lt;/code&gt; in your code.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Recommended starting point&lt;/strong&gt;: ARM64 with 4096 MB (3 workers) offers an excellent balance of cost and performance for most workloads. Scale up to 6 vCPUs for latency-critical applications, or down to 2 vCPUs for maximum cost savings.&lt;/p&gt; 
&lt;h2&gt;Additional Resources&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://github.com/awslabs/aws-lambda-rust-runtime"&gt;AWS Lambda Rust Runtime&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://www.cargo-lambda.info/"&gt;Cargo Lambda Documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.rs/rayon/latest/rayon/"&gt;Rayon Data Parallelism Library&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html"&gt;AWS Lambda Memory and CPU Configuration&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/lambda/pricing/"&gt;AWS Lambda Pricing&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;em&gt;The complete sample code, SAM template, and test scripts from this post are available at &lt;/em&gt;&lt;a href="https://github.com/aws-samples/sample-rust-multithread-lambda"&gt;&lt;em&gt;Github Repository&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Amazon SageMaker AI now hosts NVIDIA Evo-2 NIM microservices</title>
		<link>https://aws.amazon.com/blogs/compute/amazon-sagemaker-ai-now-hosting-nvidia-evo-2-nim-microservices/</link>
					
		
		<dc:creator><![CDATA[Malvika Viswanathan]]></dc:creator>
		<pubDate>Tue, 24 Feb 2026 18:48:08 +0000</pubDate>
				<category><![CDATA[Amazon SageMaker AI]]></category>
		<category><![CDATA[Amazon SageMaker JumpStart]]></category>
		<category><![CDATA[Amazon SageMaker Unified Studio]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Marketplace]]></category>
		<category><![CDATA[AWS Partner Network]]></category>
		<guid isPermaLink="false">ddd72291399cfef3d140f16b7df049b17d7a3ba9</guid>

					<description>This post is co-written with Neel Patel, Abdullahi Olaoye, Kristopher Kersten, Aniket Deshpande from NVIDIA. Today, we’re excited to announce that the NVIDIA Evo-2 NVIDIA NIM microservice are now listed in Amazon SageMaker JumpStart. You can use this launch to deploy accelerated and specialized NIM microservices to build, experiment, and responsibly scale your drug discovery […]</description>
										<content:encoded>&lt;p&gt;&lt;em&gt;This post is co-written with Neel Patel, Abdullahi Olaoye, Kristopher Kersten, Aniket Deshpande from NVIDIA.&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;Today, we’re excited to announce that the NVIDIA Evo-2 &lt;a href="https://www.nvidia.com/en-us/ai-data-science/products/nim-microservices/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA NIM microservice&lt;/a&gt; are now listed in &lt;a href="https://aws.amazon.com/sagemaker/ai/jumpstart/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker JumpStart&lt;/a&gt;. You can use this launch to deploy accelerated and specialized NIM microservices to build, experiment, and responsibly scale your drug discovery workflows on &lt;a href="https://aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;Amazon Web Services (AWS).&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;In this post, we demonstrate how to get started with these models using &lt;a href="https://aws.amazon.com/sagemaker/ai/studio/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Studio&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;NVIDIA NIM microservices on AWS&lt;/h2&gt; 
&lt;p&gt;NVIDIA NIM integrates closely with AWS managed services, such as &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt;, &lt;a href="https://aws.amazon.com/eks/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (Amazon EKS)&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/sagemaker/ai/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker AI&lt;/a&gt;, to support deployment of generative AI models at scale. As part of &lt;a href="https://www.nvidia.com/en-us/data-center/products/ai-enterprise/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA AI Enterprise&lt;/a&gt;, which is available in the &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-ozgjkov6vq3l6?applicationId=AWSMPContessa&amp;amp;ref_=beagle&amp;amp;sr=0-2" target="_blank" rel="noopener noreferrer"&gt;AWS Marketplace&lt;/a&gt;, NVIDIA NIM is a set of microservices designed to accelerate the deployment of generative AI. These prebuilt containers support a broad spectrum of generative AI models, from open source community models, to &lt;a href="https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA Nemotron&lt;/a&gt; and custom models. NIM microservices are deployed with just a few lines of code, or with a few actions in the SageMaker Studio console. Engineered to facilitate seamless generative AI inferencing at scale, NIM ensures that generative AI applications can be deployed on various AWS services.&lt;/p&gt; 
&lt;h2&gt;NVIDIA BioNeMo Evo 2 overview&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://www.nvidia.com/en-us/clara/biopharma/" target="_blank" rel="noopener noreferrer"&gt;NVIDIA BioNeMo&lt;/a&gt; is a platform of NIM microservices, developer tools, and AI models that accelerate building, adapting, and deploying biomolecular AI models for drug discovery. It packages curated training recipes, data loaders, and domain-optimized pretrained models for DNA, RNA, and proteins, alongside &lt;a href="https://developer.nvidia.com/gpu-accelerated-libraries" target="_blank" rel="noopener noreferrer"&gt;NVIDIA CUDA-X libraries&lt;/a&gt; such as &lt;a href="https://developer.nvidia.com/cuequivariance" target="_blank" rel="noopener noreferrer"&gt;NVIDIA cuEquivariance&lt;/a&gt;. These components power tasks such as 3D structure prediction, de novo design, virtual screening, docking, and property prediction with GPU-accelerated performance.&lt;/p&gt; 
&lt;p&gt;NVIDIA NIM microservices provide optimized, API-first inference that integrates directly into enterprise pipelines across on-premises and the cloud, providing scalable and secure deployment with faster time-to-market and lower Total Cost of Ownership (TCO). The Evo 2 NIM delivers a 40-billion parameter foundation model (FM) trained on a vast dataset of genomes that can be used to predict protein function, identify mutations, and accelerate bioengineering research. Furthermore, the Evo 2 NIM can be chained with other NIM microservices such as ESMFold to create end-to-end, containerized workflows that cut time-to-insight while streamlining deployment through consistent APIs.&lt;/p&gt; 
&lt;h2&gt;SageMaker Studio overview&lt;/h2&gt; 
&lt;p&gt;SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that provides a unified visual interface for all of the tools that you need to complete each step of the ML development lifecycle. SageMaker Studio provides complete access, control, and visibility into each step of the ML workflow, from data preparation to model building, training, and deployment. &lt;/p&gt; 
&lt;p&gt;The key features of SageMaker Studio include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Unified interface&lt;/strong&gt;: Access all SageMaker capabilities through a single, web-based visual interface&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Jupyter notebooks&lt;/strong&gt;: Fully managed Jupyter notebooks with pre-configured kernels for popular ML frameworks&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Model management&lt;/strong&gt;: Browse, deploy, and manage models from AWS Marketplace and other sources through an intuitive interface&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Collaboration&lt;/strong&gt;: Share notebooks, experiments, and models with your team members&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Built-in security&lt;/strong&gt;: Integrated with &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; for secure access control&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost management&lt;/strong&gt;: Monitor and control costs with built-in usage tracking and resource management tools&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Amazon SageMaker JumpStart overview&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/sagemaker/ai/jumpstart/"&gt;SageMaker JumpStart&lt;/a&gt; is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is model hubs, which offer a vast catalog of pre-trained models, such as Mistral, for a variety of tasks. You can now discover and deploy Evo 2 NIM in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, so you can derive model performance and MLOps controls with &lt;a href="https://aws.amazon.com/sagemaker/ai/"&gt;Amazon SageMaker AI&lt;/a&gt; features such as &lt;a href="https://aws.amazon.com/sagemaker/ai/pipelines/"&gt;Amazon SageMaker Pipelines&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html"&gt;Amazon SageMaker Debugger&lt;/a&gt;, or container logs. The model is deployed in a secure AWS environment and in your VPC, helping to support data security for enterprise security needs.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before getting started with deployment, make sure that your IAM service role for SageMaker AI has the SageMakerFullAccess permission policy attached. To deploy the NVIDIA NIM microservices successfully, confirm one of the following:&lt;/p&gt; 
&lt;p&gt;Make sure that your IAM role has the following permissions, and that you have the authority to make AWS Marketplace subscriptions in the AWS account used:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;aws-marketplace:ViewSubscriptions&lt;/li&gt; 
 &lt;li&gt;aws-marketplace:Unsubscribe&lt;/li&gt; 
 &lt;li&gt;aws-marketplace:Subscribe&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;If your account is already subscribed to the model, then you can skip to the following Deploy section. Otherwise, start by subscribing to the model package and move to the Deploy section after.&lt;/p&gt; 
&lt;h2&gt;Subscribe to the model package&lt;/h2&gt; 
&lt;p&gt;To subscribe to the model package, complete the following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the SageMaker Jumpstart portal from the SageMaker AI page.&lt;/li&gt; 
 &lt;li&gt;Search for &lt;strong&gt;Evo 2 NIM&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose View model, and on the Model details page choose Subscribe. This will take you to the AWS Marketplace listing for the Evo 2 NIM.&lt;/li&gt; 
 &lt;li&gt;On the AWS Marketplace listing page, choose View purchase options, review the purchase terms and choose the Subscribe button if you and your organization agree with EULA, pricing, and support terms.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Continue&lt;/strong&gt; to with the configuration and choose an &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Region&lt;/a&gt; where you have the service quota for the desired instance type.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;A product &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Name (ARN)&lt;/a&gt; is displayed. This is the model package ARN that you need to specify while creating a deployable model using the SageMaker SDK.&lt;/p&gt; 
&lt;h2&gt;Option 1: Deploy the Evo 2 NIM using SageMaker Studio&lt;/h2&gt; 
&lt;p&gt;The following section outlines how to deploy the EVO 2 NIM using SageMaker Studio.&lt;/p&gt; 
&lt;h3&gt;Getting started with SageMaker Studio&lt;/h3&gt; 
&lt;p&gt;Begin by accessing the &lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt; and navigating to the SageMaker AI service. When you’re in the SageMaker AI console, locate &lt;strong&gt;Studio&lt;/strong&gt; in the left navigation panel and choose &lt;strong&gt;Open Studio&lt;/strong&gt; next to your user profile. If you haven’t set up a SageMaker Studio domain yet, then you must create a new domain and user profile first. This launches the web-based SageMaker Studio interface where you can manage all aspects of your ML workflow.&lt;/p&gt; 
&lt;h3&gt;Navigating to model packages&lt;/h3&gt; 
&lt;p&gt;Within SageMaker Studio, look for &lt;strong&gt;Models&lt;/strong&gt; in the left sidebar and choose &lt;strong&gt;JumpStart base models&lt;/strong&gt; tab within the &lt;strong&gt;Models&lt;/strong&gt; interface. This section contains all available model packages in &lt;strong&gt;SageMaker JumpStart&lt;/strong&gt;, including those from the AWS Marketplace&lt;/p&gt; 
&lt;h3&gt;Locating the Evo-2 NIM model&lt;/h3&gt; 
&lt;p&gt;Use the search functionality to find the NVIDIA Evo-2 NIM model by searching for terms such as “Evo-2” or “NVIDIA”. When you locate the model package in the filtered results, choose it to view the &lt;strong&gt;Model overview&lt;/strong&gt; page. This page provides an overview of the model and can have a &lt;strong&gt;Notebooks&lt;/strong&gt; tab that will show a sample notebook that contains an example showing how to use the NIM. You can choose &lt;strong&gt;Open in JupyterLab&lt;/strong&gt; to open the notebook in JupyterLab and use it as a starting point for using the NIM.&lt;/p&gt; 
&lt;h3&gt;Configuring the model deployment&lt;/h3&gt; 
&lt;p&gt;On the model package overview page, choose the &lt;strong&gt;Deploy&lt;/strong&gt; button on the top right to begin the deployment process. You must configure several important settings: provide a unique endpoint name (such as “Evo-2-nim-endpoint”), choose an appropriate instance type (ml.g6e.12xlarge is recommended for optimal performance), set the initial instance count (typically 1 for initial testing), and specify an endpoint configuration name. Review all of these settings carefully before proceeding.&lt;/p&gt; 
&lt;h3&gt;Initiating and monitoring the deployment&lt;/h3&gt; 
&lt;p&gt;After verifying your configuration settings, choose &lt;strong&gt;Deploy&lt;/strong&gt; to start the deployment process for creating a &lt;strong&gt;Real-time inferance endpoint&lt;/strong&gt;. Navigate to the &lt;strong&gt;Deployments&lt;/strong&gt; section and then the &lt;strong&gt;Endpoints&lt;/strong&gt; section in the left sidebar to monitor the deployment progress. The endpoint status initially shows &lt;strong&gt;Creating&lt;/strong&gt; and typically takes 5–10 minutes to complete. You can track the progress and should see the status change to &lt;strong&gt;InService&lt;/strong&gt; once the deployment is successful.&lt;/p&gt; 
&lt;h3&gt;Testing and validation&lt;/h3&gt; 
&lt;p&gt;When your endpoint is deployed and shows the &lt;strong&gt;In Service&lt;/strong&gt; status, you can optionally test it directly through the SageMaker Studio interface. Choose your deployed endpoint from the endpoints list to access the &lt;strong&gt;Endpoint summary&lt;/strong&gt; page. Scroll down and select the &lt;strong&gt;Playground&lt;/strong&gt; tab. If available, you will see two options: &lt;strong&gt;Test the sample request&lt;/strong&gt; and &lt;strong&gt;Use Python SDK example code&lt;/strong&gt;. You can use either option to validate the deployment by using a sample protein sequence. This validates the endpoint is working correctly before integrating it into your applications.&lt;/p&gt; 
&lt;h2&gt;Option 2: Deploy Evo 2 using the SageMaker SDK&lt;/h2&gt; 
&lt;p&gt;In this section we walk through deploying the Evo-2 NIM through the SageMaker SDK. Make sure that you have the account-level service limit for using ml.g6e.12xlarge for endpoint usage as one or more instances. Furthermore, NVIDIA provides a list of supported instance types that support deployment. Refer to the AWS Marketplace listing for the model to see the supported instance types. To request a service quota increase, go to the &lt;a href="https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html" target="_blank" rel="noopener noreferrer"&gt;AWS service quotas&lt;/a&gt;.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;import sagemaker
import boto3
from sagemaker import ModelPackage, get_execution_role
import json
# Initialize SageMaker session and role
role = get_execution_role()
sagemaker_session = sagemaker.Session()
# Model Package ARN from your AWS Marketplace subscription
# Replace this with your actual Model Package ARN after subscription
model_package_arn = "arn:aws:sagemaker:&amp;lt;region&amp;gt;:&amp;lt;account-id&amp;gt;:model-package/Evo-2-nim-model"
# Create model from AWS Marketplace Model Package
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session
)
# Deploy the model to an endpoint
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g6e.12xlarge",  # Using recommended NVIDIA GPU instance
    endpoint_name="Evo-2-endpoint",
    wait=True
)&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Run Inference with Evo 2 SageMaker endpoint&lt;/h3&gt; 
&lt;p&gt;When you have the model, you can use a sample text to do an inference request. NIM on SageMaker supports the OpenAI API inference protocol inference request format. For an explanation of the supported parameters, go to the &lt;a href="https://docs.api.nvidia.com/nim/reference/colabfold-msa-search-infer" target="_blank" rel="noopener noreferrer"&gt;Evo-2 API documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Real-time inference example&lt;/h3&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;sm_runtime = boto3.client("sagemaker-runtime", region_name=region)

generate_payload = {

 "sequence": "ACGTACGTACGT",

 "num_tokens": 100,

 "temperature": 0.7,

 "top_k": 3,

}

response = sm_runtime.invoke_endpoint(

EndpointName='Evo2-40b-2-1-0',

ContentType="application/json",

Body=json.dumps(generate_payload),

)

result = json.loads(response["Body"].read())

print("Generated DNA:", result["sequence"])
print("Elapsed (ms):", result.get("elapsed_ms"))
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Example output:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Generated DNA: ACGTACATATGTTCGTACATTCGCACAGACGCCATTTTGAAAAATGCTTTAAATGGATTCAGAATTGGTCAAAATGCATAAATCCATCAAAATTTTTTTC&lt;br&gt; Elapsed (ms): 10770&lt;/p&gt; 
&lt;h2&gt;Cleaning up&lt;/h2&gt; 
&lt;p&gt;To avoid unwanted charges, complete the steps in this section to clean up your resources.&lt;/p&gt; 
&lt;h3&gt;Deleting the endpoint from SageMaker Studio&lt;/h3&gt; 
&lt;p&gt;In SageMaker Studio, navigate to the &lt;strong&gt;Endpoints&lt;/strong&gt; section in the left sidebar under &lt;strong&gt;Inference&lt;/strong&gt; to view all your active endpoints. Locate your Evo-2 NIM endpoint in the list and select it to open the endpoint details page. On this page, there is a &lt;strong&gt;Delete&lt;/strong&gt; button. Choose &lt;strong&gt;Delete&lt;/strong&gt; and confirm the deletion when prompted. The endpoint status changes to &lt;strong&gt;Deleting&lt;/strong&gt; and disappears from your endpoints list when the deletion is complete. This process typically takes a few minutes, and when it’s deleted the endpoint stops incurring charges immediately.&lt;/p&gt; 
&lt;h3&gt;Delete the SageMaker endpoint&lt;/h3&gt; 
&lt;p&gt;The SageMaker endpoint that you deployed incurs costs if you leave it running. Use the following code to delete the endpoint if you want to stop incurring charges. For more details, go to &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-delete-resources.html" target="_blank" rel="noopener noreferrer"&gt;Delete endpoints and resources&lt;/a&gt;.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;# Delete endpoint when done (important for cost management)
predictor.delete_endpoint()&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The availability of NVIDIA Evo-2 NIM microservices on Amazon SageMaker Jumpstart represents a significant advancement for researchers and organizations working in drug discovery. This solution provides GPU-accelerated multiple sequence alignments and dramatically speeds up structure prediction pipelines that are critical for protein design and antibody research. Users can implement the flexible deployment options—through SageMaker Studio, or SageMaker SDK—to choose the approach that best fits their workflow and technical expertise. The optimized performance of these NIM microservices, combined with the scalability and security of SageMaker, enables faster time-to-insight while streamlining the deployment of complex biomolecular AI models. We encourage you to try the Evo-2 NIM today and look out for future release of MSA-search and Boltz-2 NIMs to accelerate your drug discovery workflows and use the power of NVIDIA’s specialized microservices on AWS infrastructure.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building fault-tolerant applications with AWS Lambda durable functions</title>
		<link>https://aws.amazon.com/blogs/compute/building-fault-tolerant-long-running-application-with-aws-lambda-durable-functions/</link>
					
		
		<dc:creator><![CDATA[Rahul Pisal]]></dc:creator>
		<pubDate>Fri, 06 Feb 2026 16:54:39 +0000</pubDate>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<guid isPermaLink="false">7a6c1a48050b7b4f20e0d12430c82dc3fe579fc1</guid>

					<description>Business applications often coordinate multiple steps that need to run reliably or wait for extended periods, such as customer onboarding, payment processing, or orchestrating large language model inference. These critical processes require completion despite temporary disruptions or system failures. Developers currently spend significant time implementing mechanisms to track progress, handle failures, and manage resources when […]</description>
										<content:encoded>&lt;p&gt;Business applications often coordinate multiple steps that need to run reliably or wait for extended periods, such as customer onboarding, payment processing, or orchestrating large language model inference. These critical processes require completion despite temporary disruptions or system failures. Developers currently spend significant time implementing mechanisms to track progress, handle failures, and manage resources when waiting for external events, shifting focus from business logic to undifferentiated tasks.&lt;/p&gt; 
&lt;p&gt;At re:Invent 2025,&amp;nbsp;&lt;a href="https://aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;Amazon Web Services (AWS)&lt;/a&gt;&amp;nbsp;launched&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;&amp;nbsp;durable functions, a new capability extending Lambda’s event-driven programming model with built-in capabilities to build fault-tolerant multi-step applications and AI workflows using familiar programming languages. At its core, durable functions are regular Lambda functions, so your development and operational processes for Lambda continue to apply. However, when you create a Lambda function you can now enable durable execution, so that you can checkpoint progress, automatically recover from failures, and suspend execution for up to one year when waiting on long-running tasks, such as human-in-the-loop processes.&lt;/p&gt; 
&lt;h2&gt;How Lambda durable functions work&lt;/h2&gt; 
&lt;p&gt;When working with standard Lambda functions, your code runs from start to finish in a single invocation. If a failure occurs at any point during the execution, the entire function must be retried by the invoking event source. Any state that needs to be preserved between executions must be explicitly saved and retrieved. This is typically done by using external storage services such as&amp;nbsp;&lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt;&amp;nbsp;or&amp;nbsp;&lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3&lt;/a&gt;). Furthermore, you must typically guard against duplicate (concurrent) invocations of the same event and have a strategy to safely deploy updates while continuing to process events.&lt;/p&gt; 
&lt;p&gt;In contrast, with Lambda durable functions, developers use durable operations such as “Steps” and “Waits” in the event handler to checkpoint progress, handle failures, and suspend execution during wait periods without incurring compute charges for on-demand functions. These durable operations and any optional state returned from them are automatically persisted by Lambda in a fully-managed durable execution backend. If failures occur during the execution, or if your function resumes its execution after being paused, Lambda invokes your function again, restoring (replaying) the previous state by executing the event handler from the start, but skipping over completed durable operations. To streamline this checkpoint/replay mechanism for developers, you can use the Lambda durable execution SDK to wrap or annotate your event handler, which enhances the existing Lambda context with several new methods like&amp;nbsp;context.step()&amp;nbsp;and context.wait(). Furthermore, you can use methods such as&amp;nbsp;context.waitForCallback()&amp;nbsp;to wait on external jobs or asynchronous processes, such as “human-in-the-loop” scenarios. The execution is paused until a&amp;nbsp;SendDurableExecutionCallbackSuccess&amp;nbsp;or&amp;nbsp;SendDurableExecutionCallbackFailure&amp;nbsp;response is sent to the Lambda API.&lt;/p&gt; 
&lt;h2&gt;Getting started&lt;/h2&gt; 
&lt;p&gt;Use the&amp;nbsp;&lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;AWS Serverless Application Model (AWS SAM)&lt;/a&gt;&amp;nbsp;to create a new durable function with&amp;nbsp;&lt;code&gt;sam init&lt;/code&gt;&amp;nbsp;with an AWS Quick Start Template. Lambda durable functions are also supported by the&amp;nbsp;&lt;a href="https://aws.amazon.com/cdk/" target="_blank" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (AWS CDK)&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI),&lt;/a&gt;&amp;nbsp;&lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt;&amp;nbsp;and other infrastructure as code (IaC) frameworks such as Terraform.&lt;/p&gt; 
&lt;p&gt;Consider the following function, which performs user onboarding. First, it creates a user profile based on some data, then it sends out an email for verification and waits until the user either confirms the email address, or a 24-hour timeout is reached. Finally, it sends out a confirmation.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;import {
  DurableContext,
  withDurableExecution,
} from '@aws/durable-execution-sdk-js';
export const handler = withDurableExecution(
  async (event: OnboardingEvent, context: DurableContext) =&amp;gt; {
    try {    
      // Create user profile
      const profile = await context.step("create-profile", async () =&amp;gt;
        createUserProfile(event.email, event.name)
      );
      // Wait for email verification via callback
      const verification = await context.waitForCallback(
        "wait-for-email-verification",
        async (callbackId) =&amp;gt; {
          // Send email to user and pass callbackId
          await sendVerificationEmail(profile, callbackId);
        },
        {
          timeout: { hours: 24 } 
        }
      );
      // Send confirmation and welcome email
      const result = await context.step("complete-onboarding", async () =&amp;gt; {
        if (!verification || !verification.verified) 
     return { ...profile, status: 'failed' };
        await sendWelcomeEmail(profile.email, profile.name);
        return { ...profile, status: 'active' };
      });
      return result;
    } catch (error) {
      // omitted 
    }
  }
);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Durable functions have built-in and fully customizable error handling for steps. For example, if the profile was successfully created and verified, but a temporary error occurred when sending out the confirmation, then the step is retried. The retry skips over any previously completed checkpoints, such as the profile creation and callback. Only the code within the send confirmation step is run again.&lt;/p&gt; 
&lt;p&gt;Next, you update the AWS SAM template to include your durable function. You create a Lambda durable function by including the DurableConfig setting for your function. Note that you currently cannot add a durable configuration to a function that was originally created without it. The&amp;nbsp;ExecutionTimeout&amp;nbsp;defines after which time the durable execution times out to protect against runaway or deadlock application bugs. This setting is separate from the invocation timeout, which defines for how long a single invocation can run. The maximum invocation timeout for a single function invocations remains unchanged at 15 minutes. With Lambda durable functions, you will typically see multiple invocations per durable execution, such as when using the wait capabilities in the SDK or automatic retries. You can set the ExecutionTimeout for up to one year when using asynchronous invocations.&lt;/p&gt; 
&lt;p&gt;The&amp;nbsp;RetentionPeriodInDays&amp;nbsp;defines how long the execution data of a durable execution is available to you after executions complete.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
 
Resources:
  UserOnboardingFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: UserOnboardingFunction
      CodeUri: ./src
      Handler: index.handler
      Runtime: nodejs24.x
      Architectures:
        - x86_64
      MemorySize: 256
      Timeout: 60		   // Timeout for an individual invocation
      DurableConfig:		   // This makes the function a durable function
        ExecutionTimeout: 90000 // 25h timeout for the durable execution overall
        RetentionPeriodInDays: 7 
UserOnboardingFunctionRole:
    Type: AWS::IAM::Role
    // omitted for brevity&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;You must include the necessary permissions for your function. For example, the&amp;nbsp;&lt;code&gt;AWSLambdaBasicDurableExecutionRole&lt;/code&gt; managed policy only allows the minimal&amp;nbsp;&lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt;&amp;nbsp;actions to create/retrieve checkpoints and logs to increase security. Therefore, it does not include permissions to invoke other (durable) functions or manage callbacks. Refer to the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&amp;nbsp;for more details.&lt;/p&gt; 
&lt;h2&gt;Testing locally&lt;/h2&gt; 
&lt;p&gt;Before deploying your function, you can test it locally using AWS SAM local invoke.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/05/compute-2471-image-1.png"&gt;&lt;/p&gt; 
&lt;p&gt;AWS SAM locally invokes your function and runs the event handler until it reaches the&amp;nbsp;&lt;code&gt;context.waitForCallback()&lt;/code&gt;. To complete callbacks, AWS SAM offers new commands to interact with your durable functions. In this example, you send a&amp;nbsp;&lt;code&gt;Success&lt;/code&gt;&amp;nbsp;response to complete the callback. You can also include relevant data in the response. You can send the response directly using the on-screen guide or using another AWS SAM CLI command from another process.&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;sam local callback succeed &amp;lt;your-callback-id&amp;gt; --result '&amp;lt;your data&amp;gt;'&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/05/compute-2471-image-2.png"&gt;&lt;/p&gt; 
&lt;p&gt;To inspect an execution, you can use AWS SAM to retrieve the durable execution history of your function, which includes details about steps, callbacks, and wait durations, as shown in the following example code.&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;sam local execution history &amp;lt;execution-arn&amp;gt;&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/02/05/compute-2471-image-3.png"&gt;&lt;/p&gt; 
&lt;p&gt;Depending on your use case, you can instead send a Failure response to a callback and handle those errors in your code. For example, by performing compensation logic in a subsequent step:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;sam local callback fail &amp;lt;your-callback-id&amp;gt; --error-data '&amp;lt;your data&amp;gt;'&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Now that you have verified that your function works as intended, deploy it to AWS using&amp;nbsp;&lt;code&gt;sam&amp;nbsp;deploy&lt;/code&gt; command.&lt;/p&gt; 
&lt;h2&gt;Best practices and considerations&lt;/h2&gt; 
&lt;p&gt;Invoking a Lambda durable function requires a qualified&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Name (ARN),&lt;/a&gt;&amp;nbsp;such as an alias or version. We recommend that you don’t use the&amp;nbsp;&lt;code&gt;$LATEST&lt;/code&gt;&amp;nbsp;qualifier except for rapid prototyping or local testing. Using explicit versions ensures that replays always happen with the same code with which the execution was started. This is to ensure deterministic execution and prevent inconsistencies when updating your function code during executions.&lt;/p&gt; 
&lt;p&gt;We recommend bundling the durable execution SDK with your function code using your preferred package manager. The SDKs are fast-moving, so you can update dependencies as new features become available.&lt;/p&gt; 
&lt;p&gt;There are&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html#durable-sdk-operations" target="_blank" rel="noopener noreferrer"&gt;other durable operations&lt;/a&gt;&amp;nbsp;in the Lambda durable functions SDK that you can use to build your application:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;waitForCondition()&lt;/code&gt;: Pauses the execution of your function until a condition is met. For example, the status of a job polled with an API. For this to work, you provide the waitStrategy and a check function to poll the status.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;parallel()&lt;/code&gt;: Runs multiple durable operations in parallel within the same function, with configurable options such as the maximum number of concurrent branches and desired failure behavior. This streamlines managing durability and checkpointing for simultaneous asynchronous actions.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;map()&lt;/code&gt;: Creates a durable operation and checkpoint for each item of an array, based on the provided mapping function. The items are processed concurrently.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;invoke()&lt;/code&gt;: Invokes another Lambda function and waits for its result. The SDK creates a checkpoint, invokes the target function, and resumes your function when the invocation completes. This enables function composition and workflow decomposition.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Refer to the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html" target="_blank" rel="noopener noreferrer"&gt;developer guide&lt;/a&gt;&amp;nbsp;for more details.&lt;/p&gt; 
&lt;p&gt;Lambda compute charges apply to all invocations, including any replays. When using wait operations, the function suspends execution and, for on-demand functions, doesn’t incur duration charges until execution resumes. You’re also charged for durable operations, data written, and data retention. To learn more about Lambda durable functions pricing, refer to the&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/pricing/?trk=c4ea046f-18ad-4d23-a1ac-cdd1267f942c&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;Lambda pricing&lt;/a&gt;&amp;nbsp;page.&lt;/p&gt; 
&lt;p&gt;For the latest Region availability, visit the&amp;nbsp;&lt;a href="https://builder.aws.com/build/capabilities" target="_blank" rel="noopener noreferrer"&gt;AWS Capabilities by Region page&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;AWS Lambda durable functions extends the Lambda programming model to streamline building fault-tolerant, long-running applications using familiar programming patterns. You can use Lambda durable functions to write multi-step workflows in your preferred programming language, using built-in methods that automatically handle progress checkpointing and error recovery. This streamlines your architectures so that you can focus on your business logic, and optimize cost by charging only for active compute time.&lt;/p&gt; 
&lt;p&gt;You can build durable functions for Python or Node.js based Lambda functions using the Lambda API,&amp;nbsp;&lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;, AWS CLI, AWS CloudFormation, AWS SAM, AWS SDK, and AWS CDK.&lt;/p&gt; 
&lt;p&gt;To get started, visit the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Developer Guide&lt;/a&gt;&amp;nbsp;or watch the&amp;nbsp;&lt;a href="https://www.youtube.com/watch?v=XJ80NBOwsow" target="_blank" rel="noopener noreferrer"&gt;re:Invent breakout session&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Serverless ICYMI Q4 2025</title>
		<link>https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2025/</link>
					
		
		<dc:creator><![CDATA[Julian Wood]]></dc:creator>
		<pubDate>Fri, 30 Jan 2026 15:23:57 +0000</pubDate>
				<category><![CDATA[Amazon API Gateway]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon DynamoDB]]></category>
		<category><![CDATA[Amazon EC2 Container Registry]]></category>
		<category><![CDATA[Amazon Elastic Container Service]]></category>
		<category><![CDATA[Amazon EventBridge]]></category>
		<category><![CDATA[Amazon Simple Queue Service (SQS)]]></category>
		<category><![CDATA[Amazon Simple Storage Service (S3)]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Serverless Application Model]]></category>
		<category><![CDATA[AWS Step Functions]]></category>
		<category><![CDATA[Serverless]]></category>
		<category><![CDATA[Strands Agents]]></category>
		<category><![CDATA[serverless]]></category>
		<category><![CDATA[Serverless ICYMI]]></category>
		<guid isPermaLink="false">c010d77d402d1cc5648d23c95ebb47993b11000f</guid>

					<description>Stay current with the latest serverless innovations that can transform your applications. In this 31st quarterly recap, discover the most impactful AWS serverless launches, features, and resources from Q4 2025 that you might have missed.</description>
										<content:encoded>&lt;p&gt;Stay current with the latest serverless innovations that can transform your applications. In this 31st quarterly recap, discover the most impactful AWS serverless launches, features, and resources from Q4 2025 that you might have missed.&lt;/p&gt; 
&lt;p&gt;In case you missed our last ICYMI, check out what happened in &lt;a href="https://aws.amazon.com/blogs/compute/serverless-icymi-q3-2025/" target="_blank" rel="noopener noreferrer"&gt;Q3 2025&lt;/a&gt;.&lt;/p&gt; 
&lt;div id="attachment_25659" style="width: 596px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/2025-Q4-calendar.png"&gt;&lt;img aria-describedby="caption-attachment-25659" loading="lazy" class="size-full wp-image-25659" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/2025-Q4-calendar.png" alt="2025 Q4 calendar" width="586" height="148"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25659" class="wp-caption-text"&gt;2025 Q4 calendar&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Serverless at re:Invent 2025&lt;/h2&gt; 
&lt;p&gt;This post covers the biggest serverless announcements from re:Invent 2025, highlighting key feature updates that can improve your applications, and shares valuable resources to keep you informed.&lt;/p&gt; 
&lt;p&gt;AWS re:Invent 2025 had more than 60,000 in-person attendees and more than 2 million online viewers for the keynotes. The event featured 3,500 sessions from 3,000 speakers, which included information on 530 AWS service and feature announcements.&lt;/p&gt; 
&lt;div id="attachment_25665" style="width: 942px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Keynote-Igniting-the-serverless-movement.png"&gt;&lt;img aria-describedby="caption-attachment-25665" loading="lazy" class="size-full wp-image-25665" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Keynote-Igniting-the-serverless-movement.png" alt="Keynote Igniting the serverless movement" width="932" height="555"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25665" class="wp-caption-text"&gt;Keynote Igniting the serverless movement&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The serverless content consisted of two tracks: Containers and Serverless (CNS) and Application Integration (API). These tracks included 150 unique sessions watched in-person by more than 16,000 attendees. There were developer-focused experiences including a &lt;a href="https://builder.aws.com/content/3515K374s531rNhcd2gu3HIV5BX/the-road-to-reinvent-hackathon-what-it-is-and-how-to-watch" target="_blank" rel="noopener noreferrer"&gt;Road to re:Invent Hackathon&lt;/a&gt;, AWS Builder Loft, and Builders Arena. &lt;a href="https://catalog.workshops.aws/serverlesspresso/en-US" target="_blank" rel="noopener noreferrer"&gt;Serverlesspresso&lt;/a&gt;, the coffee shop powered by serverless technology, operated in two locations during the event: the Expo Hall and the certification lounge.&lt;/p&gt; 
&lt;div id="attachment_25667" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Serverless-and-developer-community-photo.jpeg"&gt;&lt;img aria-describedby="caption-attachment-25667" loading="lazy" class="size-large wp-image-25667" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Serverless-and-developer-community-photo-1024x683.jpeg" alt="Serverless and developer community photo" width="1024" height="683"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25667" class="wp-caption-text"&gt;Serverless and developer community photo&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Find a curated list of serverless videos on &lt;a href="https://www.youtube.com/playlist?list=PLJo-rJlep0ECbKWbv1Ie-MdKFfSmqjmma" target="_blank" rel="noopener noreferrer"&gt;Serverless Land YouTube&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;AWS Lambda durable functions&lt;/h2&gt; 
&lt;p&gt;Managing state across multi-step serverless workflows has traditionally required complex external orchestration tools. &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html" target="_blank" rel="noopener noreferrer"&gt;durable functions&lt;/a&gt; expand how developers can use Lambda. You can now build reliable multi-step applications and AI workflows directly within Lambda.&lt;/p&gt; 
&lt;div id="attachment_25662" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-durable-functions-code.png"&gt;&lt;img aria-describedby="caption-attachment-25662" loading="lazy" class="size-large wp-image-25662" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-durable-functions-code-1024x685.png" alt="AWS Lambda durable functions code" width="1024" height="685"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25662" class="wp-caption-text"&gt;AWS Lambda durable functions code&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Durable functions automatically checkpoint progress by saving the current state and completed steps at key points during execution. This allows them to suspend execution for up to one year during long-running tasks and recover from failures by resuming from the last checkpoint rather than restarting from the beginning, all without requiring additional infrastructure management.&lt;/p&gt; 
&lt;p&gt;Developers can now build in Python or TypeScript, wrap calls in steps with automatic retries and checkpointing. You can use waits to suspend execution for minutes, hours, or even up to a year without paying for idle compute. Durable functions use a replay mechanism to maintain state and handle failures gracefully. The replay mechanism works by re-executing your function code from checkpoints when recovering from failures, ensuring state consistency without data loss. This also means you don’t need complex external orchestration tools for many use cases. This can be helpful for AI workflows and multi-step applications where you need reliable state management without managing external infrastructure.&lt;/p&gt; 
&lt;p&gt;For more information, &lt;a href="https://aws.amazon.com/blogs/aws/build-multi-step-applications-and-ai-workflows-with-aws-lambda-durable-functions/" target="_blank" rel="noopener noreferrer"&gt;read the launch blog post&lt;/a&gt; and watch the re:Invent Breakout Session video: &lt;a href="https://www.youtube.com/watch?v=XJ80NBOwsow" target="_blank" rel="noopener noreferrer"&gt;Deep Dive on AWS Lambda durable functions (CNS380)&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;AWS Lambda Managed Instances&lt;/h2&gt; 
&lt;p&gt;Lambda now offers &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Managed Instances&lt;/a&gt;, a new compute option that combines &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2&lt;/a&gt; flexibility with fully managed infrastructure. AWS automatically handles instance provisioning, scaling, and maintenance while allowing access to the full range of EC2 capabilities, including Graviton4, network-optimized instances, and other specialized compute options.&lt;/p&gt; 
&lt;div id="attachment_25663" style="width: 829px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-Managed-Instances-configuration.png"&gt;&lt;img aria-describedby="caption-attachment-25663" loading="lazy" class="size-large wp-image-25663" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-Managed-Instances-configuration-819x1024.png" alt="AWS Lambda Managed Instances configuration" width="819" height="1024"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25663" class="wp-caption-text"&gt;AWS Lambda Managed Instances configuration&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Your functions run on dedicated EC2 capacity from your account, in your own &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud (Amazon VPC)&lt;/a&gt;. AWS still manages the operational overhead, including OS patching, load balancing, and auto-scaling. This gives you access to specialized hardware options while maintaining the serverless operational model. You can further improve costs by using EC2 pricing models, including &lt;a href="https://aws.amazon.com/savingsplans/compute-pricing/" target="_blank" rel="noopener noreferrer"&gt;Compute Savings Plans&lt;/a&gt; and &lt;a href="https://aws.amazon.com/ec2/pricing/reserved-instances/" target="_blank" rel="noopener noreferrer"&gt;Reserved Instances&lt;/a&gt; for Lambda workloads. Each instance can handle multiple concurrent requests, making this particularly valuable for high-volume, steady-state workloads where predictable pricing and specific hardware requirements matter.&lt;/p&gt; 
&lt;p&gt;For more information, read the &lt;a href="https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/" target="_blank" rel="noopener noreferrer"&gt;launch blog post&lt;/a&gt; and watch the re:Invent Breakout Session video: &lt;a href="https://www.youtube.com/watch?v=7mWa2HpCZfg" target="_blank" rel="noopener noreferrer"&gt;Lambda Managed Instances: EC2 Power with Serverless Simplicity (CNS382)&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Other Lambda announcements&lt;/h2&gt; 
&lt;p&gt;Multi-tenant SaaS applications face challenges like data leakage between tenants and noisy neighbor effects where one tenant’s workload impacts others. They also struggle with implementing custom isolation mechanisms. &lt;a href="https://aws.amazon.com/blogs/compute/building-multi-tenant-saas-applications-with-aws-lambdas-new-tenant-isolation-mode/" target="_blank" rel="noopener noreferrer"&gt;Tenant isolation mode&lt;/a&gt; addresses these by processing function invocations in separate execution environments for each tenant. This manages tenant-level compute environment isolation automatically.&lt;/p&gt; 
&lt;div id="attachment_25664" style="width: 905px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-tenant-isolation.png"&gt;&lt;img aria-describedby="caption-attachment-25664" loading="lazy" class="size-full wp-image-25664" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/AWS-Lambda-tenant-isolation.png" alt="AWS Lambda tenant isolation" width="895" height="226"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25664" class="wp-caption-text"&gt;AWS Lambda tenant isolation&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Lambda adds &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/aws-lambda-provisioned-mode-sqs-esm/" target="_blank" rel="noopener noreferrer"&gt;Provisioned Mode&lt;/a&gt; for &lt;a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener noreferrer"&gt;Amazon SQS&lt;/a&gt; event-source mappings, providing predictable performance and reduced cold starts for high-throughput SQS processing workloads.&lt;/p&gt; 
&lt;p&gt;You can now send &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/10/aws-lambda-payload-size-256-kb-1-mb-invocations/" target="_blank" rel="noopener noreferrer"&gt;up to 1 MB of data in asynchronous Lambda invocations&lt;/a&gt;, increased from 256 KB, helping you build more complex data processing scenarios.&lt;/p&gt; 
&lt;p&gt;Lambda functions now support &lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-networking-over-ipv6/" target="_blank" rel="noopener noreferrer"&gt;IPv6 networking&lt;/a&gt;, so you don’t need &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html" target="_blank" rel="noopener noreferrer"&gt;NAT Gateways&lt;/a&gt; when accessing the internet or other AWS services from VPC-connected functions.&lt;/p&gt; 
&lt;div id="attachment_25666" style="width: 1034px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Lambda-internet-connectivity.png"&gt;&lt;img aria-describedby="caption-attachment-25666" loading="lazy" class="size-large wp-image-25666" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Lambda-internet-connectivity-1024x419.png" alt="Lambda internet connectivity through a NAT Gateway (IPv4) and Lambda internet connectivity through an egress-only internet gateway (IPv6)." width="1024" height="419"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25666" class="wp-caption-text"&gt;Lambda internet connectivity through a NAT Gateway (IPv4) and Lambda internet connectivity through an egress-only internet gateway (IPv6).&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/aws-lambda-rust/" target="_blank" rel="noopener noreferrer"&gt;Lambda Rust support&lt;/a&gt; is now generally available, moving from experimental status. This is backed by AWS Support and the Lambda availability SLA.&lt;/p&gt; 
&lt;p&gt;Lambda has expanded its runtime support by adding &lt;a href="https://aws.amazon.com/blogs/compute/python-3-14-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Python 3.14&lt;/a&gt;, &lt;a href="https://aws.amazon.com/blogs/compute/node-js-24-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Node.js 24&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-now-supports-java-25/" target="_blank" rel="noopener noreferrer"&gt;Java 25&lt;/a&gt; as both managed runtimes and container base images, providing access to the latest language features and ensuring long-term support.&lt;/p&gt; 
&lt;h2&gt;Amazon ECS&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ecs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Container Service (Amazon ECS)&lt;/a&gt; Express Mode streamlines the deployment and management of containerized applications by automating the infrastructure setup that traditionally slows down developers.&lt;/p&gt; 
&lt;div id="attachment_25661" style="width: 798px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Amazon-ECS-Express-Mode-deployment.png"&gt;&lt;img aria-describedby="caption-attachment-25661" loading="lazy" class="size-large wp-image-25661" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/20/Amazon-ECS-Express-Mode-deployment-788x1024.png" alt="Amazon ECS Express Mode deployment" width="788" height="1024"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25661" class="wp-caption-text"&gt;Amazon ECS Express Mode deployment&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;This means you can focus on building applications while deploying with confidence using AWS best practices. Express Mode lets you deploy production-ready containerized web applications and APIs with a single command. This automatically handles domains, networking, load balancing, &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; roles, and auto-scaling through simplified APIs. When your applications evolve and require advanced features, you can seamlessly configure and access the full capabilities of the resources, including Amazon ECS. Learn more from the &lt;a href="https://aws.amazon.com/blogs/aws/build-production-ready-applications-without-infrastructure-complexity-using-amazon-ecs-express-mode/" target="_blank" rel="noopener noreferrer"&gt;launch blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Amazon ECS announced a public preview of a &lt;a href="https://aws.amazon.com/blogs/containers/accelerate-container-troubleshooting-with-the-fully-managed-amazon-ecs-mcp-server-preview/" target="_blank" rel="noopener noreferrer"&gt;fully managed MCP server&lt;/a&gt;, enabling AI-powered experiences for development and operations. The Model Context Protocol (MCP) server provides enterprise-grade capabilities like automatic updates and patching, centralized security through AWS IAM integration, comprehensive audit logging via &lt;a href="https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html" target="_blank" rel="noopener noreferrer"&gt;AWS CloudTrail&lt;/a&gt;, and the proven scalability, reliability, and support of AWS.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ecr/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Container Registry (ECR)&lt;/a&gt; &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-ecr-managed-container-image-signing/" target="_blank" rel="noopener noreferrer"&gt;managed container image signing&lt;/a&gt; enhances your security posture and eliminates the operational overhead of setting up signing. Container image signing allows you to verify that images are from trusted sources. ECR automatically signs images as they are pushed using the identity of the entity pushing the image. Signing operations are logged through CloudTrail for full auditability.&lt;/p&gt; 
&lt;h2&gt;Amazon API Gateway&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/api-gateway/" target="_blank" rel="noopener noreferrer"&gt;Amazon API Gateway&lt;/a&gt;&amp;nbsp;allows you to improve the responsiveness of your REST APIs by &lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" target="_blank" rel="noopener noreferrer"&gt;progressively streaming response payloads&lt;/a&gt; back to the client. With this new capability, you can use streamed responses to enhance user experience when building LLM-driven applications (such as AI agents and chatbots), improve time-to-first-byte (TTFB) performance for web and mobile applications, stream large files, and perform long-running operations while reporting incremental progress using protocols such as &lt;a href="https://en.wikipedia.org/wiki/Server-sent_events" target="_blank" rel="noopener noreferrer"&gt;server-sent events&lt;/a&gt; (SSE).&lt;/p&gt; 
&lt;div id="attachment_25661" style="width: 798px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/06/compute-2459-apigw-streaming-compar.gif"&gt;&lt;img aria-describedby="caption-attachment-25661" loading="lazy" class="aligncenter size-full wp-image-25083" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/06/compute-2459-apigw-streaming-compar.gif" alt="" width="1032" height="500"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25661" class="wp-caption-text"&gt;Amazon API Gateway streaming&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;API Gateway introduces &lt;a href="https://aws.amazon.com/blogs/compute/build-scalable-rest-apis-using-amazon-api-gateway-private-integration-with-application-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;private integration&lt;/a&gt; with &lt;a href="https://aws.amazon.com/elasticloadbalancing/application-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;Application Load Balancers (ALBs)&lt;/a&gt;. You can use this to expose your VPC-based applications securely through REST APIs without exposing your ALBs to the public internet.&lt;/p&gt; 
&lt;p&gt;You can also now configure &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-api-gateway-tls-security-rest-apis/" target="_blank" rel="noopener noreferrer"&gt;enhanced TLS security policies&lt;/a&gt; on API endpoints and custom domain names, providing you with greater control over the security posture of your APIs.&lt;/p&gt; 
&lt;h2&gt;Amazon EventBridge&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/eventbridge/" target="_blank" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt; introduced an &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/eventbridge-enhanced-visual-rule-builder" target="_blank" rel="noopener noreferrer"&gt;enhanced visual rule builder&lt;/a&gt; that helps developers discover and subscribe to events from custom applications and over 200 AWS services. The console-based interface integrates the EventBridge &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-schema-registry.html" target="_blank" rel="noopener noreferrer"&gt;schema registry&lt;/a&gt; with a comprehensive event catalog and intuitive drag-and-drop canvas that simplifies building event-driven applications. Developers can browse and search through events with readily available sample payloads and schemas without having to hunt through individual service documentation. The schema-aware visual builder guides developers through creating event filter patterns and rules, reducing syntax errors and accelerating development time.&lt;/p&gt; 
&lt;p&gt;EventBridge also allows targeting &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-eventbridge-sqs-fair-queue-targets/" target="_blank" rel="noopener noreferrer"&gt;SQS fair queues&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;AWS Step Functions&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/step-functions" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt; allows for enhanced local testing through the &lt;a href="https://aws.amazon.com/blogs/aws/accelerate-workflow-development-with-enhanced-local-testing-in-aws-step-functions/" target="_blank" rel="noopener noreferrer"&gt;TestState API&lt;/a&gt;, providing programmatic access to comprehensive testing capabilities without deploying to AWS. This helps you build automated test suites that validate your workflow definitions locally on your development machines. Test error handling patterns, data transformations, and mock service integrations using your preferred testing frameworks.&lt;/p&gt; 
&lt;p&gt;There is also a new &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/10/aws-step-functions-metrics-dashboard/" target="_blank" rel="noopener noreferrer"&gt;metrics dashboard&lt;/a&gt;, giving you visibility into your workflow operations at both the account and state machine levels.&lt;/p&gt; 
&lt;h2&gt;Other announcements&lt;/h2&gt; 
&lt;p&gt;Savings Plans flexible pricing model extends to AWS managed database services with the launch of &lt;a href="https://aws.amazon.com/blogs/aws/introducing-database-savings-plans-for-aws-databases/" target="_blank" rel="noopener noreferrer"&gt;Database Savings Plans&lt;/a&gt;. This helps reduce database costs by up to 35% when committing to a consistent amount of usage ($/hour) over a&amp;nbsp;1-year&amp;nbsp;term. Savings automatically apply each hour to eligible usage across supported database services, and additional usage beyond the commitment is billed at on-demand rates.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt; now supports &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-dynamodb-multi-attribute-composite-keys-global-secondary-indexes/" target="_blank" rel="noopener noreferrer"&gt;multi-attribute composite keys in global secondary indexes&lt;/a&gt;. You no longer need to concatenate values into synthetic keys manually, which sometimes results in the need to backfill data before adding new indexes. Instead, you can create primary keys using up to eight existing attributes, making it easier to model diverse access patterns and adapt to new query requirements.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; introduced &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-bedrock-agentcore-quality-evaluations-policy-controls/" target="_blank" rel="noopener noreferrer"&gt;AgentCore with quality evaluations and policy controls&lt;/a&gt; for deploying trusted AI agents at scale.&lt;/p&gt; 
&lt;p&gt;Bedrock also added &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-bedrock-18-fully-managed-open-weight-models/" target="_blank" rel="noopener noreferrer"&gt;18 fully managed open weight models&lt;/a&gt;, expanding AI model options for developers.&lt;/p&gt; 
&lt;p&gt;The &lt;a href="https://strandsagents.com/latest/" target="_blank" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt; is an open source framework that takes a model-driven approach to building and running AI agents in just a few lines of code. TypeScript support is &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/typescript-strands-agents-preview/" target="_blank" rel="noopener noreferrer"&gt;now available&lt;/a&gt; in preview so you can choose between Python and TypeScript for building Strands Agents.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-s3-vectors-generally-available/" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 Vectors&lt;/a&gt; became generally available. S3 Vectors delivers purpose-built, cost-optimized vector storage for AI agents, inference, Retrieval Augmented Generation (RAG), and semantic search at billion-vector scale.&lt;/p&gt; 
&lt;h2&gt;Serverless blog posts&lt;/h2&gt; 
&lt;h3&gt;October&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/breaking-down-monolith-workflows-modularizing-aws-step-functions-workflows/" target="_blank" rel="noopener noreferrer"&gt;Breaking down monolith workflows: Modularizing AWS Step Functions workflows&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/introducing-aws-lambda-event-source-mapping-tools-in-the-aws-serverless-mcp-server/" target="_blank" rel="noopener noreferrer"&gt;Introducing AWS Lambda event source mapping tools in the AWS Serverless MCP Server&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/processing-amazon-s3-objects-at-scale-with-aws-step-functions-distributed-map-s3-prefix/" target="_blank" rel="noopener noreferrer"&gt;Processing Amazon S3 objects at scale with AWS Step Functions Distributed Map S3 prefix&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;November&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-networking-over-ipv6/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda networking over IPv6&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/orchestrating-big-data-processing-with-aws-step-functions-distributed-map/" target="_blank" rel="noopener noreferrer"&gt;Orchestrating big data processing with AWS Step Functions Distributed Map&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/optimizing-nested-json-array-processing-using-aws-step-functions-distributed-map/" target="_blank" rel="noopener noreferrer"&gt;Optimizing nested JSON array processing using AWS Step Functions Distributed Map&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/improve-api-discoverability-with-the-new-amazon-api-gateway-portal/" target="_blank" rel="noopener noreferrer"&gt;Improve API discoverability with the new Amazon API Gateway Portal&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" target="_blank" rel="noopener noreferrer"&gt;Building responsive APIs with Amazon API Gateway response streaming&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/python-3-14-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Python 3.14 runtime now available in AWS Lambda&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-serverless-applications-with-rust-on-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Building serverless applications with Rust on AWS Lambda&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/handle-unpredictable-processing-times-with-operational-consistency-when-integrating-asynchronous-aws-services-with-an-aws-step-functions-state-machine/" target="_blank" rel="noopener noreferrer"&gt;Handle unpredictable processing times with operational consistency when integrating asynchronous AWS services with an AWS Step Functions state machine&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-now-supports-java-25/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda now supports Java 25&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/enhancing-api-security-with-amazon-api-gateway-tls-security-policies/" target="_blank" rel="noopener noreferrer"&gt;Enhancing API security with Amazon API Gateway TLS security policies&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/improving-throughput-of-serverless-streaming-workloads-for-kafka/" target="_blank" rel="noopener noreferrer"&gt;Improving throughput of serverless streaming workloads for Kafka&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/build-scalable-rest-apis-using-amazon-api-gateway-private-integration-with-application-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;Build scalable REST APIs using Amazon API Gateway private integration with Application Load Balancer&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/serverless-strategies-for-streaming-llm-responses/" target="_blank" rel="noopener noreferrer"&gt;Serverless strategies for streaming LLM responses&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-multi-tenant-saas-applications-with-aws-lambdas-new-tenant-isolation-mode/" target="_blank" rel="noopener noreferrer"&gt;Building multi-tenant SaaS applications with AWS Lambda’s new tenant isolation mode&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/orchestrating-large-scale-document-processing-with-aws-step-functions-and-amazon-bedrock-batch-inference" target="_blank" rel="noopener noreferrer"&gt;Orchestrating large-scale document processing with AWS Step Functions and Amazon Bedrock batch inference&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/node-js-24-runtime-now-available-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Node.js 24 runtime now available in AWS Lambda&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Serverless Office Hours&lt;/h2&gt; 
&lt;p&gt;Join our livestream every Tuesday at 11 AM PT for live discussions, Q&amp;amp;A sessions, and deep dives into serverless technologies. Episodes are available on-demand at &lt;a href="https://serverlessland.com/office-hours" target="_blank" rel="noopener noreferrer"&gt;serverlessland.com/office-hours&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;October&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Oct 7 – &lt;a href="https://www.youtube.com/watch?v=XTVgHC7K2-s" target="_blank" rel="noopener noreferrer"&gt;Amazon API Gateway Routing Rules&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Oct 14 – &lt;a href="https://www.youtube.com/watch?v=eKN5TgxA4R8" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB Global Tables&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Oct 21 – &lt;a href="https://www.youtube.com/watch?v=ZGElhJmN_8o" target="_blank" rel="noopener noreferrer"&gt;Building agents with Amazon Bedrock AgentCore&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Oct 28 – &lt;a href="https://www.youtube.com/watch?v=mZ1xksrL8Lw" target="_blank" rel="noopener noreferrer"&gt;“What’s new with Observability&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;November&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Nov 4 – &lt;a href="https://www.youtube.com/watch?v=fTOg4FRFEZA" target="_blank" rel="noopener noreferrer"&gt;Getting your AI spec right!&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Nov 11 – &lt;a href="https://www.youtube.com/watch?v=RlG71WUZa7Q" target="_blank" rel="noopener noreferrer"&gt;Running Swift in AWS Lambda&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Nov 18 – &lt;a href="https://www.youtube.com/watch?v=N3uo__CCXKg" target="_blank" rel="noopener noreferrer"&gt;What’s new in EventCatalog&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Nov 24 – &lt;a href="https://www.youtube.com/watch?v=CwECZ4SHwQ4" target="_blank" rel="noopener noreferrer"&gt;pre:Invent 2025&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;December&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Dec 9 – &lt;a href="https://www.youtube.com/watch?v=b5VtHydva1A" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Managed Instances&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Dec 16 – &lt;a href="https://www.youtube.com/watch?v=giNnpHauWT0" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda durable functions&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Still looking for more?&lt;/h2&gt; 
&lt;p&gt;The&amp;nbsp;&lt;a href="http://aws.amazon.com/serverless" target="_blank" rel="noopener noreferrer"&gt;Serverless landing page&lt;/a&gt;&amp;nbsp;has overall information about building serverless applications. The&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/resources/?aws-lambda-resources-blog.sort-by=item.additionalFields.createdDate&amp;amp;aws-lambda-resources-blog.sort-order=desc" target="_blank" rel="noopener noreferrer"&gt;Lambda resources page&lt;/a&gt;&amp;nbsp;contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.&lt;/p&gt; 
&lt;p&gt;You can also&amp;nbsp;follow the Serverless Developer Advocacy team to see the latest news, follow conversations, and interact with the team.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Julian Wood:&amp;nbsp;&lt;a href="https://twitter.com/julian_wood" target="_blank" rel="noopener noreferrer"&gt;@julian_wood&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/julianrwood/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/julianrwood/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Eric Johnson:&amp;nbsp;&lt;a href="https://twitter.com/edjgeek" target="_blank" rel="noopener noreferrer"&gt;@edjgeek&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/singledigit/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/singledigit/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Gunnar Grosch: &lt;a href="https://x.com/GunnarGrosch" target="_blank" rel="noopener noreferrer"&gt;@GunnarGrosch&lt;/a&gt;, &lt;a href="https://se.linkedin.com/in/gunnargrosch" target="_blank" rel="noopener noreferrer"&gt;https://se.linkedin.com/in/gunnargrosch&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Erik Hanchet: &lt;a href="https://x.com/ErikCH" target="_blank" rel="noopener noreferrer"&gt;@ErikCH&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/erikhanchett/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/erikhanchett/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Salih Gueler: &lt;a href="https://x.com/salihgueler" target="_blank" rel="noopener noreferrer"&gt;@salihgueler&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/salihgueler/" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/salihgueler/&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Marcia Villalba:&amp;nbsp;&lt;a href="https://twitter.com/mavi888uy/" target="_blank" rel="noopener noreferrer"&gt;@mavi888uy&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/marciavillalba" target="_blank" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/marciavillalba&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;And finally, visit &lt;a href="http://serverlessland.com/" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;&amp;nbsp;for all your serverless needs.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>More room to build: serverless services now support payloads up to 1 MB</title>
		<link>https://aws.amazon.com/blogs/compute/more-room-to-build-serverless-services-now-support-payloads-up-to-1-mb/</link>
					
		
		<dc:creator><![CDATA[Anton Aleksandrov]]></dc:creator>
		<pubDate>Thu, 29 Jan 2026 22:16:14 +0000</pubDate>
				<category><![CDATA[Amazon EventBridge]]></category>
		<category><![CDATA[Amazon Simple Queue Service (SQS)]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Serverless]]></category>
		<guid isPermaLink="false">2de167a1befc19d6f6074428fc4217704a9fe6de</guid>

					<description>To support cloud applications that increasingly depend on rich contextual data, AWS is raising the maximum payload size from 256 KB to 1 MB for asynchronous AWS Lambda function invocations, Amazon Amazon SQS, and Amazon EventBridge. Developers can use this enhancement to build and maintain context-rich event-driven systems and reduce the need for complex workarounds such as data chunking or external large object storage.</description>
										<content:encoded>&lt;p&gt;To support cloud applications that increasingly depend on rich contextual data, AWS has raised the maximum payload size from 256 KB to 1 MB for asynchronous &lt;a href="https://aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt; function invocations, &lt;a href="https://aws.amazon.com/sqs/"&gt;Amazon Simple Queue Service&lt;/a&gt; (Amazon SQS), and &lt;a href="https://aws.amazon.com/eventbridge/"&gt;Amazon EventBridge&lt;/a&gt;. Developers can use this enhancement to build and maintain context-rich event-driven systems and reduce the need for complex workarounds such as data chunking or external large object storage.&lt;/p&gt; 
&lt;h1&gt;Overview&lt;/h1&gt; 
&lt;p&gt;Modern cloud applications rely on context-rich, structured data to drive intelligent behavior. Large language model (LLM) prompts, telemetry signals, personalization data, machine learning (ML) outputs, and user interaction logs are no longer simple strings. Instead, they’re typically complex, nested JSON or YAML objects carrying meaningful context. Previously, developers working with serverless services such as Amazon SQS, Lambda (asynchronous invocations and Amazon SQS event-source mapping), or EventBridge had to carefully manage their data to fit within the 256 KB payload size limit. This commonly meant chunking larger payloads, externalizing payloads to object stores such as &lt;a href="https://aws.amazon.com/s3/"&gt;Amazon S3&lt;/a&gt;, or using &lt;a href="https://aws.amazon.com/blogs/compute/optimizing-network-footprint-in-serverless-applications/"&gt;data compression&lt;/a&gt;. These workarounds added complexity and latency, creating edge cases that were difficult to monitor and debug.&lt;/p&gt; 
&lt;p&gt;With the recent launches, you can now transmit payloads up to 1 MB, significantly reducing the need for complex data chunking and architectural workarounds. This increased capacity streamlines design patterns, reduces operational overhead, and makes event-driven systems more intuitive to build and maintain. Developers can now include richer data in single payloads—from detailed LLM prompts and full system states to comprehensive context and complete transaction histories.&lt;/p&gt; 
&lt;p&gt;The new 1 MB payload size limit applies to asynchronous Lambda function invocations, whether you trigger them using either &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html"&gt;SQS event-source mapping&lt;/a&gt;, &lt;a href="https://aws.amazon.com/cli/"&gt;AWS Command Line Interface&lt;/a&gt; (AWS CLI), &lt;a href="https://builder.aws.com/build/tools"&gt;AWS SDKs&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/lambda/latest/api/API_Invoke.html"&gt;Lambda Invoke API&lt;/a&gt;, or AWS services such as EventBridge. The increased limit also extends to all messages and events flowing through Amazon SQS queues and EventBridge Event Buses.&lt;/p&gt; 
&lt;h1&gt;Getting started&lt;/h1&gt; 
&lt;p&gt;There’s nothing you need to do to get started. This enhancement is automatically applied to all new and existing Lambda functions, SQS queues, and EventBridge Event Buses.&lt;/p&gt; 
&lt;p&gt;If you were previously chunking data at 256KB (or lower) threshold, then you might need to make changes to your service configurations or business logic code to start using the new limit. For example, if you’ve explicitly set Amazon SQS &lt;strong&gt;MaximumMessageSize&lt;/strong&gt; attribute, then you might need to adjust it to a new desired value. Larger payloads might also result in higher costs, as described in the following section.&lt;/p&gt; 
&lt;h1&gt;Real-world example: rich event context in agentic event-driven architectures&lt;/h1&gt; 
&lt;p&gt;Event-driven architectures allow services to operate independently without centralized coordination. In these systems, comprehensive event context is essential. With the increased 1 MB payload limit, events can now carry more comprehensive data—from user profiles and order details to historical interactions. This enables services such as inventory, shipping, and notifications to act autonomously.&lt;/p&gt; 
&lt;p&gt;Consider the following example. In hospitality and quick-service industries, customer satisfaction depends on timely, thoughtful service recovery. When a guest submits negative feedback through a survey, review, or complaint form, service teams must gather context, interpret the issue, and craft a response. Traditionally, this meant manually piecing together visit logs, loyalty data, and prior complaints. Now, this can be fully automated using an AI agent powered by AWS serverless services and &lt;a href="https://aws.amazon.com/bedrock/"&gt;Amazon Bedrock&lt;/a&gt;, as shown in the following figure.&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/09/25/compute-2424-img1.png"&gt;&lt;img loading="lazy" class="aligncenter size-full wp-image-24614" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/09/25/compute-2424-img1.png" alt="" width="1313" height="609"&gt;&lt;/a&gt;Figure 1: Customer feedback processing pipeline&lt;/p&gt; 
&lt;p&gt;The workflow:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Receive&lt;/strong&gt;: A new review is submitted through the Review application and emitted as an event to EventBridge Event Bus.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Detect&lt;/strong&gt;: Event Bus delivers the event to downstream Feedback analysis agent. The agent running in a Lambda function recognizes the review as low-rating or complaint.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Enrich&lt;/strong&gt;: The agent collects the guest’s visit metadata, booking details, loyalty activity, and complaint history using attached MCP tools into a single structured JSON payload (up to 1 MB).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Queue&lt;/strong&gt;: The payload is sent to an SQS queue for further asynchronous processing by downstream components.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Generate&lt;/strong&gt;: A separate Lambda function polls messages from Amazon SQS and invokes an Amazon Bedrock model to analyze the full complaint context, draft a personalized response, suggest a gesture (such as a refund or credit), and classify issue severity.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Deliver&lt;/strong&gt;: The message is logged and sent to the customer, and to the service team for further analysis.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This use case demonstrates the importance of having a rich context: current and previous visits details, loyalty tier, prior interactions, and feedback history. Previously, teams had to offload pieces of context to Amazon S3 and reference them externally, adding latency and architectural complexity. The new 1 MB payload size means that all this information can be transported together, improving the serverless agentic workflow efficiency and streamlining maintenance.&lt;/p&gt; 
&lt;h1&gt;Best practices when using large payloads&lt;/h1&gt; 
&lt;p&gt;The following sections outline best practices that you should apply when using larger payloads.&lt;/p&gt; 
&lt;h2&gt;Performance considerations&lt;/h2&gt; 
&lt;p&gt;Monitor Lambda function memory usage carefully when working with larger payloads, because parsing and processing complex JSON objects can increase memory usage and execution duration. Test your systems thoroughly under load, especially for high-throughput applications, by benchmarking with realistic payload sizes and traffic patterns. Although the payload limit has increased to 1 MB, the Lambda 15-minute timeout and memory limits remain unchanged. When applicable, you can &lt;a href="https://aws.amazon.com/blogs/compute/optimizing-network-footprint-in-serverless-applications/"&gt;use compression&lt;/a&gt; to process even larger datasets efficiently, but remember to account for the added CPU overhead of compression and decompression in your performance calculations. Read the&lt;a href="https://aws.amazon.com/blogs/compute/monitoring-best-practices-for-event-delivery-with-amazon-eventbridge/"&gt; Monitoring best practices for event delivery with Amazon EventBridge&lt;/a&gt; post for more best practices to tune your event-driven architectures performances.&lt;/p&gt; 
&lt;h2&gt;Operational guidelines&lt;/h2&gt; 
&lt;p&gt;Configure &lt;a href="https://aws.amazon.com/what-is/dead-letter-queue/"&gt;dead-letter-queues&lt;/a&gt; (DLQ) to make sure that failed messages are retained for inspection and troubleshooting. This becomes especially important with larger payloads, because debugging complex data structures necessitates access to the complete message context. Implement robust error handling and retries to manage transient failures, particularly when processing rich payload content that may contain nested structures or complex relationships.&lt;/p&gt; 
&lt;p&gt;To further optimize throughput, you can batch similar smaller events together into a single payload. However, avoid mixing unrelated events and maintain clear boundaries between different business domains and processes.&lt;/p&gt; 
&lt;p&gt;Always make sure that your downstream dependencies are capable of handling larger payloads.&lt;/p&gt; 
&lt;h2&gt;When to use external storage&lt;/h2&gt; 
&lt;p&gt;Even with the increased 1 MB payload limit, there are scenarios where patterns such as &lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/StoreInLibrary.html"&gt;claim check&lt;/a&gt; remain a sound architectural choice. These patterns involve storing a full payload in an external system, such as Amazon S3, and passing a lightweight reference through your event stream. This approach continues to provide value when payloads exceed the new limit, when data needs to be reused by multiple consumers, or when strict governance, traceability, and security requirements are involved. For example, audit logs, image metadata, or large ML inference inputs may still surpass the 1 MB boundary, even when compressed. Instead of risking truncation or fragmentation, a claim check enables consistent, scalable access to the complete data set.&lt;/p&gt; 
&lt;p&gt;You can use open source libraries such as the &lt;a href="https://github.com/aws/eventbridge-kafka-connector"&gt;Kafka sink connector for EventBridge&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-managing-large-messages.html"&gt;Amazon SQS Extended Client Library&lt;/a&gt; (available for Python and Java) that abstract complexities of storing large objects in external storage.&lt;/p&gt; 
&lt;h2&gt;Cost management&lt;/h2&gt; 
&lt;p&gt;Although larger payloads enable richer context in your applications, logging full payloads can increase storage and processing costs. Services such as CloudWatch Logs charge based on data volume, thus implementing selective logging, payload truncation, or sampling becomes crucial for high-volume events. Consider logging only essential fields or implementing smart sampling strategies based on business importance.&lt;/p&gt; 
&lt;p&gt;For full payload archival and retention, evaluate cost-effective storage solutions such as Amazon S3 with appropriate lifecycle policies. This can include moving older logs to cheaper storage tiers or implementing automated cleanup procedures for non-critical data. Balance your retention needs with cost optimization by defining clear policies for what data needs to be kept and for how long.&lt;/p&gt; 
&lt;p&gt;Review the pricing pages for &lt;a href="https://aws.amazon.com/lambda/pricing/"&gt;AWS Lambda&lt;/a&gt;, &lt;a href="https://aws.amazon.com/eventbridge/pricing/"&gt;Amazon EventBridge&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/sqs/pricing/"&gt;Amazon SQS&lt;/a&gt; to learn about the costs of delivering and processing events and messages.&lt;/p&gt; 
&lt;h1&gt;Conclusion&lt;/h1&gt; 
&lt;p&gt;The increase in maximum payload size from 256 KB to 1 MB enables developers to build more efficient distributed architectures. You can use this enhancement to transport richer context in event and message payloads, reducing the need for complex workarounds that previously added architectural complexity and operational overhead. This added room to transmit rich context means that you can streamline your workflows, improve observability, and reduce architectural complexity whether using choreography or orchestration patterns.&lt;/p&gt; 
&lt;p&gt;Go to the developer guides for &lt;a href="https://docs.aws.amazon.com/lambda/"&gt;AWS Lambda&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html"&gt;Amazon EventBridge&lt;/a&gt;, and &lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html"&gt;Amazon SQS,&lt;/a&gt; to learn more about how to take advantage of this update.&lt;/p&gt; 
&lt;p&gt;To learn more about serverless architectures, visit &lt;a href="https://serverlessland.com/"&gt;Serverless Land&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Simplify network segmentation for AWS Outposts racks with multiple local gateway routing domains</title>
		<link>https://aws.amazon.com/blogs/compute/simplify-network-segmentation-for-aws-outposts-racks-with-multiple-local-gateway-routing-domains/</link>
					
		
		<dc:creator><![CDATA[Brianna Rosentrater]]></dc:creator>
		<pubDate>Fri, 16 Jan 2026 18:49:35 +0000</pubDate>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Outposts rack]]></category>
		<guid isPermaLink="false">4043a52fea844cf29608e9ad0cbbd0e13a14705d</guid>

					<description>AWS now supports multiple local gateway (LGW) routing domains on AWS Outposts racks to simplify network segmentation. Network segmentation is the practice of splitting a computer network into isolated subnetworks, or network segments. This reduces the attack surface so that if a host on one network segment is compromised, the hosts on the other network segments are not affected. Many customers in regulated industries such as manufacturing, health care and life sciences, banking, and others implement network segmentation as part of their on-premises network security standards to reduce the impact of a breach and help address compliance requirements.</description>
										<content:encoded>&lt;p&gt;AWS now supports multiple &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html" target="_blank" rel="noopener noreferrer"&gt;local gateway (LGW) routing domains&lt;/a&gt; on &lt;a href="https://aws.amazon.com/outposts/rack/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts racks&lt;/a&gt; to simplify network segmentation. Network segmentation is the practice of splitting a computer network into isolated subnetworks, or network segments. This reduces the attack surface so that if a host on one network segment is compromised, the hosts on the other network segments are not affected. Many customers in regulated industries such as manufacturing, health care and life sciences, banking, and others implement network segmentation as part of their on-premises network security standards to reduce the impact of a breach and help address compliance requirements. Some AWS services also have network requirements that specify certain IP ranges to be used for endpoints, and may or may not support &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/routing.html#ip-addressing" target="_blank" rel="noopener noreferrer"&gt;customers bringing their own IP pool&lt;/a&gt; (also called CoIP routing, see &lt;a href="https://aws.amazon.com/blogs/compute/how-to-choose-between-coip-and-direct-vpc-routing-modes-on-aws-outposts-rack/" target="_blank" rel="noopener noreferrer"&gt;How to choose between CoIP and Direct VPC routing (DVR) modes on AWS Outposts rack&lt;/a&gt; for more information). Customers want the flexibility to use both routing modes (CoIP and DVR) on the same logical Outpost. With this new feature, AWS Outposts racks now support multiple LGW routing domains to meet subnetwork isolation and cloud service network requirements in an on-premises environment. For example, a leading automotive company&amp;nbsp;deploys&amp;nbsp;latency-sensitive manufacturing workloads on Outposts racks in a multi-AZ architecture for resiliency. This feature provides traffic separation between routing domains and enables both customer-owned IP (CoIP) and direct VPC routing (DVR) modes on the same logical Outpost.&lt;/p&gt; 
&lt;p&gt;In this post you will learn how to use multiple LGW routing domains on Outposts racks and considerations for implementation.&lt;/p&gt; 
&lt;h2&gt;Overview&lt;/h2&gt; 
&lt;p&gt;With the introduction of multiple LGW routing domains on Outposts, you can now create multiple routing domains and associate one or more VLANs with each routing domain. This allows you to integrate your Outposts rack into your existing on-premises network schema. Each LGW routing domain will have a unique &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/vif-vif-groups.html" target="_blank" rel="noopener noreferrer"&gt;LGW Virtual Interface (VIF) Group&lt;/a&gt; and an &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing.html" target="_blank" rel="noopener noreferrer"&gt;LGW Route Table&lt;/a&gt;, enabling logical network traffic isolation. You can have a mix of up to 10 active routing domains with route tables using either DVR or CoIP routing mode, and you can make changes to these routing domains as needed in a self-service fashion allowing for network flexibility as architectures are updated over time. These settings can be found in the AWS Outposts console under the &lt;strong&gt;Networking&lt;/strong&gt; tab in the menu.&lt;/p&gt; 
&lt;p&gt;The following diagram shows an example of 3 VPCs, each with at least 1 subnet on the Outpost rack, and each VPC corresponds to its own routing domain. Each routing domain can then be associated with one or more VLANs, and one or more VPCs. You can only associate a VPC to one LGW routing domain per Outpost.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-1.png" alt="Architecture diagram showing 3 routing domains uplinking to an on-premises network."&gt;&lt;/p&gt; 
&lt;p&gt;Figure 1 – Architecture diagram showing 3 routing domains&lt;/p&gt; 
&lt;h2&gt;Walkthrough&lt;/h2&gt; 
&lt;p&gt;Before creating a LGW routing domain, first you’ll need to &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html#vif-best-practices" target="_blank" rel="noopener noreferrer"&gt;create an LGW VIF group&lt;/a&gt; and an LGW route table. A local gateway routing domain is the association of a local gateway route table and local gateway VIF group. Each VIF group can be associated with one or more VLANs, but a route table can only be associated with one VIF group.&lt;/p&gt; 
&lt;p&gt;To create a LGW VIF Group, navigate to the AWS Outposts console, go to &lt;strong&gt;LGW virtual interfaces groups&lt;/strong&gt;, and select &lt;strong&gt;Create VIF group&lt;/strong&gt;. Enter your VIF details which include &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/outposts-rack2ndgen-local-rack.html#local-gateway-bgp-connectivity" target="_blank" rel="noopener noreferrer"&gt;BGP and VLAN routing information&lt;/a&gt;, you must create 4 LGW VIFs per VIF group.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-2.png" alt="Creating VIF group for RD1 routing domain"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 2 – Creating VIF group for RD1 routing domain&lt;/p&gt; 
&lt;p&gt;After creating your VIF group, create a LGW route table. You’ll have the option to use &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing.html#direct-vpc-routing" target="_blank" rel="noopener noreferrer"&gt;Direct VPC Routing (DVR)&lt;/a&gt; or &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing.html#ip-addressing" target="_blank" rel="noopener noreferrer"&gt;Customer-owned IP address pool (CoIP)&lt;/a&gt; routing. If CoIP routing is selected, you’ll have the option to enter your CIDR before creating. A LGW route table’s routing mode cannot be changed after creating. However, you can disassociate a LGW route table from a VIF group and attach a new route table if you need to change the routing mode of a VIF group.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-3.png"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 3 – Creating LGW route table for RD1 routing domain&lt;/p&gt; 
&lt;p&gt;After you’ve created your LGW route table and VIF group, you can proceed to the final step which is to &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html#creating-routing-domains" target="_blank" rel="noopener noreferrer"&gt;create your LGW routing domain&lt;/a&gt; where you will associate the LGW route table and VIF group.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-4.png" alt="Create LGW routing domain form for RD1 example"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 4 – Creating LGW routing domain for RD1&lt;/p&gt; 
&lt;p&gt;You can view and create up to 10 active routing domains through the AWS Outposts console under the &lt;strong&gt;Networking&lt;/strong&gt; tab.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/12/computeblog-2512-images-5.png"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 5 – Local Gateway (LGW) routing domains&lt;/p&gt; 
&lt;h2&gt;Considerations&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;Multiple LGW routing domains feature is only available on &lt;a href="https://aws.amazon.com/blogs/aws/announcing-second-generation-aws-outposts-racks-with-breakthrough-performance-and-scalability-on-premises/" target="_blank" rel="noopener noreferrer"&gt;second-generation Outposts racks&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Avoid overlapping IP addresses across subnetworks and local routing domains as those can create IP routing conflicts.&lt;/li&gt; 
 &lt;li&gt;A VIF group can only be associated to one LGW route table/routing domain at a time. A routing domain is the association of a VIF group and LGW route table.&lt;/li&gt; 
 &lt;li&gt;LGW routing domain will allow for logical local network traffic isolation, however all traffic will still travel across your &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/local-rack.html#link-aggregation" target="_blank" rel="noopener noreferrer"&gt;local gateway Link Aggregation Control Protocol (LACP) Link Aggregation Group (LAG)&lt;/a&gt; to uplink into your on-premises network.&lt;/li&gt; 
 &lt;li&gt;Additional network isolation can be achieved through &lt;a href="https://secure.cisco.com/secure-firewall/docs/virtual-routing-and-forwarding" target="_blank" rel="noopener noreferrer"&gt;Virtual Routing and Forwarding (VRF)&lt;/a&gt; on Cisco platforms or &lt;a href="https://www.juniper.net/documentation/us/en/software/junos/routing-overview/topics/concept/routing-instances-overview.html" target="_blank" rel="noopener noreferrer"&gt;Routing Instances&lt;/a&gt; on Juniper equipment, providing logical separation of routing tables and enabling secure multi-tenancy within the same physical infrastructure.&lt;/li&gt; 
 &lt;li&gt;You can only associate a VPC to one LGW routing domain per Outpost. You can self-serve to change VPC association as needed. Multiple on-premises &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/local-rack.html#vlans" target="_blank" rel="noopener noreferrer"&gt;VLANs&lt;/a&gt; can be connected to a single routing domain.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;This post demonstrated how to configure multiple local routing domains on Outposts racks to integrate into your on-premises network. For more information see &lt;a href="https://docs.aws.amazon.com/outposts/latest/network-userguide/routing-domains.html"&gt;LGW routing domains&lt;/a&gt; section in the AWS Outposts user guide. Reach out to your AWS account team to learn more about Outposts racks network configuration options.&lt;/p&gt; 
&lt;p&gt;In addition to multiple LGW routing domains, we have also announced several updates to Outposts in the past week to help you meet digital sovereignty and local data processing needs. To learn more, read the following announcements:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li style="list-style-type: none"&gt; 
  &lt;ul&gt; 
   &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/opening-the-aws-european-sovereign-cloud/"&gt;AWS Outposts as an option to extend the AWS European Sovereign Cloud&lt;/a&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/01/amazon-s3-second-generation-aws-outposts-racks/"&gt;Amazon S3 on Outposts now available on second-generation Outposts racks&lt;/a&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/01/second-generation-aws-outposts-racks-additional-aws-regions/"&gt;Second-generation Outposts racks now supported in the South America (São Paulo) and Europe (Stockholm) Regions&lt;/a&gt;&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;To discuss Outposts with an expert on any of these topics, submit &lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;this form&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Optimizing storage performance for Amazon EKS on AWS Outposts</title>
		<link>https://aws.amazon.com/blogs/compute/optimizing-storage-performance-for-amazon-eks-on-aws-outposts/</link>
					
		
		<dc:creator><![CDATA[Arun Kumar]]></dc:creator>
		<pubDate>Tue, 13 Jan 2026 18:57:12 +0000</pubDate>
				<category><![CDATA[Amazon Elastic Block Store (Amazon EBS)]]></category>
		<category><![CDATA[Amazon Elastic File System (EFS)]]></category>
		<category><![CDATA[Amazon Elastic Kubernetes Service]]></category>
		<category><![CDATA[AWS Outposts]]></category>
		<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Amazon EBS]]></category>
		<category><![CDATA[Amazon EFS]]></category>
		<category><![CDATA[Amazon EKS]]></category>
		<category><![CDATA[Amazon S3]]></category>
		<guid isPermaLink="false">2e302008fe1896e9f4a550585f79afd24a8f81e9</guid>

					<description>&lt;a href="https://aws.amazon.com/eks/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (Amazon EKS)&lt;/a&gt; on 
&lt;a href="https://aws.amazon.com/outposts/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts&lt;/a&gt; brings the power of managed 
&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-concepts.html" target="_blank" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; to your on-premises infrastructure. Use Amazon EKS on Outposts rack to create hybrid cloud deployments that maintain consistent AWS experiences across environments. As organizations increasingly adopt edge computing and hybrid architectures, storage optimization and performance tuning become critical for successful workload deployment.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://aws.amazon.com/eks/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (Amazon EKS)&lt;/a&gt; on &lt;a href="https://aws.amazon.com/outposts/" target="_blank" rel="noopener noreferrer"&gt;AWS Outposts&lt;/a&gt; brings the power of managed &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-concepts.html" target="_blank" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; to your on-premises infrastructure. Use Amazon EKS on Outposts rack to create hybrid cloud deployments that maintain consistent AWS experiences across environments. As organizations increasingly adopt edge computing and hybrid architectures, storage optimization and performance tuning become critical for successful workload deployment.&lt;/p&gt; 
&lt;p&gt;Outposts extend AWS infrastructure, services, APIs, and tools to virtually any datacenter, co-location space, or on-premises facility. In this blog post you will learn about your storage options and their performance characteristics which is essential for building resilient, high-performing applications using Amazon EKS on Outposts.&lt;/p&gt; 
&lt;h2&gt;Amazon EKS on Outposts deployment options&lt;/h2&gt; 
&lt;p&gt;The following two sections outline the differences between &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/eks-outposts.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EKS extended and local cluster deployment options available on Outposts&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Amazon EKS extended cluster architecture&lt;/h3&gt; 
&lt;p&gt;Amazon EKS extended clusters on Outposts provide a powerful solution for organizations seeking to use the benefits of Kubernetes while maintaining certain workloads on-premises, as shown in the following figure. This hybrid architecture allows businesses to extend their EKS clusters from the AWS Cloud to their own data centers or edge locations using Outposts. The Kubernetes control plane remains in the &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Region&lt;/a&gt;, providing centralized management and benefiting from the AWS infrastructure in the cloud and on the Outpost.&lt;/p&gt; 
&lt;p&gt;Outposts is designed to be a connected service, and needs reliable&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;network connectivity to the AWS Region using the &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/service-links.html" target="_blank" rel="noopener noreferrer"&gt;Outposts service link&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic1.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25558 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic1.png" alt="" width="1430" height="1698"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 1 : Extended cluster&lt;/p&gt; 
&lt;h3&gt;Amazon EKS local cluster architecture&lt;/h3&gt; 
&lt;p&gt;Amazon EKS local clusters deploy the Kubernetes control plane on your Outpost, as shown in the following figure. This provides greater network resilience against outages as cluster operations run entirely on the Outposts and reduces the dependency on network connectivity to the AWS Region. Having the Kubernetes control plane hosted on your Outpost also reduces latency for cluster operations.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic2.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25557 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic2.png" alt="" width="1430" height="1925"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&amp;nbsp; Figure 2: Local cluster&lt;/p&gt; 
&lt;h3&gt;&lt;/h3&gt; 
&lt;h2&gt;Storage options for Amazon EKS extended clusters on Outposts&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/" target="_blank" rel="noopener noreferrer"&gt;Persistent Volumes (PV)&lt;/a&gt; and &lt;a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/" target="_blank" rel="noopener noreferrer"&gt;Persistent Volume Claims (PVC)&lt;/a&gt; serve as a critical abstraction layer in Kubernetes, separating the storage consumption details from storage provisioning, and allowing administrators to manage storage resources independently from how applications consume them. PVs and PVCs make sure of data persistence across pod restarts and rescheduling events, making them essential for applications that need to maintain state, such as databases, file storage systems, and other data-intensive workloads. The abstraction provided by PV and PVC enables platform-agnostic storage management, where applications can request storage through PVCs without needing to know the underlying storage implementation details. PVs and PVCs support dynamic provisioning through &lt;a href="https://kubernetes.io/docs/concepts/storage/storage-classes/" target="_blank" rel="noopener noreferrer"&gt;Storage Classes&lt;/a&gt;, allowing for automated storage allocation based on application demands, while also providing features such as access modes, capacity management, and reclaim policies to effectively manage the storage lifecycle in a Kubernetes cluster.&lt;/p&gt; 
&lt;h3&gt;Integrating Amazon EBS with Amazon EKS&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ebs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Block Store (Amazon EBS)&lt;/a&gt; provides high-performance block storage that’s ideal for low-latency applications providing consistent performance. When deployed on Outposts racks, EBS volumes are stored on the Outposts hardware, providing significant performance advantages over network-attached storage solutions, as shown in the following figure.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic3.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25556 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic3.png" alt="" width="1430" height="1961"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 3 : Integrating Amazon EBS with Amazon EKS on Outposts&lt;/p&gt; 
&lt;h3&gt;Benefits and use cases&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Storage&lt;/strong&gt;: EBS volumes on Outposts racks provide data access without dependency on external connectivity.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Performance:&lt;/strong&gt; Local storage delivers consistent latency and high IOPS/throughput.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost: &lt;/strong&gt;On-premises storage eliminates data transfer costs and reduces bandwidth needs, lowering the total cost of ownership.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Implementation considerations&lt;/h3&gt; 
&lt;p&gt;Consider the following when using EBS on Outposts rack:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;EBS volumes on Outposts are tied to a single rack and the availability zone the Outpost is homed to, needing applications to address single-point-of-failure risks.&lt;/li&gt; 
 &lt;li&gt;Protect data using EBS snapshots in the parent Region and schedule regular backups.&lt;/li&gt; 
 &lt;li&gt;Capacity on Outposts is finite, monitor Outposts storage usage and plan expansions proactively to avoid insufficient capacity errors.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Refer to &lt;a href="https://github.com/kubernetes-sigs/aws-ebs-csi-driver/tree/master/examples/kubernetes/dynamic-provisioning" target="_blank" rel="noopener noreferrer"&gt;Dynamic Volume Provisioning&lt;/a&gt;&amp;nbsp;to learn more about deploying pod with the EBS volume attached.&lt;/p&gt; 
&lt;h3&gt;Amazon EFS with Amazon EKS&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/efs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic File System (Amazon EFS)&lt;/a&gt; provides scalable, shared file storage that can be accessed across multiple AWS Availability Zones (AZs) and on-premises environments. Although Amazon EFS with Amazon EKS on Outposts maintains the same setup procedures as standard cloud deployments, there is a critical dependency on the &lt;a href="https://docs.aws.amazon.com/outposts/latest/server-userguide/service-links.html" target="_blank" rel="noopener noreferrer"&gt;service link&lt;/a&gt; connection between your Outposts and the AWS Region. Amazon EFS is not a locally supported service on Outposts, so connectivity to the AWS Region is required to use this service with your Outpost.&lt;/p&gt; 
&lt;p&gt;Amazon EFS allows multiple pods to concurrently access shared file systems. It is well-suited for applications that need collaborative data access, content management, and distributed processing workloads.&lt;/p&gt; 
&lt;h4&gt;Amazon EFS as a persistent storage solution for Amazon EKS extended cluster instances&lt;/h4&gt; 
&lt;p&gt;Amazon EFS as a PV for your Amazon EKS extended cluster operates through a hybrid architecture where the Amazon EFS file system resides in the Region, but mount points can be created on the worker nodes running on Outposts subnets through the service link as shown in the following figure.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic4.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25555 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic4.png" alt="" width="1430" height="1897"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 4 : Amazon EFS as a persistent storage solution for extended clusters&lt;/p&gt; 
&lt;h4&gt;Benefits and use cases&lt;/h4&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Shared storage capabilities&lt;/strong&gt;: multiple pods can access a centralized file system, enabling shared data, code, and assets across instances.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: storage capacity and performance automatically scale with usage, eliminating manual provisioning and upfront planning.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Compliance&lt;/strong&gt;: Amazon EFS provides full file system features and compatibility for traditional applications, such as locking, permissions, and directory structure.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h4&gt;Challenges and limitations&lt;/h4&gt; 
&lt;p&gt;Consider the following when using Amazon EFS with Outposts:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Network latency: file access involves network traversal to Amazon EFS in the Region, adding more latency and making small or metadata operations potentially slow for latency-sensitive applications.&lt;/li&gt; 
 &lt;li&gt;Throughput: aggregate throughput is restricted by the available bandwidth on the service link between the Outposts and AWS Region. This impacts concurrent access and large file transfers during peak usage.&lt;/li&gt; 
 &lt;li&gt;Dependency on AWS Region connectivity: Amazon EFS needs continuous connectivity to the parent Region. Disruptions may affect file system availability, operations, and disaster recovery processes.&lt;/li&gt; 
 &lt;li&gt;Data Transfer charges: Since EFS is in AWS Parent region and EKS worker nodes and pods are in Outpost additional charges are applicable.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;You can refer to&amp;nbsp;&lt;a href="https://aws.amazon.com/efs/features/" target="_blank" rel="noopener noreferrer"&gt;Amazon EFS Features&lt;/a&gt; and &lt;a href="https://aws.amazon.com/efs/when-to-choose-efs/" target="_blank" rel="noopener noreferrer"&gt;When to Choose Amazon EFS&lt;/a&gt;&amp;nbsp;for&amp;nbsp;more detailed insights into its capabilities and use cases.&lt;/p&gt; 
&lt;h4&gt;Deploying pods on extended clusters using Amazon EFS as PV&lt;/h4&gt; 
&lt;p&gt;Refer to &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html" target="_blank" rel="noopener noreferrer"&gt;Use Elastic File System Storage with Amazon EFS&lt;/a&gt; for deployment guidance. Note, Create Amazon EFS mount targets in subnets that are in the same Availability Zone (AZ) as the Outposts subnets.&lt;/p&gt; 
&lt;h3&gt;Amazon S3 with Amazon EKS extended cluster&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&lt;/a&gt;&amp;nbsp;on Outposts delivers local object storage on your Outposts, allowing applications to use Amazon S3 APIs for storing and retrieving data while keeping it onsite. It is ideal for workloads that need Amazon S3 compatibility, low latency access to object data, and local data residency.&lt;/p&gt; 
&lt;p&gt;You should use Amazon S3 access point &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Names (ARNs)&lt;/a&gt; and not bucket ARNs for proper integration with Amazon EKS workloads.&lt;/p&gt; 
&lt;p&gt;Learn more about &lt;a href="https://aws.amazon.com/s3/outposts/" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 on Outposts&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic5.png"&gt;&lt;img loading="lazy" class="aligncenter wp-image-25554 size-full" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/06/pic5.png" alt="" width="1368" height="1980"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 5 : Amazon S3 with Amazon EKS extended cluster on Outposts&lt;/p&gt; 
&lt;h3&gt;Benefits and use cases&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Data archiving and compliance: Enables cost-effective, locally retained storage for logs, audit trails, regulatory compliance, backups, and sensitive healthcare data with strict residency requirements.&lt;/li&gt; 
 &lt;li&gt;Content distribution and media: Provides ultra-low latency local storage for serving static content, media streaming, digital asset management, and gaming asset delivery.&lt;/li&gt; 
 &lt;li&gt;Data lake and analytics: Supports local data processing for analytics, ETL, machine learning (ML), real-time Internet of Things (IoT) data handling, and business intelligence with reduced latency and transfer costs.&lt;/li&gt; 
 &lt;li&gt;Application integration: Seamlessly integrates with Amazon S3 compatible apps for backup, synchronization, microservices storage, API-driven workflows, and container image management on-premises.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Refer to&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/s3-outposts/S3OnOutpostsRestrictionsLimitations.html" target="_blank" rel="noopener noreferrer"&gt;How is Amazon S3 on Outposts different from Amazon S3&lt;/a&gt; and the&amp;nbsp;&lt;a href="https://aws.amazon.com/s3/storage-classes/#topic-6" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 on Outposts&lt;/a&gt;&amp;nbsp;documentation to learn more.&lt;/p&gt; 
&lt;h4&gt;Deploying pods on extended clusters using Amazon S3 as PV&lt;/h4&gt; 
&lt;p&gt;&lt;strong&gt;Step 1: &lt;/strong&gt;Create Amazon S3 on Outposts bucket&lt;br&gt; &lt;strong&gt;Step 2: &lt;/strong&gt;Create Amazon S3 Access Point (necessary for Amazon EKS integration)&lt;br&gt; &lt;strong&gt;Step 3: &lt;/strong&gt;Configure IAM roles and policies&lt;br&gt; &lt;strong&gt;Step 4: &lt;/strong&gt;Install Amazon S3 CSI driver&lt;br&gt; &lt;strong&gt;Step 5:&amp;nbsp;&lt;/strong&gt;Deploying your pod with Amazon S3 volume attached&lt;br&gt; &lt;strong&gt;Step 6:&lt;/strong&gt; Complete Amazon S3 configuration with Kubernetes&lt;/p&gt; 
&lt;p&gt;Refer to the documentation &lt;a href="https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/outpost_bucket.yaml" target="_blank" rel="noopener noreferrer"&gt;Static Provisioning on Outposts bucket&lt;/a&gt; for more details on Step 5.&lt;/p&gt; 
&lt;h2&gt;Best practices for optimizing performance&lt;/h2&gt; 
&lt;p&gt;Optimizing performance starts with selecting the right storage type for your workload: Amazon EBS for low-latency, high-throughput block storage; Amazon EFS for shared POSIX-compliant file systems; and Amazon S3 for scalable object storage with API compatibility. Ensure proper volume sizing, monitor usage proactively, and configure CPU and memory requests accurately to balance performance and efficiency—auto scaling and QoS classes can further optimize resource management. Improve data locality by using local storage, apply caching with intelligent eviction, and design for efficient, asynchronous, and compressed data access patterns.&lt;/p&gt; 
&lt;h2&gt;Monitoring and observability&lt;/h2&gt; 
&lt;p&gt;Monitoring key performance metrics is essential to maintain storage efficiency and application reliability. For Amazon EBS, track IOPS, throughput, latency, burst balance, queue depth, and snapshot performance to avoid degradation—see the &lt;a href="https://docs.aws.amazon.com/outposts/latest/userguide/outposts-cloudwatch-metrics.html#metrics-ebs" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch metrics for Amazon EBS&lt;/a&gt; for the full list. For Amazon EFS, monitor total I/O, throughput, client connections, metadata operations, burst credits, and Regional data transfers to support effective capacity planning—refer to &lt;a href="https://docs.aws.amazon.com/efs/latest/ug/efs-metrics.html" target="_blank" rel="noopener noreferrer"&gt;CloudWatch metrics for Amazon EFS&lt;/a&gt;. For Amazon S3, observe request and error rates, data transfer, storage usage, latency, multipart upload efficiency, and access patterns to optimize performance and cost—see &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html" target="_blank" rel="noopener noreferrer"&gt;Metrics and dimensions&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Security considerations&lt;/h2&gt; 
&lt;p&gt;Strong security practices are critical for Amazon EKS on Outposts. Use &lt;a href="https://aws.amazon.com/kms/" target="_blank" rel="noopener noreferrer"&gt;AWS Key Management Service (AWS KMS)&lt;/a&gt; for Amazon EBS encryption, encrypt Amazon EFS data at rest and in transit, and enable server- or client-side encryption for Amazon S3. Enforce TLS for all data transfers and apply key rotation with compliance controls. Implement least privilege IAM policies, scoped roles, and Kubernetes Role-Based Access Control (RBAC) for granular pod access. Secure traffic with security groups and NACLs, and maintain audit logs for all storage operations.&lt;/p&gt; 
&lt;h2&gt;Cost optimization strategies&lt;/h2&gt; 
&lt;p&gt;Manage storage costs by right-sizing volumes, automating lifecycle policies, selecting appropriate storage classes, monitoring data transfer, and using de-duplication and compression where applicable. Lower operational expenses through automated backups, infrastructure as code (IaC), monitoring automation, leveraging managed services, applying cost allocation tags, and conducting regular usage reviews.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Amazon EKS on Outposts empowers organizations to build hybrid applications with storage options that align to performance, compliance, and data residency needs. By selecting the right storage solution for each workload and leveraging Outposts’ local infrastructure, you can reduce latency, minimize network dependencies, and maintain consistency across environments. As Outposts capabilities continue to evolve, they offer a strong foundation for modern, resilient, and cost-efficient hybrid cloud architectures.&lt;/p&gt; 
&lt;p&gt;Reach out to your AWS account team, or fill out this&amp;nbsp;&lt;a href="https://pages.awscloud.com/GLOBAL_PM_LN_outposts-features_2020084_7010z000001Lpcl_01.LandingPage.html" target="_blank" rel="noopener noreferrer"&gt;form&lt;/a&gt;&amp;nbsp;to learn more about running containarized applications on Outposts.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>.NET 10 runtime now available in AWS Lambda</title>
		<link>https://aws.amazon.com/blogs/compute/net-10-runtime-now-available-in-aws-lambda/</link>
					
		
		<dc:creator><![CDATA[Henrique Graca]]></dc:creator>
		<pubDate>Thu, 08 Jan 2026 21:01:05 +0000</pubDate>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS .NET Development]]></category>
		<category><![CDATA[AWS CLI]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<guid isPermaLink="false">1c8c9d6ca1873a4a839fea208f2118142da92fc8</guid>

					<description>Amazon Web Services (AWS) Lambda now supports .NET 10 as both a managed runtime and base container image. .NET is a popular language for building serverless applications. Developers can now use the new features and enhancements in .NET when creating serverless applications on Lambda. This includes support for file-based apps to streamline your projects by implementing functions using just a single file.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; now supports .NET 10 as both a managed runtime and base container image. .NET is a popular language for building serverless applications. Developers can now use the new features and enhancements in .NET when creating serverless applications on Lambda. This includes support for file-based apps to streamline your projects by implementing functions using just a single file.&lt;/p&gt; 
&lt;p&gt;.NET 10 delivers runtime and compiler optimizations including enhancements to the JIT compiler and improvements to Native AOT that reduce executable size and startup time. For details of the .NET 10 features, you can go to the &lt;a href="https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotnet-10/overview" target="_blank" rel="noopener noreferrer"&gt;.NET 10 overview&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;You can develop Lambda functions in .NET 10 using the &lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;, &lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI),&lt;/a&gt; &lt;a href="https://aws.amazon.com/visualstudio/" target="_blank" rel="noopener noreferrer"&gt;AWS Toolkit for Visual Studio&lt;/a&gt;, &lt;a href="https://github.com/aws/aws-extensions-for-dotnet-cli" target="_blank" rel="noopener noreferrer"&gt; AWS Extensions for .NET CLI (Amazon.Lambda.Tools)&lt;/a&gt;, &lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;AWS Serverless Application Model (AWS SAM),&lt;/a&gt; &lt;a href="https://aws.amazon.com/cdk/" target="_blank" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (AWS CDK),&lt;/a&gt; and other infrastructure as code (IaC) tools.&lt;/p&gt; 
&lt;p&gt;You can also use .NET 10 with &lt;a href="https://docs.powertools.aws.dev/lambda/dotnet/" target="_blank" rel="noopener noreferrer"&gt;Powertools for AWS Lambda (.NET)&lt;/a&gt;, a developer toolkit that helps you implement serverless best practices. Use cases include observability, batch processing, &lt;a href="https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html" target="_blank" rel="noopener noreferrer"&gt;AWS Systems Manager Parameter Store&lt;/a&gt; integration, idempotency, and more.&lt;/p&gt; 
&lt;p&gt;This post demonstrates what’s new in the .NET 10 Lambda runtime and how you can use the new .NET 10 runtime in your serverless applications.&lt;/p&gt; 
&lt;h2&gt;File-based C# applications&lt;/h2&gt; 
&lt;p&gt;.NET 10 introduces file-based apps, which are programs contained in a single &lt;code&gt;.cs&lt;/code&gt; file, without a &lt;code&gt;.csproj&lt;/code&gt; file or a complex folder structure. File-based apps are an ideal way to streamline the development and management of .NET Lambda functions. They are fully supported by the Lambda .NET 10 runtime and associated developer tooling.&lt;/p&gt; 
&lt;h3&gt;Creating C# file-based apps&lt;/h3&gt; 
&lt;p&gt;The fastest way to get started creating a C# file-based Lambda function is to use the &lt;code&gt;&lt;a href="https://www.nuget.org/packages/Amazon.Lambda.Templates" target="_blank" rel="noopener noreferrer"&gt;Amazon.Lambda.Templates&lt;/a&gt;&lt;/code&gt; package. Version 8.0.1 of the package adds the &lt;code&gt;lambda.FileBased&lt;/code&gt; template as well as updating the rest of the templates in the package to .NET 10.&lt;/p&gt; 
&lt;p&gt;Install the package by running the following command:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;dotnet new install Amazon.Lambda.Templates&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Create a new C# file-based Lambda function by running the following command:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;dotnet new lambda.FileBased -n MyLambdaFunction&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;This creates a file in the current directory called &lt;code&gt;MyLambdaFunction.cs&lt;/code&gt; with all of the required startup code necessary for a Lambda function. The following is the starting content of the file:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;// C# file-based Lambda functions can be deployed to Lambda using the 
// .NET Tool Amazon.Lambda.Tools version 6.0.0 or later. 
// 
// Command to install Amazon.Lambda.Tools 
//   dotnet tool install -g Amazon.Lambda.Tools 
// 
// Command to deploy function 
//    dotnet lambda deploy-function &amp;lt;lambda-function-name&amp;gt; MyLambdaFunction.cs 
// 
// Command to package function 
//    dotnet lambda package MyLambdaFunction.zip MyLambdaFunction.cs 
 
 
#:package Amazon.Lambda.Core@2.8.0 
#:package Amazon.Lambda.RuntimeSupport@1.14.1 
#:package Amazon.Lambda.Serialization.SystemTextJson@2.4.4 
 
// Explicitly setting TargetFramework here is done to avoid 
// having to specify it when packaging the function with Amazon.Lambda.Tools 
#:property TargetFramework=net10.0 
 
// By default File-based C# apps publish as Native AOT. When packaging Lambda function 
// unless the host machine is Amazon Linux a container build will be required. 
// Amazon.Lambda.Tools will automatically initate a container build if docker is installed. 
// Native AOT also requires the code and dependencies be Native AOT compatible. 
// 
// To disable Native AOT uncomment the following line to add the .NET build directive 
// that disables Native AOT. 
//#:property PublishAot=false 
 
using Amazon.Lambda.Core; 
using Amazon.Lambda.RuntimeSupport; 
using Amazon.Lambda.Serialization.SystemTextJson; 
using System.Text.Json.Serialization; 
 
// The function handler that will be called for each Lambda event 
var handler = (string input, ILambdaContext context) =&amp;gt; 
{ 
    return input.ToUpper(); 
}; 
 
// Build the Lambda runtime client passing in the handler to call for each 
// event and the JSON serializer to use for translating Lambda JSON documents 
// to .NET types. 
await LambdaBootstrapBuilder.Create(handler, new SourceGeneratorLambdaJsonSerializer&amp;lt;LambdaSerializerContext&amp;gt;()) 
        .Build() 
        .RunAsync(); 
 
// Since Native AOT is used by default with C# file-based Lambda functions the source generator 
// based Lambda serializer is used. Ensure the input type and return type used by the function 
// handler are registered on the JsonSerializerContext using the JsonSerializable attribute. 
[JsonSerializable(typeof(string))] 
public partial class LambdaSerializerContext : JsonSerializerContext 
{ 
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;File-based functions use &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/csharp-handler.html#csharp-executable-assembly-handlers" target="_blank" rel="noopener noreferrer"&gt;executable assembly handlers&lt;/a&gt;, in which the compiler generates the &lt;code&gt;Main()&lt;/code&gt; method containing your function code. Therefore, your code must include the &lt;code&gt;Amazon.Lambda.RuntimeSupport&lt;/code&gt; NuGet package and implement the &lt;code&gt;LambdaBootstrapBuilder.Create&lt;/code&gt; method to bootstrap the runtime client.&lt;/p&gt; 
&lt;p&gt;File-based applications also use .NET Native AOT by default. You can disable Native AOT by adding &lt;code&gt;#:property PublishAot=false&lt;/code&gt; to the top of the file. For more information on using Native AOT in Lambda, go to &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/dotnet-native-aot.html" target="_blank" rel="noopener noreferrer"&gt;Compile .NET Lambda function code to a native runtime format&lt;/a&gt; in the Lambda documentation.&lt;/p&gt; 
&lt;h3&gt;Deploying C# file-based apps&lt;/h3&gt; 
&lt;p&gt;To deploy your function using the dotnet CLI with the &lt;code&gt;Amazon.Lambda.Tools&lt;/code&gt; extension, pass the .cs filename as an added argument. Native AOT is enabled by default, thus the build must match the target architecture. If you’re building on the same architecture as the target Lambda function and on &lt;a href="https://aws.amazon.com/linux/amazon-linux-2023/" target="_blank" rel="noopener noreferrer"&gt;Amazon Linux 2023,&lt;/a&gt; then the build runs natively. Otherwise, &lt;code&gt;Amazon.Lambda.Tools&lt;/code&gt; uses a Docker container to build the function.&lt;/p&gt; 
&lt;p&gt;For example, to deploy for x86_64 (default architecture):&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;
dotnet lambda deploy-function ToUpper ToUpper.cs --function-runtime 
dotnet10 --function-role &amp;lt;role-arn&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Alternatively, to deploy for arm64:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;
dotnet lambda deploy-function ToUpper ToUpper.cs --function-runtime 
dotnet10 --function-role &amp;lt;role-arn&amp;gt; --function-architecture arm64&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Debugging C# file-based apps&lt;/h3&gt; 
&lt;p&gt;Visual Studio Code with the C# Dev Kit support debugging C# file-based applications.&lt;/p&gt; 
&lt;p&gt;1. Install the test tool.&lt;br&gt; &lt;code&gt;dotnet tool install -g Amazon.Lambda.TestTool&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;2. Start the emulator.&lt;br&gt; &lt;code&gt;dotnet lambda-test-tool start --lambda-emulator-port 5050&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;3. Configure &lt;code&gt;.vscode/launch.json&lt;/code&gt; to attach to the process.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "LambdaDebugFile",
      "type": "coreclr",
      "request": "launch",
      "program": "${fileDirname}/artifacts/Debug/${fileBasenameNoExtension}.dll",
      "cwd": "${workspaceFolder}",
      "console": "internalConsole",
      "stopAtEntry": false,
      "env": {
        "AWS_LAMBDA_RUNTIME_API": "localhost:5050/${fileBasenameNoExtension}"
      },
      "preLaunchTask": "build-active-file"
    }
  ]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;4. Configure &lt;code&gt;.vscode/tasks.json&lt;/code&gt; to build the active file.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "build-active-file",
      "command": "dotnet",
      "type": "process",
      "args": [
        "build",
        "${file}",
        "--output",
        "./artifacts/Debug"
      ],
      "problemMatcher": "$msCompile"
    }
  ]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The configuration uses &lt;code&gt;${file}&lt;/code&gt;, which allows the build task to target whichever C# file is currently active in your editor. This enables seamless debugging across multiple single-file functions.&lt;/p&gt; 
&lt;h2&gt;Lambda Managed Instances&lt;/h2&gt; 
&lt;p&gt;The Lambda runtime for .NET 10 includes support for &lt;a href="https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/" target="_blank" rel="noopener noreferrer"&gt;Lambda Managed Instances&lt;/a&gt;, so that you can run .NET 10 Lambda functions on &lt;a href="https://aws.amazon.com/ec2/graviton/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt; instances while maintaining serverless operational clarity. Therefore, you can use current-generation EC2 instances, including Graviton4, network-optimized instances, and other specialized compute options, without managing instance lifecycles, operating system patching, or scaling policies. Lambda Managed Instances provides access to Amazon EC2 commitment-based pricing models, such as &lt;a href="https://aws.amazon.com/savingsplans/compute-pricing/" target="_blank" rel="noopener noreferrer"&gt;Compute Savings Plans&lt;/a&gt; and &lt;a href="https://aws.amazon.com/ec2/pricing/reserved-instances/" target="_blank" rel="noopener noreferrer"&gt;Reserved Instances&lt;/a&gt;, which can provide up to a 72% discount over &lt;a href="https://aws.amazon.com/ec2/pricing/on-demand/" target="_blank" rel="noopener noreferrer"&gt;Amazon EC2 On-Demand pricing&lt;/a&gt;. This offers significant cost savings for steady-state workloads while maintaining the familiar Lambda programming model. For more information, go to &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Managed Instances&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;With Lambda Managed Instances, each function execution environment can process multiple function invokes at the same time. In .NET, Lambda uses .NET Tasks for asynchronous processing of multiple concurrent requests. You should apply the same concurrency safety practices when using Lambda Managed Instances that you would in any other multi-concurrent environment. For example, any mutable state—including shared collections, database connections, and static objects—must be thread safe. For more information, go to &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-dotnet-runtime.html" target="_blank" rel="noopener noreferrer"&gt;.NET runtime for Lambda Managed Instances&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Performance considerations&lt;/h2&gt; 
&lt;p&gt;At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. Therefore, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized.&lt;/p&gt; 
&lt;p&gt;Performance is highly dependent on workload, thus you should conduct your own testing instead of relying on generic test benchmarks. There are a range of features available to reduce the impact of cold starts for Lambda functions that use .NET 10, including &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html" target="_blank" rel="noopener noreferrer"&gt;SnapStart&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html" target="_blank" rel="noopener noreferrer"&gt;provisioned concurrency&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/dotnet-native-aot.html" target="_blank" rel="noopener noreferrer"&gt;Native AOT&lt;/a&gt;, and &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" target="_blank" rel="noopener noreferrer"&gt;Lambda Managed Instances&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;In addition, .NET 10 Lambda workloads might experience slightly lower performance until some added runtime enhancements are released. Go to &lt;a href="https://github.com/dotnet/runtime/issues/120288#issuecomment-2436423945" target="_blank" rel="noopener noreferrer"&gt;dotnet/runtime#120288&lt;/a&gt; for details.&lt;/p&gt; 
&lt;h2&gt;Migrating from .NET 8 to .NET 10&lt;/h2&gt; 
&lt;p&gt;To use .NET 10 and the new file-based features, you must update your tools.&lt;/p&gt; 
&lt;p&gt;1. Install or update the .NET 10 SDK.&lt;/p&gt; 
&lt;p&gt;2. If you are using AWS SAM, then install or update to the latest version.&lt;/p&gt; 
&lt;p&gt;3. If you are using Visual Studio, then install or update the AWS Toolkit for Visual Studio to version 1.83.0.0 or later.&lt;/p&gt; 
&lt;p&gt;4. If you use the &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/csharp-package-cli.html" target="_blank" rel="noopener noreferrer"&gt;.NET Lambda Global Tools extension&lt;/a&gt; (&lt;code&gt;Amazon.Lambda.Tools&lt;/code&gt;) for the .NET CLI, then update to version 6.0.0 or later to support file-based C#.&lt;/p&gt; 
&lt;p&gt;To upgrade a function to .NET 10, check your code and dependencies for compatibility with .NET 10, run tests, and update as necessary. Generative AI can help: consider using &lt;a href="https://aws.amazon.com/transform/custom/" target="_blank" rel="noopener noreferrer"&gt;AWS Transform custom&lt;/a&gt; or coding assistants such as &lt;a href="https://kiro.dev/" target="_blank" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; to help with upgrades.&lt;/p&gt; 
&lt;h3&gt;Upgrading using the dotnet CLI&lt;/h3&gt; 
&lt;p&gt;For projects using the dotnet CLI with the &lt;code&gt;Amazon.Lambda.tools&lt;/code&gt; extension:&lt;/p&gt; 
&lt;p&gt;1. Open the &lt;code&gt;.csproj&lt;/code&gt; project file.&lt;/p&gt; 
&lt;p&gt;2. Update the &lt;code&gt;TargetFramework&lt;/code&gt; to &lt;code&gt;net10.0&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;3. Update NuGet packages &lt;code&gt;Amazon.Lambda.*&lt;/code&gt; to the latest versions.&lt;/p&gt; 
&lt;p&gt;4. If using &lt;code&gt;aws-lambda-tools-defaults.json&lt;/code&gt;, then set function-runtime to &lt;code&gt;dotnet10&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;5. Run &lt;code&gt;dotnet lambda deploy-function&lt;/code&gt; to deploy.&lt;/p&gt; 
&lt;h3&gt;Upgrading using the AWS Toolkit for Visual Studio&lt;/h3&gt; 
&lt;p&gt;To upgrade a function to .NET 10:&lt;/p&gt; 
&lt;p&gt;1. Open the &lt;code&gt;.csproj&lt;/code&gt; project file and update the &lt;code&gt;TargetFramework&lt;/code&gt; to &lt;code&gt;net10.0&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;2. Update NuGet packages to the latest versions.&lt;/p&gt; 
&lt;p&gt;3. Right-click the project in &lt;strong&gt;Solution Explorer&lt;/strong&gt; and choose &lt;strong&gt;Publish to AWS Lambda&lt;/strong&gt;.&lt;/p&gt; 
&lt;h3&gt;Upgrading container image functions&lt;/h3&gt; 
&lt;p&gt;Along with the preceding changes, the .NET 8 and .NET 10 runtimes are built on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. The Amazon Linux 2023 minimal image uses &lt;code&gt;microdnf&lt;/code&gt; as a package manager, symlinked as &lt;code&gt;dnf&lt;/code&gt;. This replaces the yum package manager used in .NET 6 and earlier Amazon Linux 2-based images. If you deploy your Lambda functions as container images, then you must update your Dockerfiles to use &lt;code&gt;dnf&lt;/code&gt; instead of &lt;code&gt;yum&lt;/code&gt; when upgrading to the .NET 10 base image from .NET 6 or earlier base images.&lt;/p&gt; 
&lt;p&gt;Learn more about the provided.al2023 runtime in the post &lt;a href="https://aws.amazon.com/blogs/compute/introducing-the-amazon-linux-2023-runtime-for-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Introducing the Amazon Linux 2023 runtime for AWS Lambda&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Using the .NET 10 runtime in Lambda&lt;/h2&gt; 
&lt;p&gt;The following sections demonstrate how to use the .NET 10 runtime in Lambda.&lt;/p&gt; 
&lt;h3&gt;The console&lt;/h3&gt; 
&lt;p&gt;On the &lt;strong&gt;Create Function&lt;/strong&gt; page of the &lt;a href="https://console.aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;Lambda console&lt;/a&gt;, choose &lt;strong&gt;.NET 10&lt;/strong&gt; in the &lt;strong&gt;Runtime&lt;/strong&gt; dropdown menu, as shown in the following figure.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/08/computeblog-2512-image-1.png"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 1: Creating a .NET 10 function in the Lambda console&lt;/p&gt; 
&lt;p&gt;To update an existing Lambda function to .NET 10, navigate to the function in the Lambda console. Choose &lt;strong&gt;Edit&lt;/strong&gt; in the &lt;strong&gt;Runtime settings&lt;/strong&gt; panel, then choose &lt;strong&gt;.NET 10&lt;/strong&gt; from the &lt;strong&gt;Runtime&lt;/strong&gt; dropdown menu, as shown in the following figure.&lt;/p&gt; 
&lt;p&gt;&lt;img src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2026/01/08/computeblog-2512-image-2.png"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 2: Editing runtime settings to choose .NET 10&lt;/p&gt; 
&lt;h3&gt;Lambda container image&lt;/h3&gt; 
&lt;p&gt;Change the .NET base image version by modifying the &lt;code&gt;FROM&lt;/code&gt; statement in your Dockerfile:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;FROM public.ecr.aws/lambda/dotnet:10
# Copy function code
COPY artifacts/publish/ ${LAMBDA_TASK_ROOT}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;AWS SAM&lt;/h3&gt; 
&lt;p&gt;In AWS SAM, set the &lt;code&gt;Runtime&lt;/code&gt; attribute to &lt;strong&gt;dotnet10&lt;/strong&gt;:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ruby"&gt;AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: My .NET Lambda Function
      CodeUri: ./src/MyFunction/
      Handler: MyFunction::MyFunction.Function::FunctionHandler
      Runtime: dotnet10&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;For file-based functions, set &lt;code&gt;CodeUri&lt;/code&gt; to the C# file path relative to the AWS SAM template. &lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt; commands such as &lt;code&gt;deploy-serverless&lt;/code&gt; and &lt;code&gt;package-ci&lt;/code&gt; package the file-based Lambda function as a .NET executable. The &lt;code&gt;Handler&lt;/code&gt; field must be set to the .NET assembly name, which for file-based C# applications is the filename minus the .cs extension:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ruby"&gt;ToUpperFunction:
  Type: AWS::Serverless::Function
  Properties:
    Handler: ToUpperFunction
    Runtime: dotnet10
    CodeUri: ./ToUpperFunction.cs
    MemorySize: 512
    Timeout: 30&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;AWS SAM supports generating new serverless .NET 10 applications using the &lt;code&gt;sam init&lt;/code&gt; command. Refer to the &lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-cli-command-reference-sam-init.html" target="_blank" rel="noopener noreferrer"&gt;AWS SAM documentation&lt;/a&gt; for details.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;AWS Lambda now supports .NET 10 as a managed language runtime to help developers build more efficient, powerful, and scalable serverless applications. .NET 10 language additions include C# 14 features, runtime optimizations, and improved Native AOT support. This release also introduces file-based C# applications for streamlined single-file Lambda functions, and it includes support for Lambda Managed Instances for specialized compute requirements and at-scale cost efficiency.&lt;/p&gt; 
&lt;p&gt;You can build and deploy functions using .NET 10 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of IaC tool. You can also use the .NET 10 container base image if you prefer to build and deploy your functions using container images.&lt;/p&gt; 
&lt;p&gt;Try the .NET 10 runtime in Lambda today and experience the benefits of this updated language version.&lt;/p&gt; 
&lt;p&gt;To find more .NET examples, use the &lt;a href="https://serverlessland.com/patterns?language=.NET" target="_blank" rel="noopener noreferrer"&gt;Serverless Patterns Collection&lt;/a&gt;. For more serverless learning resources, visit &lt;a href="https://serverlessland.com/patterns?language=.NET" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building zero trust generative AI applications in healthcare with AWS Nitro Enclaves</title>
		<link>https://aws.amazon.com/blogs/compute/building-zero-trust-generative-ai-applications-in-healthcare-with-aws-nitro-enclaves/</link>
					
		
		<dc:creator><![CDATA[Nathan Pogue]]></dc:creator>
		<pubDate>Fri, 12 Dec 2025 19:06:03 +0000</pubDate>
				<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Expert (400)]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Healthcare]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">642fc064167586b1ca1fbd39d63e790431c546f7</guid>

					<description>In healthcare, generative AI is transforming how 
&lt;a href="https://aws.amazon.com/blogs/publicsector/how-healthcare-organizations-use-generative-ai-on-aws-to-turn-data-into-better-patient-outcomes/" target="_blank" rel="noopener noreferrer"&gt;medical professionals analyze data&lt;/a&gt;, 
&lt;a href="https://aws.amazon.com/blogs/industries/netsmart-transforms-behavioral-healthcare-with-new-feature-of-aws-healthscribe/" target="_blank" rel="noopener noreferrer"&gt;summarize clinical notes&lt;/a&gt;, and 
&lt;a href="https://aws.amazon.com/solutions/case-studies/generative-ai-ekacare/" target="_blank" rel="noopener noreferrer"&gt;generate insights to improve patient outcomes&lt;/a&gt;. From 
&lt;a href="https://aws.amazon.com/solutions/case-studies/one-medical-case-study/" target="_blank" rel="noopener noreferrer"&gt;automating medical documentation&lt;/a&gt; to assisting in 
&lt;a href="https://www.gehealthcare.com/insights/article/ge-healthcare-unveils-firstofitskind-mri-foundation-model" target="_blank" rel="noopener noreferrer"&gt;diagnostic reasoning&lt;/a&gt;, large language models (LLMs) have the potential to augment clinical workflows and accelerate research. However, these innovations also introduce significant privacy, security, and intellectual property challenges.</description>
										<content:encoded>&lt;p&gt;In healthcare, generative AI is transforming how &lt;a href="https://aws.amazon.com/blogs/publicsector/how-healthcare-organizations-use-generative-ai-on-aws-to-turn-data-into-better-patient-outcomes/" target="_blank" rel="noopener noreferrer"&gt;medical professionals analyze data&lt;/a&gt;, &lt;a href="https://aws.amazon.com/blogs/industries/netsmart-transforms-behavioral-healthcare-with-new-feature-of-aws-healthscribe/" target="_blank" rel="noopener noreferrer"&gt;summarize clinical notes&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/solutions/case-studies/generative-ai-ekacare/" target="_blank" rel="noopener noreferrer"&gt;generate insights to improve patient outcomes&lt;/a&gt;. From &lt;a href="https://aws.amazon.com/solutions/case-studies/one-medical-case-study/" target="_blank" rel="noopener noreferrer"&gt;automating medical documentation&lt;/a&gt; to assisting in &lt;a href="https://www.gehealthcare.com/insights/article/ge-healthcare-unveils-firstofitskind-mri-foundation-model" target="_blank" rel="noopener noreferrer"&gt;diagnostic reasoning&lt;/a&gt;, large language models (LLMs) have the potential to augment clinical workflows and accelerate research. However, these innovations also introduce significant privacy, security, and intellectual property challenges.&lt;/p&gt; 
&lt;p&gt;Healthcare data often contains Protected Health Information (PHI), which is &lt;a href="https://aws.amazon.com/health/healthcare-compliance/" target="_blank" rel="noopener noreferrer"&gt;governed by strict regulations and compliance frameworks&lt;/a&gt;. At the same time, organizations or researchers who have invested substantial time and compute resources into training medical LLMs must protect their proprietary model architectures, weights, and fine-tuned datasets. Traditional deployment models necessitate mutual trust between the model publisher and the healthcare data provider — trust that sensitive data won’t be leaked, and that the model itself won’t be copied, tampered with, or exfiltrated. The absence of a secure and verifiable trust model between model publishers and consumers remains &lt;a href="https://www.sciencedirect.com/science/article/pii/S138650562400443X" target="_blank" rel="noopener noreferrer"&gt;one of the main barriers to scaling generative AI in regulated medical environments&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;To address this concern, both parties need a secure environment to publish and consume models without exposing data or intellectual property. Amazon Web Services &lt;a href="https://aws.amazon.com/ec2/nitro/nitro-enclaves/" target="_blank" rel="noopener noreferrer"&gt;(AWS) Nitro Enclaves&lt;/a&gt; provide isolated, attested, and cryptographically verified compute environments that help protect sensitive workloads. Model owners can encrypt their LLMs with &lt;a href="https://aws.amazon.com/kms/" target="_blank" rel="noopener noreferrer"&gt;AWS Key Management Service (AWS KMS)&lt;/a&gt; and allow only verified Nitro Enclaves to decrypt and run them, making sure that the model can’t be accessed outside the Nitro Enclave. Healthcare organizations and consumers can use this to process sensitive data within their own AWS environment entirely within the Nitro Enclave, helping keep PHI private and contained. Hardware-based attestation provides proof that the Nitro Enclave is running trusted code, so that both sides can exchange information with confidence.&lt;/p&gt; 
&lt;p&gt;In this post, we demonstrate how to deploy a publicly available foundational model (FM) using Nitro Enclaves for isolated, more secure compute, AWS KMS for model encryption, &lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&lt;/a&gt; for storing model artifacts and images, and &lt;a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Queue Service (Amazon SQS)&lt;/a&gt; for securely delivering queries, enabling private, privacy-preserving inferences while helping protect both model intellectual property and sensitive patient data.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;This solution outlines how to build a more secure end-to-end pipeline that enables &lt;a href="https://aws.amazon.com/security/zero-trust/" target="_blank" rel="noopener noreferrer"&gt;zero trust&lt;/a&gt; medical LLM publication and inference&amp;nbsp;with Nitro Enclaves. This post demonstrates a guide for setting up an &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt; instance with Nitro Enclaves enabled, downloading and encrypting a publicly available FM to an S3 bucket with an AWS KMS key, sending medical text and image-based queries to an SQS queue for processing, and storing results in an &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt; table.&lt;/p&gt; 
&lt;p&gt;This project is intended solely for educational and demonstration purposes and isn’t suitable for production or clinical use. Its outputs aren’t validated for clinical accuracy and must not be used for patient care or medical decision-making. Before any real-world deployment, make sure that you implement comprehensive security, privacy, and compliance safeguards. These include health data protection controls, secrets management, and regulatory validation. Furthermore, you must consult the appropriate clinical, legal, and security experts.&lt;/p&gt; 
&lt;p&gt;For demonstration purposes, this solution is deployed in a single AWS account. Ideally, in production, it would be deployed across separate AWS accounts: one for the model owner and one for the model consumer. The model owner can use cross-account &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; permissions and encrypted model sharing through AWS KMS to securely provide access to their model without exposing the underlying weights or logic. At the same time, the consumer can run sensitive inferences within their own environment, maintaining strict data privacy and zero trust principles. In a real-world implementation, the model provider should also establish a robust entitlement and licensing framework to manage customer access, enabling fine-grained control over who can invoke the model, track usage, and support license revocation to immediately remove permissions from specific customers when necessary.&lt;/p&gt; 
&lt;p&gt;The following diagram shows the solution architecture:&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/11/architecture.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25546" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/11/architecture.png" alt="Scope of solution" width="587" height="395"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;The steps of the solution include:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon EC2 setup:&amp;nbsp;&lt;/strong&gt;An EC2 instance is launched with Nitro Enclaves and Trusted Platform Module (TPM) enabled. For this project, a &lt;a href="https://aws.amazon.com/ec2/instance-types/c7i/" target="_blank" rel="noopener noreferrer"&gt;c7i.12xlarge instance&lt;/a&gt; with a &lt;a href="https://aws.amazon.com/ebs/" target="_blank" rel="noopener noreferrer"&gt;150 GB Amazon&amp;nbsp;EBS volume&lt;/a&gt; is used to provide the necessary compute resources for running LLMs.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Public FM download: &lt;/strong&gt;A publicly available FM is retrieved from &lt;a href="https://huggingface.co/" target="_blank" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; and stored in an S3 bucket within the model consumer’s AWS account.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Model encryption:&amp;nbsp;&lt;/strong&gt;The model is encrypted using AWS KMS envelope encryption. Only a Nitro Enclave presenting a &lt;a href="https://docs.aws.amazon.com/enclaves/latest/user/set-up-attestation.html" target="_blank" rel="noopener noreferrer"&gt;valid attestation document&lt;/a&gt; can request the decryption key from AWS KMS, which helps prevent unauthorized access to the model weights outside the Nitro Enclave.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Nitro Enclave setup:&amp;nbsp;&lt;/strong&gt;A &lt;a href="https://www.docker.com/" target="_blank" rel="noopener noreferrer"&gt;Docker image&lt;/a&gt; containing the &lt;a href="https://github.com/ggml-org/llama.cpp" target="_blank" rel="noopener noreferrer"&gt;llama.cpp inference runtime&lt;/a&gt; is built and deployed inside the Nitro Enclave.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Model decryption and setup:&amp;nbsp;&lt;/strong&gt;When the Nitro Enclave launched, it requests decryption of the model artifacts using its attestation credentials. Then, the model can be securely decrypted inside the memory of the Nitro Enclave and loaded by the llama.cpp server. This means that the decrypted model weights aren’t visible outside of the Nitro Enclave boundary.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Medical query:&amp;nbsp;&lt;/strong&gt;Users can submit either text or image-based queries to the model. Queries are sent through &lt;a href="https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave-concepts.html#term-socket" target="_blank" rel="noopener noreferrer"&gt;vsock&lt;/a&gt;, a secure communication channel from the client application to the model server inside the Nitro Enclave. Image queries necessitate that users upload images to an S3 bucket. The upload event triggers an SQS queue, which signals the Amazon EC2 parent to fetch and send the image to the Nitro Enclave image for the medical LLM to process with its multimodal capabilities.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Message history:&lt;/strong&gt;&amp;nbsp;Each interaction, including the user’s prompt and the model’s response, is logged to&amp;nbsp;a DynamoDB table. This provides a persistent conversation history that enables traceability and auditing while keeping PHI securely stored within the consumer’s account. If necessary, the &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/encryption.tutorial.html" target="_blank" rel="noopener noreferrer"&gt;DynamoDB table can be encrypted&lt;/a&gt; and sealed for another layer of security and privacy.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;About Google MedGemma 4B&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://deepmind.google/models/gemma/medgemma/" target="_blank" rel="noopener noreferrer"&gt;Google MedGemma&lt;/a&gt; is a family of medically-optimized LLMs built on &lt;a href="https://deepmind.google/models/gemma/gemma-3/" target="_blank" rel="noopener noreferrer"&gt;Gemma 3&lt;/a&gt;, with 4B and 27B parameter variants supporting both &lt;a href="https://developers.google.com/health-ai-developer-foundations/medgemma/model-card#description" target="_blank" rel="noopener noreferrer"&gt;text and multimodal versions for medical image inputs&lt;/a&gt;. The 4B model offers efficiency and strong performance for multimodal tasks such as report generation and medical Q&amp;amp;A, while the 27B models excel at more demanding scenarios, such as electronic health record interpretation and complex longitudinal data analysis.&lt;/p&gt; 
&lt;p&gt;MedGemma models are well-suited for automated radiology report generation, clinical triage and documentation, patient education, medical image pre-interpretation, and medical education systems. The 4B model is ideal for portable or resource-constrained deployments, whereas the 27B multimodal delivers maximal performance.&lt;/p&gt; 
&lt;p&gt;In this project, MedGemma 4B serves as a reference medical LLM, showing how domain-adapted fine-tuning can enhance a model’s ability to interpret, reason about, and respond to complex medical queries. It also provides a foundation for exploring the safe and effective use of LLMs in healthcare applications, while being securely deployed within a Nitro Enclave.&amp;nbsp;However, you can choose to deploy your own medical FM if needed. This is &lt;a href="https://huggingface.co/google/medgemma-4b-it" target="_blank" rel="noopener noreferrer"&gt;a deeper overview&lt;/a&gt; on the 4B model.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;To implement the proposed solution, make sure that you have the following:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;The &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt; installed on your machine to create the EC2 instance.&lt;/li&gt; 
 &lt;li&gt;AWS permissions with access to EC2 c7i.12xlarge instances and Nitro Enclaves.&lt;/li&gt; 
 &lt;li&gt;Knowledge of Amazon S3, AWS KMS, Amazon SQS, &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;, and DynamoDB.&lt;/li&gt; 
 &lt;li&gt;Basic knowledge of Nitro Enclaves and healthcare data security.&lt;/li&gt; 
 &lt;li&gt;The &lt;a href="https://github.com/aws-samples/sample-for-secure-medical-llm-inference-with-nitro-enclaves" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; cloned to your local machine.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Environment setup&lt;/h2&gt; 
&lt;p&gt;The following sections outline how to set up your environment for this solution.&lt;/p&gt; 
&lt;h3&gt;Create S3 buckets&lt;/h3&gt; 
&lt;p&gt;In this solution, you create two S3 buckets: one for the model artifacts and one for the image inputs.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create the S3 buckets&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Sign in to the Amazon S3 console, choose &lt;strong&gt;Create bucket&lt;/strong&gt;, and follow the prompts to &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html" target="_blank" rel="noopener noreferrer"&gt;create a new S3 bucket&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;For the model artifact bucket, give it a unique name (for example &lt;code&gt;AWSACCOUNTNUMBER-medgemma-model&lt;/code&gt;) in the same &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;Region&lt;/a&gt; you use for the other project resources.&lt;/li&gt; 
 &lt;li&gt;Repeat the same process for the image bucket&amp;nbsp;(for example &lt;code&gt;AWSACCOUNTNUMBER-medgemma-image-inputs&lt;/code&gt;).&lt;/li&gt; 
 &lt;li&gt;Update the &lt;code&gt;S3_BUCKET_NAME&lt;/code&gt; variable with your &lt;strong&gt;model bucket name&lt;/strong&gt; in &lt;code&gt;envelope_encrypt_model.sh&lt;/code&gt; and &lt;code&gt;run.sh&lt;/code&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Create an SQS queue&lt;/h3&gt; 
&lt;p&gt;When images are uploaded to the S3 image bucket, they are sent to an SQS queue for processing in sequential order by the model running in the Nitro Enclave.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create an SQS queue&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Sign in to the Amazon SQS console, choose &lt;strong&gt;Create queue&lt;/strong&gt;, and follow the prompts to &lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/creating-sqs-standard-queues.html" target="_blank" rel="noopener noreferrer"&gt;create a new SQS queue&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Standard Queue&lt;/strong&gt;, provide a name, leave the rest as default, and choose &lt;strong&gt;Create queue&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Replace the &lt;code&gt;SQS_QUEUE_URL&lt;/code&gt; variable in &lt;code&gt;image_processor.py&lt;/code&gt; and &lt;code&gt;lambda_function.py&lt;/code&gt; (in the &lt;code&gt;client&lt;/code&gt; and &lt;code&gt;assets&lt;/code&gt; folder, respectively) with your URL.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Create a Lambda function&lt;/h3&gt; 
&lt;p&gt;For image-based queries, MedGemma 4B expects images encoded in base64 format to be passed in the prompt. To convert the images to this format, a Lambda function is invoked using an Amazon S3 trigger when an image is uploaded to the bucket.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create a Lambda function&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Sign in to the Lambda console, choose &lt;strong&gt;Create function&lt;/strong&gt;, and follow the prompts to &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html" target="_blank" rel="noopener noreferrer"&gt;create a new Lambda function from scratch&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose a name, choose a Python runtime (for example Python 3.13), and paste in the Lambda function code from the &lt;code&gt;assets&lt;/code&gt; folder.&lt;/li&gt; 
 &lt;li&gt;Next, &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/permissions-executionrole-update.html" target="_blank" rel="noopener noreferrer"&gt;update the Lambda function’s IAM role&lt;/a&gt; in&amp;nbsp;&lt;strong&gt;Permissions&lt;/strong&gt;&amp;nbsp;under the &lt;strong&gt;Configuration&lt;/strong&gt; tab with access to your S3 image bucket and the SQS queue that you created with inline policy permissions. Attach the following policies: 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Amazon S3 policy: 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-json"&gt;{
&amp;nbsp;"Version": "2012-10-17",
&amp;nbsp;"Statement": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; {
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Sid": "Statement1",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Effect": "Allow",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Action": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "s3:*"
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ],
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Resource": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "arn:aws:s3:::&amp;lt;IMAGE_BUCKET_NAME&amp;gt;",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "arn:aws:s3:::&amp;lt;IMAGE_BUCKET_NAME&amp;gt;/*"
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ]
&amp;nbsp;&amp;nbsp; &amp;nbsp; }
&amp;nbsp;]
}&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/li&gt; 
   &lt;li&gt;Amazon SQS policy: 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-json"&gt;{
&amp;nbsp;"Version": "2012-10-17",
&amp;nbsp;"Statement": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; {
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Sid": "VisualEditor0",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Effect": "Allow",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Action": "sqs:ListQueues",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Resource": "*"
&amp;nbsp;&amp;nbsp; &amp;nbsp; },
&amp;nbsp;&amp;nbsp; &amp;nbsp; {
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Sid": "VisualEditor1",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Effect": "Allow",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Action": "sqs:*",
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; "Resource": "arn:aws:sqs:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_NUMBER&amp;gt;:&amp;lt;QUEUE_NAME&amp;gt;"
&amp;nbsp;&amp;nbsp; &amp;nbsp; }
&amp;nbsp;]
}&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;Finally, within the Lambda Designer, add a trigger, choose Amazon S3, and choose your image bucket. You should see the following example when the trigger is enabled.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image-3-5.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25494" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image-3-5.png" alt="AWS Lambda configuration interface showing S3 bucket trigger setup for medical image processing workflow" width="737" height="334"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;Create a DynamoDB table&lt;/h3&gt; 
&lt;p&gt;When the queries have been processed by the model for inference, the prompts and responses are logged to a DynamoDB table for auditing and message history purposes.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create a DynamoDB table&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Sign in to the DynamoDB console, choose &lt;strong&gt;Create table&lt;/strong&gt;, and follow the prompts to &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/getting-started-step-1.html" target="_blank" rel="noopener noreferrer"&gt;create a new DynamoDB table&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Give it a partition key named&amp;nbsp;&lt;code&gt;ID&lt;/code&gt; as a String type.&lt;/li&gt; 
 &lt;li&gt;Replace the &lt;code&gt;TABLE_NAME&lt;/code&gt; variable with the table name and &lt;code&gt;REGION&lt;/code&gt; variable with your AWS Region in &lt;code&gt;direct_query.py&lt;/code&gt; and &lt;code&gt;image_processor.py&lt;/code&gt; files.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Create an AWS KMS key&lt;/h3&gt; 
&lt;p&gt;An AWS KMS key is used to envelope-encrypt the model artifacts before they are uploaded to the S3 model bucket. During encryption, the AWS KMS key policy is configured with conditions that restrict decryption to only those Nitro Enclaves presenting a valid attestation document. This attestation includes &lt;a href="https://docs.aws.amazon.com/enclaves/latest/user/set-up-attestation.html#where" target="_blank" rel="noopener noreferrer"&gt;platform configuration registers (PCR) hashes &lt;/a&gt;that represent the measured state of the Nitro Enclave, which covers the signed Nitro Enclave image, runtime, and configuration. When the Nitro Enclave is launched, it generates an attestation document signed by the Amazon &lt;a href="https://aws.amazon.com/ec2/nitro/" target="_blank" rel="noopener noreferrer"&gt;EC2 Nitro hypervisor&lt;/a&gt;, proving that its PCR values match the expected trusted measurements defined in the AWS KMS key policy. The key is released only if these PCR hashes align and the attestation is verified by AWS KMS, allowing the Nitro Enclave to decrypt and load the model securely in memory.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create an AWS KMS key&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Sign in to the AWS KMS console, choose &lt;strong&gt;Create key&lt;/strong&gt;, and follow the prompts to &lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/create-symmetric-cmk.html" target="_blank" rel="noopener noreferrer"&gt;create a new AWS KMS key&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Symmetric&lt;/strong&gt;&amp;nbsp;as the key type and &lt;strong&gt;Encrypt and decrypt&lt;/strong&gt;&amp;nbsp;for the key usage. Make the alias&amp;nbsp;&lt;code&gt;AppKmsKey&lt;/code&gt;. Leave the default settings and choose &lt;strong&gt;Finish&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Replace the &lt;code&gt;REGION&lt;/code&gt; variable in &lt;code&gt;vsock-proxy.yaml&lt;/code&gt; in the &lt;code&gt;client&lt;/code&gt; folder with your AWS Region.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Create an EC2 instance&lt;/h3&gt; 
&lt;p&gt;Now that the necessary resources are set up, you can proceed to launch the EC2 instance and create the Nitro Enclave image. For this solution, a c7i.12xlarge instance with a 150 GB EBS volume is provisioned.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To launch an EC2 instance with Nitro Enclaves enabled&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Within the GitHub repository on your local machine, run &lt;code&gt;./create_ec2.sh&lt;/code&gt; to create the EC2 instance.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;div class="hide-language"&gt; 
  &lt;pre&gt;&lt;code class="lang-bash"&gt;cd scripts 
chmod +x create_ec2.sh 
./create_ec2.sh&lt;/code&gt;&lt;/pre&gt; 
 &lt;/div&gt; 
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;The script launches an EC2 instance called &lt;code&gt;MedGemmaNitroEnclaveDemo&lt;/code&gt;. When the instance is running, you must &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html#access_policies_create-json-editor" target="_blank" rel="noopener noreferrer"&gt;create an IAM policy&lt;/a&gt; and add it to the Amazon EC2 IAM role with necessary permissions to the resources created previously.&lt;/li&gt; 
 &lt;li&gt;Sign in to the IAM console and navigate to &lt;strong&gt;Policies&lt;/strong&gt;, choose&amp;nbsp;&lt;strong&gt;Create policy&lt;/strong&gt;, choose &lt;strong&gt;JSON&lt;/strong&gt;, and paste the following policy, making sure that you update the bucket, queue URL, AWS Region, account number, and table variables:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-json"&gt;{
&amp;nbsp;&amp;nbsp;"Version": "2012-10-17",
&amp;nbsp;&amp;nbsp;"Statement": [
&amp;nbsp;&amp;nbsp; &amp;nbsp;{
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Sid": "s3modelbucket",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Effect": "Allow",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Action": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"s3:*"
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;],
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Resource": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"arn:aws:s3:::&amp;lt;MODEL_BUCKET_NAME&amp;gt;",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"arn:aws:s3:::&amp;lt;MODEL_BUCKET_NAME&amp;gt;/*"
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;]
&amp;nbsp;&amp;nbsp; &amp;nbsp;},
&amp;nbsp;&amp;nbsp; &amp;nbsp;{
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Sid": "dynamodb",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Effect": "Allow",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Action": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"dynamodb:*"
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;],
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Resource": [
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"arn:aws:dynamodb:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_NUMBER&amp;gt;:table/&amp;lt;TABLE_NAME&amp;gt;"
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;]
&amp;nbsp;&amp;nbsp; &amp;nbsp;},
&amp;nbsp;&amp;nbsp; &amp;nbsp;{
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Sid": "sqslist",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Effect": "Allow",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Action": "sqs:ListQueues",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Resource": "*"
&amp;nbsp;&amp;nbsp; &amp;nbsp;},
&amp;nbsp;&amp;nbsp; &amp;nbsp;{
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Sid": "sqsqueue",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Effect": "Allow",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Action": "sqs:*",
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;"Resource": "arn:aws:sqs:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_NUMBER&amp;gt;:&amp;lt;QUEUE_NAME&amp;gt;"
&amp;nbsp;&amp;nbsp; &amp;nbsp;}
&amp;nbsp;&amp;nbsp;]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Give it a name (for example &lt;code&gt;enclave-permissions&lt;/code&gt;) and choose &lt;strong&gt;Create policy.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Navigate to &lt;strong&gt;Roles&lt;/strong&gt;, choose&amp;nbsp;&lt;strong&gt;Create role&lt;/strong&gt;, choose &lt;strong&gt;EC2 &lt;/strong&gt;as the&amp;nbsp;&lt;strong&gt;AWS service&lt;/strong&gt;&amp;nbsp;for the &lt;strong&gt;Trusted entity type&lt;/strong&gt;, then choose your policy that you created under &lt;strong&gt;Add permissions.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Update your EC2 instance to use the role by going to the &lt;strong&gt;Security&lt;/strong&gt; setting under the &lt;strong&gt;Actions&lt;/strong&gt; dropdown, then modifying its IAM role.&lt;/li&gt; 
 &lt;li&gt;You can upload the modified repository to your EC2 instance &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/linux-file-transfer-scp.html" target="_blank" rel="noopener noreferrer"&gt;using SCP&lt;/a&gt;. Alternatively, you can transfer the repository through &lt;code&gt;rsync&lt;/code&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;rsync -avz -e "ssh -i /path/to/directory/sample-for-secure-medical-llm-inference-with-nitro-enclaves/nitro-enclave-key.pem" --exclude='*.pem' /path/to/directory/sample-for-secure-medical-llm-inference-with-nitro-enclaves/ ec2-user@&amp;lt;PUBLIC_IP&amp;gt;:~ 
cd sample-for-secure-medical-llm-inference-with-nitro-enclaves&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="8"&gt; 
 &lt;li&gt;Make the scripts executable (in &lt;code&gt;client&lt;/code&gt;, &lt;code&gt;server&lt;/code&gt; and &lt;code&gt;scripts&lt;/code&gt;).&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;chmod +x *.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Nitro Enclave setup&lt;/h3&gt; 
&lt;p&gt;With Amazon EC2 loaded with the necessary scripts, you can begin building the Nitro Enclave image.&amp;nbsp;During this process, the &lt;a href="https://docs.aws.amazon.com/enclaves/latest/user/building-eif.html" target="_blank" rel="noopener noreferrer"&gt;Docker container is converted into an Enclave Image File (EIF)&lt;/a&gt;, which generates cryptographic measurements (PCR hashes) that uniquely identify the code and configuration of the enclave. These measurements are embedded into the AWS KMS key policy, creating a hardware-attested trust boundary that makes sure only this specific, unmodified Nitro Enclave can decrypt and access the model weights.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Run the complete setup script, which sets up the client on the EC2 parent instance and the server running within the Nitro Enclave. You can observe the different scripts in the &lt;code&gt;client&lt;/code&gt;, &lt;code&gt;server&lt;/code&gt;, and &lt;code&gt;scripts&lt;/code&gt; folders.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;sudo ./run_complete_setup.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;The various scripts run to download the MedGemma 4B model, encrypt the model with the AWS KMS key, build a Docker image to run a llama.cpp server, start a Nitro Enclave, and decrypt and run the model. This process takes approximately 10 minutes.&lt;/li&gt; 
 &lt;li&gt;When the Nitro Enclave is running, it runs in debug mode so that you can observe the various startup logs outputted. Wait until llama.cpp server logs are outputted that indicate the server is ready and listening.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Inference examples&lt;/h2&gt; 
&lt;p&gt;When the model is decrypted and running on the llama.cpp server within the Nitro Enclave, you can begin to invoke the model with either image or text-based queries.&amp;nbsp;Open a &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect.html" target="_blank" rel="noopener noreferrer"&gt;new terminal session in your EC2 instance&lt;/a&gt;. You can navigate to the &lt;code&gt;client&lt;/code&gt; folder to run the scripts for queries.&lt;/p&gt; 
&lt;h3&gt;Image-based queries&lt;/h3&gt; 
&lt;p&gt;For inference on medical images, upload an image to your Amazon S3 image bucket. When it is uploaded, run&amp;nbsp;&lt;code&gt;python3 image_processor.py&lt;/code&gt;&amp;nbsp;to pass the image from the SQS queue to the Nitro Enclave for processing. The following are examples of image inputs and model outputs.&lt;/p&gt; 
&lt;h4&gt;Brain CT scan:&lt;/h4&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image-5.jpg"&gt;&lt;img loading="lazy" class="alignnone wp-image-25498" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image-5.jpg" alt="Medical brain CT scan image showing anatomical structures in grayscale" width="119" height="119"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Case courtesy of Dr Henry Knipe,&amp;nbsp;&lt;/em&gt;&lt;a href="https://radiopaedia.org/" target="_blank" rel="noopener noreferrer"&gt;&lt;em&gt;Radiopaedia.org&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, rID:&amp;nbsp;&lt;/em&gt;&lt;a href="https://radiopaedia.org/cases/multiple-cerebral-contusions-and-temporal-bone-fracture#image-12495595" target="_blank" rel="noopener noreferrer"&gt;&lt;em&gt;46289&lt;/em&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h4&gt;Model response:&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image_1-1.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25518 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image_1-1.png" alt="Complete processing log showing secure medical image analysis pipeline from upload through diagnostic output" width="1897" height="402"&gt;&lt;/a&gt;Chest X-ray:&lt;/h4&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image-10.jpeg"&gt;&lt;img loading="lazy" class="alignnone wp-image-25485" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image-10.jpeg" alt="Chest X-ray image in AP sitting position" width="136" height="153"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h4&gt;Model response:&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image_2-1.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25519 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/image_2-1.png" alt="Model inference logs showing automated chest X-ray analysis with detailed radiological findings and considerations" width="1897" height="476"&gt;&lt;/a&gt;Text-based queries&lt;/h4&gt; 
&lt;p&gt;For inference on text-queries, run&amp;nbsp;&lt;code&gt;python3 direct_query.py "&amp;lt;YOUR_MEDICAL_QUERY&amp;gt;"&lt;/code&gt;&amp;nbsp;to invoke the model. The following are examples of text-based inputs and model outputs.&lt;/p&gt; 
&lt;h4&gt;Basic usage:&lt;/h4&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;python3 direct_query.py "What are the symptoms of pneumonia?"&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h4&gt;Model response:&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/text_1.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25514 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/text_1.png" alt="Model inference output describing pneumonia symptoms, diagnosis, and treatment options" width="1753" height="1050"&gt;&lt;/a&gt;Lab result interpretation:&lt;/h4&gt; 
&lt;div class="hide-language"&gt; 
 &lt;div class="hide-language"&gt; 
  &lt;pre&gt;&lt;code class="lang-python"&gt;python3 direct_query.py "Patient has elevated troponin levels (15.2 ng/mL), elevated CK-MB, and ST elevation in leads II, III, aVF. What does this suggest?"&lt;/code&gt;&lt;/pre&gt; 
 &lt;/div&gt; 
&lt;/div&gt; 
&lt;h4&gt;Model response:&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/text_3.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25516 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/12/09/text_3.png" alt="Model inference output analyzing cardiac lab results and ECG findings" width="1898" height="784"&gt;&lt;/a&gt;&lt;/h4&gt; 
&lt;h2&gt;Cleaning up&lt;/h2&gt; 
&lt;p&gt;To avoid incurring future charges, delete the resources used in this solution:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html" target="_blank" rel="noopener noreferrer"&gt;Stop&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html" target="_blank" rel="noopener noreferrer"&gt;terminate&lt;/a&gt; the EC2 instance.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/empty-bucket.html" target="_blank" rel="noopener noreferrer"&gt;Empty&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html" target="_blank" rel="noopener noreferrer"&gt;delete the S3 buckets&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/getting-started-step-6.html" target="_blank" rel="noopener noreferrer"&gt;Delete the DynamoDB table&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/step-delete-queue.html" target="_blank" rel="noopener noreferrer"&gt;Delete the SQS queue&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/deleting-keys.html" target="_blank" rel="noopener noreferrer"&gt;Delete the AWS KMS Key&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Delete the Lambda function.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;You can combine the isolation and attestation capabilities of AWS Nitro Enclaves, the encryption controls of AWS KMS, and the scalability of services such as Amazon S3, Amazon SQS, and Amazon DynamoDB to build a more secure, zero trust pipeline for deploying generative AI models in healthcare. Using Google MedGemma 4B as your reference medical LLM, you can enable privacy-preserving inference where both PHI and model intellectual property remain protected.&amp;nbsp;For more information, consult the following resources:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html" target="_blank" rel="noopener noreferrer"&gt;AWS Nitro Enclaves Developer Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 Developer Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB Developer Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/overview.html" target="_blank" rel="noopener noreferrer"&gt;AWS KMS Developer Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html" target="_blank" rel="noopener noreferrer"&gt;Amazon SQS Developer Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Developer Guide&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Orchestrating large-scale document processing with AWS Step Functions and Amazon Bedrock batch inference</title>
		<link>https://aws.amazon.com/blogs/compute/orchestrating-large-scale-document-processing-with-aws-step-functions-and-amazon-bedrock-batch-inference/</link>
					
		
		<dc:creator><![CDATA[Brian Zambrano]]></dc:creator>
		<pubDate>Wed, 26 Nov 2025 21:41:51 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Bedrock Knowledge Bases]]></category>
		<category><![CDATA[Amazon Nova]]></category>
		<category><![CDATA[Amazon Textract]]></category>
		<category><![CDATA[AWS Step Functions]]></category>
		<guid isPermaLink="false">04f527fdc75d26217a24ec09b92484b5436dc074</guid>

					<description>Organizations often have large volumes of documents containing valuable information that remains locked away and unsearchable. This solution addresses the need for a 
&lt;strong&gt;scalable, automated text extraction and knowledge base pipeline&lt;/strong&gt; that transforms static document collections into intelligent, searchable repositories for generative AI applications.</description>
										<content:encoded>&lt;p&gt;Organizations often have large volumes of documents containing valuable information that remains locked away and unsearchable. This solution addresses the need for a &lt;strong&gt;scalable, automated text extraction and knowledge base pipeline&lt;/strong&gt; that transforms static document collections into intelligent, searchable repositories for generative AI applications.&lt;/p&gt; 
&lt;p&gt;Organizations can automate the extraction of both content and structured metadata to build comprehensive knowledge bases that power retrieval-augmented generation (RAG) solutions while significantly reducing manual processing costs and time-to-value. The architecture not only demonstrates the processing of 500 research papers automatically, but also scales to handle enterprise document volumes cost-effectively through the &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; batch inference pricing model.&lt;/p&gt; 
&lt;h2&gt;Overview&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/automate-amazon-bedrock-batch-inference-building-a-scalable-and-efficient-pipeline/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock batch inference&lt;/a&gt; is a feature of Amazon Bedrock that offers a 50% discount on inference requests. Although Amazon Bedrock schedules and runs the batch job (needing a minimum of 100 inference requests) as capacity becomes available, the inference won’t be real-time. For use cases where you can accommodate minutes to hours of latency, Amazon Bedrock batch inference is a good option.&lt;/p&gt; 
&lt;p&gt;This post demonstrates how to build an automated, serverless pipeline using &lt;a href="https://aws.amazon.com/step-functions/" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt;, &lt;a href="https://aws.amazon.com/textract/" target="_blank" rel="noopener noreferrer"&gt;Amazon Textract&lt;/a&gt;, Amazon Bedrock batch inference, and &lt;a href="https://aws.amazon.com/bedrock/knowledge-bases/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Knowledge Bases&lt;/a&gt; to extract text, create metadata, and load it into a knowledge base at scale. The example solution processes 500 research papers in PDF format from &lt;a href="https://www.amazon.science/" target="_blank" rel="noopener noreferrer"&gt;Amazon Science&lt;/a&gt;, extracts text using Amazon Textract, generated structured metadata with Amazon Bedrock batch inference and the &lt;a href="https://aws.amazon.com/ai/generative-ai/nova/" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova Pro&lt;/a&gt; model, and loads the final output, including Amazon Bedrock Knowledge Base filter, into an Amazon Bedrock Knowledge Base.&lt;/p&gt; 
&lt;h2&gt;Architecture&lt;/h2&gt; 
&lt;p&gt;This solution uses Step Functions with parallel Amazon Textract job processing through child workflows run by &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/state-map-distributed.html" target="_blank" rel="noopener noreferrer"&gt;Distributed Map&lt;/a&gt;. You can use the concurrency controls offered by Distributed Map to process documents as quickly as possible within your Amazon Textract quotas. Increasing processing speed necessitates adjusting your Amazon Textract quota and updating the Distributed Map configuration. Amazon Bedrock batch inference handles concurrency, scaling, and throttling. This means that you can create the job without managing these complexities.&lt;/p&gt; 
&lt;p&gt;In this example implementation, the solution processes research papers to extract metadata such as:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Code availability and repository locations&lt;/li&gt; 
 &lt;li&gt;Dataset availability and access methods&lt;/li&gt; 
 &lt;li&gt;Research methodology types&lt;/li&gt; 
 &lt;li&gt;Reproducibility indicators&lt;/li&gt; 
 &lt;li&gt;Other relevant research attributes&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The high-level parts of this solution include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Extracting text from PDF documents with Amazon Textract in parallel, through Step Functions Distributed Map.&lt;/li&gt; 
 &lt;li&gt;Analyzing extracted text using Amazon Bedrock batch inference to extract structured metadata.&lt;/li&gt; 
 &lt;li&gt;Loading extract text and metadata into a searchable knowledge base using Amazon Bedrock Knowledge Bases with &lt;a href="https://aws.amazon.com/opensearch-service/features/serverless/" target="_blank" rel="noopener noreferrer"&gt;Amazon OpenSearch Serverless&lt;/a&gt;.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-1.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25420 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-1.png" alt="Complete architecture diagram" width="819" height="1303"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Figure 1. Complete architecture diagram&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;The following prerequisites are necessary to complete this solution:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Access to an &lt;a href="https://portal.aws.amazon.com/gp/aws/developer/registration/index.html" target="_blank" rel="noopener noreferrer"&gt;AWS account&lt;/a&gt; through the &lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt; and the &lt;a href="https://aws.amazon.com/cli" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt;. The &lt;a href="https://aws.amazon.com/iam" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege" target="_blank" rel="noopener noreferrer"&gt;principle of least-privilege&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html" target="_blank" rel="noopener noreferrer"&gt;AWS CLI&lt;/a&gt; installed and configured. If you are using long-term credentials such as access keys, then follow &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html" target="_blank" rel="noopener noreferrer"&gt;manage access keys for IAM users&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/securing_access-keys.html" target="_blank" rel="noopener noreferrer"&gt;secure access keys&lt;/a&gt; for best practices.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git" target="_blank" rel="noopener noreferrer"&gt;Git Installed&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Python 3.13+ installed.&lt;/li&gt; 
 &lt;li&gt;Node and npm installed.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/cdk/" target="_blank" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (AWS CDK)&lt;/a&gt; installed.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Running the solution&lt;/h2&gt; 
&lt;p&gt;The complete solution uses AWS CDK to implement two &lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt; stacks:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;BedrockKnowledgeBaseStack: Creates the knowledge base infrastructure&lt;/li&gt; 
 &lt;li&gt;SFNBatchInferenceStack: Implements the main processing workflow&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;First, clone the GitHub repository into your local development environment and install the requirements:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;git clone https://github.com/aws-samples/sample-step-functions-batch-inference.git .&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;cd sample-step-functions-batch-inference&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;npm install&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Next, deploy the solution using AWS CDK:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;cdk deploy --all&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;After deploying the cdk stacks, upload your data sources (PDF files) into the AWS CDK-created &lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon S3&lt;/a&gt; input bucket. In this example, I uploaded 500 Amazon Science papers. The input bucket name is included in the AWS CDK outputs:&lt;/p&gt; 
&lt;p&gt;Outputs:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;SFNBatchInference.BatchInputBucketName = sfnbatchinference-batchinputbucket11aaa222-nrjki8tewwww&lt;/code&gt;&lt;/p&gt; 
&lt;h3&gt;Parallel text extraction&lt;/h3&gt; 
&lt;p&gt;The process begins when you upload a manifest.json file to the input bucket. The manifest file lists the files for processing, which already exist in the input bucket. The filenames listed in manifest.json define what constitutes a single processing job run. To create another run, you would create a different manifest.json and upload it to the same S3 bucket.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-json"&gt;[
  {
    "filename": "flexecontrol-flexible-and-efficient-multimodal-control-for-text-to-image-generation.pdf"
  },
  {
    "filename": "adaptive-global-local-context-fusion-for-multi-turn-spoken-language-understanding.pdf"
  }
]
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The AWS CDK definition for the input bucket includes &lt;a href="https://aws.amazon.com/eventbridge/" target="_blank" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt; notifications and creates a rule that triggers the Step Functions workflow whenever a manifest.json file is uploaded.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ts"&gt;private createS3Buckets() {
    const batchBucket = new s3.Bucket(this, "BatchInputBucket", {
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    })
    batchBucket.enableEventBridgeNotification()

    new cdk.CfnOutput(this, "BatchInputBucketName", {
      value: batchBucket.bucketName,
      description: "Name of input bucket to send PDF documents that Textract will read.",
    })

    const manifestFileCreatedRule = new eventBridge.Rule(this, "ManifestFileCreatedRule", {
      eventPattern: {
        source: ["aws.s3"],
        detailType: ["Object Created"],
        detail: {
          bucket: {
            name: [batchBucket.bucketName],
          },
          object: {
            key: ["manifest.json"],
          },
        },
      },
    })

    return { batchBucket, manifestFileCreatedRule }
  }
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The first step in the Step Functions workflow is a Distributed Map run that performs the following actions for each PDF in the manifest file:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Starts an Amazon Textract job, providing an &lt;a href="https://aws.amazon.com/sns/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Notification Service (Amazon SNS)&lt;/a&gt; topic for completion notification.&lt;/li&gt; 
 &lt;li&gt;Writes the Step Functions task token to &lt;a href="https://aws.amazon.com/dynamodb/" target="_blank" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt;, pausing the individual child workflow.&lt;/li&gt; 
 &lt;li&gt;Processes the Amazon SNS message when the Amazon Textract job completes, triggering an &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; function.&lt;/li&gt; 
 &lt;li&gt;Uses a Lambda function to retrieve the task token from DynamoDB using the Amazon Textract JobId.&lt;/li&gt; 
 &lt;li&gt;Fetches the raw results from Amazon Textract, organizes the text for readability, and writes results to an S3 bucket&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-2.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25425 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="//d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-2.png" alt="First step in the Step Functions workflow" width="1429" height="896"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;A key component of this architecture is the callback pattern that Amazon Textract supports using the NotificationChannel option, as shown in the preceding figure. The AWS CDK definition the Step Functions state that starts the Amazon Textract job is shown in the following.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;div class="hide-language"&gt; 
  &lt;pre&gt;&lt;code class="lang-ts"&gt;const startTextractStep = new tasks.CallAwsService(this, "StartTextractJob", {
  service: "textract",
  action: "startDocumentAnalysis",
  resultPath: "$.textractOutput",
  parameters: {
    DocumentLocation: {
      S3Object: {
        Bucket: sourceBucket.bucketName,
        Name: sfn.JsonPath.stringAt("$.filename"),
      },
    },
    FeatureTypes: ["LAYOUT"],
    NotificationChannel: {
      RoleArn: textractRoleArn,
      SnsTopicArn: snsTopicArn,
    },
  },
  iamResources: ["*"],
})
&lt;/code&gt;&lt;/pre&gt; 
 &lt;/div&gt; 
&lt;/div&gt; 
&lt;p&gt;The Lambda function that handles task tokens extracts the Amazon Textract JobId from the Amazon SNS message, fetches the TaskToken from DynamoDB, and resumes the Step Functions Workflow by sending the TaskToken:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ts"&gt;from aws_lambda_powertools.utilities.data_classes import SNSEvent, event_source

@event_source(data_class=SNSEvent)
def handle_textract_task_complete(event, context):
    # Multiple records can be delivered in a single event
    for record in event.records:
        sns_message = json.loads(record.sns.message)
        textract_job_id = sns_message["JobId"]

        # Get both task token and original file from DynamoDB
        ddb_item = _get_item_from_ddb(textract_job_id)

        # Send both the job ID and original file name in the response
        _send_task_success(
            ddb_item["TaskToken"],
            {
                "TextractJobId": textract_job_id,
                "OriginalFile": ddb_item["OriginalFile"],
            },
        )
        # Delete the task token from DynamoDB after use
        _delete_item_from_ddb(textract_job_id)

def _send_task_success(task_token: str, output: None | dict = None) -&amp;gt; None:
    """Sends task success to Step Functions with the provided output"""
    sfn = boto3.client("stepfunctions")
    sfn.send_task_success(taskToken=task_token, output=json.dumps(output or {}))
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The Distributed Map runs up to 10 child workflows concurrently, controlled by the maxConcurrency setting. Although Step Functions supports running up to 10,000 child workflow executions, the practical concurrency for this solution is constrained by Amazon Textract quotas. The startDocumentAnalysis API has a default quota of 10 requests per second (RPS), which means you must consider this limit when scaling your document processing workloads and potentially request quota increases for higher throughput requirements.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ts"&gt;const distributedMap = new sfn.DistributedMap(this, "DistributedMap", {
  mapExecutionType: sfn.StateMachineType.STANDARD,
  maxConcurrency: 10,
  itemReader: new sfn.S3JsonItemReader({
    bucket: sourceBucket,
    key: "manifest.json",
  }),
  resultPath: "$.files",
}
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Running Amazon Bedrock batch inference&lt;/h3&gt; 
&lt;p&gt;When all of the Amazon Textract jobs finish, the Distributed Map state creates an Amazon Bedrock batch inference input file, launches the Amazon Bedrock inference job, and waits for it to complete.&lt;/p&gt; 
&lt;ol start="6"&gt; 
 &lt;li&gt;A Lambda function collects text results from Amazon S3 and creates an Amazon Bedrock batch inference input file with custom prompts.&lt;/li&gt; 
 &lt;li&gt;The workflow starts the Amazon Bedrock batch inference job by calling createModelInvocationJob and sending the batch inference input file as input.&lt;/li&gt; 
 &lt;li&gt;The workflow pauses and stores the task token in DynamoDB.&lt;/li&gt; 
 &lt;li&gt;An EventBridge rule matches completed Amazon Bedrock batch inference events, and upon job completion and triggers a Lambda function. The Lambda function retrieves the task token and resumes the workflow, as shown in the following figure.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-3.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25424 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-3.png" alt="Lambda function retrieves the task token and resumes the workflow" width="1429" height="1308"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;A batch inference input is a single jsonl file with multiple entries such as the following example. The prompt in each inference request instructs the large language model (LLM) to analyze the paper and extract metadata. Read the full &lt;a href="https://github.com/aws-samples/sample-step-functions-batch-inference/blob/956b5fc645c7de5f43d650d21ef9df011db67170/src/bedrock-batcher/handler.py#L41-L81" target="_blank" rel="noopener noreferrer"&gt;prompt template in the GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-json"&gt;{
  "recordId": "c1b8a3b2086141f963",
  "modelInput": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "text": "Analyze the following research paper transcript and extract metadata about code and dataset availability. Extract the following metadata from this research paper transcript:\n\n1. **has_code**: Does the paper mention or link to source code? (true/false) ...... Return only valid JSON matching the schema above. Do not include any text outside of the JSON structure."
          }
        ]
      }
    ],
    "inferenceConfig": { "maxTokens": 4096 }
  }
}
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Populating the Amazon Bedrock Knowledge Base&lt;/h3&gt; 
&lt;p&gt;After the batch inference completes, the workflow does the following:&lt;/p&gt; 
&lt;ol start="10"&gt; 
 &lt;li&gt;Extracts inference results and creates metadata files based on the Amazon Bedrock inference results (example metadata shown in the following figure).&lt;/li&gt; 
 &lt;li&gt;Starts an Amazon Bedrock Knowledge Base ingestion job.&lt;/li&gt; 
 &lt;li&gt;Monitors the ingestion job status using Step Functions Wait and Choice states.&lt;/li&gt; 
 &lt;li&gt;Sends a completion notification through Amazon SNS.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-4.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25423 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/26/computeblog-2442-4.png" alt="Populating the Amazon Bedrock Knowledge Base" width="1332" height="1689"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;The following shows the example metadata format:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-json"&gt;{
  "metadataAttributes": {
    "has_code": true,
    "has_dataset": false,
    "code_availability": "publicly_available",
    "dataset_availability": "not_available",
    "research_type": "methodology",
    "is_reproducible": true,
    "code_repository_url": "https://github.com/amazon-science/PIXELS"
  }
}
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Testing the knowledge base&lt;/h2&gt; 
&lt;p&gt;After the workflow completes successfully, you can test the knowledge base to verify that the documents and metadata have been properly ingested and are searchable. There are two practical methods for testing an Amazon Bedrock Knowledge Base:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Using the Console&lt;/li&gt; 
 &lt;li&gt;Using the AWS SDK to run a query&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Testing through the Console&lt;/h2&gt; 
&lt;p&gt;The Console provides an intuitive interface for testing your knowledge base queries with metadata filters:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://console.aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Knowledge Bases&lt;/strong&gt; under the &lt;strong&gt;Build &lt;/strong&gt;section.&lt;/li&gt; 
 &lt;li&gt;Choose the knowledge base created by the AWS CDK deployment (the name will be output by the AWS CDK stack).&lt;/li&gt; 
 &lt;li&gt;Choose the &lt;strong&gt;Test&lt;/strong&gt; button in the upper right corner.&lt;/li&gt; 
 &lt;li&gt;In the test interface, choose your preferred foundation model (FM) (such as Amazon Nova Pro).&lt;/li&gt; 
 &lt;li&gt;Expand the &lt;strong&gt;Configurations&lt;/strong&gt; column, then navigate to the &lt;strong&gt;Filters &lt;/strong&gt;section.&lt;/li&gt; 
 &lt;li&gt;Configure filters based on the extracted metadata, as shown in the following figure.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/computeblog-2442-5.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25422 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/computeblog-2442-5.png" alt="Configure filters based on the extracted metadata" width="383" height="247"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Enter a natural language query related to your documents, for example: “Recent research on retrieval augmented generation?”&lt;/p&gt; 
&lt;p&gt;The console displays the generated response along with source attributions showing which documents were retrieved and used to formulate the answer, filtered by your specified metadata attributes, as shown in the following figure.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/compute-2442-6.png"&gt;&lt;img loading="lazy" class="alignnone wp-image-25421 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/compute-2442-6.png" alt="A chat example" width="1095" height="1074"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;Testing via API&lt;/h2&gt; 
&lt;p&gt;For programmatic testing and integration into applications, use the AWS SDK with metadata filtering. The following is a Python example using boto3:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;model_arn = "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0"

# Query for papers with publicly available code
response = bedrock_agent_runtime.retrieve_and_generate(
    input={'text': "What recent research has been done on RAG?"},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': knowledge_base_id,
            'modelArn': model_arn,
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'filter': {"equals": {"key": "has_code", "value": True}},
                }
            },
        },
    },
)

# Display results
print(f"Response: {response['output']['text']}\n")
print("Source Documents:")

for citation in response.get('citations', []):
    for reference in citation.get('retrievedReferences', []):
        metadata = reference.get('metadata', {})
        print(f" Document: {reference['location']['s3Location']['uri']}\n")
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The following is the test script output:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;Response: Recent research on Retrieval-Augmented Generation (RAG) has focused on enhancing the system's ability to dynamically retrieve and utilize relevant information from a Vector Database (VDB) to improve decision-making and performance. Key innovations include:

1. **Dynamic Retrieval and Utilization**: The system is designed to query the VDB for contextually relevant past experiences, which significantly improves decision quality and accelerates performance by leveraging a growing repository of relevant experiences.

2. **Teacher-Student Instructional Tuning**: A novel mechanism where a Teacher agent refines a Student agent's core policy through direct interaction. The Teacher generates a modified SYSTEM prompt based on the Student's actions, creating a meta-learning loop that enhances the Student's reasoning policy over time.
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;This solution demonstrates how to combine multiple AWS AI and serverless services to build a scalable document processing pipeline. Organizations can use AWS Step Functions for orchestration, Amazon Textract for document processing, Amazon Bedrock batch inference for intelligent content analysis, and Amazon Bedrock Knowledge Bases for searchable storage. In turn, they can automate the extraction of insights from large document collections while optimizing costs.&lt;/p&gt; 
&lt;p&gt;Following this solution, you can build a solid foundation for production-scale document processing pipelines that maintain the flexibility to adapt to your specific requirements while making sure of reliability, scalability, and operational excellence. Follow this link to learn more about &lt;a href="https://serverlessland.com/" target="_blank" rel="noopener noreferrer"&gt;serverless architectures&lt;/a&gt;.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Node.js 24 runtime now available in AWS Lambda</title>
		<link>https://aws.amazon.com/blogs/compute/node-js-24-runtime-now-available-in-aws-lambda/</link>
					
		
		<dc:creator><![CDATA[Andrea Amorosi]]></dc:creator>
		<pubDate>Tue, 25 Nov 2025 22:19:46 +0000</pubDate>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[AWS Cloud Development Kit]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[AWS Serverless Application Model]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Lambda@Edge]]></category>
		<category><![CDATA[Serverless]]></category>
		<category><![CDATA[serverless]]></category>
		<guid isPermaLink="false">a0c9b23009a13ae2fa4ecabf27ace335fd9e3f25</guid>

					<description>You can now develop AWS Lambda&amp;nbsp;functions using Node.js&amp;nbsp;24, either as a managed runtime or using the container base image. Node.js 24 is in&amp;nbsp;active LTS status&amp;nbsp;and ready for production use. It is expected to be supported with security patches and bugfixes until April 2028. The Lambda runtime for Node.js 24 includes a new implementation of the […]</description>
										<content:encoded>&lt;p&gt;You can now develop &lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;&amp;nbsp;functions using &lt;a href="https://nodejs.org/" target="_blank" rel="noopener noreferrer"&gt;Node.js&lt;/a&gt;&amp;nbsp;24, either as a managed runtime or using the container base image. Node.js 24 is in&amp;nbsp;&lt;a href="https://nodejs.org/en/blog/release/v24.11.0" target="_blank" rel="noopener noreferrer"&gt;active LTS status&lt;/a&gt;&amp;nbsp;and ready for production use. It is expected to be supported with security patches and bugfixes until April 2028.&lt;/p&gt; 
&lt;p&gt;The Lambda runtime for Node.js 24 includes a new implementation of the Runtime Interface Client (RIC), which integrates your functions code with the Lambda service. Written in TypeScript, the new RIC streamlines and simplifies Node.js support in Lambda, removing several legacy features. In particular, callback-based function handlers are no longer supported.&lt;/p&gt; 
&lt;p&gt;Node.js 24 includes several additions to the language, such as &lt;a href="https://github.com/tc39/proposal-explicit-resource-management" target="_blank" rel="noopener noreferrer"&gt;Explicit Resource Management&lt;/a&gt;, as well as changes to the runtime implementation and the standard library. With this release, Node.js developers can take advantage of these new features and enhancements when creating serverless applications on Lambda.&lt;/p&gt; 
&lt;p&gt;You can develop Node.js 24 Lambda functions using the&amp;nbsp;&lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt;,&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK for JavaScript&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/serverless/sam/" target="_blank" rel="noopener noreferrer"&gt;AWS Serverless Application Model (AWS SAM)&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/cdk/" target="_blank" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (AWS CDK)&lt;/a&gt;, and other infrastructure as code tools. You can use Node.js 24 with&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/powertools/typescript/latest/" target="_blank" rel="noopener noreferrer"&gt;Powertools for AWS Lambda (TypeScript)&lt;/a&gt;, a developer toolkit to implement serverless best practices and increase developer velocity. Powertools includes libraries to support common tasks such as observability, &lt;a href="https://aws.amazon.com/systems-manager/" target="_blank" rel="noopener noreferrer"&gt;AWS Systems Manager&lt;/a&gt; Parameter Store integration, idempotency, batch processing,&amp;nbsp;&lt;a href="https://docs.powertools.aws.dev/lambda/typescript/latest/#features" target="_blank" rel="noopener noreferrer"&gt;and more&lt;/a&gt;. You can also use Node.js 24 with&amp;nbsp;&lt;a href="https://aws.amazon.com/lambda/edge/" target="_blank" rel="noopener noreferrer"&gt;Lambda@Edge&lt;/a&gt; to customize low-latency content delivered through&amp;nbsp;&lt;a href="https://aws.amazon.com/cloudfront/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudFront&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 24 runtime in your serverless applications.&lt;/p&gt; 
&lt;h2&gt;Node.js 24 runtime changes&lt;/h2&gt; 
&lt;p&gt;The Lambda Runtime for Node.js 24 includes the following changes relative to the Node.js 22 and earlier runtimes.&lt;/p&gt; 
&lt;h3&gt;Removing support for callback-based function handlers&lt;/h3&gt; 
&lt;p&gt;Starting with the Node.js 24 runtime, Lambda no longer supports the callback-based handler signature for asynchronous operations. Callback-based handlers take three parameters, with the third parameter a callback. For example:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;export const handler = (event, context, callback) =&amp;gt; {
    try {
        // Some processing...
        
        // Success case
        // First parameter (error) is null, second is the result
        callback(null, {
            statusCode: 200,
            body: JSON.stringify({
                message: "Operation completed successfully"
            })
        });
        
    } catch (error) {
        // Error case
        // First parameter contains the error
        callback(error);
    }
};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The modern approach to asynchronous programming in Node.js is to use the &lt;code&gt;async/await&lt;/code&gt; pattern. Lambda introduced support for &lt;code&gt;async&lt;/code&gt; handlers with the Node.js 8 runtime, launched in 2018. Here’s how the above function looks when using an &lt;code&gt;async&lt;/code&gt; handler:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;export const handler = async (event, context) =&amp;gt; {
    try {
	  // Some processing
        
        return {
            statusCode: 200,
            body: JSON.stringify({
                message: "Operation completed successfully"
            })
        };
        
    } catch (error) {
        throw error;
    }
};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The Node.js 24 runtime still supports synchronous function handlers that do not use callbacks:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;export const handler = (event, context) =&amp;gt; {
    // Perform some synchronous data processing
    // Return response
    return {
        statusCode: 200,
        body: JSON.stringify(response)
    };
};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;And Node.js 24 still supports response streaming, enabling more responsive applications by accelerating the time-to-first-byte:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;export const handler = awslambda.streamifyResponse(async (event, responseStream, context) =&amp;gt; {
    // Convert event to a readable stream
    const&amp;nbsp;requestStream = Readable.from(Buffer.from(JSON.stringify(event)));
    // Stream the response using pipeline
    await pipeline(requestStream, responseStream);
});&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This change to remove support for callback-based function handlers only affects Node.js 24 (and later) runtimes. Existing runtimes for Node.js 22 and earlier continue to support callback-based function handlers. When migrating functions that use callback-based handlers to Node.js 24, you need to modify your code to use one of the supported function handler signatures&lt;/p&gt; 
&lt;p&gt;As part of this change, &lt;code&gt;context.callbackWaitsForEmptyEventLoop&lt;/code&gt; is removed. In addition, the previously deprecated &lt;code&gt;context.succeed&lt;/code&gt;, &lt;code&gt;context.fail&lt;/code&gt;, and &lt;code&gt;context.done&lt;/code&gt; methods have also been removed. This aligns the runtime with modern Node.js patterns for clearer, more consistent error and result handling.&lt;/p&gt; 
&lt;h3&gt;Harmonizing streaming and non-streaming behavior for unresolved promises&lt;/h3&gt; 
&lt;p&gt;The Node.js 24 runtime also resolves a previous inconsistency in how unresolved promises were handled. Previously, Lambda would not wait for unresolved promises once the handler returns &lt;em&gt;except when using response streaming&lt;/em&gt;. Starting with Node.js 24, the response streaming behavior is now consistent with non-streaming behavior, and Lambda no longer waits for unresolved promises once your handler returns or the response stream ends. Any background work (for example, pending timers, fetches, or queued callbacks) is not awaited implicitly. If your response depends on additional asynchronous operations, ensure you await them in your handler or integrate them into the streaming pipeline before closing the stream or returning, so the response only completes after all required work has finished.&lt;/p&gt; 
&lt;h3&gt;Experimental Node.js features&lt;/h3&gt; 
&lt;p&gt;Node.js enables certain experimental features by default in the upstream language releases. Such features include support for importing modules using &lt;code&gt;require()&lt;/code&gt; in ECMAScript modules (ES modules) and automatically detecting ES vs CommonJS modules. As they are experimental, these features may be unstable or undergo breaking changes in future Node.js updates. To provide a stable experience, Lambda disables these features by default in the corresponding Lambda runtimes.&lt;/p&gt; 
&lt;p&gt;Lambda allows you to &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-nodejs.html#nodejs-experimental-features" target="_blank" rel="noopener noreferrer"&gt;re-enable these features&lt;/a&gt; by adding the &lt;code&gt;--experimental-require-module&lt;/code&gt; flag or the &lt;code&gt;--experimental-detect&lt;/code&gt;-module flag to the &lt;code&gt;NODE_OPTIONS&lt;/code&gt; environment variable. Enabling experimental Node.js features may affect performance and stability, and these features can change or be removed in future Node.js releases; such issues are not covered by AWS Support or the Lambda SLA.&lt;/p&gt; 
&lt;h3&gt;ES modules in CloudFormation inline functions&lt;/h3&gt; 
&lt;p&gt;With &lt;a href="https://aws.amazon.com/cloudformation/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt; inline functions, you provide your function code directly in the CloudFormation template. They’re particularly useful when &lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/walkthrough-lambda-backed-custom-resources.html" target="_blank" rel="noopener noreferrer"&gt;deploying custom resources&lt;/a&gt;. With inline functions, the code filename is always &lt;code&gt;index.js&lt;/code&gt;, which by default Node.js interprets as a CommonJS module. With the Node.js 24 runtime, you can use ES modules when authoring inline functions by passing the&lt;code&gt; --experimental-detect-module flag&lt;/code&gt; via the &lt;code&gt;NODE_OPTIONS&lt;/code&gt; environment variable. Previously, you needed a zip or container package to use ES modules. With Node.js 24, you can write inline functions using standard ESM syntax (&lt;code&gt;import/export&lt;/code&gt;) and top‑level await), which simplifies small utilities and bootstrap logic without requiring a packaging step.&lt;/p&gt; 
&lt;h2&gt;Node.js 24 language features&lt;/h2&gt; 
&lt;p&gt;Node.js 24 introduces several language updates and features that enhance developer productivity and improve application performance.&lt;/p&gt; 
&lt;p&gt;Node.js 24 includes Undici 7, a newer version of the HTTP client that powers global ⁠&lt;code&gt;fetch&lt;/code&gt;. This version brings performance improvements and broader protocol capabilities. Network‑heavy Lambda functions that call AWS services or external APIs can benefit from better connection management and throughput, especially when reusing clients or using HTTP/2 where supported. Most applications should work without changes, but you should validate behavior for advanced scenarios, such as custom headers or streaming bodies, and continue to define HTTP clients outside of the handler to maximize connection reuse across invocations.&lt;/p&gt; 
&lt;p&gt;The JavaScript Explicit Resource Management syntax (⁠&lt;code&gt;using&lt;/code&gt; and &lt;code&gt;⁠await using&lt;/code&gt;) enables deterministic clean-up of resources when a block completes. For Lambda handlers, this makes it easier to ensure short‑lived objects, such as streams, temporary buffers, or file handles, are disposed of promptly, which reduces the risk of resource leaks across warm invocations. You should continue to define long‑lived clients, for example SDK clients or database pools, outside the handler to benefit from connection reuse, and apply explicit disposal only to resources you want to tear down at the end of each invocation.&lt;/p&gt; 
&lt;p&gt;Finally, the ⁠&lt;code&gt;AsyncLocalStorage&lt;/code&gt; API now uses ⁠&lt;code&gt;AsyncContextFrame&lt;/code&gt; by default, improving the performance and reliability of async context propagation. This benefits common serverless patterns such as timers, correlating logs, managing tracing IDs and request‑scoped metadata across &lt;code&gt;async&lt;/code&gt; and &lt;code&gt;await&lt;/code&gt; boundaries, and streams without manual parameter threading. If you already use ⁠&lt;code&gt;AsyncLocalStorage&lt;/code&gt;‑based libraries for logging or observability, you may see lower overhead and more consistent context propagation in Node.js 24.&lt;/p&gt; 
&lt;p&gt;For a detailed overview of Node.js 24 language features, see the&amp;nbsp;&lt;a href="https://nodejs.org/en/blog/release/v24.0.0" target="_blank" rel="noopener noreferrer"&gt;Node.js 24 release blog post&lt;/a&gt;&amp;nbsp;and the&amp;nbsp;&lt;a href="https://github.com/nodejs/node/blob/main/doc/changelogs/CHANGELOG_V24.md" target="_blank" rel="noopener noreferrer"&gt;Node.js 24 changelog&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Performance considerations&lt;/h2&gt; 
&lt;p&gt;At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.&lt;/p&gt; 
&lt;p&gt;Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see our blog post&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/compute/optimizing-node-js-dependencies-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Optimizing Node.js dependencies in AWS Lambda&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Migration from earlier Node.js runtimes&lt;/h2&gt; 
&lt;p&gt;We’ve already discussed changes that are new to the Node.js 24 runtime, such as removing support for callback-based function handlers. As a reminder, we’ll recap some previous changes for customers upgrading from older Node.js functions.&lt;/p&gt; 
&lt;h3&gt;AWS SDK for JavaScript&lt;/h3&gt; 
&lt;p&gt;Up until Node.js 16, Lambda’s Node.js runtimes included the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS SDK for JavaScript version 2&lt;/a&gt;. This has since been superseded by the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/" target="_blank" rel="noopener noreferrer"&gt;AWS SDK for JavaScript version 3&lt;/a&gt;, which was&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/developer/modular-aws-sdk-for-javascript-is-now-generally-available/" target="_blank" rel="noopener noreferrer"&gt;released in December 2024&lt;/a&gt;.&amp;nbsp;Starting with Node.js 18, and continuing with Node.js 24, the Lambda Node.js runtimes include version 3. When upgrading from Node.js 16 or earlier runtimes and using the included version 2, you must &lt;a href="https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/migrating-to-v3.html" target="_blank" rel="noopener noreferrer"&gt;upgrade your code to use the v3 SDK&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;For optimal performance, and to have full control over your code dependencies, we recommend&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/developer/reduce-lambda-cold-start-times-migrate-to-aws-sdk-for-javascript-v3/" target="_blank" rel="noopener noreferrer"&gt;bundling and minifying the AWS SDK&lt;/a&gt;&amp;nbsp;in your deployment package, rather than using the SDK included in the runtime. For more information, see&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/compute/optimizing-node-js-dependencies-in-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Optimizing Node.js dependencies in AWS Lambda&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Amazon Linux 2023&lt;/h3&gt; 
&lt;p&gt;The Node.js 24 runtime is based on the&amp;nbsp;&lt;code&gt;provided.al2023&lt;/code&gt;&amp;nbsp;runtime, which is based on the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/linux/al2023/ug/minimal-container.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Linux 2023 minimal container image&lt;/a&gt;. The Amazon Linux 2023 minimal image uses&amp;nbsp;&lt;code&gt;microdnf&lt;/code&gt;&amp;nbsp;as a package manager,&amp;nbsp;symlinked&amp;nbsp;as&amp;nbsp;&lt;code&gt;dnf&lt;/code&gt;. This replaces the&amp;nbsp;yum&amp;nbsp;package manager used in Node.js 18 and earlier AL2-based images. If you deploy your Lambda function as a container image, you must update your Dockerfile to use&amp;nbsp;&lt;code&gt;dnf&lt;/code&gt;&amp;nbsp;instead of&amp;nbsp;&lt;code&gt;yum&lt;/code&gt;&amp;nbsp;when upgrading to the Node.js 24 base image from Node.js 18 or earlier.&lt;/p&gt; 
&lt;p&gt;Learn more about the&amp;nbsp;provided.al2023&amp;nbsp;runtime in the blog post&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/compute/introducing-the-amazon-linux-2023-runtime-for-aws-lambda/" target="_blank" rel="noopener noreferrer"&gt;Introducing the Amazon Linux 2023 runtime for AWS Lambda&lt;/a&gt;&amp;nbsp;and the&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/aws/amazon-linux-2023-a-cloud-optimized-linux-distribution-with-long-term-support/" target="_blank" rel="noopener noreferrer"&gt;Amazon Linux 2023 launch blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Using the Node.js 24 runtime in AWS Lambda&lt;/h2&gt; 
&lt;p&gt;Finally, we’ll review how to configure your functions to use Node.js 24, using a range of deployment tools.&lt;/p&gt; 
&lt;h3&gt;AWS Management Console&lt;/h3&gt; 
&lt;p&gt;When using the &lt;a href="https://console.aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda Console&lt;/a&gt;, you can choose &lt;strong&gt;Node.js 24.x&lt;/strong&gt; in the &lt;em&gt;Runtime&lt;/em&gt; dropdown when creating a function:&lt;/p&gt; 
&lt;div id="attachment_25400" style="width: 740px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/Creating-Node.js-function-in-the-AWS-Management-Console.png"&gt;&lt;img aria-describedby="caption-attachment-25400" loading="lazy" class="size-full wp-image-25400" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/Creating-Node.js-function-in-the-AWS-Management-Console.png" alt="Creating Node.js function in the AWS Management Console" width="730" height="559"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25400" class="wp-caption-text"&gt;Creating Node.js function in the AWS Management Console&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;To update an existing Lambda function to Node.js 24, navigate to the function in the Lambda console, click &lt;strong&gt;Edit&lt;/strong&gt; in the &lt;em&gt;Runtime settings&lt;/em&gt; panel, then choose&amp;nbsp;&lt;strong&gt;Node.js 24.x &lt;/strong&gt;from the&amp;nbsp;&lt;em&gt;Runtime&lt;/em&gt;&amp;nbsp;dropdown:&lt;/p&gt; 
&lt;div id="attachment_25401" style="width: 739px" class="wp-caption aligncenter"&gt;
 &lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/Editing-Node.js-function-runtime.png"&gt;&lt;img aria-describedby="caption-attachment-25401" loading="lazy" class="size-full wp-image-25401" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/25/Editing-Node.js-function-runtime.png" alt="Editing Node.js function runtime" width="729" height="490"&gt;&lt;/a&gt;
 &lt;p id="caption-attachment-25401" class="wp-caption-text"&gt;Editing Node.js function runtime&lt;/p&gt;
&lt;/div&gt; 
&lt;h3&gt;AWS Lambda container image&lt;/h3&gt; 
&lt;p&gt;Change the Node.js &lt;a href="https://gallery.ecr.aws/lambda/nodejs" target="_blank" rel="noopener noreferrer"&gt;base image version&lt;/a&gt; by modifying the &lt;code&gt;FROM&lt;/code&gt; statement in your Dockerfile.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;FROM public.ecr.aws/lambda/nodejs:24
# Copy function code
COPY lambda_handler.mjs ${LAMBDA_TASK_ROOT}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;AWS Serverless Application Model&lt;/h3&gt; 
&lt;p&gt;In&amp;nbsp;AWS SAM, set the&amp;nbsp;Runtime&amp;nbsp;attribute to&amp;nbsp;&lt;code&gt;node24.x&lt;/code&gt;&amp;nbsp;to use this version:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ruby"&gt;AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: nodejs24.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;AWS SAM supports generating this template with Node.js 24 for new serverless applications using the &lt;code&gt;sam init&lt;/code&gt; command. For more information, refer to the &lt;a href="https://docs.aws.amazon.com/serverless-application-model/" target="_blank" rel="noopener noreferrer"&gt;AWS SAM&amp;nbsp;documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;AWS Cloud Development Kit (AWS CDK)&lt;/h3&gt; 
&lt;p&gt;In&amp;nbsp;AWS CDK, set the runtime attribute to&amp;nbsp;&lt;code&gt;Runtime.NODEJS_24_X&lt;/code&gt;&amp;nbsp;to use this version.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";
export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    // The code that defines your stack goes here
    // The Node.js 24 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node24LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_24_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",
    });
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;AWS Lambda now supports Node.js 24 as a managed runtime and container base image. This release uses a new runtime interface client, removes support for callback-based function handlers, and includes several other changes to streamline and simplify Node.js support in Lambda.&lt;/p&gt; 
&lt;p&gt;You can build and deploy functions using Node.js 24 using the&amp;nbsp;AWS Management Console,&amp;nbsp;AWS CLI,&amp;nbsp;AWS SDK,&amp;nbsp;AWS SAM,&amp;nbsp;AWS CDK, or your choice of infrastructure as code tool. You can also use the&amp;nbsp;&lt;a href="https://gallery.ecr.aws/lambda/nodejs" target="_blank" rel="noopener noreferrer"&gt;Node.js 24 container base image&lt;/a&gt;&amp;nbsp;if you prefer to build and deploy your functions using container images.&lt;/p&gt; 
&lt;p&gt;To find more Node.js examples, use the&amp;nbsp;&lt;a href="https://serverlessland.com/patterns?language=Node.js" target="_blank" rel="noopener noreferrer"&gt;Serverless Patterns Collection&lt;/a&gt;. For more serverless learning resources, visit&amp;nbsp;&lt;a href="https://serverlessland.com/%22%20/t%20%22_blank" target="_blank" rel="noopener noreferrer"&gt;Serverless Land&lt;/a&gt;&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Performance benefits of new Amazon EC2 R8a memory-optimized instances</title>
		<link>https://aws.amazon.com/blogs/compute/performance-benefits-of-new-amazon-ec2-r8a-memory-optimized-instances/</link>
					
		
		<dc:creator><![CDATA[Tyler Jones]]></dc:creator>
		<pubDate>Tue, 25 Nov 2025 19:32:37 +0000</pubDate>
				<category><![CDATA[Compute]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[AMD]]></category>
		<category><![CDATA[MySQL]]></category>
		<guid isPermaLink="false">394b96335fdb5c6d7dbdb59d9e32010b282512d7</guid>

					<description>Recently we announced the availability of Amazon Elastic Compute Cloud (Amazon EC2) R8a instances, the latest addition to the AMD memory-optimized instance family. These instances are powered by the 5th Generation AMD EPYC (codename Turin) processors with a maximum frequency of 4.5 GHz. In this post I take these instances for a spin and benchmark MySQL later on, but first I discuss the top things you should know about these instances.</description>
										<content:encoded>&lt;p&gt;Recently we announced the availability of &lt;a href="https://aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt; R8a instances, the latest addition to the AMD memory-optimized instance family. These instances are powered by the 5th Generation AMD EPYC (codename Turin) processors with a maximum frequency of 4.5 GHz. In this post I take these instances for a spin and benchmark MySQL later on, but first I discuss the top things you should know about these instances.&lt;/p&gt; 
&lt;h2&gt;Notable characteristics of R8a instances&lt;/h2&gt; 
&lt;p&gt;Each vCPU on an R8a instance corresponds to a physical CPU core (something we started on 7th generation AMD instances). This means that there is no simultaneous multi-threading (SMT).&amp;nbsp;Each vCPU mapped to a dedicated physical core, which means that you get more predictable and consistent performance because there’s no resource sharing or potential interference between threads, which is particularly crucial for performance-sensitive workloads where consistent latency is essential. When evaluating and adopting R8a instances, make sure that you’re re-evaluating your thresholds for CPU usage. You can likely squeeze more out of each instance’s CPU without impacting any of your workload’s SLA metrics.&lt;/p&gt; 
&lt;p&gt;R8a instances feature sizes of up to 192 vCPU with 1,536 GiB RAM. The following table shows the detailed specs:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Instance size&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;vCPU&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Memory (GiB)&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Instance storage&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Network bandwidth (Gbps)&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;EBS bandwidth (Gbps)&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.medium&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 12.5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 10&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.large&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 12.5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 10&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;4&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 12.5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 10&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.2xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;64&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 15&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 10&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.4xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;128&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 15&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 10&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.8xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;256&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;15&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.12xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;48&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;384&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;22.5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;15&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.16xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;64&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;512&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;30&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;20&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.24xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;96&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;768&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;40&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;30&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.48xlarge&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;192&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1536&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;75&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;60&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.metal-24xl&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;96&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;768&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;40&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;30&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;r8a.metal-48xl&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;192&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1536&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EBS Only&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;75&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;60&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;Testing MySQL performance using HammerDB&lt;/h2&gt; 
&lt;p&gt;R8a instances are a great choice for MySQL databases,&amp;nbsp;so I thought that would be a great place to showcase some of these instances capabilities.&amp;nbsp;To test MySQL, I used a series of scripts written by my colleagues to track MySQL performance across software versions and different EC2 instances. These scripts are stored in the&amp;nbsp;&lt;a href="https://github.com/aws/repro-collection" target="_blank" rel="noopener noreferrer"&gt;repro-collection&lt;/a&gt;&amp;nbsp;repository, which is an open source, extensible framework for performance testing that addresses real-world workloads rather than micro-benchmarks. It is built to provide a performance measurement reference usable across multiple organizations, and it’s currently centered on MySQL and actively used in discussions with Linux Kernel developers and maintainers. Furthermore, it helps track any performance impacts created by code changes to MySQL. The scripts contained in this repository set up a MySQL database to be tested, and a load generator running the&amp;nbsp;&lt;a href="https://www.hammerdb.com/" target="_blank" rel="noopener noreferrer"&gt;HammerDB&lt;/a&gt; benchmark. &lt;/p&gt; 
&lt;p&gt;For this benchmark I used an r6a.24xlarge instance for the load generator, and an r6a.xlarge, r7a.xlarge, and r8a.xlarge instances for the MySQL database server all deployed in the same &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/" target="_blank" rel="noopener noreferrer"&gt;AWS Availability Zone (AZ&lt;/a&gt;). I chose a single AZ setup to minimize any latency variability from crossing multiple AZs. This is not meant to be a production-like setup, and I highly recommend using multiple AZs for production workloads. Each MySQL instance was tested separately using the same HammerDB load generator. Each test was run three times, and the results were averaged across the three runs. A diagram of the architecture is shown in the following figure:&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-solution-overview-transparent-bg.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25289" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-solution-overview-transparent-bg.png" alt="Performance testing architecture showing r6a/r7a/r8a instance types with HammerDB load generator executing 9 test runs" width="1430" height="1522"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;HammerDB overall results&lt;/h3&gt; 
&lt;p&gt;R8a instances show great results in the HammerDB benchmark for MySQL databases. For HammerDB’s overall score category, R8a instances outscored R7a instances by 55% and outscored R6a instances by 74%.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-OverallHammerDBScore-transparent-bg.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25287" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-OverallHammerDBScore-transparent-bg.png" alt="Performance comparison chart showing r6a, r7a, and r8a instance scores" width="1431" height="953"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;HammerDB transactions per minute test&lt;/h3&gt; 
&lt;p&gt;R8a instances also showed a notable improvement in this category. When compared to previous generation R7a instances,&amp;nbsp;R8a out performed R7a by 32%. When compared to R6a instances, R8a outperformed by 63%.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-HammerDBTPM-white-bg.jpg"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25286" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-HammerDBTPM-white-bg-scaled.jpg" alt=" Performance comparison showing r6a (91,105), r7a (112,686), and r8a (148,478) transactions per minute" width="2560" height="1708"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;HammerDB P99 latency results&lt;/h3&gt; 
&lt;p&gt;R8a instances showed improvement in P99 latency results, showing the efficiency gains driven by the new 5th Generation AMD EPYC CPUs and higher memory bandwidth.&amp;nbsp;R8a shows an 14% latency reduction when compared to R7a, and a 25% latency reduction when compared to R6a.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-HammerDBP99Latency-transparent-bg.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-25283" src="https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2025/11/18/benefits-r8a-HammerDBP99Latency-transparent-bg.png" alt="P99 latency comparison showing decrease from 39.93ms (r6a) to 30.02ms (r8a) across instance generations" width="1431" height="953"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Built on the &lt;a href="https://aws.amazon.com/ec2/nitro/" target="_blank" rel="noopener noreferrer"&gt;AWS Nitro System&lt;/a&gt; using sixth generation Nitro Cards, R8a instances are ideal for high performance, memory-intensive workloads, such as SQL and NoSQL databases, as demonstrated by the bench-marking shown in this post, as well as distributed web scale in-memory caches, in-memory databases, real-time big data analytics, and Electronic Design Automation (EDA) applications. R8a instances offer 12 sizes, including 2 bare metal sizes. Amazon EC2 R8a instances are SAP-certified, and providing 38% more SAPS when compared to R7a instances. If you’re still running 6th generation R6a instances, then I highly encourage you to migrate to the 8th generation instances to use their clear price performance benefits. Staying on modern infrastructure is a great way to drive down costs and provide more features for your customers, and there are clear gains to be had based on the testing shown in this post.&lt;/p&gt; 
&lt;p&gt;Start optimizing your high performance memory intensive workloads today by migrating to R8a instances. Visit the Amazon &lt;a href="https://aws.amazon.com/ec2/instance-types/r8a/" target="_blank" rel="noopener noreferrer"&gt;EC2 R8a instances &lt;/a&gt;page to learn more and get started on your upgrades to use the increased price performance of R8a instances today!&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
	</channel>
</rss>