<?xml version="1.0" encoding="UTF-8" standalone="no"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" version="2.0">

<channel>
	<title>Artificial Intelligence</title>
	<atom:link href="https://aws.amazon.com/blogs/machine-learning/feed/" rel="self" type="application/rss+xml"/>
	<link>https://aws.amazon.com/blogs/machine-learning/</link>
	<description>Official Machine Learning Blog of Amazon Web Services</description>
	<lastBuildDate>Thu, 09 Apr 2026 17:33:28 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Understanding Amazon Bedrock model lifecycle</title>
		<link>https://aws.amazon.com/blogs/machine-learning/understanding-amazon-bedrock-model-lifecycle/</link>
					
		
		<dc:creator><![CDATA[Saurabh Trikande]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 17:33:28 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Machine Learning]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<guid isPermaLink="false">0b68fb49f6a084b3383633311e7892a1cb1081eb</guid>

					<description>This post shows you how to manage FM transitions in Amazon Bedrock, so you can make sure your AI applications remain operational as models evolve. We discuss the three lifecycle states, how to plan migrations with the new extended access feature, and practical strategies to transition your applications to newer models without disruption.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/?trk=aa671e0d-f774-4103-92f0-f97df26c3d18&amp;amp;sc_channel=ps&amp;amp;ef_id=Cj0KCQjw5c_FBhDJARIsAIcmHK8qIvVSmisklV0C8S7K1NfurpscXev5-MwMRa_bVlmoQMJc-KThqrEaAvpmEALw_wcB:G:s&amp;amp;s_kwcid=AL!4422!3!770399507977!e!!g!!amazon%20bedrock!22922842282!188068867590&amp;amp;gad_campaignid=22922842282&amp;amp;gbraid=0AAAAADjHtp9QbXGfqIwjJyk2XIx5VqwNt&amp;amp;gclid=Cj0KCQjw5c_FBhDJARIsAIcmHK8qIvVSmisklV0C8S7K1NfurpscXev5-MwMRa_bVlmoQMJc-KThqrEaAvpmEALw_wcB" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; regularly releases new foundation model (FM) versions with better capabilities, accuracy, and safety. Understanding the model lifecycle is essential for effective planning and management of AI applications built on Amazon Bedrock. Before migrating your applications, you can test these models through the Amazon Bedrock console or API to evaluate their performance and compatibility.&lt;/p&gt; 
&lt;p&gt;This post shows you how to manage FM transitions in Amazon Bedrock, so you can make sure your AI applications remain operational as models evolve. We discuss the three lifecycle states, how to plan migrations with the new extended access feature, and practical strategies to transition your applications to newer models without disruption.&lt;/p&gt; 
&lt;h2&gt;Amazon Bedrock model lifecycle overview&lt;/h2&gt; 
&lt;p&gt;A model offered on Amazon Bedrock can exist in one of three states: &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-lifecycle.html" target="_blank" rel="noopener noreferrer"&gt;Active, Legacy, or End-of-Life (EOL)&lt;/a&gt;. Their current status is visible both on the Amazon Bedrock console and in API responses. For example, when you make a &lt;a href="https://alpha.www.docs.aws.a2z.com/bedrock/latest/APIReference/API_runtime_GetFoundationModel.html" target="_blank" rel="noopener noreferrer"&gt;GetFoundationModel&lt;/a&gt; or &lt;a href="https://alpha.www.docs.aws.a2z.com/bedrock/latest/APIReference/API_runtime_ListFoundationModels.html" target="_blank" rel="noopener noreferrer"&gt;ListFoundationModels&lt;/a&gt; call, the state of the model will be shown in the &lt;code&gt;modelLifecycle&lt;/code&gt; field in the response.&lt;/p&gt; 
&lt;p&gt;The following diagram illustrates the details around each model state.&lt;/p&gt; 
&lt;p&gt;&lt;img class="alignnone size-full wp-image-122007" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/12/22/ml-19718-image-1.png" alt="" width="1588" height="762"&gt;&lt;/p&gt; 
&lt;p&gt;The state details are as follows:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;ACTIVE&lt;/strong&gt; – Active models receive ongoing maintenance, updates, and bug fixes from their providers. While a model is &lt;code&gt;Active&lt;/code&gt;, you can use it for inference through APIs like &lt;code&gt;InvokeModel&lt;/code&gt; or &lt;code&gt;Converse&lt;/code&gt;, customize it (if supported), and request quota increases through AWS Service Quotas.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;LEGACY&lt;/strong&gt; – When a model provider transitions a model to &lt;code&gt;Legacy&lt;/code&gt; state, Amazon Bedrock will notify customers with at least 6 months’ advance notice before the EOL date, providing essential time to plan and execute a migration to newer or alternative model versions. During the &lt;code&gt;Legacy&lt;/code&gt; period, existing customers can continue using the model, though new customers might be unable to access it, and existing customers might lose access for inactive accounts if they do not call the model for a period of 15 days or more. Organizations should note that creating new provisioned throughput by model units becomes unavailable, and model customization capabilities might face restrictions. For models with EOL dates after February 1, 2026, Amazon Bedrock introduces an additional phase within the &lt;code&gt;Legacy&lt;/code&gt; state: 
  &lt;ul&gt; 
   &lt;li&gt;&lt;strong&gt;Public extended access period&lt;/strong&gt; – After spending a minimum of 3 months in &lt;code&gt;Legacy&lt;/code&gt; status, the model enters this extended access phase. Active users can continue using it for at least another 3 months until EOL. During extended access, quota increase requests through AWS Service Quotas are not expected to be approved, so plan your capacity needs before a model enters this phase. During this period, pricing may be adjusted (see Pricing during extended access below), and customers will receive notifications about the transition date and any changes.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;END-OF-LIFE (EOL)&lt;/strong&gt; – When a model reaches its EOL date, it becomes completely inaccessible across all AWS Regions unless specifically noted in the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-lifecycle.html#versions-for-eol" target="_blank" rel="noopener noreferrer"&gt;EOL list&lt;/a&gt;. API requests to EOL models will fail, rendering them unavailable to most customers unless special arrangements exist between the customer and provider for continued access. The transition to EOL requires proactive customer action—migration doesn’t happen automatically. Organizations must update their application code to use alternative models before the EOL date arrives. When EOL is reached, the model becomes completely inaccessible for most customers.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;After a model launches on Amazon Bedrock, it remains available for at least 12 months after launch and stays in &lt;code&gt;Legacy&lt;/code&gt; state for at least 6 months before EOL. This timeline helps customers plan migrations without rushing.&lt;/p&gt; 
&lt;h2&gt;Pricing during extended access&lt;/h2&gt; 
&lt;p&gt;During the extended access period, pricing may be adjusted by the model provider. If pricing changes are planned, you will be notified in the initial legacy announcement and before any subsequent changes take effect, so there will be no surprise retroactive price increases. Customers with existing private pricing agreements with model providers or those using provisioned throughput will continue to operate under their current pricing terms during the extended access period. This makes sure customers who have made specific arrangements with model providers or invested in provisioned capacity will not be unexpectedly affected by any pricing changes.&lt;/p&gt; 
&lt;h2&gt;Communication Process for Model State Changes&lt;/h2&gt; 
&lt;p&gt;Customers will receive a notification 6 months prior to a model’s EOL date when the model provider transitions a model to Legacy state. This proactive communication approach ensures that customers have sufficient time to plan and execute their migration strategies before a model becomes EOL.&lt;br&gt; Notifications include details about the model being deprecated, important dates, extended access availability, and when the model will be EOL. AWS uses multiple channels to ensure these important communications reach the right people, including:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Email notifications&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://health.console.aws.amazon.com/health/home#/account/dashboard/scheduled-changes?viewType=table"&gt;AWS Health Dashboard&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Alerts in the Amazon Bedrock console&lt;/li&gt; 
 &lt;li&gt;Programmatic access through the API.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127325" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/Picture-1.png" alt="" width="838" height="618"&gt;&lt;/h2&gt; 
&lt;p&gt;To make sure you receive these notifications, verify and configure your account contact email addresses. By default, notifications are sent to your account’s root user email and alternate contacts (operations, security, and billing). You can review and update these contacts on your &lt;a href="https://console.aws.amazon.com/billing/home#/account"&gt;AWS Account page&lt;/a&gt; in the Alternate contacts section. To add additional recipients or delivery channels (such as Slack or email distribution lists), go to the &lt;a href="https://console.aws.amazon.com/notifications"&gt;AWS User Notifications console&lt;/a&gt; and choose AWS managed notifications subscriptions to manage your delivery channels and account contacts. If you are not receiving expected notifications, check that your email addresses are correctly configured in these settings and that notification emails from health@aws.com are not being filtered by your email provider.&lt;/p&gt; 
&lt;h2&gt;Migration strategies and best practices&lt;/h2&gt; 
&lt;p&gt;When migrating to a newer model, update your application code and check that your &lt;a href="https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html" target="_blank" rel="noopener noreferrer"&gt;service quotas&lt;/a&gt; can handle expected volume. Planning ahead helps you transition smoothly with minimal disruption.&lt;/p&gt; 
&lt;h3&gt;Planning your migration timeline&lt;/h3&gt; 
&lt;p&gt;Start planning as soon as a model enters &lt;code&gt;Legacy&lt;/code&gt; state:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Assessment phase&lt;/strong&gt; – Evaluate your current usage of the legacy model, including which applications depend on it, typical request patterns, and specific behaviors or outputs that your applications rely on.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Research phase&lt;/strong&gt; – Investigate the recommended replacement model, understanding its capabilities, differences from the legacy model, new features that could enhance your applications, and the new model’s Regional availability. Review API changes and documentation.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Testing phase&lt;/strong&gt; – Conduct thorough testing with the new model and compare performance metrics between models. This helps identify adjustments needed in your application code or prompt engineering.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Migration phase&lt;/strong&gt; – Implement changes using a phased deployment approach. Monitor system performance during transition and maintain rollback capability.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Operational phase&lt;/strong&gt; – After migration, continuously monitor your applications and user feedback to make sure they’re performing as expected with the new model.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Technical migration steps&lt;/h3&gt; 
&lt;p&gt;Test your migration thoroughly:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Update API references&lt;/strong&gt; – Modify your application code to reference the new model ID. For example, changing from &lt;code&gt;anthropic.claude-3-5-sonnet-20240620-v1:0&lt;/code&gt; to &lt;code&gt;anthropic.claude-sonnet-4-5-20250929-v1:0&lt;/code&gt; or &lt;a href="https://aws.amazon.com/blogs/machine-learning/unlock-global-ai-inference-scalability-using-new-global-cross-region-inference-on-amazon-bedrock-with-anthropics-claude-sonnet-4-5/" target="_blank" rel="noopener noreferrer"&gt;global cross-Region inference&lt;/a&gt; &lt;code&gt;global.anthropic.claude-sonnet-4-5-20250929-v1:0&lt;/code&gt;. Update prompt structures according to new model’s best practices. For more detailed guidance, refer to &lt;a href="https://aws.amazon.com/blogs/machine-learning/migrate-from-anthropics-claude-sonnet-3-x-to-claude-sonnet-4-x-on-amazon-bedrock/" target="_blank" rel="noopener noreferrer"&gt;Migrate from Anthropic’s Claude Sonnet 3.x to Claude Sonnet 4.x on Amazon Bedrock&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Request quota increases&lt;/strong&gt; – Before fully migrating, make sure you have sufficient quotas for the new model by requesting increases through the AWS Service Quotas console if necessary.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Adjust prompts&lt;/strong&gt; – Newer models might respond differently to the same prompts. Review and refine your prompts accordingly to the new model specifications. You can also use tools such as the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-management-optimize.html" target="_blank" rel="noopener noreferrer"&gt;prompt optimizer in Amazon Bedrock&lt;/a&gt; to assist with rewriting your prompt for the target model.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Update response handling&lt;/strong&gt; – If the new model returns responses in a different format or with different characteristics, update your parsing and processing logic accordingly.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Optimize token usage&lt;/strong&gt; – Take advantage of efficiency improvements in newer models by reviewing and optimizing your token usage patterns. For example, models that support &lt;a href="https://aws.amazon.com/bedrock/prompt-caching/" target="_blank" rel="noopener noreferrer"&gt;prompt caching&lt;/a&gt; can reduce the cost and latency of your invocations.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Testing strategies&lt;/h3&gt; 
&lt;p&gt;Thorough testing is critical for a successful migration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Side-by-side comparison&lt;/strong&gt; – Run the same requests against both the legacy and new models to compare outputs and identify any differences that might affect your application. For production environments, consider shadow testing—sending duplicate requests to the new model alongside your existing model without affecting end-users. With this approach, you can evaluate model performance, latency and errors rates, and other operational factors before full migration. Perform A/B testing for user impact assessment by routing a controlled percentage of live traffic to the new model while monitoring key metrics such as user engagement, task completion rates, satisfaction scores, and business KPIs.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Performance testing&lt;/strong&gt; – Measure response times, token usage, and other performance metrics to understand how the new model performs compared to the legacy version. Validate business-specific success metrics.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Regression and edge case testing&lt;/strong&gt; – Make sure existing functionality continues to work as expected with the new model. Pay special attention to unusual or complex inputs that might reveal differences in how the models handle challenging scenarios.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The model lifecycle policy in Amazon Bedrock gives you clear stages for managing FM evolution. Transition periods offer extended access options, and provisions for fine-tuned models help you balance innovation with stability.&lt;/p&gt; 
&lt;p&gt;Stay informed about model states through the AWS Health Dashboard, plan migrations when models enter the &lt;code&gt;Legacy&lt;/code&gt; state, and test newer versions thoroughly. These guidelines can help you maintain continuity in your AI applications while using improved capabilities in newer models.&lt;/p&gt; 
&lt;p&gt;If you have further questions or concerns, reach out to your AWS team. We want to help you and facilitate a smooth transition as you continue to take advantage of the latest advancements in FM technology.&lt;/p&gt; 
&lt;p&gt;For continued learning and implementation support, explore the official &lt;a href="http://docs.aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;AWS Bedrock documentation&lt;/a&gt; for comprehensive guides and &lt;a href="http://docs.aws.amazon.com/bedrock/latest/APIReference/" target="_blank" rel="noopener noreferrer"&gt;API references&lt;/a&gt;. Additionally, visit the &lt;a href="https://aws.amazon.com/blogs/machine-learning/" target="_blank" rel="noopener noreferrer"&gt;AWS Machine Learning Blog&lt;/a&gt; and AWS Architecture Center for real-world case studies, &lt;a href="https://aws.amazon.com/blogs/machine-learning/migrate-from-anthropics-claude-3-5-sonnet-to-claude-4-sonnet-on-amazon-bedrock/" target="_blank" rel="noopener noreferrer"&gt;migration best practices&lt;/a&gt;, and reference architectures that can help optimize your model lifecycle management strategy.&lt;/p&gt; 
&lt;hr&gt; 
&lt;h3&gt;About the authors&lt;/h3&gt; 
&lt;p style="clear: both"&gt;&lt;strong&gt;&lt;img loading="lazy" class="size-full wp-image-38198 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/06/15/Saurabh-Trikande.jpg" alt="" width="100" height="118"&gt;Saurabh Trikande&lt;/strong&gt; is a Senior Product Manager for Amazon Bedrock and Amazon SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.&lt;/p&gt; 
&lt;p style="clear: both"&gt;&lt;strong&gt;&lt;img loading="lazy" class="size-full wp-image-116211 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/09/10/melanie_ml19602.png" alt="Melanie" width="100" height="133"&gt;Melanie Li&lt;/strong&gt;, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions using state-of-the-art AI/ML tools. She has been actively involved in multiple generative AI initiatives across APJ, harnessing the power of LLMs. Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.&lt;/p&gt; 
&lt;p style="clear: both"&gt;&lt;strong&gt;&lt;img loading="lazy" class="alignleft size-thumbnail wp-image-117485" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/10/03/derrchoo-100x133.jpg" alt="" width="100" height="133"&gt;Derrick Choo&lt;/strong&gt;&amp;nbsp;is a Senior Solutions Architect at AWS who accelerates enterprise digital transformation through cloud adoption, AI/ML, and generative AI solutions. He specializes in full-stack development and ML, designing end-to-end solutions spanning frontend interfaces, IoT applications, data integrations, and ML models, with a particular focus on computer vision and multi-modal systems.&lt;/p&gt; 
&lt;p style="clear: both"&gt;&lt;strong&gt;&lt;img loading="lazy" class="wp-image-117483 size-thumbnail alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/10/03/jldean-100x133.jpg" alt="" width="100" height="133"&gt;Jared Dean &lt;/strong&gt;is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.&lt;/p&gt; 
&lt;p style="clear: both"&gt;&lt;img loading="lazy" class="size-full wp-image-105388 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/04/28/jbodea.jpeg" alt="" width="100" height="133"&gt;&lt;strong&gt;Julia Bodia&lt;/strong&gt; is Principal Product Manager for Amazon Bedrock.&lt;/p&gt; 
&lt;p style="clear: both"&gt;&lt;img loading="lazy" class="wp-image-127557 size-full alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/01/pooja-1.jpeg" alt="" width="100" height="133"&gt;&lt;strong&gt;Pooja Rao&lt;/strong&gt; is a Senior Program Manager at AWS, leading quota and capacity management and supporting business development for the Bedrock Go-To-Market team. Outside of work, she enjoys reading, traveling, and spending time with her family.&lt;/p&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>The future of managing agents at scale: AWS Agent Registry now in preview</title>
		<link>https://aws.amazon.com/blogs/machine-learning/the-future-of-managing-agents-at-scale-aws-agent-registry-now-in-preview/</link>
					
		
		<dc:creator><![CDATA[Preethi C N]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 17:28:20 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<guid isPermaLink="false">3f19bed60e801f394bcf1bb23adf8920f84fd039</guid>

					<description>Today, we're announcing AWS Agent Registry (preview) in AgentCore, a single place to discover, share, and reuse AI agents, tools, and agent skills across your enterprise.</description>
										<content:encoded>&lt;p&gt;&lt;em&gt;Now available through Amazon Bedrock AgentCore, use AWS Agent Registry to discover, share, and reuse agents, tools, and agent skills across your organization.&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;As enterprises scale to hundreds or thousands of agents, platform teams face three critical challenges: visibility (knowing what agents exist across the organization), control (governing who can publish and what becomes discoverable organization-wide), and reuse (preventing teams from rebuilding capabilities that already exist). Without a centralized system, agent sprawl accelerates, compliance risks grow, and development effort is wasted on duplicate work. These challenges are compounded by reality: no organization’s agent landscape lives entirely within one provider. Agents are built across AWS services, other cloud platforms, and on-premises environments. A registry that only covers part of the stack leaves the rest invisible, and invisible agents can’t be discovered, governed, or reused. Solving this requires more than a place to list what exists. Platform teams need to build agents, publish them with approval workflows, help teams to discover and reuse what exists, govern who can publish and consume, monitor what’s running in production, and retire what’s no longer needed. Today, we’re announcing AWS Agent Registry (preview) in AgentCore, a single place to discover, share, and reuse AI agents, tools, and agent skills across your enterprise.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/agentcore/" target="_blank" rel="noopener noreferrer"&gt;AgentCore&lt;/a&gt; is the platform to build, connect, and optimize agents at scale, designed from the ground up for agents: open to any model, any framework, any enterprise architecture. Whether you’re shipping your first agent or your thousandth, you have one platform that scales with you. The registry extends that same flexibility to how you organize and govern what you’ve built. It indexes agents regardless of where they’re built or hosted – on AWS, other cloud providers, or on premises.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;What’s available in preview today&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The registry stores metadata for every agent, tool, MCP server, agent skill, and custom resources as a structured record. It captures who published each record, what protocols it implements, what it exposes, and how to invoke it. The registry supports established standards like MCP and A2A natively, with the flexibility to define custom schemas for your organization. There are two ways to register a record. You can provide metadata manually through the console, AWS SDK, or API, specifying capability descriptions, ownership, compliance status, and usage documentation. Or you can point to an MCP or A2A endpoint, and the registry will automatically pull in the details. Your registry can reflect your full agent landscape from day one, not only the pieces that happen to run on AWS.&lt;/p&gt; 
&lt;p&gt;The registry is accessible through the AgentCore &lt;a href="https://us-east-1.console.aws.amazon.com/bedrock-agentcore/registry?region=us-east-1" target="_blank" rel="noopener"&gt;Console&lt;/a&gt;, APIs, and as an MCP server. Any MCP-compatible client can query it directly, including Kiro and Claude Code. For organizations with custom identity providers, OAuth-based access means that teams can build their own discovery UIs without requiring IAM credentials.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-128059 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/08/ml-20825-image-1-new.png" alt="" width="3024" height="1844"&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;&lt;em&gt;Finding what already exists&lt;/em&gt;&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Without a central registry, developers search externally for third-party tools or duplicate work that a neighboring team already shipped. You lose visibility into what’s been built, who owns it, and whether it’s approved for use. The registry solves this with hybrid search that combines keyword and semantic matching: all queries use keyword matching, but longer, natural language queries also use semantic understanding to surface conceptually related results. This means a search for “payment processing” surfaces tools tagged as “billing” or “invoicing,” even if they’re named differently. Discovery becomes the path of least resistance. Teams can search by name, descriptions, and resource type to find what already exists before building something new. Developers search the registry first. If a vetted capability exists, they use it. If it doesn’t, they build it, register it, and make it available to everyone else. You can see what exists across your organization.&lt;/p&gt; 
&lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;For Zuora, an AI-first monetization and revenue management platform deploying 50 agents across Sales, Finance, Product, and Developer teams, the AWS Agent Registry in AgentCore gives Principal Architects a unified view to discover, manage, and catalog every agent, tool, and skill in use. This centralized approach enables teams to find and reuse existing assets rather than rebuilding from scratch. Standardized metadata ensures each agent and tool includes consistent details on ownership and capabilities, giving teams end-to-end visibility and accountability across the entire agent ecosystem.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;– Pete Hirsch, Chief Product and Technology Officer, Zuora&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-128056" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/08/ml-20825-image-2.png" alt="" width="1422" height="784"&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;&lt;em&gt;Governing what gets published&lt;/em&gt;&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Without governance, anyone can register anything. You lose control over what becomes discoverable, can’t enforce standards, can’t track ownership, and can’t manage agents from development to retirement. When you have a few agents, you can manage them in a spreadsheet. When you have hundreds or thousands, you need a system that enforces standards automatically.&lt;/p&gt; 
&lt;p&gt;The registry gives you control over what gets published and who can access it. Admins use IAM policies to define who can register agents, tools, and agent skills and who can discover them. Every record follows an approval workflow: they start as drafts, move to pending approval, and become discoverable to the broader organization once approved. The registry tracks agents across their entire lifecycle, from initial development through deployment to eventual retirement. Records are versioned to track changes over time, and organizations can deprecate records that are no longer in use. The registry provides hooks to integrate your existing approval workflows. You can add custom metadata to each entry through a record, capturing information like team ownership, compliance status, or deployment environment.&lt;/p&gt; 
&lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Southwest Airlines is enabling an enterprise-wide agent catalog and governance across the enterprise. AWS Agent Registry in AgentCore solves the critical discoverability challenge— enabling teams to find and reuse existing agents instead of rebuilding capabilities from scratch. With managed governance across multiple platforms, every agent carries standardized ownership metadata and policy enforcement. This will prevent agent sprawl across the organization while establishing the foundation for scaling thousands of agents with enterprise-grade governance from day one.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;– Justin Bundick, VP AI and Intelligent Platforms, Southwest Airlines&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-128057" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/08/ml-20825-image-3.png" alt="" width="1342" height="810"&gt;&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Where we’re headed &lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;We’re building toward a future where the registry spans every AWS service where agents are built, including Amazon Quick, and Kiro. Agents will be automatically indexed the moment that they’re deployed. Developers will search from the IDE, business users will discover agents in their workspace, and admins will govern from the console, all backed by the same source of truth. Cross-registry federation will let you connect multiple registries and search across them as one. You will be able to define categories and taxonomies that match how your organization thinks about agents, backed by structured metadata schemas capturing ownership, compliance status, cost center, and whatever else your governance model requires. Over time, operational intelligence from AgentCore Observability will surface alongside registry records: invocation counts, latency, uptime, and usage patterns, helping you to understand not only what exists, but what’s actively working in production.&lt;/p&gt; 
&lt;p&gt;Beyond AWS Agent Registry, we’re building toward connecting with external partner catalogs. We’re excited about early partner interest in centralized discovery and governance across your technology landscape.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Get started&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Today’s preview is the starting line. No more rebuilding what already exists. No more agents deployed without visibility. The AWS Agent Registry gives you one place to discover, govern, and reuse every agent across your enterprise.&lt;/p&gt; 
&lt;p&gt;AWS Agent Registry is available in preview today through AgentCore in five &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agentcore-regions.html" target="_blank" rel="noopener noreferrer"&gt;AWS Regions&lt;/a&gt;: US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Europe (Ireland).&lt;/p&gt; 
&lt;p&gt;Get started with AWS Agent Registry through the AgentCore &lt;a href="https://us-east-1.console.aws.amazon.com/bedrock-agentcore/registry?region=us-east-1" target="_blank" rel="noopener"&gt;Console&lt;/a&gt;. Learn more by reading the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/registry.html" target="_blank" rel="noopener"&gt;documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-128054" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/08/Preethi-Color-Final.jpg" alt="" width="1054" height="1268"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Preethi CN&lt;/h3&gt; 
  &lt;p&gt;Preethi CN is Director of AgentCore in the Agentic AI Organization, with over 20 years of expertise in embedded and cloud software development. In her 14 years at Amazon, she has architected large-scale distributed systems and driven AI innovations across Retail, Alexa, and AWS, delivering breakthroughs in multimodal AI. She led speech recognition for Alexa, Computer Vision services at AWS, and generative AI transformation that revolutionized how organizations extract insights from unstructured content at scale. As a technical advisor to the Agentic AI Organization, she has provided strategic oversight across Amazon Quick, Kiro, and AWS Transform. Most recently, she crafted the vision and led the launch of AgentCore, the platform for building, connecting, and optimizing production-ready AI agents at scale.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Embed a live AI browser agent in your React app with Amazon Bedrock AgentCore</title>
		<link>https://aws.amazon.com/blogs/machine-learning/embed-a-live-ai-browser-agent-in-your-react-app-with-amazon-bedrock-agentcore/</link>
					
		
		<dc:creator><![CDATA[Sundar Raghavan]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 17:06:07 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Amazon Machine Learning]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[AIML]]></category>
		<category><![CDATA[artificial-intelligence]]></category>
		<guid isPermaLink="false">4ebd6f69bbc9b32a4e0686b243d4f64076a626af</guid>

					<description>This post walks you through three steps: starting a session and generating the Live View URL, rendering the stream in your React application, and wiring up an AI agent that drives the browser while your users watch. At the end, you will have a working sample application you can clone and run.</description>
										<content:encoded>&lt;p&gt;When you build AI-powered applications, your users must understand and trust AI agents that navigate websites and interact with web content on their behalf. When an agent interacts with web content autonomously, your users require visibility into those actions to maintain confidence and control, which they don’t currently have.&lt;/p&gt; 
&lt;p&gt;The &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser-tool.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Browser&lt;/a&gt; &lt;code&gt;BrowserLiveView&lt;/code&gt; component addresses this challenge by providing a real-time video feed of the agent’s browsing session directly within your React application. This component, part of the &lt;a href="https://github.com/aws/bedrock-agentcore-sdk-typescript/tree/main" target="_blank" rel="noopener noreferrer"&gt;Bedrock AgentCore TypeScript SDK&lt;/a&gt;, streamlines the integration by embedding a live browser stream with three lines of JavaScript XML (JSX).&lt;/p&gt; 
&lt;p&gt;The &lt;code&gt;BrowserLiveView&lt;/code&gt; component uses the &lt;a href="https://aws.amazon.com/hpc/dcv/" target="_blank" rel="noopener noreferrer"&gt;Amazon DCV&lt;/a&gt; protocol to render the browser session, creating transparency into agent actions. Implementation requires only a presigned URL from your server, without requiring you to build streaming infrastructure.&lt;/p&gt; 
&lt;p&gt;This post walks you through three steps: starting a session and generating the Live View URL, rendering the stream in your React application, and wiring up an AI agent that drives the browser while your users watch. At the end, you will have a working sample application you can clone and run.&lt;/p&gt; 
&lt;h2&gt;Why embed Live View in your application&lt;/h2&gt; 
&lt;p&gt;Embedding Live View inside your own application unlocks additional value for your users at scale.&lt;/p&gt; 
&lt;p&gt;With an embedded Live View, your users follow every navigation, form submission, and search query as the agent performs it. They get immediate visual confirmation that the agent is on the right page, interacting with the correct elements, and progressing through the workflow. This real-time feedback loop gives end users direct insight into agent behavior without waiting for the final result.&lt;/p&gt; 
&lt;p&gt;Users who delegate browsing tasks to an AI agent are more confident when they can observe the work. Watching the agent fill in a form field by field is more reassuring than receiving a text confirmation. For regulated workflows, visual evidence of agent actions can support audit requirements.&lt;/p&gt; 
&lt;p&gt;In workflows that require human supervision, like handling customer accounts and processing sensitive data, a supervisor can use the embedded Live View to watch the agent in real time and intervene if needed, without leaving your application.&lt;/p&gt; 
&lt;p&gt;Organizations also gain audit trail support through visual evidence of agent actions, which proves valuable for compliance requirements and troubleshooting scenarios. Combined with session recordings to Amazon Simple Storage Service (Amazon S3) and console-based session replay, you get both real-time observation and post-hoc review.&lt;/p&gt; 
&lt;h2&gt;How it works&lt;/h2&gt; 
&lt;p&gt;The integration has three components.&lt;/p&gt; 
&lt;p&gt;The user’s web browser runs a React application containing the &lt;code&gt;BrowserLiveView&lt;/code&gt; component, which receives a SigV4-presigned URL and establishes a persistent WebSocket connection to receive the DCV video stream from a remote browser session. The React application handles video rendering and user interface presentation while maintaining the WebSocket connection for continuous streaming.&lt;/p&gt; 
&lt;p&gt;The application server functions as an AI agent within the Amazon Bedrock session lifecycle, orchestrating the connection between client browsers and cloud-hosted browser sessions. It starts sessions using the Amazon Bedrock AgentCore API and generates SigV4-presigned URLs that grant secure, time-limited access to the Live View stream. This layer handles session management, authentication, and stream distribution.&lt;/p&gt; 
&lt;p&gt;AWS Cloud hosts Amazon Bedrock AgentCore Browser and Amazon Bedrock services that provide the underlying browser automation and streaming capabilities. Amazon Bedrock AgentCore hosts the isolated cloud browser sessions within AWS Cloud and provides both the automation endpoint (Playwright CDP) and the Live View streaming endpoint (DCV).&lt;/p&gt; 
&lt;p&gt;The key efficiency advantage with this architecture is that the DCV Live View stream flows directly from Amazon Bedrock AgentCore to the user’s browser. It doesn’t pass through your application server. Your server generates the URL and runs the agent, but the video stream is a direct WebSocket connection from AWS to the client. This helps minimize latency and reduce infrastructure requirements.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/architecture_liveview.png"&gt;&lt;img loading="lazy" class="alignnone size-large wp-image-127767" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/architecture_liveview-1024x566.png" alt="" width="1024" height="566"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Figure 1: &lt;/strong&gt;Solution architecture showing the data flow between three components. The numbered arrows in the diagram represent the following data flows:&lt;br&gt; &lt;strong&gt;Arrow 1 (gray, solid): &lt;/strong&gt;The client sends prompts and polls status from the Application Server using REST.&lt;br&gt; &lt;strong&gt;Arrow 2 (orange, solid): &lt;/strong&gt;The Application Server calls the Amazon Bedrock Converse API for AI model reasoning.&lt;br&gt; &lt;strong&gt;Arrow 3 (blue, solid): &lt;/strong&gt;The Application Server runs browser tools against Amazon Bedrock AgentCore Browser using &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser-quickstart-playwright.html" target="_blank" rel="noopener noreferrer"&gt;Playwright Chrome DevTools Protocol&lt;/a&gt; (CDP).&lt;br&gt; &lt;strong&gt;Arrow 4 (red, dashed): &lt;/strong&gt;The DCV Live View stream flows directly from Amazon Bedrock AgentCore to the User Browser, bypassing the Application Server.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before you begin, verify that you have the following:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://nodejs.org/" target="_blank" rel="noopener noreferrer"&gt;Node.js 20 or later&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;An AWS account &lt;/strong&gt;in a &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agentcore-regions.html" target="_blank" rel="noopener noreferrer"&gt;supported AWS Region&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;AWS credentials with &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser-quickstart.html#browser-prerequisites" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Browser permissions&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Access to an AI model to drive the agent (this post uses the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-call.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Converse API&lt;/a&gt; with Anthropic Claude, but Live View is model-agnostic and you can use a model provider or agent framework of your choice)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;blockquote&gt;
 &lt;p&gt;&lt;strong&gt;Important: &lt;/strong&gt;Live View (Steps 1 and 2) requires only &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser-quickstart.html#browser-prerequisites" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore permissions&lt;/a&gt;. It does not depend on Amazon Bedrock or any specific AI model. The AI agent in Step 3 uses the Amazon Bedrock Converse API, which requires additional Amazon Bedrock permissions, but this is specific to our sample. You can substitute a model provider or agent framework of your choice. Use temporary credentials from &lt;a href="https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;AWS IAM Identity Center&lt;/a&gt; or &lt;a href="https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS Security Token Service (AWS STS)&lt;/a&gt;. Do not use long-lived access keys. Follow the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html" target="_blank" rel="noopener noreferrer"&gt;principle of least privilege&lt;/a&gt; when configuring &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; permissions.&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;p&gt;Install the &lt;a href="https://github.com/aws/bedrock-agentcore-sdk-typescript" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore TypeScript SDK&lt;/a&gt;:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;npm install bedrock-agentcore&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;For the AI agent in Step 3, you also need the AWS SDK for JavaScript:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;npm install @aws-sdk/client-bedrock-runtime&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The code in this post runs in two environments: the server-side code (Steps 1 and 3) runs in Node.js, and the client-side code (Step 2) runs in a React application bundled with Vite. The sample application at the end of this post packages everything together.&lt;/p&gt; 
&lt;h2&gt;Step-by-step implementation&lt;/h2&gt; 
&lt;h3&gt;1: Start a browser session and generate the Live View URL&lt;/h3&gt; 
&lt;p&gt;On your application server, use the &lt;code&gt;Browser&lt;/code&gt; class to start a session and generate the presigned URL. The API returns a session identifier and streaming URL, which the server converts into a presigned URL with a defined expiration time of 300 seconds by default. It contains SigV4 credentials in the query parameters, so no secrets reach the browser. Pass this URL to your frontend through an API endpoint.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;import { Browser } from 'bedrock-agentcore/browser'

const browser = new Browser({
  region: 'us-west-2'
})
await browser.startSession({
  viewport: { width: 1920, height: 1080 }
})

const signedUrl =
  await browser.generateLiveViewUrl()
// Send signedUrl to your frontend via API&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;2: Render the BrowserLiveView component in your React app&lt;/h3&gt; 
&lt;p&gt;On your browser, import the &lt;code&gt;BrowserLiveView&lt;/code&gt; component from the Bedrock AgentCore TypeScript SDK and render it with the presigned URL. The component handles WebSocket connection, DCV protocol negotiation, video stream decoding, and frame rendering. It auto scales to fit its parent container while preserving its aspect ratio. The &lt;code&gt;remoteWidth&lt;/code&gt; and &lt;code&gt;remoteHeight&lt;/code&gt; must match the viewport that you set in Step 1. Mismatched values cause cropping or black bars.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;import { BrowserLiveView }
  from 'bedrock-agentcore/browser/live-view'

&amp;lt;BrowserLiveView
  signedUrl={presignedUrl}
  remoteWidth={1920}
  remoteHeight={1080}
/&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After adding this component, the Live View begins streaming as soon as the presigned URL is valid and the browser session is active. You should see the remote browser’s desktop appear within the component’s container. If the container remains empty, verify that the presigned URL hasn’t expired and that the browser session is still running.&lt;/p&gt; 
&lt;h3&gt;3: Connect an AI agent to drive browser actions&lt;/h3&gt; 
&lt;p&gt;With the Live View streaming, you need something interesting to watch. The following example uses the Amazon Bedrock Converse API, but Live View is model agnostic. You can use an AI model or agent framework of your choice to drive the browser.&lt;/p&gt; 
&lt;p&gt;The code creates a &lt;code&gt;PlaywrightBrowser&lt;/code&gt; client, which starts a new AgentCore Browser session and connects to it using the Playwright Chrome DevTools protocol. This is the same type of cloud browser session as Step 1 but accessed through the Playwright automation interface rather than the Live View interface.&lt;/p&gt; 
&lt;p&gt;The model decides which browser tools to call, including &lt;code&gt;navigate&lt;/code&gt;, &lt;code&gt;click&lt;/code&gt;, &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;getText&lt;/code&gt;, &lt;code&gt;getHtml&lt;/code&gt;, and &lt;code&gt;pressKey&lt;/code&gt;. Your server runs these tools and feeds the results back to the model for the next iteration.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;import { BedrockRuntimeClient, ConverseCommand }
  from '@aws-sdk/client-bedrock-runtime'
import { PlaywrightBrowser }
  from 'bedrock-agentcore/browser/playwright'

const browser = new PlaywrightBrowser({
  region: 'us-west-2'
})
await browser.startSession()

// Define browser tools as JSON Schema
// (navigate, click, type, getText, and more)

while (step &amp;lt; maxSteps) {
  const response = await bedrockClient.send(
    new ConverseCommand({
      modelId: modelId,
      system: [{ text: systemPrompt }],
      messages,
      toolConfig: browserTools,
    })
  )

  if (response.stopReason === 'tool_use') {
    // Run browser tool, add result
    // to conversation, continue loop
  } else {
    break // Final answer from model
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The model is configurable. You can use Anthropic Claude, Amazon Nova, or an Amazon Bedrock model that supports tool use. Every tool call that the model makes is visible to your user through the Live View. They see the browser navigate, the search box fill in, and the results page load.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Note: &lt;/strong&gt;The TypeScript SDK also includes a &lt;a href="https://github.com/aws/bedrock-agentcore-sdk-typescript/tree/main/src/tools/browser/integrations/vercel-ai" target="_blank" rel="noopener noreferrer"&gt;Vercel AI SDK integration&lt;/a&gt; (&lt;code&gt;BrowserTools&lt;/code&gt;) that wraps these browser operations as framework-native tools.&lt;/p&gt; 
&lt;h2&gt;Try it using the sample application&lt;/h2&gt; 
&lt;p&gt;We built a complete sample application on GitHub that puts Steps 1–3 together. The sample includes a React dashboard with the embedded Live View, an activity log showing agent reasoning and actions, and a Fastify server running the AI agent. The agent navigates to Wikipedia, searches for a topic, reads the page content, and summarizes what it finds while you watch every step. You can download it from the &lt;a href="https://github.com/awslabs/bedrock-agentcore-samples-typescript/tree/main/use-cases/browser-live-view-agent" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;GitHub repository&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/browser_live_view.gif"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127779" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/browser_live_view.gif" alt="" width="960" height="757"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Figure 2: &lt;/strong&gt;The sample application mid-run. The left panel shows the &lt;code&gt;BrowserLiveView&lt;/code&gt; component streaming a Wikipedia page that the agent has navigated to. The right panel shows the activity log with timestamped tool calls (&lt;code&gt;navigate&lt;/code&gt;, &lt;code&gt;getText&lt;/code&gt;, &lt;code&gt;click&lt;/code&gt;). At the bottom, the prompt input field and Launch Agent button are visible.&lt;/p&gt; 
&lt;h3 id="_To_clone_and_run"&gt;To clone and run the sample application&lt;/h3&gt; 
&lt;h4&gt;Complete the following steps to clone and run the sample application.&lt;/h4&gt; 
&lt;ol start="1"&gt; 
 &lt;li&gt;Clone the repository and navigate to the sample folder.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;git clone https://github.com/awslabs/bedrock-agentcore-samples-typescript.git
cd bedrock-agentcore-samples-typescript
cd use-cases/browser-live-view-agent&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Install the dependencies.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;npm install&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Export your AWS credentials.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;export AWS_ACCESS_KEY_ID=&amp;lt;your-access-key&amp;gt;
export AWS_SECRET_ACCESS_KEY=&amp;lt;your-secret-key&amp;gt;
export AWS_SESSION_TOKEN=&amp;lt;your-session-token&amp;gt;
export AWS_REGION=us-west-2&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;blockquote&gt;
 &lt;p&gt;&lt;strong&gt;Important: &lt;/strong&gt;Use temporary credentials. Do not commit credentials to source control.&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Start the application. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-bash"&gt;npm run dev&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Open &lt;code&gt;http://localhost:5173&lt;/code&gt;, enter a prompt, and choose &lt;strong&gt;Launch Agent&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Bundler configuration&lt;/h2&gt; 
&lt;p&gt;The &lt;code&gt;BrowserLiveView&lt;/code&gt; component uses the &lt;a href="https://docs.aws.amazon.com/dcv/latest/websdkguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon DCV Web Client SDK&lt;/a&gt;, which ships vendored files inside the &lt;code&gt;bedrock-agentcore&lt;/code&gt; npm package. You don’t need to download or install DCV separately. Your Vite configuration needs three additions:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;resolve.alias&lt;/code&gt; points the &lt;code&gt;dcv&lt;/code&gt; and &lt;code&gt;dcv-ui&lt;/code&gt; bare specifiers to the vendored SDK files.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;resolve.dedupe&lt;/code&gt; verifies that React and shared dependencies resolve from your &lt;code&gt;node_modules&lt;/code&gt;, not from the vendored path.&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;viteStaticCopy&lt;/code&gt; copies DCV runtime files (workers, WASM decoders) to your build output.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The sample application’s &lt;code&gt;vite.config.ts&lt;/code&gt; has the complete configuration ready to use. For more details on the &lt;code&gt;BrowserLiveView&lt;/code&gt; component, see the &lt;a href="https://github.com/aws/bedrock-agentcore-sdk-typescript/tree/main/src/tools/browser/live-view" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;live-view source directory&lt;/strong&gt;&lt;/a&gt; in the TypeScript SDK.&lt;/p&gt; 
&lt;h2&gt;Clean up resources&lt;/h2&gt; 
&lt;p&gt;To avoid incurring charges, stop the browser session and shut down the application when you’re done:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the application UI, choose &lt;strong&gt;Stop Session&lt;/strong&gt; to end the Amazon Bedrock AgentCore Browser session.&lt;/li&gt; 
 &lt;li&gt;In your terminal, press Ctrl+C to stop the development servers.&lt;/li&gt; 
 &lt;li&gt;If you created any IAM roles or policies specifically for this demo, &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage_delete.html" target="_blank" rel="noopener noreferrer"&gt;delete them from the IAM console&lt;/a&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Amazon Bedrock AgentCore Browser sessions incur charges while active. For pricing details, refer to the &lt;a href="https://aws.amazon.com/bedrock/agentcore/pricing/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore pricing page&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Next steps&lt;/h2&gt; 
&lt;p&gt;Now that you have a working Live View integration, here are some things to explore.&lt;/p&gt; 
&lt;p&gt;To get started, clone the &lt;a href="https://github.com/awslabs/bedrock-agentcore-samples-typescript/tree/main/use-cases/browser-live-view-agent" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;sample application&lt;/strong&gt;&lt;/a&gt;, fill in your AWS credentials, and run &lt;code&gt;npm run dev&lt;/code&gt; to see the full demo in action. For instructions, refer to the &lt;a href="#_To_clone_and_run"&gt;&lt;strong&gt;To clone and run the sample application&lt;/strong&gt;&lt;/a&gt; section in this post.&lt;/p&gt; 
&lt;p&gt;The sample application defaults to Anthropic Claude, but you can switch to Amazon Nova or another Amazon Bedrock model that supports tool use by setting the &lt;code&gt;BEDROCK_MODEL_ID&lt;/code&gt; environment variable. For a list of available models and their tool use capabilities, refer to the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.html" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon Bedrock model documentation&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;The React dashboard in the sample application is a starting point for your own implementation. You can adapt the layout to match your design system, integrate the Live View into an existing application, or add controls that let users intervene mid-workflow. For guidance on building React applications with the AgentCore SDK, refer to the &lt;a href="https://github.com/aws/bedrock-agentcore-sdk-typescript/blob/main/docs/BROWSER_LIVE_VIEW.md" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Bedrock AgentCore TypeScript SDK documentation&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;The &lt;code&gt;BrowserLiveView&lt;/code&gt; component supports multiple instances on the same page, each streaming a different browser session. This capability is useful for monitoring dashboards. The component’s source code, including scaling logic and DCV authentication flow, is available in the &lt;a href="https://github.com/aws/bedrock-agentcore-sdk-typescript/tree/main/src/tools/browser/live-view" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;live-view source directory&lt;/strong&gt;&lt;/a&gt; in the TypeScript SDK.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, you learned how to use the &lt;code&gt;BrowserLiveView&lt;/code&gt; component to embed a Live View of an Amazon Bedrock AgentCore Browser session into your React application. The three-step implementation and architecture that streams video directly from AWS to client browsers makes live agent visualization accessible without specialized streaming expertise.&lt;/p&gt; 
&lt;p&gt;For a deeper look at Amazon Bedrock AgentCore Browser capabilities, refer to the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser-tool.html" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon Bedrock AgentCore Browser documentation&lt;/strong&gt;&lt;/a&gt;. If you have feedback or questions, open an issue in the &lt;a href="https://github.com/aws/bedrock-agentcore-sdk-typescript/issues" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;GitHub repository&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt; 
&lt;blockquote&gt;
 &lt;p&gt;&lt;strong&gt;Important: &lt;/strong&gt;This sample application is intended for local development and demonstration. For production use, add authentication to your API endpoints, enable HTTPS, restrict CORS origins, implement rate limiting, and follow the AWS Well-Architected Framework security pillar.&lt;/p&gt;
&lt;/blockquote&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-90341" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2024/10/23/sundar.jpeg" alt="" width="82" height="123"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;a href="https://www.linkedin.com/in/sundar-raghavan-4838a526/" target="_blank" rel="noopener"&gt;Sundar Raghavan&lt;/a&gt;&lt;/h3&gt; 
  &lt;p&gt;Sundar Raghavan is a Senior Solutions Architect at AWS on the Agentic AI Foundation team. He shaped the developer experience for Amazon Bedrock AgentCore, contributing to the SDK, CLI, and starter toolkit, and now focuses on integrations with AI agent frameworks. Previously, Sundar worked as a Generative AI Specialist, helping customers design AI applications on Amazon Bedrock. In his free time, he loves exploring new places, sampling local eateries, and embracing the great outdoors.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-127746" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/radheshyam-200x300.jpeg" alt="" width="81" height="122"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;a href="https://www.linkedin.com/in/radhe-shyam-75626833" target="_blank" rel="noopener"&gt;Radhe Shyam&lt;/a&gt;&lt;/h3&gt; 
  &lt;p&gt;Radhe Shyam is a Senior Front End Engineer on the Agentic AI Foundation team at AWS, where he builds the user experiences for Amazon Bedrock AgentCore, including browser session replay and live view tooling for agentic workflows. With nearly seven years at Amazon spanning domains from Amazon SageMaker Canvas to Prime Video, he is passionate about building performant, accessible front-end systems that bring complex AI and ML capabilities to a broader audience of builders.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-127770" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/SauravDas-683x1024.jpeg" alt="" width="81" height="121"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;a href="https://www.linkedin.com/in/the-saurav-das/" target="_blank" rel="noopener"&gt;Saurav Das&lt;/a&gt;&lt;/h3&gt; 
  &lt;p&gt;Saurav Das is part of the Amazon Bedrock AgentCore Product Management team. He has more than 15 years of experience in working with cloud, data and infrastructure technologies. He has a deep interest in solving customer challenges centered around data and AI infrastructure.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Introducing stateful MCP client capabilities on Amazon Bedrock AgentCore Runtime</title>
		<link>https://aws.amazon.com/blogs/machine-learning/introducing-stateful-mcp-client-capabilities-on-amazon-bedrock-agentcore-runtime/</link>
					
		
		<dc:creator><![CDATA[Evandro Franco]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 14:47:57 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">f3d7c75c035fc0f9ece6f077ddf091fd61c25664</guid>

					<description>In this post, you will learn how to build stateful MCP servers that request user input during execution, invoke LLM sampling for dynamic content generation, and stream progress updates for long-running tasks. You will see code examples for each capability and deploy a working stateful MCP server to Amazon Bedrock AgentCore Runtime.</description>
										<content:encoded>&lt;p&gt;Stateful MCP client capabilities on &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/mcp-stateful-features.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Runtime&lt;/a&gt; now enable interactive, multi-turn agent workflows that were previously impossible with stateless implementations. Developers building AI agents often struggle when their workflows must pause mid-execution to ask users for clarification, request large language model (LLM)-generated content, or provide real-time progress updates during long-running operations, stateless MCP servers can’t handle these scenarios. This solves these limitations by introducing three client capabilities from the MCP specification:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Elicitation (request user input mid-execution)&lt;/li&gt; 
 &lt;li&gt;Sampling (request LLM-generated content from the client)&lt;/li&gt; 
 &lt;li&gt;Progress notification (stream real-time updates)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;These capabilities transform one-way tool execution into bidirectional conversations between your MCP server and clients.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt; is an open standard defining how LLM applications connect with external tools and data sources. The specification defines server capabilities (tools, prompts, and resources that servers expose) and client capabilities (features clients offer back to servers). While our previous release focused on hosting stateless MCP servers on AgentCore Runtime, this new capability completes the bidirectional protocol implementation. Clients connecting to AgentCore-hosted MCP servers can now respond to server-initiated requests. In this post, you will learn how to build stateful MCP servers that request user input during execution, invoke LLM sampling for dynamic content generation, and stream progress updates for long-running tasks. You will see code examples for each capability and deploy a working stateful MCP server to Amazon Bedrock AgentCore Runtime.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;From stateless to stateful MCP&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The original MCP server support on AgentCore used stateless mode: each incoming HTTP request was independent, with no shared context between calls. This model is straightforward to deploy and reason about, and it works well for tool servers that receive inputs and return outputs. However, it has a fundamental constraint. The server can’t maintain a conversation thread across requests, ask the user for clarification in the middle of a tool call, or report progress back to the client as work happens.&lt;/p&gt; 
&lt;p&gt;Stateful mode removes that constraint. When you run your MCP server with stateless_http=False, AgentCore Runtime provisions a dedicated microVM for each user session. The microVM persists for the session’s lifetime (up to 8 hours, or 15 minutes of inactivity per idleRuntimeSessionTimeout&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-lifecycle-settings.html#configuration-attributes" target="_blank" rel="noopener noreferrer"&gt;setting&lt;/a&gt;), with CPU, memory, and filesystem isolation between sessions. The protocol maintains continuity through a &lt;code&gt;Mcp-Session-Id&lt;/code&gt; header: the server returns this identifier during the initialize handshake, and the client includes it in every subsequent request to route back to the same session.&lt;/p&gt; 
&lt;p&gt;The following table summarizes the key differences:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Stateless mode&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Stateful mode&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;stateless_httpsetting&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;TRUE&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;FALSE&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Session isolation&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Dedicated microVM per session&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Dedicated microVM per session&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Session lifetime&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 8 hours; 15-min idle timeout&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Up to 8 hours; 15-min idle timeout&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Client capabilities&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Not supported&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Elicitation, sampling, progress notifications&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Recommended for&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Simple tool serving&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Interactive, multi-turn workflows&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;When a session expires or the server is restarted, subsequent requests with the early session ID return a 404. At that point, clients must re-initialize the connection to obtain a new session ID and start a fresh session.The configuration change to enable stateful mode is a single flag in your server startup:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;mcp.run( transport="streamable-http", host="0.0.0.0", port=8000, stateless_http=False # Enable stateful mode)&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Beyond this flag, the three client capabilities become available automatically once the MCP client declares support for them during the initialization handshake.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;The three new client capabilities&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Stateful mode brings three client capabilities from the MCP specification. Each addresses a different interaction pattern that agents encounter in production workflows.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Elicitation&lt;/strong&gt;&amp;nbsp;allows a server to pause execution and request structured input from the user through the client. The tool can ask targeted questions at the right moment in its workflow, gathering a preference, confirming a decision, or collecting a value that depends on earlier results. The server sends an&amp;nbsp;elicitation/create&amp;nbsp;request with a message and an optional JSON schema describing the expected response structure. The client renders an appropriate input interface, and the user can accept (providing the data), decline, or cancel.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Sampling&lt;/strong&gt; allows a server to request an LLM-generated completion from the client through sampling/createMessage. This is the mechanism that makes it possible for tool logic on the server to use language model capabilities without holding its own model credentials. The server provides a prompt and optional model preferences; the client forwards the request to its connected LLM and returns the generated response. Practical uses include generating personalized summaries, creating natural-language explanations of structured data, or producing recommendations based on earlier conversation context.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Progress notifications&lt;/strong&gt;&amp;nbsp;allow a server to report incremental progress during long-running operations. Using&amp;nbsp;&lt;code&gt;ctx.report_progress(progress, total)&lt;/code&gt;, the server emits updates that clients can display as a progress bar or status indicator. For operations that span multiple steps, for example, searching across data sources, this keeps users informed rather than watching a blank screen.&lt;/p&gt; 
&lt;p&gt;All three capabilities are opt-in at the client level: a client declares which capabilities it supports during initialization, and the server must only use capabilities the client has advertised.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Elicitation: server-initiated user input&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Elicitation is the mechanism by which an MCP server pauses mid-execution and asks the client to collect specific information from the user. The server sends an&amp;nbsp;elicitation/create&amp;nbsp;JSON-RPC request containing a human-readable message and a&amp;nbsp;requestedSchema&amp;nbsp;that describes the expected response. The client presents this as a form or prompt, and the user’s response (or explicit decline) is returned to the server so execution can continue.The MCP specification supports two elicitation modes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Form mode&lt;/strong&gt;: structured data collection directly through the MCP client. Suitable for preferences, configuration inputs, and confirmations that don’t involve sensitive data.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;URL mode&lt;/strong&gt;: directs the user to an external URL for interactions that must not pass through the MCP client, such as OAuth flows, payment processing, or credential entry.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The response uses a three-action model: &lt;code&gt;accept&lt;/code&gt; (user provided data), &lt;code&gt;decline&lt;/code&gt; (user explicitly rejected the request), or &lt;code&gt;cancel&lt;/code&gt; (user dismissed without choosing). Servers should handle each case appropriately. The following example implements an &lt;code&gt;add_expense_interactive&lt;/code&gt; tool that collects a new expense through four sequential elicitation steps: amount, description, category, and a final confirmation before writing to DynamoDB. Each step defines its expected input as a Pydantic model, which FastMCP converts to the JSON Schema sent in the &lt;code&gt;elicitation/create&lt;/code&gt; request.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Server&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The&amp;nbsp;&lt;code&gt;add_expense_interactive&lt;/code&gt;&amp;nbsp;tool walks a user through four sequential questions before writing to Amazon DynamoDB. Each step defines its expected input as a separate Pydantic model, because the form mode schema must be a flat object. You can collect all four fields in a single model with four properties but splitting them here gives the user one focused question at a time, which is the interactive pattern elicitation is designed for.&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;&lt;strong&gt;agents/mcp_client_features.py&lt;/strong&gt;&lt;/code&gt;&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import os
from pydantic import BaseModel
from fastmcp import FastMCP, Context
from fastmcp.server.elicitation import AcceptedElicitation
from dynamo_utils import FinanceDB

mcp = FastMCP(name='ElicitationMCP')

_region = os.environ.get('AWS_REGION') or os.environ.get('AWS_DEFAULT_REGION') or 'us-east-1'
db = FinanceDB(region_name=_region)

class AmountInput(BaseModel):
    amount: float

class DescriptionInput(BaseModel):
    description: str

class CategoryInput(BaseModel):
    category: str  # one of: food, transport, bills, entertainment, other

class ConfirmInput(BaseModel):
    confirm: str  # Yes or No

@mcp.tool()
async def add_expense_interactive(user_alias: str, ctx: Context) -&amp;gt; str:
    """Interactively add a new expense using elicitation.

    Args:
        user_alias: User identifier
    """
    # Step 1: Ask for the amount
    result = await ctx.elicit('How much did you spend?', AmountInput)
    if not isinstance(result, AcceptedElicitation):
        return 'Expense entry cancelled.'
    amount = result.data.amount

    # Step 2: Ask for a description
    result = await ctx.elicit('What was it for?', DescriptionInput)
    if not isinstance(result, AcceptedElicitation):
        return 'Expense entry cancelled.'
    description = result.data.description

    # Step 3: Select a category
    result = await ctx.elicit(
        'Select a category (food, transport, bills, entertainment, other):',
        CategoryInput
    )
    if not isinstance(result, AcceptedElicitation):
        return 'Expense entry cancelled.'
    category = result.data.category

    # Step 4: Confirm before saving
    confirm_msg = (
        f'Confirm: add expense of ${amount:.2f} for {description}'
        f' (category: {category})? Reply Yes or No'
    )
    result = await ctx.elicit(confirm_msg, ConfirmInput)
    if not isinstance(result, AcceptedElicitation) or result.data.confirm != 'Yes':
        return 'Expense entry cancelled.'

    return db.add_transaction(user_alias, 'expense', -abs(amount), description, category)

if __name__ == '__main__':
    mcp.run(
        transport="streamable-http",
        host="0.0.0.0",
        port=8000,
        stateless_http=False
    )&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Each&amp;nbsp;&lt;code&gt;await ctx.elicit()&amp;nbsp;&lt;/code&gt;suspends the tool and sends an&amp;nbsp;&lt;code&gt;elicitation/create&lt;/code&gt;&amp;nbsp;request over the active session. The&amp;nbsp;&lt;code&gt;isinstance(result, AcceptedElicitation)&amp;nbsp;&lt;/code&gt;check handles&amp;nbsp;&lt;code&gt;decline&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;cancel&lt;/code&gt;&amp;nbsp;uniformly at every step.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Client&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Registering an&amp;nbsp;&lt;code&gt;elicitation_handler&lt;/code&gt;&amp;nbsp;on&amp;nbsp;&lt;code&gt;fastmcp.Client&lt;/code&gt;&amp;nbsp;is both how the handler is wired in and how the client advertises elicitation support to the server during initialization.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import asyncio
from fastmcp import Client
from fastmcp.client.transports import StreamableHttpTransport

# Pre-loaded responses simulate the user answering each question in sequence
_responses = iter([
    {'amount': 45.50},
    {'description': 'Lunch at the office'},
    {'category': 'food'},
    {'confirm': 'Yes'},
])

async def elicit_handler(message, response_type, params, context):
    # In production: render a form and return the user's input
    response = next(_responses)
    print(f'  Server asks: {message}')
    print(f'  Responding:  {response}\n')
    return response

transport = StreamableHttpTransport(url=mcp_url, headers=headers)

async with Client(transport, elicitation_handler=elicit_handler) as client:
    await asyncio.sleep(2)  # allow session initialization
    result = await client.call_tool('add_expense_interactive', {'user_alias': 'me'})

print(result.content[0].text)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Running this against the deployed server:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;Server asks: How much did you spend?
Responding:&amp;nbsp; {'amount': 45.5}

Server asks: What was it for?
Responding:&amp;nbsp; {'description': 'Lunch at the office'}

Server asks: Select a category (food, transport, bills, entertainment, other):
Responding:&amp;nbsp; {'category': 'food'}

Server asks: Confirm: add expense of $45.50 for Lunch at the office (category: food)? Reply Yes or No
Responding:&amp;nbsp; {'confirm': 'Yes'}

Expense of $45.50 added for me&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The complete working example, including DynamoDB setup and AgentCore deployment, is available in the&amp;nbsp;&lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples/tree/main/01-tutorials/01-AgentCore-runtime/08-mcp-e2e/02-client-e2e" target="_blank" rel="noopener noreferrer"&gt;GitHub sample repository&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Use elicitation when your tool needs information that depends on earlier results, is better collected interactively than upfront, or varies across users in ways that cannot be parameterized in advance. A travel booking tool that first searches destinations and then asks the user to choose among them is a natural fit. A financial workflow that confirms a transaction amount before submitting is another. Elicitation isn’t appropriate for sensitive inputs like passwords or API keys, use URL mode or a secure out-of-band channel for those.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Sampling: server-initiated LLM generation&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Sampling is the mechanism by which an MCP server requests an LLM completion from the client. The server sends a&amp;nbsp;&lt;code&gt;sampling/createMessage&lt;/code&gt;&amp;nbsp;request containing a list of conversation messages, a system prompt, and optional model preferences. The client forwards the request to its connected language model (subject to user approval) and returns the generated response. The server receives a structured result containing the generated text, the model used, and the stop reason.&lt;/p&gt; 
&lt;p&gt;This capability inverts the typical flow: instead of the client asking the server for tool results, the server asks the client for model output. The benefit is that the server doesn’t need API keys or a direct model integration. The client retains full control over which model is used, and the MCP specification calls for a human-in-the-loop step where users can review and approve sampling requests before they are forwarded.&lt;/p&gt; 
&lt;p&gt;Servers can express model preferences using capability priorities (&lt;code&gt;costPriority,&amp;nbsp;speedPriority,&amp;nbsp;intelligencePriority&lt;/code&gt;) and optional model hints. These are advisory, the client makes the final selection based on what models it has access to.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Server&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The&amp;nbsp;&lt;code&gt;analyze_spending&lt;/code&gt;&amp;nbsp;tool fetches transactions from DynamoDB, builds a prompt from the structured data, and delegates the analysis to the client’s LLM via&amp;nbsp;&lt;code&gt;ctx.sample()&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;agents/mcp_client_features.py&lt;/strong&gt;&amp;nbsp;(added tool, same file as elicitation)&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;@mcp.tool()
async def analyze_spending(user_alias: str, ctx: Context) -&amp;gt; str:
    """Fetch expenses from DynamoDB and ask the client's LLM to analyse them.

    Args:
        user_alias: User identifier
    """
    transactions = db.get_transactions(user_alias)
    if not transactions:
        return f'No transactions found for {user_alias}.'

    lines = '\n'.join(
        f"- {t['description']} (${abs(float(t['amount'])):.2f}, {t['category']})"
        for t in transactions
    )

    prompt = (
        f'Here are the recent expenses for a user:\n{lines}\n\n'
        f'Please analyse the spending patterns and give 3 concise, '
        f'actionable recommendations to improve their finances. '
        f'Keep the response under 120 words.'
    )

    ai_analysis = 'Analysis unavailable.'
    try:
        response = await ctx.sample(messages=prompt, max_tokens=300)
        if hasattr(response, 'text') and response.text:
            ai_analysis = response.text
    except Exception:
        pass

    return f'Spending Analysis for {user_alias}:\n\n{ai_analysis}'
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The tool calls&amp;nbsp;&lt;code&gt;await ctx.sample()&lt;/code&gt;&amp;nbsp;and suspends. The server sends a&amp;nbsp;&lt;code&gt;sampling/createMessage&lt;/code&gt;&amp;nbsp;request to the client over the open session. When the client returns the LLM response, execution resumes.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Client&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The&amp;nbsp;&lt;code&gt;sampling_handler&lt;/code&gt;&amp;nbsp;receives the prompt from the server and forwards it to a language model. In this example, that’s Claude Haiku on Amazon. Registering the handler is also how the client declares sampling support to the server during initialization.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import json
import asyncio
import boto3
from mcp.types import CreateMessageResult, TextContent
from fastmcp import Client
from fastmcp.client.transports import StreamableHttpTransport

MODEL_ID = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
bedrock = boto3.client('bedrock-runtime', region_name=region)

def _invoke_bedrock(prompt: str, max_tokens: int) -&amp;gt; str:
    body = json.dumps({
        'anthropic_version': 'bedrock-2023-05-31',
        'max_tokens': max_tokens,
        'messages': [{'role': 'user', 'content': prompt}]
    })
    resp = bedrock.invoke_model(modelId=MODEL_ID, body=body)
    return json.loads(resp['body'].read())['content'][0]['text']

async def sampling_handler(messages, params, ctx):
    """Called by fastmcp.Client when the server issues ctx.sample()."""
    prompt = messages if isinstance(messages, str) else ' '.join(
        m.content.text for m in messages if hasattr(m.content, 'text')
    )
    max_tokens = params.maxTokens if params and hasattr(params, 'maxTokens') and params.maxTokens else 300
    text = await asyncio.to_thread(_invoke_bedrock, prompt, max_tokens)
    return CreateMessageResult(
        role='assistant',
        content=TextContent(type='text', text=text),
        model=MODEL_ID,
        stopReason='endTurn'
    )

transport = StreamableHttpTransport(url=mcp_url, headers=headers)

async with Client(transport, sampling_handler=sampling_handler) as client:
    result = await client.call_tool('analyze_spending', {'user_alias': 'me'})

print(result.content[0].text)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Running this against a user with four seeded expenses:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;Spending Analysis for me:

Total Spending: $266.79

Breakdown:
- Food: $130.80 (49%)
- Bills: $120.00 (45%)
- Entertainment: $15.99 (6%)

3 Actionable Recommendations:

1. Meal prep at home — cook groceries into multiple meals to reduce restaurant
&amp;nbsp;&amp;nbsp; spending and lower food costs by 20-30%.

2. Review entertainment subscriptions — audit all subscriptions and cancel
&amp;nbsp;&amp;nbsp; unused services or share family plans.

3. Reduce energy costs — use programmable thermostats, LED bulbs, and unplug
&amp;nbsp;&amp;nbsp; devices to lower electricity bills by 10-15% monthly.&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Use sampling when your tool must produce natural-language output that benefits from a language model’s capabilities. A tool that has collected a user’s travel preferences and wants to generate a tailored trip itinerary narrative is a good example. Sampling isn’t appropriate for deterministic operations like database queries, calculations, or API calls with well-defined outputs. We recommend that you use tool logic for those.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Progress notifications: real-time operation feedback&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Progress notifications are events that a server sends during long-running operations to keep the client and the user informed about how much work has been completed.&amp;nbsp;&lt;code&gt;await ctx.report_progress(progress, total)&lt;/code&gt;&amp;nbsp;emits a&amp;nbsp;&lt;code&gt;notifications/progress&lt;/code&gt; message and returns immediately. The server doesn’t wait for a response, it’s fire-and-forget in both directions. The client receives the notification asynchronously and can render a progress bar, log a status line, or use it to prevent the user from assuming the connection has stalled. The pattern is to call&amp;nbsp;&lt;code&gt;report_progress&lt;/code&gt;&amp;nbsp;at each logical step of a multi-stage operation, with&amp;nbsp;progress&amp;nbsp;incrementing toward&amp;nbsp;total.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Server&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The&amp;nbsp;&lt;code&gt;generate_report&lt;/code&gt;&amp;nbsp;tool builds a monthly financial report in five steps, emitting a progress notification at the start of each one.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;agents/mcp_progress_server.py&lt;/strong&gt;&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import os
from fastmcp import FastMCP, Context
from dynamo_utils import FinanceDB

mcp = FastMCP(name='Progress-MCP-Server')

_region = os.environ.get('AWS_REGION') or os.environ.get('AWS_DEFAULT_REGION') or 'us-east-1'
db = FinanceDB(region_name=_region)

@mcp.tool()
async def generate_report(user_alias: str, ctx: Context) -&amp;gt; str:
    """Generate a monthly financial report, streaming progress at each stage.

    Args:
        user_alias: User identifier
    """
    total = 5

    # Step 1: Fetch transactions
    await ctx.report_progress(progress=1, total=total)
    transactions = db.get_transactions(user_alias)

    # Step 2: Group by category
    await ctx.report_progress(progress=2, total=total)
    by_category = {}
    for t in transactions:
        cat = t['category']
        by_category[cat] = by_category.get(cat, 0) + abs(float(t['amount']))

    # Step 3: Fetch budgets
    await ctx.report_progress(progress=3, total=total)
    budgets = {b['category']: float(b['monthly_limit']) for b in db.get_budgets(user_alias)}

    # Step 4: Compare spending vs budgets
    await ctx.report_progress(progress=4, total=total)
    lines = []
    for cat, spent in sorted(by_category.items(), key=lambda x: -x[1]):
        limit = budgets.get(cat)
        if limit:
            pct = (spent / limit) * 100
            status = 'OVER' if spent &amp;gt; limit else 'OK'
            lines.append(f'  {cat:&amp;lt;15} ${spent:&amp;gt;8.2f} / ${limit:.2f}  [{pct:.0f}%] {status}')
        else:
            lines.append(f'  {cat:&amp;lt;15} ${spent:&amp;gt;8.2f}  (no budget set)')

    # Step 5: Format and return
    await ctx.report_progress(progress=5, total=total)
    total_spent = sum(by_category.values())
    return (
        f'Monthly Report for {user_alias}\n'
        f'{"=" * 50}\n'
        f'  {"Category":&amp;lt;15} {"Spent":&amp;gt;10}   {"Budget":&amp;gt;8}  Status\n'
        f'{"-" * 50}\n'
        + '\n'.join(lines)
        + f'\n{"-" * 50}\n'
        f'  {"TOTAL":&amp;lt;15} ${total_spent:&amp;gt;8.2f}\n'
    )

if __name__ == '__main__':
    mcp.run(
        transport="streamable-http",
        host="0.0.0.0",
        port=8000,
        stateless_http=False
    )
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Each&amp;nbsp;await &lt;code&gt;ctx.report_progress()&lt;/code&gt;&amp;nbsp;is fire-and-forget: the notification is sent and execution moves immediately to the next step.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Client&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The&amp;nbsp;&lt;code&gt;progress_handler&lt;/code&gt;&amp;nbsp;receives&amp;nbsp;progress,&amp;nbsp;total, and an optional&amp;nbsp;message&amp;nbsp;each time the server emits a notification. Registering the handler is how the client declares progress support during initialization.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import logging
logging.getLogger('mcp.client.streamable_http').setLevel(logging.ERROR)

from fastmcp import Client
from fastmcp.client.transports import StreamableHttpTransport

async def progress_handler(progress: float, total: float | None, message: str | None):
    pct = int((progress / total) * 100) if total else 0
    filled = pct // 5
    bar = '#' * filled + '-' * (20 - filled)
    print(f'\r  Progress: [{bar}] {pct}% ({int(progress)}/{int(total or 0)})',
          end='', flush=True)
    if total and progress &amp;gt;= total:
        print('  Done!')

transport = StreamableHttpTransport(url=mcp_url, headers=headers)

async with Client(transport, progress_handler=progress_handler) as client:
    result = await client.call_tool('generate_report', {'user_alias': 'me'})

print(result.content[0].text)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;As the server moves through its five stages, the client renders the bar in place:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;  Progress: [####----------------] 20% (1/5)
&amp;nbsp; Progress: [########------------] 40% (2/5)
&amp;nbsp; Progress: [############--------] 60% (3/5)
&amp;nbsp; Progress: [################----] 80% (4/5)
&amp;nbsp; Progress: [####################] 100% (5/5)&amp;nbsp; Done!&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Use progress notifications for any tool call that takes more than a few seconds and involves discrete, measurable steps. Operations like searching multiple data sources, running a sequence of API calls, processing a batch of records, or running a multi-step booking workflow are all good candidates. A tool that completes in under a second generally does not need progress reporting; the overhead of emitting events is not worthwhile for fast operations.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;In this post, you have been introduced to stateful MCP client capabilities on Amazon Bedrock AgentCore Runtime. We explained the difference between stateless and stateful MCP deployments, walked through elicitation, sampling, and progress notifications with code examples, and showed how to deploy a stateful MCP server into AgentCore Runtime. With these capabilities, you can build MCP servers that engage users in structured conversations, use the client’s LLM for content generation, and provide real-time visibility into long-running operations, all hosted on managed, isolated infrastructure powered by AgentCore Runtime.We encourage you to explore the following resources to get started:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples/tree/main/01-tutorials/01-AgentCore-runtime/08-mcp-e2e" target="_blank" rel="noopener noreferrer"&gt;GitHub sample code.&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-how-it-works.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Runtime documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/mcp-stateful-features.html" target="_blank" rel="noopener noreferrer"&gt;Stateful MCP features documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;MCP specification 2025-11-25&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/accelerate-development-with-the-amazon-bedrock-agentcore-mcpserver/" target="_blank" rel="noopener noreferrer"&gt;Prior post: Hosting MCP servers on AgentCore Runtime&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;hr&gt; 
&lt;h2&gt;&lt;strong&gt;About the Authors&lt;/strong&gt;&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-127657" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/03/ml20073-image-1.png" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Evandro Franco&lt;/h3&gt; 
  &lt;p&gt;Evandro Franco is a Sr. Data Scientist working on Amazon Web Services. He is part of the Global GTM team that helps AWS customers overcome business challenges related to AI/ML on top of AWS, mainly on Amazon Bedrock AgentCore and Strands Agents. He has more than 18 years of experience working with technology, from software development, infrastructure, serverless, to machine learning. In his free time, Evandro enjoys playing with his son, mainly building some funny Lego bricks.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-127658" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/03/ml20073-image-2.png" alt="" width="300" height="300"&gt;
  &lt;/div&gt; 
  &lt;h3&gt;Phelipe Fabres&lt;/h3&gt; 
  &lt;p&gt;Phelipe Fabres is a Sr. Solutions Architect for Generative AI at AWS for Startups. He is part of a global Frontier AI team with a focus on costumers that are building Foundation Models/LLMs/SLMs. Has extended work on Agentic systems and Software driven AI systems. He has more than 10 years of working with software development, from monolith to event-driven architectures with a Ph.D. in Graph Theory. In his free time, Phelipe enjoys playing with his daughter, mainly board games and drawing princess.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-127659" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/03/ml20073-image-3.jpeg" alt="" width="208" height="208"&gt;
  &lt;/div&gt; 
  &lt;h3&gt;Zihang Huang&lt;/h3&gt; 
  &lt;p&gt;Zihang Huang is a solution architect at AWS. He is an agentic expert for connected vehicles, smart home, renewable energy, and industrial IoT. Currently, he focuses on agentic AI solutions with AgentCore, physical AI, IoT, edge computing, and big data. Before AWS, he gained technical experience at Bosch and Alibaba Cloud.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-medium wp-image-127660" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/03/ml20073-image-4-225x300.jpg" alt="" width="225" height="300"&gt;
  &lt;/div&gt; 
  &lt;h3&gt;Sayee Kulkarni&lt;/h3&gt; 
  &lt;p&gt;Sayee Kulkarni is a Software Development Engineer on the AWS Bedrock AgentCore service. Her team is responsible for building and maintaining the AgentCore Runtime platform, a foundational component that enables customers to leverage agentic AI capabilities. She is driven by delivering tangible customer value, and this customer-centric focus motivates her work. Sayee has led key initiatives including MCP Stateful capabilities and other core platform features, enabling customers to build more sophisticated and production-ready AI agents.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Customize Amazon Nova models with Amazon Bedrock fine-tuning</title>
		<link>https://aws.amazon.com/blogs/machine-learning/customize-amazon-nova-models-with-amazon-bedrock-fine-tuning/</link>
					
		
		<dc:creator><![CDATA[Bhavya Sruthi Sode]]></dc:creator>
		<pubDate>Wed, 08 Apr 2026 19:51:50 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Nova]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">b788a9cc21bca716ad316bd590218b47ee4c78f6</guid>

					<description>In this post, we'll walk you through a complete implementation of model fine-tuning in Amazon Bedrock using Amazon Nova models, demonstrating each step through an intent classifier example that achieves superior performance on a domain specific task. Throughout this guide, you'll learn to prepare high-quality training data that drives meaningful model improvements, configure hyperparameters to optimize learning without overfitting, and deploy your fine-tuned model for improved accuracy and reduced latency. We'll show you how to evaluate your results using training metrics and loss curves.</description>
										<content:encoded>&lt;p&gt;Today, we’re sharing how &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; makes it straightforward to customize &lt;a href="https://aws.amazon.com/nova/" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova models&lt;/a&gt; for your specific business needs. As customers scale their AI deployments, they need models that reflect proprietary knowledge and workflows — whether that means maintaining a consistent brand voice in customer communications, handling complex industry-specific workflows or accurately classifying intents in a high-volume airline reservation system. Techniques like prompt engineering and Retrieval-Augmented Generation (RAG) provide the model with additional context to improve task performance, but these techniques do not instill native understanding into the model.&lt;/p&gt; 
&lt;p&gt;Amazon Bedrock supports three customization approaches for Nova models: supervised fine-tuning (SFT), which trains the model on labeled input-output examples; reinforcement fine-tuning (RFT), which uses a reward function to guide learning toward target behaviors; and model distillation, which transfers knowledge from a larger teacher model into a smaller, faster student model. Each technique embeds new knowledge directly into the model weights, rather than supplying it at inference time through prompts or retrieved context. With these approaches, you get faster inference, lower token costs, and higher accuracy on the tasks that matter most to your business. Amazon Bedrock manages the training process automatically, requiring only that you upload your data to &lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&lt;/a&gt; and initiate the job through the AWS Management Console, CLI, or API. Deep machine learning expertise is not required. Nova models support on-demand invocation of customized models in Amazon Bedrock. This means you pay only per-call at the standard rate for the model, instead of needing to purchase more expensive allocated capacity (Provisioned Throughput).&lt;/p&gt; 
&lt;p&gt;In this post, we’ll walk you through a complete implementation of model fine-tuning in Amazon Bedrock using Amazon Nova models, demonstrating each step through an intent classifier example that achieves superior performance on a domain specific task. Throughout this guide, you’ll learn to prepare high-quality training data that drives meaningful model improvements, configure hyperparameters to optimize learning without overfitting, and deploy your fine-tuned model for improved accuracy and reduced latency. We’ll show you how to evaluate your results using training metrics and loss curves.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Understanding fine-tuning and when to use it&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Context-engineering techniques such as prompt engineering or Retrieval-Augmented Generation (RAG) place information into the model’s prompt. These approaches offer significant advantages: they take effect immediately with no training required, allow for dynamic information updates, and work with multiple foundation models without modification. However, these techniques consume context window tokens on every invocation, which can increase cumulative costs and latency over time. More importantly, they do not generalize well. The model is simply reading instructions each time rather than having internalized the knowledge, so it can struggle with novel phrasings, edge cases, or tasks that require reasoning beyond what was explicitly provided in the prompt. Customization techniques, by comparison, incorporate the new knowledge directly into the model by adding an adapter matrix of additional weights and customizing those (“parameter-efficient fine-tuning”, aka “PEFT”). The resulting customized model has acquired new domain-specific skills. Customization allows faster and more efficient small models to reach performance comparable to larger models in the specific training domain.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;When to fine-tune: &lt;/strong&gt;Consider fine-tuning when you have a high-volume, well-defined task where you can assemble quality labeled examples or a reward function. Use cases include training a model to correctly render your company’s logo, embedding brand tone and company policies into the model, or replacing a traditional ML classifier with a small LLM. For example, Amazon Customer Service &lt;a href="https://aws.amazon.com/blogs/machine-learning/transforming-enterprise-operations-four-high-impact-use-cases-with-amazon-nova/" target="_blank" rel="noopener noreferrer"&gt;customized Nova Micro&lt;/a&gt; for specialized customer support to improve accuracy and reduce latency, improving accuracy by 5.4% on domain-specific issues and 7.3% on general issues.&lt;/p&gt; 
&lt;p&gt;Fine-tuned small LLMs like Nova Micro are increasingly replacing traditional ML classifiers for tasks such as intent detection. They deliver the flexibility and world knowledge of an LLM at the speed and cost of a lightweight model. Unlike classifiers, LLMs handle natural variation in phrasing, slang, and context without retraining, and fine-tuning sharpens their accuracy further for the specific task. We demonstrate this with an intent classifier example later in this blog.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;When NOT to fine-tune: &lt;/strong&gt;Fine-tuning requires assembling quality labeled data or a reward function and executing a training job, which involves upfront time and cost. However, this initial investment can reduce per-request inference costs and latency for high-volume applications.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Customization approaches&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Amazon Bedrock offers three customization approaches for Nova models:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Supervised fine-tuning (SFT)&lt;/strong&gt; customizes the model to learn patterns from labeled data that you supply. This post demonstrates this technique in action.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Reinforcement fine-tuning (RFT)&lt;/strong&gt; takes a different approach, using training data combined with a reward function, either custom code or an LLM acting as a judge, to guide the learning process.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Model distillation&lt;/strong&gt;, for scenarios requiring knowledge transfer, lets you compress insights from large teacher models into smaller, more efficient student models suitable for resource-constrained devices.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Amazon Bedrock automatically uses parameter efficient fine-tuning (PEFT) techniques appropriate to the model for customizing Nova models. This reduces memory requirements and accelerates training compared to full fine-tuning, while maintaining model quality. Having established when and why to use fine-tuning, let’s explore how Amazon Bedrock simplifies the implementation process, and which Nova models support this customization approach.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Understanding Amazon Nova models on Amazon Bedrock&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Amazon Bedrock fully automates infrastructure provisioning, compute management, and training orchestration. You upload data to S3 and start training with a single API call, without managing clusters and GPUs or configuring distributed training pipelines. It provides clear documentation for data preparation (including format specifications and schema requirements), sensible hyperparameter defaults (such as &lt;code&gt;epochCount&lt;/code&gt;, &lt;code&gt;learningRateMultiplier&lt;/code&gt;), and training visibility through loss curves that help you monitor convergence in real-time.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Nova Models: &lt;/strong&gt;Several of the Nova models allow fine-tuning (see &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-supported.html" target="_blank" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;). After training is completed, you have the option to host the customized Nova models on Amazon Bedrock using cost-effective On Demand inference, at the same low inference price as the non-customized model.&lt;/p&gt; 
&lt;p&gt;Nova 2 Lite, for example, is a fast, cost-effective reasoning model. As a multimodal foundation model, it processes text, images, and video within a 1-million token context window. This context window supports analysis of documents longer than 400 pages or 90-minute videos in a single prompt. It excels at document processing, video understanding, code generation, and agentic workflows. Nova 2 Lite supports both SFT and RFT.&lt;/p&gt; 
&lt;p&gt;The smallest Nova model, Nova Micro, is also particularly useful because it offers fast, low-cost inference with LLM intelligence. Nova Micro is ideal for pipeline processing tasks done as part of a larger system, such as fixing addresses or extracting data fields from text. In this post, we show an example of customizing Nova Micro for a segmentation task instead of building a custom data science model.This table shows both Nova 1 and Nova 2 reasoning models and their current availability as of publication time, with which models currently allow RFT or SFT. These capabilities are subject to change; see the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html" target="_blank" rel="noopener noreferrer"&gt;online documentation&lt;/a&gt; for the most current model availability and &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-supported.html" target="_blank" rel="noopener noreferrer"&gt;customization&lt;/a&gt;, and the &lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html" target="_blank" rel="noopener noreferrer"&gt;Nova Users Guide&lt;/a&gt; for more detail on the models.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Capabilities&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Input&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Status&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Bedrock fine-tuning&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Nova Premier&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Most capable model for complex tasks and teacher for model distillation&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text, images, video (excluding audio)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Generally available&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Can be used as a teacher for model distillation&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Nova Pro&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Multimodal model with best combination of accuracy, speed, and cost for a wide range of tasks&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text, images, video&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Generally available&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;SFT&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Nova 2 Lite&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Low cost multimodal model with fast processing&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text, images, video&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Generally available&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;RFT, SFT&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Nova Lite&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Low cost multimodal model with fast processing&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text, images, video&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Generally available&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;SFT&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Nova Micro&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lowest latency responses at low cost.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Text&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Generally available&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;SFT&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;Now that you understand how Nova models support fine-tuning through the Amazon Bedrock managed infrastructure, let’s examine a real-world scenario that demonstrates these capabilities in action.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Use case example – intent detection (replacing traditional ML models)&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Intent detection determines the category of the user’s intended interaction from the input case. For example, in the case of an airline travel assistance system, the user might be attempting to get information about a previously booked flight or asking a question about airline services, such as how to transport a pet. Often systems will want to route the inquiry to specific agents based on intent. Intent detection systems must operate quickly and economically at high volume.&lt;/p&gt; 
&lt;p&gt;The traditional solution for such a system has been to train a machine-learning model. While this is effective, developers are more often turning to small LLMs for these tasks. LLMs offer more flexibility, can quickly be modified through prompt changes, and come with extensive world knowledge built in. Their understanding of shorthand, texting slang, equivalent words, and context can provide a better user experience, and the LLM development experience is familiar for AI engineers.&lt;/p&gt; 
&lt;p&gt;For our example, we will customize Nova Micro model on the open-source &lt;a href="https://www.kaggle.com/datasets/hassanamin/atis-airlinetravelinformationsystem" target="_blank" rel="noopener noreferrer"&gt;Airline Travel Information System (ATIS)&lt;/a&gt; data set, an industry standard benchmark for intent-based systems. Nova Micro achieves 41.4% on ATIS with no customization, but we can customize it for the specific task, improving its accuracy to 97% with a simple training job.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Technical implementation: Fine-tuning process&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The two critical factors that drive model fine-tuning success are &lt;strong&gt;data quality &lt;/strong&gt;and &lt;strong&gt;hyperparameter selection&lt;/strong&gt;. Getting these right determines whether your model converges efficiently or requires costly retraining. Let’s walk through each component of the implementation process, starting with how to prepare your training data.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Data preparation&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Amazon Bedrock requires JSONL (JavaScript Object Notation Lines) format because it supports efficient streaming of large datasets during training, so that you can process your data incrementally without memory constraints. This format also simplifies validation. Each line can be checked independently for errors. Verify that each row in the JSONL file is valid JSON. If the file format is invalid, the Amazon Bedrock model creation job will fail with an error. For more detail, see the &lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/fine-tune-prepare-data-understanding.html" target="_blank" rel="noopener noreferrer"&gt;documentation on Nova model fine-tuning&lt;/a&gt;. We used a script to format the ATIS dataset as JSONL. Nova Micro accepts a separate validation set so we then off split 10% of the data into a validation set (Nova 2 models do this automatically in customization). We also reserved a test set of records, which the model was not trained on, to facilitate clean testing results.&lt;/p&gt; 
&lt;p&gt;For our intent classifier example, our input data is text only. However, when fine-tuning multimedia models, also make sure you are using only supported image formats (PNG, JPEG, and GIF). Make sure your training examples span the important cases. Validate your dataset with your team and remove ambiguous or contradictory answers before fine-tuning.&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;{"schemaVersion": "bedrock-conversation-2024", "system": [{"text": "Classify the intent of airline queries. Choose one intent from this list: abbreviation, aircraft, aircraft+flight+flight_no, airfare, airfare+flight_time, airline, airline+flight_no, airport, capacity, cheapest, city, distance, flight, flight+airfare, flight_no, flight_time, ground_fare, ground_service, ground_service+ground_fare, meal, quantity, restriction\n\nRespond with only the intent name, nothing else."}], "messages": [{"role": "user", "content": [{"text": "show me the morning flights from boston to philadelphia"}]}, {"role": "assistant", "content": [{"text": "flight"}]}]}&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Prepared row in a training data sample (note that although it appears wrapped, JSONL format is really a single row per example)&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Important: Note that the system prompt appears in the training data. It is important that the system prompt used for training match the system prompt used for inference, because the model learns the system prompt as context that triggers its fine-tuned behavior.&lt;/strong&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Data privacy considerations:&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;When fine-tuning with sensitive data:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Anonymize or mask PII (names, email addresses, phone numbers, payment details) before uploading to Amazon S3.&lt;/li&gt; 
 &lt;li&gt;Consider data residency requirements for regulatory compliance.&lt;/li&gt; 
 &lt;li&gt;Amazon Bedrock does not use your training data to improve base models.&lt;/li&gt; 
 &lt;li&gt;For enhanced security, consider using Amazon Virtual Private Cloud (VPC) endpoints for private connectivity between S3 and Amazon Bedrock, eliminating exposure to the public internet.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;&lt;strong&gt;Key hyperparameters&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Hyperparameters control the training job. Amazon Bedrock sets reasonable defaults, and you can often use them with no adjustment, but you might need to adjust them for your fine-tuning job to achieve your target accuracy. Here are the hyperparameters for the Nova understanding models – consult the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/custom-models-hp.html" target="_blank" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for other models:&lt;/p&gt; 
&lt;p&gt;Three hyperparameters control your training job’s behavior, and while Amazon Bedrock sets reasonable defaults, understanding them helps you optimize results. &lt;strong&gt;Getting these settings right can save you hours of training time and minimize compute costs.&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The first hyperparameter, &lt;code&gt;epochCount&lt;/code&gt;, specifies how many complete passes the model makes through your dataset. Think of it like reading a book multiple times to improve comprehension. After the first read you might retain 60% of the material; a second pass raises comprehension to 80%. However, after you understand 100% of the material, additional readings waste training time without producing gains. Amazon Nova models support 1 to 5 epochs with a default of 2. Larger datasets typically converge with fewer epochs, while smaller datasets benefit from more iterations. For our ATIS intent classifier example with ~5000 combined samples, we set &lt;code&gt;epochCount&lt;/code&gt; to 3.&lt;/p&gt; 
&lt;p&gt;The &lt;code&gt;learningRateMultiplier&lt;/code&gt; controls how aggressively the model learns from errors. It is essentially the step size for corrections. If the learning rate is too high, you might miss details and jump to wrong conclusions. If the rate is too low, you form conclusions slowly. We use 1e-5 (0.00001) for the ATIS example, which provides stable, gradual learning. The &lt;code&gt;learningRateWarmupSteps&lt;/code&gt; parameter gradually increases the learning rate to the specified value over a set number of iterations, alleviating unstable training at the start. We use the default value of 10 for our example.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Why this matters to you:&lt;/strong&gt; Setting the right epoch count avoids wasted training time and costs. Each epoch represents another pass through the complete training data, which will increase the number of tokens processed (the main cost in model training—see “Cost and training time” later in this post). Too few epochs mean your model might not learn the training data effectively enough. Finding this balance early saves both time and budget. The learning rate directly impacts your model’s accuracy and training efficiency, potentially meaning the difference between a model that converges in hours versus one that never reaches acceptable performance.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Starting a fine-tuning job&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The prerequisite of fine-tuning is creating an S3 bucket with training data.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;S3 bucket setup &lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Create an S3 bucket in the same region as your Amazon Bedrock job with the following security configurations:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Enable server-side encryption&lt;/strong&gt; (SSE-S3 or SSE-KMS) to protect training data at rest.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Block public access&lt;/strong&gt; on the bucket to prevent unauthorized exposure.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Enable S3 versioning&lt;/strong&gt; to protect training data from accidental overwrites and track changes across training iteration.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Apply the same encryption and access controls to your output S3 bucket. Upload your JSONL file in the new S3 bucket and then organize it with the /training-data prefix. S3 versioning helps protect your training data from accidental overwrites and allows you to track changes across training iterations. This is essential when you’re experimenting with different dataset versions to optimize results.&lt;/p&gt; 
&lt;p&gt;To create a supervised fine-tuning job&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the &lt;a href="https://console.aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;, choose &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Test, Chat/Text playground&lt;/strong&gt; and confirm that Nova Micro appears in the model selector drop-down list.&lt;/li&gt; 
 &lt;li&gt;Under Custom model, choose &lt;strong&gt;Create&lt;/strong&gt;, and then select &lt;strong&gt;Supervised fine-tuning job&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127722" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-1.png" alt="Amazon Bedrock Custom Models management interface showing three customization techniques: Reinforcement fine-tuning (new), Supervised fine-tuning, and Distillation, with a models management section displaying action buttons and navigation menu." width="1347" height="579"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 1:&lt;/em&gt;&lt;em&gt; Creating supervised fine-tuning job&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Specify “&lt;strong&gt;Nova Micro&lt;/strong&gt;” model as the source model.&lt;/li&gt; 
 &lt;li&gt;In the Training data section, enter the S3 URI path to your JSONL training file (for example, &lt;code&gt;s3://amzn-s3-demo-bucket/training-data/focused-training-data-v2.jsonl)&lt;/code&gt;.&lt;/li&gt; 
 &lt;li&gt;In the Output data section, specify the S3 URI path where training outputs will be stored (for example, &lt;code&gt;s3://amzn-s3-demo-bucket/output-data/&lt;/code&gt;).&lt;/li&gt; 
 &lt;li&gt;Expand the Hyperparameters section and configure the following values: &lt;code&gt;epochCount: 3&lt;/code&gt;, &lt;code&gt;learningRateMultiplier: 1e-5&lt;/code&gt;, &lt;code&gt;learningRateWarmupSteps: 10&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;Select the IAM role with least-privilege S3 access permissions or you can create one. The role should have: 
  &lt;ul&gt; 
   &lt;li&gt;Scoped permissions limited to specific actions (&lt;code&gt;s3:GetObject&lt;/code&gt; and &lt;code&gt;s3:PutObject&lt;/code&gt;) on specific bucket paths (for example,&lt;code&gt; arn:aws:s3:::your-bucket-name/training-data/*&lt;/code&gt; and &lt;code&gt;arn:aws:s3:::your-bucket-name/output-data/*&lt;/code&gt;)&lt;/li&gt; 
   &lt;li&gt;Avoid over-provisioning and include IAM condition keys.&lt;/li&gt; 
   &lt;li&gt;For detailed guidance on S3 permission best practices and security configurations, refer to the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html" target="_blank" rel="noopener noreferrer"&gt;AWS IAM Best Practices documentation&lt;/a&gt;.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;Choose Create job.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;&lt;strong&gt;Monitoring job status&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;To monitor the training job’s status and convergence:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Monitor the job status in the &lt;strong&gt;Custom models &lt;/strong&gt;dashboard.&lt;/li&gt; 
 &lt;li&gt;Wait for the &lt;strong&gt;Data validation &lt;/strong&gt;phase to complete, followed by the &lt;strong&gt;Training &lt;/strong&gt;phase (completion time ranges from minutes to hours depending on dataset size and modality).&lt;/li&gt; 
 &lt;li&gt;After training completes, choose your job name to view the &lt;strong&gt;Training metrics &lt;/strong&gt;tab and verify the loss curve shows proper convergence.&lt;/li&gt; 
 &lt;li&gt;After training is completed, if the job is successful, a custom model is created and ready for inference. You can deploy the customized Nova model for on-demand inference.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127721" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-2.png" alt="AWS Bedrock console showing completed fine-tuning job for Nova Micro model nova-micro-atis-20260209 with data validation and training status both marked as completed on February 9, 2026." width="1494" height="1028"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 2: Verifying job status&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Evaluating training success&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;With Amazon Bedrock, you can evaluate your fine-tuning job’s effectiveness through training metrics and loss curves. By analyzing the training loss progression across steps and epochs, you can assess whether your model is learning effectively and determine if hyperparameter adjustments are needed for optimal performance. Amazon Bedrock customization automatically stores training artifacts, including validation results, metrics, logs, and training data in your designated S3 bucket, giving you complete visibility into the training process. Training metrics data lets you track how your model performs with specific hyperparameters and make informed tuning decisions.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127720" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-3.png" alt="Training metrics table showing decreasing loss values across 5 training steps in epoch 0, from 4.04 to 2.34" width="514" height="266"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 3: Example training metrics in CSV format&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;You can visualize your model’s training progress directly from the Amazon Bedrock Custom Models console. Select your customized model to access detailed metrics, including an interactive training loss curve that shows how effectively your model learned from the training data over time. The loss curve gives insight into how training progressed, and whether hyperparameters need modification for effective training. From the Amazon Bedrock Custom Models tab, select the customized model to see its details, including the training loss curve. (Figure 4).&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127719" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-4.png" alt="Training loss graph showing decreasing model performance metrics from 2.9 to 0.6 over 600 training steps for model examplebank-large-20260119-183250" width="1150" height="1086"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 4: Analyzing the loss curve from the training metrics&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;This loss curve shows that the model is performing well. The decreasing loss curve shown in your metrics confirms the model successfully learned from your training data. Ideally while the model is learning, the training loss and validation loss curves should track similarly .A well-configured model shows steady convergence—the loss decreases smoothly without dramatic fluctuations. If you see oscillating patterns in your loss curve (wild swings up and down), reduce your &lt;code&gt;learningRateMultiplier&lt;/code&gt; by 50% and restart training. If your loss decreases too slowly (flat or barely declining curve), increase your &lt;code&gt;learningRateMultiplier&lt;/code&gt; by 2x. If your loss plateaus early (flattens before reaching good accuracy), increase your &lt;code&gt;epochCount&lt;/code&gt; by 1-2 epochs.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127718" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-5.png" alt="Machine learning training loss curves showing three scenarios: converging too slow, oscillating, and optimal convergence patterns" width="1430" height="805"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 5:&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;&lt;strong&gt;Understanding the loss curve&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Key takeaway:&lt;/strong&gt; Your loss curve tells the complete story. A smooth downward trend means success. Wild oscillations mean that your learning rate is too high. Flat lines mean you need more epochs or better data. Monitor this one metric to avoid costly retraining.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Customization best practices&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;&lt;strong&gt;Maximizing your fine-tuning success&lt;/strong&gt; starts with data quality. Small, high-quality datasets consistently outperform large, noisy ones. Focus on curating labeled examples that accurately represent your target domain rather than collecting massive volumes of mediocre data. Each training sample should be properly formatted and validated before use, as clean data directly translates to better model performance. Remember to specify an appropriate system prompt.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Common pitfalls to avoid&lt;/strong&gt; include over-training (running too many epochs after convergence), suboptimal data formatting (inconsistent JSON/JSONL structures), and hyperparameter settings that need adjustment. We recommend validating your training data format before starting and monitoring loss curves actively during training. Watch for signs that your model has converged. Continuing training beyond this point wastes resources without improving results.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Cost and training time&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Training the customized Nova Micro model for our ATIS example with 4,978 combined examples and 3 training epochs (~1.75M total tokens) completed in about 1.5 hours and cost only $2.18, plus a $1.75 monthly recurring storage fee for the model. On-Demand inference using customized Amazon Nova models is charged at the same rate as the non-customized models. See the &lt;a href="https://aws.amazon.com/bedrock/pricing/" target="_blank" rel="noopener noreferrer"&gt;Bedrock pricing&lt;/a&gt; page for reference. The managed fine-tuning provided by Amazon Bedrock and the Amazon Nova models bring fine-tuning well within cost thresholds for most organizations. The ease of use and cost effectiveness opens new possibilities for customizing models to produce better and faster results without maintaining long prompts or knowledge bases of information specific to your organization.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Deploying and testing the fine-tuned model&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Consider on-demand inference for unpredictable or low-volume workloads. Use the more expensive provisioned throughput when needed for consistent, high-volume production workloads requiring guaranteed performance and lower per-token costs.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Model security considerations:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Restrict model invocation using IAM resource policies to control which users and applications can invoke your custom model.&lt;/li&gt; 
 &lt;li&gt;Implement authentication/authorization for API callers accessing the on-demand inference endpoint through IAM roles and policies.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Network security: &lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/vpc-interface-endpoints.html" target="_blank" rel="noopener noreferrer"&gt;Configure VPC endpoints&lt;/a&gt; for Amazon Bedrock to keep traffic within your AWS network.&lt;/li&gt; 
 &lt;li&gt;Restrict network access to training and inference pipelines using security groups and network ACLs.&lt;/li&gt; 
 &lt;li&gt;Consider deploying resources within a VPC for additional network-level controls.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The deployment name should be unique, and the description should explain in detail what the custom model is used for.&lt;/p&gt; 
&lt;p&gt;To deploy the model, enter deployment name, description and choose &lt;strong&gt;Create &lt;/strong&gt;(Figure 6).&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127717" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-6.png" alt="Custom model on-demand deployment interface showing a three-step workflow and a table of model deployments with status tracking" width="1426" height="534"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;br&gt; Figure 6: &lt;/em&gt;&lt;em&gt;Deploying a custom model with on-demand inference&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;After the status changes to “Active” the model is ready to use by your application and can be tested via the Amazon Bedrock playground. Choose &lt;strong&gt;Test in playground &lt;/strong&gt;(Figure 7).&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127716" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-7.png" alt="AWS Bedrock console screenshot showing the Custom Model Deployment Overview page for &amp;quot;nova-micro-atis-eval&amp;quot; deployment with active status, creation timestamp, and associated custom model details." width="825" height="286"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 7: &lt;/em&gt;&lt;em&gt;Testing the model from the deployed inference endpoint&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Logging and monitoring:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Enable the following for security auditing and incident response:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;AWS CloudTrail for Amazon Bedrock API call logging&lt;/li&gt; 
 &lt;li&gt;Amazon CloudWatch for model invocation metrics and performance monitoring&lt;/li&gt; 
 &lt;li&gt;S3 access logs for tracking data access patterns.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Testing the model in the playground:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;To test inference with the custom model, we use the Amazon Bedrock playground, giving the following example prompt:system:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;Classify the intent of airline queries. Choose one intent from this list: abbreviation, aircraft, aircraft+flight+flight_no, airfare, airfare+flight_time, airline, airline+flight_no, airport, capacity, cheapest, city, distance, flight, flight+airfare, flight_no, flight_time, ground_fare, ground_service, ground_service+ground_fare, meal, quantity, restriction\n\nRespond with only the intent name, nothing else. I would like to find a flight from charlotte to las vegas that makes a stop in st. louisIf called on the base model, the same prompt will return a less accurate answer.&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Note that the system prompt provided with the training data for fine-tuning must be included with your prompt during invocation for best results. Because the playground does not provide a separate place to put the system prompt for our custom model, we include it in the preceding prompt string.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127715" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-8.png" alt="Screenshot of the Amazon Bedrock Chat/Text Playground interface demonstrating an airline query intent classification system with performance metrics and a sample user query." width="1068" height="451"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 8: &lt;/em&gt;&lt;em&gt;Manually evaluating a customized model in the test playground&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Evaluating your customized model&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;After you have trained your model, you must evaluate its real-world performance. A common evaluation is “LLM as a judge,” where a larger, more intelligent model with access to a full RAG database scores the trained model’s responses against the expected responses. Amazon Bedrock provides the Amazon Bedrock Evaluations service for this purpose (or you can use your own framework). For guidance, refer to the blog post&lt;a href="https://aws.amazon.com/blogs/machine-learning/llm-as-a-judge-on-amazon-bedrock-model-evaluation/" target="_blank" rel="noopener noreferrer"&gt; LLM-as-a-judge on Amazon Bedrock Model Evaluation&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Your evaluation should use a test set of questions and answers, prepared using the same method as your training data, but kept separate so the model has not seen the exact questions. Figure 9 shows the fine-tuned model achieves accuracy of 97% on the test data set, a 55% improvement vs. the base Nova Micro model.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127714" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/ML-20219-9.png" alt="Bar chart comparing ATIS intent classification accuracy between base Nova Micro (41.4%) and fine-tuned Nova Micro (97.0%), showing a 55.6% improvement through fine-tuning at $2.18 training cost" width="2969" height="2089"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Figure 9: Evaluation of fine-tuning results vs. base model&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Beyond Amazon Bedrock customization&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Amazon Bedrock’s simplified customization experience will meet many customer needs. Should you need more extensive control over customization, Amazon SageMaker AI provides a broader range of customization types and more detailed control over hyperparameters – see the blog &lt;a href="https://aws.amazon.com/blogs/aws/announcing-amazon-nova-customization-in-amazon-sagemaker-ai/" target="_blank" rel="noopener noreferrer"&gt;Announcing Amazon Nova customization in Amazon SageMaker AI&lt;/a&gt; for more detail.&lt;/p&gt; 
&lt;p&gt;For cases where even more extensive customization is needed, &lt;a href="https://aws.amazon.com/nova/forge/" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova Forge&lt;/a&gt; provides a strategic alternative to building foundation models from scratch. While fine-tuning teaches specific task behaviors through labeled examples, Nova Forge uses continued pre-training to build comprehensive domain knowledge by immersing the model in millions to billions of tokens of unlabeled, proprietary data. This approach is ideal for organizations with massive proprietary datasets, highly specialized domains requiring deep expertise, or those building long-term strategic foundational models that will serve as organizational assets.&lt;/p&gt; 
&lt;p&gt;Nova Forge goes beyond standard fine-tuning by offering advanced capabilities including data mixing to mitigate catastrophic forgetting during full-rank supervised fine-tuning, checkpoint selection for optimal model performance, and bring-your-own-optimizer (BYOO) for multi-turn reinforcement fine-tuning. While requiring greater investment through an annual subscription and longer training cycles, Forge can deliver a significantly more cost-effective path than training foundation models from scratch. This approach is ideal for building strategic AI assets that serve as long-term competitive advantages. For Nova Forge customization examples, see the &lt;a href="https://github.com/aws-samples/amazon-nova-samples/tree/main/customization" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova Customization Hub&lt;/a&gt; on GitHub.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;As we have demonstrated through our intent classifier example, the Amazon Bedrock managed fine-tuning capabilities, together with the Nova and Nova 2 models, make AI customization accessible at low cost and with low effort. This simplified approach requires minimal data preparation and hyperparameter management, minimizing the need for dedicated data science skills. You can customize models to improve latency and reduce inference cost by reducing the tokens of contextual information that the model must process. Fine-tuning Nova models on Amazon Bedrock transforms generic foundation models into powerful, domain-specific tools that deliver higher accuracy and reduced latency, at low training cost. The ability of Amazon Bedrock to host the Nova models using &lt;a href="https://docs.aws.amazon.com/nova/latest/nova2-userguide/on-demand-inference.html" target="_blank" rel="noopener noreferrer"&gt;On-Demand inference&lt;/a&gt; allows you to run the model at the same per-token pricing as the base Nova model. See the &lt;a href="https://aws.amazon.com/bedrock/pricing/" target="_blank" rel="noopener noreferrer"&gt;Bedrock pricing page&lt;/a&gt; for current rates.&lt;/p&gt; 
&lt;p&gt;To get started with your own fine-tuning project using Amazon Bedrock, explore the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/custom-models.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock fine-tuning documentation&lt;/a&gt; and review sample notebooks in the &lt;a href="https://github.com/aws-samples/amazon-nova-samples/tree/main/customization" target="_blank" rel="noopener noreferrer"&gt;AWS Samples GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-128080" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/08/bhavya.jpg" alt="" width="2229" height="2326"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Bhavya Sruthi Sode&lt;/h3&gt; 
  &lt;p&gt;&lt;b&gt;Bhavya Sruthi Sode&lt;/b&gt; is a Technical Account Manager at Amazon Web Services, focused on AI/ML. She helps customers design resilient, scalable, and secure cloud architectures while driving successful outcomes in their enterprise cloud environments. With a background in Machine Learning, she is passionate about helping organizations transform their AI aspirations into practical solutions.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-29797 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/06/david-rostcheck-photo.png" alt="David Rostcheck" width="100" height="140"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;David Rostcheck&lt;/h3&gt; 
  &lt;p&gt;&lt;b&gt;David Rostcheck&lt;/b&gt; is a Sr. Specialist Solutions Architect at Amazon Web Services, focused on AI/ML, Bedrock, and agent solutions. He enjoys helping our customers deliver effective AI-based solutions to production.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Human-in-the-loop constructs for agentic workflows in healthcare and life sciences</title>
		<link>https://aws.amazon.com/blogs/machine-learning/human-in-the-loop-constructs-for-agentic-workflows-in-healthcare-and-life-sciences/</link>
					
		
		<dc:creator><![CDATA[Pierre de Malliard]]></dc:creator>
		<pubDate>Wed, 08 Apr 2026 19:48:07 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Amazon Machine Learning]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Healthcare]]></category>
		<category><![CDATA[Life Sciences]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[AI/ML]]></category>
		<guid isPermaLink="false">7fa91e859bb6495386818795e261253aa02631ed</guid>

					<description>In healthcare and life sciences, AI agents help organizations process clinical data, submit regulatory filings, automate medical coding, and accelerate drug development and commercialization. However, the sensitive nature of healthcare data and regulatory requirements like Good Practice (GxP) compliance require human oversight at key decision points. This is where human-in-the-loop (HITL) constructs become essential. In this post, you will learn four practical approaches to implementing human-in-the-loop constructs using AWS services.</description>
										<content:encoded>&lt;p&gt;In healthcare and life sciences, AI agents help organizations process clinical data, submit regulatory filings, automate medical coding, and accelerate drug development and commercialization. However, the sensitive nature of healthcare data and regulatory requirements like Good Practice (GxP) compliance require human oversight at key decision points. This is where human-in-the-loop (HITL) constructs become essential. In this post, you will learn four practical approaches to implementing human-in-the-loop constructs using AWS services.&lt;/p&gt; 
&lt;h2&gt;Why human-in-the-loop matters in healthcare&lt;/h2&gt; 
&lt;p&gt;Healthcare and life sciences organizations face unique challenges when deploying AI agents:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Regulatory compliance –&lt;/strong&gt; GxP regulations require human oversight for sensitive operations. For example, deleting patient records or modifying clinical trial protocols can’t proceed without documented authorization.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Patient safety –&lt;/strong&gt; Medical decisions affecting patient care must have clinical validation before execution.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Audit requirements –&lt;/strong&gt; Healthcare systems need complete traceability of who approved what actions and when.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Data sensitivity –&lt;/strong&gt; Protected Health Information (PHI) requires explicit authorization before access or modification.&lt;/p&gt; 
&lt;p&gt;HITL constructs provide the necessary control points while maintaining the efficiency gains of agentic automation to meet these requirements.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;We present four complementary approaches to implementing HITL in agentic workflows. Each workflow is suited for different scenarios and risk profiles as described in our &lt;a href="https://aws.amazon.com/blogs/machine-learning/a-guide-to-building-ai-agents-in-gxp-environments/" target="_blank" rel="noopener noreferrer"&gt;guide to building AI agents in GxP Environments&lt;/a&gt;. We build these patterns using the &lt;a href="https://strandsagents.com/" target="_blank" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; framework, &lt;a href="https://aws.amazon.com/bedrock/agentcore/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt; Runtime, and the Model Context Protocol (MCP), with code examples that you can adapt for your own use cases.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Agentic Loop Interrupt (Agent Framework Hook System) –&lt;/strong&gt; We use the Strands Agent Framework Hooks to enforce the human-in-the-loop policy. With the hooks, we can intercept tool calls before their execution.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Tool Context Interrupt –&lt;/strong&gt; The human-in-the-loop approval logic can also be implemented within the tool logic directly for fine-grained, tool-specific control and flexibility. The session context can be used for custom approval logic.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Remote Tool Interrupt (AWS Step Functions) –&lt;/strong&gt; In some cases, one might want to send an approval request to a third party system or person asynchronously. We demonstrate this pattern by sending a notification to an external approver using Amazon Simple Notification Service (Amazon SNS). The agent session continues without blocking while approval proceeds in the background.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;MCP Elicitation –&lt;/strong&gt; The MCP protocol recently introduced elicitation, which is used by servers to request additional information from users through the client during interactions. The MCP’s native elicitation protocol allows for real-time, interactive approval using server-sent events (SSE) for stateful, two-way communication.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Architecture&lt;/h2&gt; 
&lt;p&gt;The solution architecture uses the Strands Agents Framework for agent lifecycle management and interrupt handling, deployed on Amazon Bedrock AgentCore Runtime for serverless scalability and session isolation. AWS Step Functions orchestrates asynchronous approval workflows with Amazon SNS, while MCP servers expose tools to the agent through the MCP—also deployed on AgentCore Runtime.&lt;/p&gt; 
&lt;h2&gt;Implementation details&lt;/h2&gt; 
&lt;p&gt;All the code for these architecture patterns is available publicly in the &lt;a href="https://github.com/aws-samples/sample-human-in-the-loop-patterns" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Each of the following methods demonstrates a self-contained approach. The agent deploys on Amazon Bedrock AgentCore Runtime with access to healthcare tools at different sensitivity levels. Low-risk operations, like looking up a patient’s name, execute without approval, while high-risk actions, like retrieving vitals or medical conditions, require human authorization. Operations such as patient discharge require external supervisor approval through email notification.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Method 1: Agentic loop hook local tool interrupt&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The Strands Agent Framework provides a &lt;strong&gt;hook system&lt;/strong&gt; that intercepts tool calls &lt;strong&gt;before&lt;/strong&gt; execution at the agent loop level. This enforces a blanket HITL policy across sensitive tools without modifying the tools themselves.&lt;/p&gt; 
&lt;p&gt;A &lt;code&gt;HookProvider&lt;/code&gt; registers a callback on &lt;code&gt;BeforeToolCallEvent&lt;/code&gt;. When a sensitive tool is invoked, the hook fires an &lt;code&gt;interrupt&lt;/code&gt;, pausing the agent loop until the human responds. The user can reply with “y” (approve once), “n” (deny), or “t” (trust—approve this tool for the rest of the session):&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;class ApprovalHook(HookProvider):
    SENSITIVE_TOOLS = ["get_patient_condition", "get_patient_vitals"]

    def register_hooks(self, registry: HookRegistry, **kwargs: Any) -&amp;gt; None:
        registry.add_callback(BeforeToolCallEvent, self.approve)

    def approve(self, event: BeforeToolCallEvent) -&amp;gt; None:
        tool_name = event.tool_use["name"]
        if tool_name not in self.SENSITIVE_TOOLS:
            return

        # Skip if user previously chose "trust always" for this tool
        approval_key = f"{tool_name}-approval"
        if event.agent.state.get(approval_key) == "t":
            return

        approval = event.interrupt(
            approval_key,
            reason={"reason": f"Authorize {tool_name} with args: {event.tool_use.get('input', {})}"},
        )
        if approval.lower() not in ["y", "yes", "t"]:
            event.cancel_tool = f"User denied permission to run {tool_name}"
            return

        if approval.lower() == "t":
            event.agent.state.set(approval_key, "t")  # trust tool for the rest of the session
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The hook is attached to the agent at construction—tools remain completely unaware of the approval logic:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;agent = Agent(
    hooks=[ApprovalHook()],
    tools=[get_patient_name, get_patient_condition, get_patient_vitals],
)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Method 2: Tool context interrupt&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Instead of a centralized hook, the approval logic is embedded directly inside each tool using&amp;nbsp;&lt;code&gt;tool_context.interrupt()&lt;/code&gt;. This gives fine-grained, per-tool control: each tool can implement its own access rules based on session context. In this example, the agent session carries a&amp;nbsp;&lt;code&gt;user_role&lt;/code&gt;. A shared&amp;nbsp;&lt;code&gt;check_access&lt;/code&gt;function enforces role-based access: In our code example, Non-Physicians are denied outright, while Physicians are prompted for approval: Like Method 1, the trust option caches approval for the session:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;def check_access(tool_context, patient_id: str, action: str):
    user_role = tool_context.agent.state.get("user_role") or "Non-Physician"

    if user_role != "Physician":
        return f"Access denied: {action} requires Physician role (current: {user_role})"

    approval_key = f"{action}-{patient_id}-approval"
    if tool_context.agent.state.get(approval_key) == "t":
        return None  # previously trusted

    approval = tool_context.interrupt(
        approval_key,
        reason={"reason": f"[{user_role}] Authorize {action} for patient {patient_id}"},
    )
    if approval.lower() not in ["y", "yes", "t"]:
        return f"Physician denied access to {action} for patient {patient_id}"

    if approval.lower() == "t":
        tool_context.agent.state.set(approval_key, "t")
    return None  # approved
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Method 3: Asynchronous tool approval using AWS Step Functions&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;In many enterprise scenarios, the approval flow requires authorization from a third-party approver who is not the person invoking the agent. This necessitates an asynchronous approval workflow that can operate independently of the agent session. One effective approach uses &lt;strong&gt;AWS Step Functions&lt;/strong&gt; to orchestrate these external approval processes.&lt;/p&gt; 
&lt;p&gt;In this pattern, the agent tool triggers a Step Functions workflow that sends an approval request to an external approver through email notification through Amazon SNS. The tool polls for the approval result and updates the agent session state accordingly. The user can also check the approval status later using a separate &lt;code&gt;check_discharge_status&lt;/code&gt; tool. The &lt;code&gt;discharge_patient&lt;/code&gt; tool starts the Step Functions execution and polls for the result:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;@tool(context=True)
def discharge_patient(tool_context, patient_id: str, reason: str) -&amp;gt; str:
    # Skip workflow if already approved in this session
    if tool_context.agent.state.get("external-approver-state") == "approved":
        return f"Patient {patient_id} discharged (pre-approved). Reason: {reason}"

    response = sfn_client.start_execution(
        stateMachineArn=state_machine_arn,
        input=json.dumps({"patient_id": patient_id, "action": "discharge", "reason": reason}),
    )
    return f"Waiting for approval. Execution ARN: {response['executionArn']}"
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;This asynchronous approach enables non-blocking operations where users aren’t forced to wait for approvals that can take hours or days, and agent execution can continue independently. Step Functions maintains detailed audit trails with complete execution history, persistent state management across session timeouts, and integration with existing enterprise communication channels like email, Slack, or Microsoft Teams. The user that starts a sensitive workflow will trigger a State Function: The agent returns a confirmation to the user that the workflow was launched. At all times, the user can check for a state update to make sure that the workflow completed.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Method 4: MCP elicitation &lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The MCP protocol recently introduced the elicitation protocol that allows MCP servers to request additional information or approval from users during tool execution. This approach follows protocol standards and provides a dynamic mechanism for prompting users at runtime without requiring parameters to be hardwired upfront. It can be used to authorize a tool call and include some business justification.&lt;/p&gt; 
&lt;p&gt;When a sensitive tool is called, the MCP server pauses execution and sends an approval prompt back through the MCP client to the end user. The user sees the prompt, makes a decision, and the server resumes—either proceeding with the operation or denying access. This two-way communication is enabled by MCP’s streamable HTTP transport, which maintains a stateful connection between client and server.&lt;/p&gt; 
&lt;p&gt;On the MCP server, the approval logic is a single &lt;code&gt;ctx.elicit()&lt;/code&gt; call inside each sensitive tool:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;@server.tool
async def get_patient_condition(patient_id: str, ctx: Context) -&amp;gt; str:
    """Get patient condition. Sensitive — requires approval via MCP elicitation."""
    result = await ctx.elicit(
        f"⚠ Approve access to SENSITIVE condition data for patient {patient_id}?"
    )
    if result.action != "accept":
        return f"Access to condition data for patient {patient_id} DENIED."
    return f"Patient {patient_id} condition: Hypertension Stage 2, Type 2 Diabetes"
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;On the agent side, an &lt;code&gt;elicitation_callback&lt;/code&gt; is registered with the MCP client. When the server calls &lt;code&gt;ctx.elicit(),&lt;/code&gt; this callback fires, relaying the approval prompt to the user and returning their decision back to the server. For local agents, this is a terminal prompt. For agents deployed on AgentCore Runtime, we use a WebSocket connection to relay the elicitation to the remote end user in real time:&lt;/p&gt; 
&lt;h2&gt;&lt;img loading="lazy" class="aligncenter wp-image-127007 " src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/Screenshot-2026-03-24-at-1.45.21 PM.png" alt="" width="504" height="212"&gt;&lt;/h2&gt; 
&lt;p&gt;This approach keeps the approval logic entirely within the MCP server’s tool definitions. The agent itself has no knowledge of which tools require approval, so you can add or modify approval requirements independently.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;You can use these human-in-the-loop (HITL) constructs to build safe, compliant AI agent deployments in healthcare and life sciences. By implementing the appropriate HITL pattern for your use case, you can deploy production-ready workflows that scale from pilot projects to enterprise-wide deployments. Start by identifying which operations in your workflow require human oversight. Then, select the HITL pattern that matches your approval requirements—centralized (Method 1), tool-specific (Method 2), asynchronous (Method 3), or real-time (Method 4).&lt;/p&gt; 
&lt;p&gt;For more information about Amazon Bedrock AgentCore, visit the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;hr&gt; 
&lt;h3&gt;About the author&lt;/h3&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-127008 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/profile_picture-100x150.jpeg" alt="" width="100" height="150"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Pierre de Malliard&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Pierre de Malliard&lt;/strong&gt; is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the Healthcare and Life Sciences Industry. Pierre has 10+ years of experience building Machine Learning Applications and Platforms. In his spare time, he enjoys playing the piano and enjoying nature.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building intelligent audio search with Amazon Nova Embeddings: A deep dive into semantic audio understanding</title>
		<link>https://aws.amazon.com/blogs/machine-learning/building-intelligent-audio-search-with-amazon-nova-embeddings-a-deep-dive-into-semantic-audio-understanding/</link>
					
		
		<dc:creator><![CDATA[Madhavi Evana]]></dc:creator>
		<pubDate>Wed, 08 Apr 2026 19:45:13 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Nova]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">42f7a3cdb13904d196748f642bd12a9fc2e82cc7</guid>

					<description>This post walks you through understanding audio embeddings, implementing Amazon Nova Multimodal Embeddings, and building a practical search system for your audio content. You'll learn how embeddings represent audio as vectors, explore the technical capabilities of Amazon Nova, and see hands-on code examples for indexing and querying your audio libraries. By the end, you'll have the knowledge to deploy production-ready audio search capabilities.</description>
										<content:encoded>&lt;p&gt;If you’re looking to enhance your content understanding and search capabilities, audio embeddings offer a powerful solution. In this post, you’ll learn how to use &lt;a href="https://aws.amazon.com/ai/generative-ai/nova/" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova Multimodal Embeddings&lt;/a&gt; to transform your audio content to searchable, intelligent data that captures acoustic features like tone, emotion, musical characteristics, and environmental sounds.&lt;/p&gt; 
&lt;p&gt;Finding specific content in these libraries presents real technical challenges. Traditional search methods like manual transcription, metadata tagging, and speech-to-text conversion work well for capturing and searching spoken words. However, these text-based approaches focus on linguistic content rather than acoustic properties like tone, emotion, musical characteristics, and environmental sounds. Audio embeddings address this gap. They represent your audio as dense numerical vectors in high-dimensional space that encode both semantic and acoustic properties. These representations let you perform semantic search using natural language queries, match similar-sounding audio, and automatically categorize content based on what it sounds like rather than just metadata tags. Amazon Nova Multimodal Embeddings, announced on October 28, 2025, is a multimodal embedding model available in Amazon Bedrock [1]. It’s the unified embedding model that supports text, documents, images, video, and audio through a single model for cross-modal retrieval with accuracy.&lt;/p&gt; 
&lt;p&gt;This post walks you through understanding audio embeddings, implementing Amazon Nova Multimodal Embeddings, and building a practical search system for your audio content. You’ll learn how embeddings represent audio as vectors, explore the technical capabilities of Amazon Nova, and see hands-on code examples for indexing and querying your audio libraries. By the end, you’ll have the knowledge to deploy production-ready audio search capabilities.&lt;/p&gt; 
&lt;h2&gt;Understanding Audio Embeddings: Core Concepts&lt;/h2&gt; 
&lt;h3&gt;Vector Representations for Audio Content&lt;/h3&gt; 
&lt;p&gt;Think of audio embeddings as a coordinate system for sound. Just as GPS coordinates pinpoint locations on Earth, embeddings map your audio content to specific points in high-dimensional space. Amazon Nova Multimodal Embeddings gives you four-dimension options: 3,072 (default), 1,024, 384, or 256 [1]. Each embedding is a float32 array. Individual dimensions encode acoustic and semantic features—rhythm, pitch, timbre, emotional tone, and semantic meaning—all learned through the model’s neural network architecture during training. Amazon Nova uses Matryoshka Representation Learning (MRL), a technique that structures embeddings hierarchically [1]. Think of MRL like Russian nesting dolls. A 3,072-dimension embedding contains all the information, but you can extract just the first 256 dimensions and still get accurate results. Generate embeddings once, then choose the size that balances accuracy with storage costs. No need to reprocess your audio when trying different dimensions— the hierarchical structure lets you truncate to your preferred size.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;How you measure similarity:&lt;/strong&gt; When you want to find similar audio, you compute cosine similarity between two embeddings v₁ and v₂ [1]:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;similarity = (v₁ · v₂) / (||v₁|| × ||v₂||)&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Cosine similarity measures the angle between vectors, giving you values from -1 to 1. Values closer to 1 indicate higher semantic similarity. When you store embeddings in a vector database, it uses distance metrics (distance = 1 – similarity) to perform k-nearest neighbor (k-NN) searches, retrieving the top-k most similar embeddings for your query.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Real-world example:&lt;/strong&gt; Suppose you have two audio clips—”a violin playing a melody” and “a cello playing a similar melody”—that generate embeddings v₁ and v₂. If their cosine similarity is 0.87, they cluster near each other in vector space, indicating strong acoustic and semantic relatedness. A different audio clip like “rock music with drums” generates v₃ with cosine similarity 0.23 to v₁, placing it far away in the embedding space.&lt;/p&gt; 
&lt;h3&gt;Audio Processing Architecture and Modalities&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Understanding the end-to-end workflow:&lt;/strong&gt; Before diving into technical details, let’s look at how audio embeddings work in practice. There are two main workflows:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127391" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ML-20119-image1.png" alt="" width="1261" height="681"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 1 – End-to-end audio embedding workflow&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Data ingestion and indexing flow:&lt;/strong&gt; During the ingestion phase, you process your audio library in bulk. You upload audio files to Amazon S3, then use the asynchronous API to generate embeddings. For long audio files (over 30 seconds), the model automatically segments them into smaller chunks with temporal metadata. You store these embeddings in a vector database along with metadata like filename, duration, and genre. This happens once for your entire audio library.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Runtime search flow:&lt;/strong&gt; When a user searches, you use the synchronous API to generate an embedding for their query—whether it’s text like “upbeat jazz piano” or another audio clip. Because queries are short, and users expect fast results, the synchronous API provides low-latency responses. The vector database performs a k-NN search to find the most similar audio embeddings, returning results with their associated metadata. This entire search happens in milliseconds.&lt;/p&gt; 
&lt;p&gt;When you submit audio-only inputs, temporal convolutional networks or transformer-based architectures analyze your acoustic signals for spectro-temporal patterns. Rather than working with raw waveforms, Amazon Nova operates on audio representations like mel-spectrograms or learned audio features, which allows efficient processing of high-sample-rate audio [1].Audio is sequential data that requires temporal context. Your audio segments (up to 30 seconds) pass through architectures with temporal receptive fields that capture acoustic patterns across time [1]. This approach captures rhythm, cadence, prosody, and long-range acoustic dependencies spanning multiple seconds—preserving the full richness of your audio content.&lt;/p&gt; 
&lt;h3&gt;API Operations and Request Structures&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;When to use synchronous embedding generation:&lt;/strong&gt; Use the &lt;code&gt;invoke_model&lt;/code&gt; API for runtime search when you need embeddings for real-time applications where latency matters [1]. For example, when a user submits a search query, the query text is short, and you want to provide a fast user experience—the synchronous API is ideal for this:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import boto3
import json
 
# Create the Bedrock Runtime client.
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
 
# Define the request body for a search query.
request_body = {
    "taskType": "SINGLE_EMBEDDING",  # Use for single items
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_RETRIEVAL",  # Use GENERIC_RETRIEVAL for queries
        "embeddingDimension": 1024,  # Choose dimension size
        "text": {
            "truncationMode": "END",  # How to handle long inputs
            "value": "jazz piano music"  # Your search query
        }
    }
}
 
# Invoke the Nova Embeddings model.
response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId="amazon.nova-2-multimodal-embeddings-v1:0",
    contentType="application/json"
)
 
# Extract the embedding from response.
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]  # float32 array
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Understanding request parameters:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;taskType&lt;/strong&gt;: Choose &lt;code&gt;SINGLE_EMBEDDING&lt;/code&gt; for single items or &lt;code&gt;SEGMENTED_EMBEDDING&lt;/code&gt; for chunked processing [1, 2]&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;embeddingPurpose&lt;/strong&gt;: Optimizes embeddings for your use case—&lt;code&gt;GENERIC_INDEX&lt;/code&gt; for indexing your content, &lt;code&gt;GENERIC_RETRIEVAL&lt;/code&gt; for queries, &lt;code&gt;DOCUMENT_RETRIEVAL&lt;/code&gt; for document search [1]&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;embeddingDimension&lt;/strong&gt;: Your output dimension choice (3072, 1024, 384, 256) [1]&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;truncationMode&lt;/strong&gt;: How to handle inputs exceeding context length—&lt;code&gt;END&lt;/code&gt; truncates at the end, START at beginning [1]&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;What you get back:&lt;/strong&gt; The API returns a JSON object containing your embedding:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;{
  "embeddings": [
    {
      "embedding": [0.123, -0.456, 0.789, ...],  // float32 array
      "embeddingLength": 1024
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;When to use asynchronous processing:&lt;/strong&gt; Amazon Nova Multimodal Embeddings supports two approaches for processing large volumes of content: the asynchronous API and the batch API. Understanding when to use each helps you optimize your workflow.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Asynchronous API:&lt;/strong&gt; Use the &lt;code&gt;start_async_invoke&lt;/code&gt; API when you need to process large individual audio or video files that exceed the synchronous API limits [1]. This is ideal for:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Processing single large files (multi-hour recordings, full-length videos)&lt;/li&gt; 
 &lt;li&gt;Files requiring segmentation (over 30 seconds)&lt;/li&gt; 
 &lt;li&gt;When you need results within hours but not immediately&lt;/li&gt; 
&lt;/ul&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;response = bedrock_runtime.start_async_invoke(
    modelId="amazon.nova-2-multimodal-embeddings-v1:0",
    modelInput=model_input,
    outputDataConfig={
        "s3OutputDataConfig": {"s3Uri": "s3://amzn-s3-demo-bucket/output/"}
    }
)
invocation_arn = response["invocationArn"]
# Poll job status
job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn)
status = job["status"]  # "InProgress" | "Completed" | "Failed"
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;When your job completes, it writes output to Amazon S3 in JSONL format (one JSON object per line). For AUDIO_VIDEO_COMBINED mode, you’ll find the output in &lt;code&gt;embedding-audio-video.jsonl&lt;/code&gt; [1].&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Batch API:&lt;/strong&gt; Use the batch inference API when you need to process thousands of audio files in a single job [3].&lt;/p&gt; 
&lt;p&gt;This is ideal for:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Bulk processing of your entire audio library (thousands to millions of files)&lt;/li&gt; 
 &lt;li&gt;Cost optimization through batch pricing&lt;/li&gt; 
 &lt;li&gt;Non-time-sensitive indexing operations where you can wait 24-48 hours&lt;/li&gt; 
 &lt;li&gt;Processing many small-to-medium sized files efficiently&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The batch API offers better cost efficiency for large-scale operations and handles job management automatically. You submit a manifest file with all your input files, and the service processes them in parallel, writing results to S3.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Choosing between async and batch:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Single large file or real-time segmentation needs?&lt;/strong&gt; → Use async API&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Thousands of files to process in bulk?&lt;/strong&gt; → Use batch API&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Need results within hours?&lt;/strong&gt; → Use async API&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Can wait 24-48 hours for cost savings?&lt;/strong&gt; → Use batch API&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Learn more about batch inference in the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-supported.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock batch inference documentation&lt;/a&gt;.[3]&lt;/p&gt; 
&lt;h3&gt;Segmentation and Temporal Metadata&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Why you need segmentation:&lt;/strong&gt; If your audio files exceed 30 seconds, you need to segment them [1]. Imagine you have a 2-hour podcast and want to find the specific 30-second segment where the host discusses AI—segmentation makes this possible.&lt;/p&gt; 
&lt;p&gt;You control chunking with the &lt;code&gt;segmentationConfig&lt;/code&gt; parameter:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;"segmentationConfig": {
    "durationSeconds": 15  # Generate one embedding every 15 seconds
}
&lt;/code&gt;This configuration processes a 5-minute audio file (300 seconds) into 20 segments (300 ÷ 15 = 20), generating 20 embeddings [1]. Each segment receives temporal metadata marking its position in your original file.&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Understanding segmented output:&lt;/strong&gt; The asynchronous API writes your segmented embeddings to JSONL with temporal metadata [1]:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;{"startTime": 0.0, "endTime": 15.0, "embedding": [...]}
{"startTime": 15.0, "endTime": 30.0, "embedding": [...]}
{"startTime": 30.0, "endTime": 45.0, "embedding": [...]}&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;How to parse segmented output:&lt;/strong&gt;&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import json
from boto3 import client
s3 = client("s3", region_name="us-east-1")
# Read JSONL file from S3
response = s3.get_object(Bucket="bucket", Key="output/embedding-audio-video.jsonl")
content = response['Body'].read().decode('utf-8')
segments = []
for line in content.strip().split('\n'):
    if line:
        segment = json.loads(line)
        segments.append({
            'start': segment['startTime'],
            'end': segment['endTime'],
            'embedding': segment['embedding'],
            'duration': segment['endTime'] - segment['startTime']
        })
print(f"Processed {len(segments)} segments")
print(f"First segment: {segments[0]['start']:.1f}s - {segments[0]['end']:.1f}s")
print(f"Embedding dimension: {len(segments[0]['embedding'])}")
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Real-world use case—temporal search:&lt;/strong&gt; You can store segmented embeddings with their temporal metadata in a vector database. When someone searches for “customer complaint about billing,” you retrieve the specific 15-second segments with timestamps, giving you precise navigation to relevant moments within multi-hour call recordings. There is no need to listen to the entire recording.&lt;/p&gt; 
&lt;h3&gt;Vector Storage and Indexing Strategies&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Referring to the architecture:&lt;/strong&gt; In Section 2.2, we showed you the end-to-end workflow diagram. Now we’re diving deeper into the &lt;strong&gt;Vector Database&lt;/strong&gt; component—the storage layer where your embeddings live during both the ingestion phase and the runtime search phase. This is the critical component that connects your indexed audio embeddings to fast search queries.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Understanding your storage requirements:&lt;/strong&gt; Embeddings are float32 arrays requiring 4 bytes per dimension. Here’s what you’ll need:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;3,072 dimensions&lt;/strong&gt;: 12,288 bytes (12 KB) per embedding&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;1,024 dimensions&lt;/strong&gt;: 4,096 bytes (4 KB) per embedding&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;384 dimensions&lt;/strong&gt;: 1,536 bytes (1.5 KB) per embedding&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;256 dimensions&lt;/strong&gt;: 1,024 bytes (1 KB) per embedding&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Example calculation:&lt;/strong&gt; For 1 million audio clips with 1,024-dimensional embeddings, you need 4 GB of vector storage (excluding metadata and index structures).&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Choosing your dimension size:&lt;/strong&gt; Larger dimensions give you more detailed representations but require more storage and computation. Smaller dimensions offer a practical balance between retrieval performance and resource efficiency. Start with 1,024 dimensions—it provides excellent accuracy for most applications while keeping costs manageable.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Using Amazon S3 Vectors:&lt;/strong&gt; You can store and query your embeddings using Amazon S3 Vectors [2]:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;s3vectors = boto3.client("s3vectors", region_name="us-east-1")
# Create vector index
s3vectors.create_index(
    vectorBucketName="audio-vectors",
    indexName="audio-embeddings",
    dimension=1024,
    dataType="float32",
    distanceMetric="cosine"
)
# Store embedding with metadata
s3vectors.put_vectors(
    vectorBucketName="audio-vectors",
    indexName="audio-embeddings",
    vectors=[{
        "key": "audio:track_12345",
        "data": {"float32": embedding},
        "metadata": {
            "filename": "track_12345.mp3",
            "duration": 180.5,
            "genre": "jazz",
            "upload_date": "2025-10-28"
        }
    }]
)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;How metadata enhances your search:&lt;/strong&gt; Metadata attributes work alongside embeddings to provide richer search results. When you retrieve results from the vector database, the metadata helps you filter, sort, and display information to users. For example, the genre field lets you filter results to only jazz recordings, duration helps you find tracks within a specific length range, and filename provides the path to the actual audio file for playback. The &lt;code&gt;upload_date&lt;/code&gt; can help you prioritize recent content or track data freshness. This combination of semantic similarity (from embeddings) and structured metadata creates a powerful search experience.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Querying your vectors:&lt;/strong&gt; k-NN search retrieves the top-k most similar vectors [2]:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;    vectorBucketName="audio-vectors",
    indexName="audio-embeddings",
    queryVector={"float32": query_embedding},
    topK=10,  # Return 10 most similar results
    returnDistance=True,
    returnMetadata=True
)
for result in response["vectors"]:
    print(f"Key: {result['key']}")
    print(f"Distance: {result['distance']:.4f}")  # Lower = more similar
    print(f"Metadata: {result['metadata']}")
 
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Using Amazon OpenSearch Service:&lt;/strong&gt; OpenSearch provides native k-NN search with HNSW (Hierarchical Navigable Small World) indexes for sub-linear query time complexity [1]. This means your searches stay fast even as your audio library grows to millions of files.&lt;/p&gt; 
&lt;p&gt;Index configuration:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;  "mappings": {
    "properties": {
      "audio_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib",
          "parameters": {"ef_construction": 512, "m": 16}
        }
      },
      "metadata": {"type": "object"}
    }
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;h3&gt;&lt;strong&gt;Batch Optimization and Production Patterns&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Why batch processing matters:&lt;/strong&gt; When you process multiple audio files, batch inference improves throughput by reducing network latency overhead [1]. Instead of making separate API calls for each file, you can process them more efficiently.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Example batch pattern:&lt;/strong&gt;&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;texts = ["jazz music", "rock music", "classical music"]
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(
        body=json.dumps({
            "taskType": "SINGLE_EMBEDDING",
            "singleEmbeddingParams": {
                "embeddingDimension": 1024,
                "text": {"truncationMode": "END", "value": text}
            }
        }),
        modelId="amazon.nova-2-multimodal-embeddings-v1:0",
        contentType="application/json"
    )
    embedding = json.loads(response["body"].read())["embeddings"][0]["embedding"]
    vectors.append(embedding)
# Batch write to vector store
s3vectors.put_vectors(
    vectorBucketName="audio-vectors",
    indexName="audio-embeddings",
    vectors=[
        {"key": f"text:{text}", "data": {"float32": emb}}
        for text, emb in zip(texts, vectors)
    ]
)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Multilingual support:&lt;/strong&gt; The model supports text inputs in 200+ languages [1]. This supports powerful cross-modal search scenarios: your customers can search in Spanish for audio content indexed in English, or vice versa. The embeddings capture semantic meaning across languages.&lt;/p&gt; 
&lt;h2&gt;Amazon Nova Audio Multimodal Embeddings Deep Dive&lt;/h2&gt; 
&lt;h3&gt;Technical Specifications&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Model architecture:&lt;/strong&gt; Amazon Nova Multimodal Embeddings is built on a foundation model trained to understand relationships across different modalities—text, images, documents, video, and audio—within a unified embedding space.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Flexible embedding dimensions:&lt;/strong&gt; You get four output dimension options: 3,072, 1,024, 384, and 256. Larger dimensions provide more detailed representations but require more storage and computation. Smaller dimensions offer a practical balance between retrieval performance and resource efficiency. This flexibility helps you optimize for your specific application and cost requirements.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Media processing capabilities:&lt;/strong&gt; For video and audio inputs, the model supports segments of up to 30 seconds, and automatically segments longer files [1]. This segmentation capability is particularly useful when you work with large media files—the model splits them into manageable pieces and creates embeddings for each segment. The output includes embeddings for your video and audio files with temporal metadata.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;API flexibility:&lt;/strong&gt; You can access the model through both synchronous and asynchronous APIs. Use synchronous APIs for querying where latency matters. Use asynchronous APIs for data ingestion and indexing where you can tolerate longer processing times. The asynchronous API supports batch segmentation/chunking for text, audio, and video files. Segmentation refers to splitting a long file into smaller chunks, each of which creates a unique embedding, allowing for fine-grained and more accurate retrieval.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Input methods:&lt;/strong&gt; You can pass content to embed by specifying an S3 URI or inline as a base64 encoding. This gives you flexibility in how you integrate embeddings into your workflow.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;How the workflow works:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;You use Amazon Nova Multimodal Embeddings to generate embeddings for your video or audio clips&lt;/li&gt; 
 &lt;li&gt;You store the embeddings in a vector database&lt;/li&gt; 
 &lt;li&gt;When your end-user searches for content, you use Amazon Nova to generate an embedding for their search query&lt;/li&gt; 
 &lt;li&gt;Your application compares how similar the search query embedding is to your indexed content embeddings&lt;/li&gt; 
 &lt;li&gt;Your application retrieves the content that best matches the search query based on a similarity metric (such as cosine similarity)&lt;/li&gt; 
 &lt;li&gt;You show the corresponding content to your end-user&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;strong&gt;Supported inputs:&lt;/strong&gt; Your inputs to generate embeddings can be in text, image, document image, video, or audio form. The inputs refer to both the items you use to create the index and the end-user search queries. The model outputs embeddings which you use to retrieve the assets that best match the query to display to your end-user.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Audio format support:&lt;/strong&gt; Amazon Nova Multimodal Embedding currently supports mp3, wav, and ogg as input formats. These formats cover most common audio use cases from music to speech recordings.&lt;/p&gt; 
&lt;h3&gt;Key Capabilities&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Audio-to-Audio search:&lt;/strong&gt; Find acoustically similar content in your library. For example, find all recordings with similar musical characteristics or speaking styles.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Text-to-Audio search:&lt;/strong&gt; Use natural language queries to retrieve relevant audio segments. Search for “upbeat jazz piano” or “customer expressing frustration” and get back matching audio clips.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Cross-modal retrieval:&lt;/strong&gt; Search across images, audio, video, and text simultaneously. This unified approach means you can use one query to search your entire content library regardless of format.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Temporal understanding:&lt;/strong&gt; The model recognizes actions and events within audio over time. This lets you search for specific moments within long recordings.&lt;/p&gt; 
&lt;h3&gt;When to Choose Amazon Nova&lt;/h3&gt; 
&lt;p&gt;Amazon Nova Multimodal Embeddings is designed for production applications requiring scalable performance, rapid deployment, and minimal operational overhead.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Why choose Amazon Nova:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Speed to market&lt;/strong&gt;: Deploy in hours or days, not months&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Managed service&lt;/strong&gt;: No infrastructure to maintain or models to train&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cross-modal capabilities&lt;/strong&gt;: One model for all your content types with enterprise level deployment support&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Continuous improvements&lt;/strong&gt;: Benefit from model updates without migration work&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Decision factors to consider:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Scale requirements&lt;/strong&gt;: How many audio files and queries do you need to handle&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Time-to-market&lt;/strong&gt;: How quickly do you need a working solution&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Expertise availability&lt;/strong&gt;: Do you have engineering team to maintain custom models&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Integration needs&lt;/strong&gt;: Do you need seamless AWS service integration&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Core application domains:&lt;/strong&gt; Amazon Nova Multimodal Embeddings serves a wide range of applications optimized for multimodal RAG, semantic search, and clustering:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Agentic Retrieval-Augmented Generation (RAG):&lt;/strong&gt; You can use Amazon Nova Multimodal Embeddings for RAG-based applications where the model serves as the embedding for the retrieval task. Your input can be text from documents, images, or document images that interleave text with infographics, video, and audio. The embedding lets you retrieve the most relevant information from your knowledge base that you can provide to an LLM system for improved responses.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Semantic Search:&lt;/strong&gt; You can generate embeddings from text, images, document images, video, and audio to power search applications stored in a vector index. A vector index is a specialized embedding space that reduces the number of comparisons needed to return effective results. Because the model captures the nuance of your user’s query within the embedding, it supports advanced search queries that don’t rely on keyword matching. Your users can search for concepts, not just exact words.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Clustering:&lt;/strong&gt; You can use Amazon Nova Multimodal Embeddings to generate embeddings from text, images, document images, video, and audio. Clustering algorithms can group together items that are close to each other based on distance or similarity. For example, if you work in media management and want to categorize your media assets across similar themes, you can use the embeddings to cluster similar assets together without needing metadata for each asset. The model understands content similarity automatically.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we explored how Amazon Nova Multimodal Embeddings enables semantic audio understanding beyond traditional text-based approaches. By representing audio as high-dimensional vectors that capture both acoustic and semantic properties, you can build search systems that understand tone, emotion, and context not just spoken words. We covered the end-to-end workflow for building an audio search system, including:- Generating embeddings using synchronous and asynchronous APIs- Segmenting long audio files with temporal metadata- Storing embeddings in a vector database- Performing k-NN search to retrieve relevant audio segments. This approach allows you to transform large audio libraries into searchable, intelligent datasets that support use cases such as call center analysis, media search, and content discovery.&lt;/p&gt; 
&lt;p&gt;In our implementation, we took a real-world scenario embedding call center recordings and used Amazon Nova Multimodal Embeddings model to make them searchable by both sentiment and content. Instead of manually tagging calls, we used text queries such as: “Find a call where the speaker sounds angry” or “Show me a conversation about billing issues.” It worked, pulling out the right audio clips on demand. In other words, we turned audio archives into a searchable experience by both tone and topic without the hassle. For those who want to dive deeper, you can see our code samples and snippets linked in the final section.&lt;/p&gt; 
&lt;h2&gt;References&lt;/h2&gt; 
&lt;p&gt;[1] &lt;a href="https://aws.amazon.com/blogs/aws/amazon-nova-multimodal-embeddings-now-available-in-amazon-bedrock/" target="_blank" rel="noopener noreferrer"&gt;Blog on Amazon Nova Multimodal Embeddings&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;[2] &lt;a href="https://usc-word-edit.officeapps.live.com/we/Nova%20Embeddings" target="_blank" rel="noopener noreferrer"&gt;Nova Embeddings&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;[3] &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-supported.html" target="_blank" rel="noopener noreferrer"&gt;Supported Regions and models for batch inference&lt;/a&gt;&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127170 alignnone" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/26/ML-19828-image-5.png" alt="" width="150" height="147"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Madhavi Evana&lt;/h3&gt; 
  &lt;p&gt;Madhavi Evana is a Solutions Architect at Amazon Web Services, where she guides Enterprise banking customers through their cloud transformation journeys. She specializes in Artificial Intelligence and Machine Learning, with a focus on Speech-to-speech translation, video analysis and synthesis, and natural language processing (NLP) technologies.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;img loading="lazy" class="size-full wp-image-127406 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/dan-kolodny.jpeg" alt="" width="100" height="133"&gt;&lt;/h3&gt; 
  &lt;h3 class="lb-h4"&gt;Dan Kolodny&lt;/h3&gt; 
  &lt;p&gt;Dan Kolodny is an AWS Solutions Architect specializing in big data, analytics, and GenAI. He is passionate about helping customers adopt best practices, discover insights from their data, and embrace new GenAI technologies.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="wp-image-127410 alignnone" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/fahisajj-1-100x133.jpg" alt="" width="108" height="144"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Fahim Sajjad&lt;/h3&gt; 
  &lt;p&gt;Fahim is a Solutions Architect at Amazon Web Services (AWS) working with Enterprise AWS customers providing them with technical guidance and helping achieve their business goals. He has an area of specialization in AI/ML technology, Data Strategy and Advertising and Marketing.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Reinforcement fine-tuning on Amazon Bedrock: Best practices</title>
		<link>https://aws.amazon.com/blogs/machine-learning/reinforcement-fine-tuning-on-amazon-bedrock-best-practices/</link>
					
		
		<dc:creator><![CDATA[Nick McCarthy]]></dc:creator>
		<pubDate>Wed, 08 Apr 2026 19:43:28 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<guid isPermaLink="false">838d0a2843a9c43701783ddd1c41c77523df9a8b</guid>

					<description>In this post, we explore where RFT is most effective, using the GSM8K mathematical reasoning dataset as a concrete example. We then walk through best practices for dataset preparation and reward function design, show how to monitor training progress using Amazon Bedrock metrics, and conclude with practical hyperparameter tuning guidelines informed by experiments across multiple models and use cases.</description>
										<content:encoded>&lt;p&gt;You can use reinforcement Fine-Tuning (RFT) in &lt;a href="https://aws.amazon.com/bedrock/?trk=7ecf60df-6136-414c-a7c3-6aa4d2d6019f&amp;amp;sc_channel=ps&amp;amp;ef_id=CjwKCAiAnoXNBhAZEiwAnItcG_quu7odGWcZPLfH1XE3QJu1ybzUZZ6RDd9R5rmqzjyIE5KnOvhfKxoCTtwQAvD_BwE:G:s&amp;amp;s_kwcid=AL!4422!3!795877020842!e!!g!!amazon%20bedrock!23532472972!194311072004&amp;amp;gad_campaignid=23532472972&amp;amp;gbraid=0AAAAADjHtp8BzKFnYuFMrdXAUbbzIgUDa&amp;amp;gclid=CjwKCAiAnoXNBhAZEiwAnItcG_quu7odGWcZPLfH1XE3QJu1ybzUZZ6RDd9R5rmqzjyIE5KnOvhfKxoCTtwQAvD_BwE" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; to customize Amazon Nova and supported open source models by defining what “good” looks like—no large labeled datasets required. By learning from reward signals rather than static examples, RFT delivers up to 66% accuracy gains over base models at reduced customization cost and complexity. This post covers best practices for RFT on Amazon Bedrock, from dataset design, reward function strategy, and hyperparameter tuning for use cases like code generation, structured extraction, and content moderation.&lt;/p&gt; 
&lt;p&gt;In this post, we explore where RFT is most effective, using the &lt;a href="https://huggingface.co/datasets/openai/gsm8k" target="_blank" rel="noopener noreferrer"&gt;GSM8K&lt;/a&gt; mathematical reasoning dataset as a concrete example. We then walk through best practices for dataset preparation and reward function design, show how to monitor training progress using Amazon Bedrock metrics, and conclude with practical hyperparameter tuning guidelines informed by experiments across multiple models and use cases.&lt;/p&gt; 
&lt;h2&gt;RFT use-cases: Where can RFT shine?&lt;/h2&gt; 
&lt;p&gt;Reinforcement Fine-Tuning (RFT) is a model customization technique that improves foundation model (FM) behavior using reward signals. Compared to supervised fine-tuning (SFT), it doesn’t directly train on correct responses (labeled I/O pairs). Instead, RFT uses a dataset of inputs and a reward function. The reward function can be rule-based or another trained grader model, or large language model (LLM) as a judge. During training, the model generates candidate responses and the reward function scores each response. Based on the reward, the model weights are updated to increase the probability of generating responses that receive a high reward. This iterative cycle of sample responses, score responses, and update weights steers the model to learn which behaviors lead to better outcomes. RFT is particularly valuable when the desired behavior can be evaluated, but difficult to demonstrate—whether because labeled data is impractical to curate or because static examples alone can’t capture the reasoning a task demands. It excels in two primary areas:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Tasks where a rule or test can verify correctness automatically&lt;/li&gt; 
 &lt;li&gt;Subjective tasks where another model can effectively evaluate response quality&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Tasks in the first category are&amp;nbsp;code generation that must pass tests, math reasoning with verifiable answers, structured data extraction that must match strict schemas, or API/tool calls that must parse and execute correctly. Because success criteria can be translated directly into reward signals, the model can discover stronger strategies than what a small set of labeled examples could teach. This pattern is known as &lt;a href="https://www.emergentmind.com/topics/rl-with-verifiable-rewards-rlvr" target="_blank" rel="noopener noreferrer"&gt;Reinforcement Learning with Verifiable Rewards (RLVR)&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;In addition, RFT suits subjective tasks such as content moderation, chatbots, creative writing, or summarization that lack easily quantifiable correctness. A judge model, guided by a detailed evaluation rubric, can serve as the reward function. It scores outputs against criteria that would be impractical to encode as static training pairs. This approach is known as &lt;a href="https://aws.amazon.com/blogs/machine-learning/fine-tune-large-language-models-with-reinforcement-learning-from-human-or-ai-feedback/" target="_blank" rel="noopener noreferrer"&gt;Reinforcement Learning with AI Feedback (RLAIF)&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;For RFT in Amazon Bedrock, you can implement both rule-based and model-based approaches as a &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/reward-functions-open-weight.html" target="_blank" rel="noopener noreferrer"&gt;custom AWS Lambda function&lt;/a&gt;, which is the reward function that Amazon Bedrock calls during the training loop.&lt;/p&gt; 
&lt;p&gt;A comparison of these two approaches is depicted in the following diagram:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126417" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/16/ml-20097-image-1.jpg" alt="" width="1668" height="524"&gt;&lt;/p&gt; 
&lt;p&gt;The following are a few common use cases that can be tackled through RLVR, RLAIF, or a combination of both.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Use Case&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Reward Signal&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Code generation for production services&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Unit-test pass rates, linting, and runtime checks&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Tool and API orchestration&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Successful end-to-end task completion (like, booking flows, data retrieval pipelines)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Complex math and algorithmic reasoning&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Correct final answers and/or intermediate verification steps&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Structured data extraction and transformation&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Schema validation, exact matches, penalties for malformed outputs&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;SQL / query synthesis over databases&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Query results matching expected answers or satisfying runtime properties&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Agentic workflows&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Combination of RLVR and RLAIF; RLVR for tool calling correctness; RLAIF for final task completion, for example, measured as usefulness, correctness, or robustness&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;GSM8K: Using RFT to improve solutions to mathematical calculations&lt;/h2&gt; 
&lt;p&gt;To illustrate how reinforcement fine-tuning works in practice, we can examine a concrete example: improving a model’s ability to solve mathematical reasoning problems. RFT is useful for mathematical problems because solutions can often be objectively verified, making it possible to design clear reward signals that guide the model toward correct reasoning and structured outputs. Let’s look at an example&amp;nbsp;from the &lt;a href="https://huggingface.co/datasets/openai/gsm8k" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;GSM8K (Grade School Math 8K)&lt;/strong&gt;&lt;/a&gt; dataset:&lt;/p&gt; 
&lt;p&gt;Tina makes $18.00 an hour. If she works more than 8 hours per shift, she is eligible for overtime, which is paid by your hourly wage + 1/2 your hourly wage. If she works 10 hours every day for 5 days, how much money does she make?&lt;/p&gt; 
&lt;p&gt;Let’s look at what an ideal response might look like:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;&amp;lt;begin_internal_thought&amp;gt;
I need to find total pay for 5 days of 10-hour shifts. Because she works over 8 hours daily, I'll need to split each day into regular and overtime hours, calculate the overtime rate (1.5x regular), then multiply by 5 days.
&amp;lt;/end_internal_thought&amp;gt;

&amp;lt;begin_of_solution&amp;gt;
Overtime rate: $18.00 + (1/2 × $18.00) = $27.00/hour

Daily earnings (10 hours):
&amp;nbsp;&amp;nbsp;Regular (8 hours): &amp;nbsp;8 × $18 = $144
&amp;nbsp;&amp;nbsp;Overtime (2 hours): 2 × $27 = $54
&amp;nbsp;&amp;nbsp;Daily total: $198

Total for 5 days: 5 × $198 = $990

\boxed{990}
&amp;lt;/end_of_solution&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Here, we see that the problem is broken down into logical steps and shows clear reasoning paths, not only final answers. Additionally, we would like the model to respond in this specific format and have the answer exactly match the ground truth solution.&amp;nbsp;Other fine-tuning methods like SFT struggle with mathematical reasoning because they primarily learn to pattern-match training data rather than truly reason. These models can memorize solution templates but often fail when presented with novel variations of a problem.&lt;/p&gt; 
&lt;p&gt;Because we can use RFT to define reward functions, exact answers like the previous answer of &lt;code&gt;$990&lt;/code&gt; can be objectively evaluated while also assigning partial credit for correct intermediate reasoning steps. This enables the model to discover valid solution approaches while learning to follow required structured, and in many cases achieves strong performance with relatively small datasets (around 100–1000 examples).&lt;/p&gt; 
&lt;h2&gt;Best practices for preparing Your dataset&lt;/h2&gt; 
&lt;p&gt;RFT requires carefully prepared datasets to achieve effective results. On Amazon Bedrock, RFT training data is provided as a JSONL file, with each record following the OpenAI chat completion format.&lt;/p&gt; 
&lt;h3&gt;Dataset size guidelines&lt;/h3&gt; 
&lt;p&gt;RFT supports dataset sizes between 100–10,000 training samples, though requirements vary depending on task complexity and reward function design. Tasks involving complex reasoning, specialized domains, or broad application scopes generally benefit from larger datasets and a sophisticated reward function. For initial experimentation, start with a small dataset (100–200 examples) to validate that your prompts and reward function produce meaningful learning signals and that the base model can achieve measurable reward improvements. Note that for certain domains, only customizing on small datasets can yield limited generalization and show inconsistent results across prompt variations. Typical implementations using 200–5,000 examples provide stronger generalization and more consistent performance across prompt variations. For more complex reasoning tasks, specialized domains, or sophisticated reward functions, 5,000–10,000 examples can improve robustness across diverse inputs.&lt;/p&gt; 
&lt;p&gt;For more information about the dataset requirements, see the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/rft-prepare-data.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Dataset quality principles&lt;/h3&gt; 
&lt;p&gt;The quality of your training data fundamentally determines RFT outcomes. Consider the following principles when preparing your dataset:&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;&lt;strong&gt;1. Prompt distribution&lt;/strong&gt;&lt;br&gt; Make sure that the dataset reflects the full range of prompts that the model will encounter in production. A skewed dataset can lead to poor generalization or unstable training behavior.&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;&lt;strong&gt;2. Base model capability&lt;/strong&gt;&lt;br&gt; RFT assumes that the base model demonstrates basic task understanding. If the model can’t achieve a non-zero reward on your prompts, the learning signal will be too weak for effective training. A simple validation step is generating several responses from the base model (like,&amp;nbsp;&lt;code&gt;temperature ≈ 0.6&lt;/code&gt;) and confirming that the outputs produce meaningful reward signals.&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;&lt;strong&gt;3. Clear prompt design&lt;/strong&gt;&lt;br&gt; Prompts should clearly communicate expectations and constraints. Ambiguous instructions lead to inconsistent reward signals and degraded learning. Prompt structure should also align with reward function parsing. For example, requiring final answers after a specific marker or enforcing code blocks for programming tasks, as well as the prompt structure that the base model is familiar with from pre-training.&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;&lt;strong&gt;4. Reliable reference answers&lt;/strong&gt;&lt;br&gt; When possible, include a reference answer that represents the desired output pattern, formatting, and correctness criteria. Reference answers anchor reward computation and reduce noise in the learning signal. For example, mathematical tasks might include a correct numerical answer, while coding tasks might include unit tests or input-output pairs.&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;It’s also good practice to validate reference answers by confirming that a response aligned with the ground truth receives the maximum reward score.&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;&lt;strong&gt;5. Consistent reward signals within the data&lt;/strong&gt;&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;Because RFT relies entirely on reward signals to guide learning, the quality of those signals is critical. Your dataset and reward function should work together to produce consistent, well-differentiated scores. This means that strong responses reliably score higher than weak ones across similar inputs. If the reward function can’t clearly distinguish between good and poor responses, or if similar outputs receive widely varying scores, the model might learn the wrong patterns or fail to improve altogether.&lt;/p&gt; 
&lt;p&gt;In the next section you will learn what to keep in mind when writing your reward function.&lt;/p&gt; 
&lt;h3&gt;Preparing your reward function&lt;/h3&gt; 
&lt;p&gt;Reward functions are central to RFT because they evaluate and score model responses, assigning higher rewards to preferred outputs and lower rewards to less desirable ones. This feedback guides the model toward improved behavior during training. For objective tasks like mathematical reasoning, a candidate response that produces the correct answer might receive a reward of &lt;strong&gt;1&lt;/strong&gt;, while an incorrect answer receives &lt;strong&gt;0&lt;/strong&gt;. A response with a partially correct reasoning trace and an incorrect final answer might get a reward of &lt;strong&gt;0.8&lt;/strong&gt;&amp;nbsp;(depending on how much you want to penalize an incorrect final response). For subjective tasks, the reward function encodes desired qualities. For example, in summarization it might capture faithfulness, coverage, and clarity. For more information about setting up your reward function, see&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/reward-functions.html" target="_blank" rel="noopener noreferrer"&gt;setting up reward functions for Amazon Nova models&lt;/a&gt;.&lt;/p&gt; 
&lt;h4&gt;Reward design for verifiable tasks&lt;/h4&gt; 
&lt;p&gt;For tasks that can be deterministically verified, like math reasoning or coding, the simplest approach is to programmatically check correctness. Effective reward functions typically evaluate both format constraints and performance objectives. Format checks make sure that the responses can be reliably parsed and evaluated. Performance metrics determine whether the result is correct. Rewards can be implemented using binary signals (correct compared to incorrect) or continuous scoring depending on the task.&lt;/p&gt; 
&lt;p&gt;For GSM8K-style mathematical reasoning tasks, reward functions must also account for how models express numerical answers. Models can format numbers with commas, currency symbols, percentages, or embed answers within explanatory text. To address this, answers should be normalized by stripping formatting characters and applying flexible extraction that prioritizes structured formats before falling back to pattern matching. This approach makes sure that the models are rewarded for correct reasoning rather than penalized for stylistic formatting choices.&amp;nbsp;You can find the full reward function implementation for GSM8K in the &lt;a href="https://github.com/aws-samples/amazon-bedrock-samples/blob/main/custom-models/bedrock-reinforcement-fine-tuning/reward-functions/gsm8k_rew_func.py" target="_blank" rel="noopener noreferrer"&gt;amazon-bedrock-samples GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;h4&gt;Reward design for non-verifiable tasks&lt;/h4&gt; 
&lt;p&gt;Tasks like summarization, creative writing, or semantic alignment require an LLM-based judge to approximate subjective preferences. In this setting, the judge prompt effectively acts as the reward function, defining what behaviors are rewarded and how responses are scored. A practical judge prompt should clearly define the evaluation goal and include a concise scoring rubric with numeric scales reflecting the qualities the model should improve for.&lt;/p&gt; 
&lt;p&gt;Judge prompts should also return structured outputs, for example JSON or tagged formats containing the final score and optional reasoning, so reward values can be reliably extracted during training while maintaining observability into how each response was evaluated. An example of a reward function that utilizes AI feedback can be seen in this &lt;a href="https://github.com/aws-samples/amazon-bedrock-samples/blob/main/custom-models/bedrock-reinforcement-fine-tuning/reward-functions/pandalm_rew_func.py" target="_blank" rel="noopener noreferrer"&gt;PandaLM reward function script in GitHub&lt;/a&gt;.&lt;/p&gt; 
&lt;h4&gt;Combining verifiable rewards with AI feedback&lt;/h4&gt; 
&lt;p&gt;Reward functions for verifiable tasks can also be augmented with AI feedback to evaluate solution quality beyond numerical correctness. For example, an LLM-as-a-judge can assess the reasoning chain, verify intermediate calculations, or evaluate the clarity of explanations, providing a reward signal that captures both correctness and reasoning quality.&lt;/p&gt; 
&lt;h4&gt;Iterating on reward design&lt;/h4&gt; 
&lt;p&gt;Reward functions often require iteration. Early versions might produce noisy signals or during the training loop the model might learn to exploit the reward function to generate a high reward without learning the desired behavior. Refining the reward logic based on observed training behavior is essential. Before launching full training jobs, it’s also good practice to test reward functions independently using sample prompts and known outputs to ensure that the scoring logic produces stable and meaningful reward signals.&lt;/p&gt; 
&lt;h3&gt;Evaluating training progress: signals that the model is learning&lt;/h3&gt; 
&lt;p&gt;After your dataset and reward function are ready, you can launch RFT training using either the Amazon Bedrock API or through the console. The exact workflow depends on your preferred development environment. The &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/rft-submit-job.html" target="_blank" rel="noopener noreferrer"&gt;Create and manage fine-tuning jobs for Amazon Nova models&lt;/a&gt; topic in the Amazon Bedrock User Guide provides step-by-step instructions for both approaches. After training begins, monitoring the training metrics is critical. These signals indicate whether the reward function is meaningful and whether the model is learning useful behaviors rather than overfitting or collapsing to trivial strategies. The following image shows the training metrics of one of our GSM8K training run showing healthy training dynamics.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126418" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/16/ml-20097-image-2.jpg" alt="" width="2160" height="859"&gt;&lt;/p&gt; 
&lt;p&gt;Training rewards plots the average reward score at each training step. Variance is expected because the input prompts in a batch are sampled randomly so difficulty in batches differ. In addition, the model is exploring different strategies leading to variance. What matters is the overall trend: rewards increase from roughly 0.5 to around 0.8–0.9, indicating that the model is converging on receiving higher rewards. Validation rewards provide a clearer signal because they are computed on a held-out dataset. Here we see a steep improvement during the first ~40 steps followed by a plateau around 0.88, suggesting the model is generalizing rather than memorizing training examples. Validation rewards that track closely with training rewards are typically a sign that overfitting isn’t occurring.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126419" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/16/ml-20097-image-3.jpg" alt="" width="2158" height="696"&gt;&lt;/p&gt; 
&lt;p&gt;Training episode length measures the average response length. The drop from roughly 625 tokens to ~400 tokens suggests that the model is learning to reach correct answers more efficiently, producing less redundant reasoning as training progresses. Policy entropy measures how much the model is exploring different response strategies during training. Values in the 0.8–1.1 range indicate healthy exploration. If entropy collapsed toward zero it would suggest the model had prematurely converged, but sustained entropy implies the model is still exploring and improving.&lt;/p&gt; 
&lt;h2&gt;Hyperparameter tuning guidelines&lt;/h2&gt; 
&lt;p&gt;In this section, we cover practical hyperparameter tuning guidelines for Amazon Bedrock RFT. These recommendations are informed by a series of internal experiments that we ran across multiple models and use cases. This includes reasoning tasks like GSM8K and other structured and generative workloads. While effective values will vary by task, the patterns observed across these experiments provide useful starting points when configuring RFT jobs. For more information about the hyperparameters that you can configure before launching an RFT customization job, see the &lt;a href="https://docs.aws.amazon.com/boto3/latest/reference/services/bedrock/client/create_model_customization_job.html" target="_blank" rel="noopener noreferrer"&gt;official boto3 docs&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;EpochCount&lt;/h3&gt; 
&lt;p&gt;Training duration and &lt;code&gt;epochCount&lt;/code&gt; require adjustment based on dataset size and model behavior. Smaller datasets often show continued improvement through 6-12 epochs, while larger datasets may achieve optimal performance in 3-6 epochs. This relationship isn’t linear and careful monitoring of validation metrics remains essential to prevent overfitting while ensuring sufficient model adaptation.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126420" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/16/ml-20097-image-4.jpg" alt="" width="2560" height="1025"&gt;&lt;/p&gt; 
&lt;h3&gt;BatchSize&lt;/h3&gt; 
&lt;p&gt;This parameter controls how many prompts are processed before the updated model generates a new round of candidate responses (rollouts). For example, with a &lt;code&gt;batchSize&lt;/code&gt; of 128, the model processes, updates, and generates new rollouts for 128 prompts at a time until it has worked through the full dataset. The total number of rollout rounds equals the (filtered) dataset size divided by batchSize.&lt;br&gt; A &lt;code&gt;batchSize&lt;/code&gt; of 128 works well for most use cases and models. Increase it if loss is erratic or reward isn’t improving. Decrease it if iterations take too long.&lt;/p&gt; 
&lt;h3&gt;LearningRate&lt;/h3&gt; 
&lt;p&gt;In Amazon Bedrock RFT, we perform parameter-efficient RFT using Low Rank Adaptation (LoRA) adapters with a rank of 32. Across a range of use cases, a learning rate of 1e-4 has consistently produced strong results. In the following experiment, we swept learning rates across seven orders of magnitude on Qwen3-1.7B using the GSM8K dataset (1K training samples, 256 test samples), running a single epoch with batch size 64, group size 16, and LoRA rank 1.As shown in the following figure, LoRA’s optimal learning rate peaks around 1e-4 to 1e-3, approximately one order of magnitude higher than full fine-tuning (FFT). Even with a rank of 1, LoRA achieves within ~5.5% of FFT’s best validation reward at roughly the same wall-clock time. In practice, LoRA-based RFT tends to be more forgiving and performs well across a wider range of learning rates than FFT, though both approaches can collapse outside their optimal ranges. We recommend monitoring reward curves closely and lowering the learning rate if they begin to oscillate or collapse.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126421" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/16/ml-20097-image-5.jpg" alt="" width="2560" height="1049"&gt;&lt;/p&gt; 
&lt;h3&gt;Prompt&amp;nbsp;length and response length&lt;/h3&gt; 
&lt;p&gt;The&amp;nbsp;&lt;code&gt;maxPromptLength&lt;/code&gt; defines the maximum allowed length for input prompt in the dataset. Prompts exceeding this limit are filtered out during training. If your dataset contains unusually long prompts or other outliers, set an appropriate value that excludes outliers while retaining most samples. Otherwise, you can set it to the length of the longest prompt in your dataset. On the other hand, &lt;code&gt;inferenceMaxTokens&lt;/code&gt; defines the maximum response length for any rollout or response generated during RL training. You can use this argument to control whether the resulting model generates detailed outputs or concise answers. We recommend that you choose a value based on the requirements of your task. An excessively large value can increase training time while a too small value could degrade model performance. For the tasks that don’t require complex reasoning, setting the maximum response length to 1,024 is typically sufficient. In contrast, for challenging tasks like coding or long-form generation, using a larger upper bound (more than 4,096) is preferable.&lt;/p&gt; 
&lt;h3&gt;Early stopping and evaluation interval&lt;/h3&gt; 
&lt;p&gt;Our RFT service provides two features that optimize training efficiency and model quality. &lt;code&gt;EarlyStopping&lt;/code&gt;&amp;nbsp;(enabled by default) automatically stops training when performance improvements plateau, preventing overfitting and reducing unnecessary computation costs. The system continuously monitors validation metrics and terminates training after it detects that further iterations are unlikely to yield meaningful improvements. Meanwhile, &lt;code&gt;evalInterval&lt;/code&gt; determines how frequently the model evaluates its performance on the validation dataset during training. This hyperparameter is automatically calculated as &lt;code&gt;min(10, data_size/batch_size)&lt;/code&gt;, maintaining at least one evaluation per epoch while maintaining reasonable frequency. For datasets where &lt;code&gt;data_size&lt;/code&gt; significantly exceeds &lt;code&gt;10×batch_size&lt;/code&gt;, evaluations typically occur every 10 steps, providing sufficient monitoring granularity without excessive overhead.&lt;/p&gt; 
&lt;h2&gt;RFT metrics and their meaning&lt;/h2&gt; 
&lt;p&gt;Amazon Bedrock&amp;nbsp;exposes several training metrics through Amazon CloudWatch and the Amazon Bedrock console that give you a clear picture of whether your RFT job is progressing as expected. Understanding what each metric represents and what anomalies to watch for makes the difference between catching a problem early and waiting hours for a failed run to finish.&lt;/p&gt; 
&lt;h3&gt;Training and validation rewards&lt;/h3&gt; 
&lt;p&gt;The training reward is the average reward on the episodes that you’re training on. The validation reward is the same metric on a held-out set of prompts that don’t contribute gradients. In a healthy run, train reward should climb steadily early on, with validation reward rising more slowly but in the same general direction.&lt;/p&gt; 
&lt;h3&gt;Train and validation episode lengths&lt;/h3&gt; 
&lt;p&gt;These encode the average number of tokens generated per response. Use this to detect verbosity hacking. If lengths explode while rewards increase, the model has learned that longer = better regardless of quality. In reasoning tasks (like Chain Of Thought (CoT)), a gradual increase is healthy (learning to think), but a sudden vertical spike usually indicates a loop or failure. In some cases, you will see a gradual decrease, and that is fine too. That could mean that the model was initially exploring more to get to the answer, but later figures out shorter yet rewarding trajectories.&lt;/p&gt; 
&lt;h3&gt;Policy entropy&lt;/h3&gt; 
&lt;p&gt;Policy entropy measures how confident the model is in its outputs. High entropy means the model is uncertain and still exploring, while low entropy means it’s converging on consistent responses. Over a healthy training run, you’d expect a gentle decline from the initial baseline to a stable plateau as the model learns. A sharp drop to near zero is a warning sign: it typically means that the model has collapsed into repeating a single response rather than reasoning through problems. On the other end, a flat line at a persistently high value suggests the model is ignoring the reward signal entirely and not learning from feedback.&lt;/p&gt; 
&lt;h3&gt;Gradient norm&lt;/h3&gt; 
&lt;p&gt;The magnitude (L2 norm) of the gradients applied to the model at each update. In a stable run it fluctuates within a reasonable band, with occasional spikes; sustained growth or extreme spikes can indicate issues with learning rate, reward scaling, or numeric stability.&lt;/p&gt; 
&lt;h2&gt;Common pitfalls&lt;/h2&gt; 
&lt;p&gt;Even well-configured RFT jobs can run into failure modes that aren’t always obvious from the metrics alone. The two most common are reward hacking—where the model learns to game the reward function rather than improve at the actual task—and reward instability, where high variance in the reward signal undermines the learning process. Both are recoverable, but easier to address if you know what to look for.&lt;/p&gt; 
&lt;h3&gt;Reward hacking&lt;/h3&gt; 
&lt;p&gt;This occurs when the policy learns to exploit weaknesses in the reward function to maximize scores without improving quality. You will see training rewards climb steadily while human evaluation scores degrade or plateau. To mitigate this, ensure that the reward function captures all aspects of the behavior you want encoded through fine-tuning. If not, observe the model generations, and iterate on the reward function. Use strict length penalties in the reward function if needed.&lt;/p&gt; 
&lt;h3&gt;Reward variance and instability&lt;/h3&gt; 
&lt;p&gt;Even with a good average reward, high fluctuation in scores for similar inputs creates a noisy signal that destabilizes training. This manifests as jittery reward curves and wildly oscillating loss metrics. The first line of defense is rigorous normalization: standardize rewards (zero mean, unit variance) within every batch, clip extreme outliers, and ensure your reward inference is deterministic (no dropout), so the optimizer receives a consistent and stable learning signal.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we demonstrated how to apply Reinforcement Fine-Tuning (RFT) in Amazon Bedrock to improve model performance using feedback-driven training. Using the GSM8K mathematical reasoning dataset as a concrete example, we showed where RFT is most effective, how to structure training datasets, and how to design reward functions that reliably evaluate model outputs. We also explored how to monitor training progress using Bedrock’s training metrics and provided practical hyperparameter tuning guidelines informed by experiments across multiple models and use cases. Together, these components form the core foundation for running successful RFT workflows. When datasets are well structured, reward functions capture the right notion of quality, and training metrics are monitored carefully. RFT can significantly improve model performance across both verifiable tasks (such as reasoning, coding, and structured extraction) and subjective tasks using AI feedback.&lt;/p&gt; 
&lt;h2&gt;Next steps&lt;/h2&gt; 
&lt;p&gt;Ready to start customizing with RFT in Amazon Bedrock? Log in to the &lt;a href="https://console.aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock console&lt;/a&gt;&amp;nbsp;or review the official AWS API docs&amp;nbsp;and create your first RFT training job using the open source models that were fine-tuned for this use-case.&lt;/p&gt; 
&lt;p&gt;To begin:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Explore the Documentation&lt;/strong&gt;: Visit the comprehensive guides and tutorials: &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/rft-submit-job.html" target="_blank" rel="noopener noreferrer"&gt;Create a reinforcement fine-tuning job&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Try the Sample Notebooks&lt;/strong&gt;: Access ready-to-run examples in the &lt;a href="https://github.com/aws-samples/amazon-bedrock-samples/tree/main/custom-models/bedrock-reinforcement-fine-tuning" target="_blank" rel="noopener noreferrer"&gt;AWS Samples GitHub repository&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Experiment with your own workloads – &lt;/strong&gt;Apply the dataset preparation, reward design, and hyperparameter tuning practices covered in this post to your own use cases.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Acknowledgement&lt;/h3&gt; 
&lt;p&gt;Thank you to the contributions from the Amazon Bedrock Applied Scientist team, Zhe Wang and Wei Zhu, who’s experimental work served as the foundation for many of the best practices listed in this blog post.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-125538 size-full" style="font-size: 16px" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/Nick_blog_pic.jpeg" alt="" width="100" height="122"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Nick McCarthy&lt;/h3&gt; 
  &lt;p&gt;Nick McCarthy is a Senior Generative AI Specialist Solutions Architect on the Amazon Bedrock team, based out of the AWS New York office. He helps customers customize their GenAI models on AWS. He has worked with clients across a wide range of industries — including healthcare, finance, sports, telecommunications, and energy — helping them accelerate business outcomes through the use of AI and machine learning. He holds a Bachelor’s degree in Physics and a Master’s degree in Machine Learning from UCL, London.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-20108 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2020/12/19/Shreyas-Subramanian.png" alt="" width="100" height="134"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Shreyas Subramanian&lt;/h3&gt; 
  &lt;p&gt;Shreyas Subramanian&amp;nbsp;is a Principal Data Scientist and helps customers by using Generative AI and deep learning to solve their business challenges using AWS services like Amazon Bedrock and AgentCore. Dr. Subramanian contributes to cutting-edge research in deep learning, Agentic AI, foundation models and optimization techniques with several books, papers and patents to his name. In his current role at Amazon, Dr. Subramanian works with various science leaders and research teams within and outside Amazon, helping to guide customers to best leverage state-of-the-art algorithms and techniques to solve business critical problems. Outside AWS, Dr. Subramanian is a expert reviewer for AI papers and funding via organizations like Neurips, ICML, ICLR, NASA and NSF.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-126449" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/17/sapana.jpeg" alt="" width="1365" height="2048"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sapana Chaudhary&lt;/h3&gt; 
  &lt;p&gt;Sapana Chaudhary is an Applied Scientist II at Amazon Web Services (AWS), where she works on reinforcement learning post-training of large language models. Her research sits at the intersection of reinforcement learning, robustness, and language models — with the goal to make AI systems more reliable and dependable for downstream tasks — whether through constrained optimization, risk-aware finetuning, or verifiable reasoning. Sapana holds a PhD from Texas A&amp;amp;M University (TAMU). Outside of work, she likes to hike, cook, paint, and photograph.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-126450" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/17/jennifer.jpeg" alt="" width="1612" height="2417"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Jennifer Zhu&lt;/h3&gt; 
  &lt;p&gt;Jennifer Zhu is an Applied Science Manager at AWS, where she leads the model customization services including Reinforcement Fine-tuning on Amazon Bedrock. At AWS, Jennifer works on LLM fine-tuning and distillation, with a focus on building production-grade infrastructure for model post-training at scale. Jennifer holds a PhD degree from Cornell University, and a master degree from University of San Francisco. Outside of work, she enjoys reading books and watching tennis games.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Manage AI costs with Amazon Bedrock Projects</title>
		<link>https://aws.amazon.com/blogs/machine-learning/manage-ai-costs-with-amazon-bedrock-projects/</link>
					
		
		<dc:creator><![CDATA[Ba'Carri Johnson]]></dc:creator>
		<pubDate>Tue, 07 Apr 2026 23:32:00 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[AWS Cost and Usage Report]]></category>
		<category><![CDATA[AWS Cost Explorer]]></category>
		<category><![CDATA[Billing & Account Management]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">105134ae0a251deebb4b6f30757ebf9139378b27</guid>

					<description>With Amazon Bedrock Projects, you can attribute inference costs to specific workloads and analyze them in AWS Cost Explorer and AWS Data Exports. In this post, you will learn how to set up Projects end-to-end, from designing a tagging strategy to analyzing costs.</description>
										<content:encoded>&lt;p&gt;As organizations scale their AI workloads on Amazon Bedrock, understanding what’s driving spending becomes critical. Teams might need to perform chargebacks, investigate cost spikes, and guide optimization decisions, all of which require cost attribution at the workload level.&lt;/p&gt; 
&lt;p&gt;With &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/projects.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Projects&lt;/a&gt;, you can attribute inference costs to specific workloads and analyze them in AWS Cost Explorer and AWS Data Exports. In this post, you will learn how to set up Projects end-to-end, from designing a tagging strategy to analyzing costs.&lt;/p&gt; 
&lt;h2&gt;How Amazon Bedrock Projects and cost allocation work&lt;/h2&gt; 
&lt;p&gt;A project on Amazon Bedrock is a logical boundary that represents a workload, such as an application, environment, or experiment. To attribute the cost of a project, you attach resource tags and pass the project ID in your API calls. You can then activate the &lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html" target="_blank" rel="noopener noreferrer"&gt;cost allocation tags&lt;/a&gt; in AWS Billing to filter, group, and analyze spend in AWS Cost Explorer and AWS Data Exports.&lt;/p&gt; 
&lt;p&gt;The following diagram illustrates the end-to-end flow:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-127921 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/ML-20677-image-1.png" alt="Amazon Bedrock Projects cost attribution architecture showing flow from user API calls through tagged projects to AWS billing and cost management tools" width="2280" height="740"&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;em&gt;Figure 1: End-to-end cost attribution flow with Amazon Bedrock Projects&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Amazon Bedrock Projects support the OpenAI-compatible APIs: &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html#bedrock-mantle-responses" target="_blank" rel="noopener noreferrer"&gt;Responses API&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/inference-chat-completions.html" target="_blank" rel="noopener noreferrer"&gt;Chat Completions API&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Requests without a project ID are automatically associated with the default project in your AWS account.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;To follow along with the steps in this post, you need:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Access to Amazon Bedrock with the OpenAI SDK. See &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Quickstart&lt;/a&gt; to get started.&lt;/li&gt; 
 &lt;li&gt;IAM permissions for Amazon Bedrock Projects, inference, and tagging. For this example, you can attach the AWS managed policy &lt;a href="https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonBedrockMantleFullAccess.html" target="_blank" rel="noopener noreferrer"&gt;AmazonBedrockMantleFullAccess&lt;/a&gt;. For production, see &lt;a href="https://aws.amazon.com/blogs/security/implementing-least-privilege-access-for-amazon-bedrock/" target="_blank" rel="noopener noreferrer"&gt;Implementing least privilege for Amazon Bedrock&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Access to the &lt;a href="https://console.aws.amazon.com/costmanagement/" target="_blank" rel="noopener noreferrer"&gt;AWS Billing and Cost Management console.&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Define your tagging strategy&lt;/h2&gt; 
&lt;p&gt;The tags that you attach to projects become the dimensions that you can filter and group by in your cost reports. We recommend that you plan these before creating your first project. A common approach is to tag by application, environment, team, and cost center:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Tag key&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Example values&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Application&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Which workload or service&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerChatbot, Experiments, DataAnalytics&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Environment&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lifecycle stage&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Production, Development, Staging, Research&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Team&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Ownership&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerExperience, PlatformEngineering, DataScience&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CostCenter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Finance mapping&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CC-1001, CC-2002, CC-3003&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;For more guidance on building a cost allocation strategy, see &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/tagging-best-practices/building-a-cost-allocation-strategy.html" target="_blank" rel="noopener noreferrer"&gt;Best Practices for Tagging AWS Resources&lt;/a&gt;. With your tagging strategy defined, you’re ready to create projects and start attributing costs.&lt;/p&gt; 
&lt;h2&gt;Create a project&lt;/h2&gt; 
&lt;p&gt;With your tagging strategy and permissions in place, you can create your first project. Each project has its own set of cost allocation tags that flow into your billing data. The following example shows how to create a project using the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/projects.html" target="_blank" rel="noopener noreferrer"&gt;Projects API&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;First, install the required dependencies:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;$ pip3 install openai requests&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Create a project with your tag taxonomy:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The OpenAI SDK uses the &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; environment variable. Set this to your Bedrock API key.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;import os
import requests

# Configuration
BASE_URL = "https://bedrock-mantle.&amp;lt;YOUR-REGION-HERE&amp;gt;.api.aws/v1"
API_KEY  = os.environ.get("OPENAI_API_KEY")  # Your Amazon Bedrock API key

def create_project(name: str, tags: dict) -&amp;gt; dict:
    """Create a Bedrock project with cost allocation tags."""
    response = requests.post(
        f"{BASE_URL}/organization/projects",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json"
        },
        json={"name": name, "tags": tags}
    )

    if response.status_code != 200:
        raise Exception(
            f"Failed to create project: {response.status_code} - {response.text}"
        )

    return response.json()

# Create a production project with full tag taxonomy
project = create_project(
    name="CustomerChatbot-Prod",
    tags={
        "Application": "CustomerChatbot",
        "Environment": "Production",
        "Team":        "CustomerExperience",
        "CostCenter":  "CC-1001",
        "Owner":       "alice"
    }
)
print(f"Created project: {project['id']}")&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The API returns the project details, including the project ID and ARN:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;{
  "id": "proj_123",
  "arn": "arn:aws:bedrock-mantle:&amp;lt;YOUR-REGION-HERE&amp;gt;:&amp;lt;YOUR-ACCOUNT-ID-HERE&amp;gt;:project/&amp;lt;YOUR-PROJECT-ID&amp;gt;"
}&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Save the project ID. You will use it to associate inference requests in the next step. The ARN is used for IAM policy attachment if you must restrict access to this project. Repeat this for each workload. The following table shows a sample project structure for an organization with three applications:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Project name&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Application&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Environment&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Team&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Cost Center&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerChatbot-Prod&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerChatbot&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Production&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerExperience&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CC-1001&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerChatbot-Dev&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerChatbot&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Development&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CustomerExperience&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CC-1001&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Experiments-Research&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Experiments&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Production&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;PlatformEngineering&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CC-2002&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DataAnalytics-Prod&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DataAnalytics&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Production&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;DataScience&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CC-3003&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;You can create up to 1,000 projects per AWS account to fit your organization’s needs.&lt;/p&gt; 
&lt;h2&gt;Associate inference requests with your project&lt;/h2&gt; 
&lt;p&gt;With your projects created, you can associate inference requests by passing the project ID in your API calls. The following example uses the Responses API:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;from openai import OpenAI

client = OpenAI(
    base_url="https://bedrock-mantle.&amp;lt;YOUR-REGION-HERE&amp;gt;.api.aws/v1",
    project="&amp;lt;YOUR-PROJECT-ID&amp;gt;", # ID returned when you created the project
)
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Summarize the key findings from our Q4 earnings report."
)
print(response.output_text)&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;To maintain clean cost attribution, always specify a project ID in your API calls rather than relying on the default project.&lt;/p&gt; 
&lt;h2&gt;Activate cost allocation tags&lt;/h2&gt; 
&lt;p&gt;Before your project tags appear in cost reports, you must activate them as cost allocation tags in AWS Billing. This one-time setup connects your project tags to the billing pipeline. For more information about &lt;a href="https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/custom-tags.html" target="_blank" rel="noopener noreferrer"&gt;activating cost allocation tags&lt;/a&gt;, see the AWS Billing documentation.&lt;/p&gt; 
&lt;p&gt;It can take up to 24 hours for tags to propagate to AWS Cost Explorer and AWS Data Exports. You can activate your tags immediately after creating your first project to avoid gaps in cost data.&lt;/p&gt; 
&lt;h2&gt;View project costs&lt;/h2&gt; 
&lt;p&gt;With projects created, inference requests tagged, and cost allocation tags activated, you can see exactly where your Amazon Bedrock spend is going. Every dimension that you defined in your taxonomy is now available as a filter or grouping in your AWS Billing cost reports.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;AWS Cost Explorer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;AWS Cost Explorer provides the fastest way to visualize your costs by project. Complete the following steps to review your costs by project:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the AWS Billing and Cost Management console and choose &lt;strong&gt;Cost Explorer&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;In the Filters pane, expand &lt;strong&gt;Service&lt;/strong&gt; and select &lt;strong&gt;Amazon&lt;/strong&gt; &lt;strong&gt;Bedrock&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Under &lt;strong&gt;Group by&lt;/strong&gt;, select &lt;strong&gt;Tag&lt;/strong&gt; and choose your tag key (for example, &lt;strong&gt;Application&lt;/strong&gt;).&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127922" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/ML-20677-image-2.png" alt="Amazon Bedrock AWS Cost Explorer projects view" width="3026" height="2236"&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;&lt;em&gt;Figure 2: Cost Explorer showing daily Amazon Bedrock spending grouped by the Application tag&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;For more ways to refine your view, see &lt;a href="https://docs.aws.amazon.com/cost-management/latest/userguide/ce-what-is.html" target="_blank" rel="noopener noreferrer"&gt;Analyzing your costs and usage with AWS Cost Explorer&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;For more granular analysis and line-item detail with your project tags, see &lt;a href="https://docs.aws.amazon.com/cur/latest/userguide/dataexports-create.html" target="_blank" rel="noopener noreferrer"&gt;Creating Data Exports&lt;/a&gt; in the AWS Billing documentation.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;With Amazon Bedrock Projects, you can attribute costs to individual workloads and track spending using the AWS tools that your organization already relies on. As your workloads scale, use the tagging strategy and cost visibility patterns covered in this post to maintain accountability across teams and applications.&lt;/p&gt; 
&lt;p&gt;For more information, see &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/projects.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Projects&lt;/a&gt; documentation and the &lt;a href="https://docs.aws.amazon.com/cost-management/latest/userguide/what-is-costmanagement.html" target="_blank" rel="noopener noreferrer"&gt;AWS Cost Management User Guide&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-thumbnail wp-image-127920 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/ML-20677-bacarri-johnson-100x130.png" alt="Portrait of Ba'Carri Johnson, author and AWS expert" width="100" height="130"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ba’Carri Johnson&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Ba’Carri Johnson&lt;/strong&gt; is a Sr. Technical Product Manager on the Amazon Bedrock team, focusing on cost management and governance for AWS AI. With a background in AI infrastructure, computer science, and strategy, she is passionate about product innovation and helping organizations scale AI responsibly. In her spare time, she enjoys traveling and exploring the great outdoors.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-127924 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/ML-20677-vadim-omeltchenko.png" alt="Portrait of Vadim Omeltchenko, author and AWS expert" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Vadim Omeltchenko&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Vadim Omeltchenko&lt;/strong&gt; is a Sr. Amazon Bedrock Go-to-Market Solutions Architect who is passionate about helping AWS customers innovate in the cloud.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127919 alignnone alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/ML-20677-ajit-mahareddy.png" alt="Portrait of Ajit Mahareddy, author and AWS expert" width="100" height="116"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ajit Mahareddy&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Ajit Mahareddy&lt;/strong&gt; is an experienced Product and Go-To-Market (GTM) leader with over 20 years of experience in product management, engineering, and go-to-market. Prior to his current role, Ajit led product management building AI/ML products at leading technology companies, including Uber, Turing, and eHealth. He is passionate about advancing generative AI technologies and driving real-world impact with generative AI.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127923 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/07/ML-20677-sofian-hamiti.png" alt="Portrait of Sofian Hamiti, author and AWS expert" width="100" height="116"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sofian Hamiti&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Sofian Hamiti&lt;/strong&gt; is a technology leader with over 12 years of experience building AI solutions, and leading high-performing teams to maximize customer outcomes. He is passionate in empowering diverse talent to drive global impact and achieve their career aspirations.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building real-time conversational podcasts with Amazon Nova 2 Sonic</title>
		<link>https://aws.amazon.com/blogs/machine-learning/building-real-time-conversational-podcasts-with-amazon-nova-2-sonic/</link>
					
		
		<dc:creator><![CDATA[Madhavi Evana]]></dc:creator>
		<pubDate>Tue, 07 Apr 2026 16:29:11 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Nova]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">7e1604bebed8710613b18f0a0f01ca7c53914508</guid>

					<description>This post walks through building an automated podcast generator that creates engaging conversations between two AI hosts on any topic, demonstrating the streaming capabilities of Nova Sonic, stage-aware content filtering, and real-time audio generation.</description>
										<content:encoded>&lt;p&gt;Content creators and organizations today face a persistent challenge: producing high-quality audio content at scale. Traditional podcast production requires significant time investment (research, scheduling, recording, editing) and substantial resources including studio space, equipment, and voice talent. These constraints limit how quickly organizations can respond to new topics or scale their content production. Amazon Nova 2 Sonic is a state-of-the-art speech understanding and generation model that delivers natural, human-like conversational AI with low latency and industry-leading price-performance. It provides streaming speech understanding, instruction following, tool invocation, and cross-modal interaction that seamlessly switches between voice and text. Supporting seven languages with up to 1M token context windows, developers can use Amazon Nova 2 Sonic to build voice-first applications for customer support, interactive learning, and voice-enabled assistants.&lt;/p&gt; 
&lt;p&gt;This post walks through building an automated podcast generator that creates engaging conversations between two AI hosts on any topic, demonstrating the streaming capabilities of Nova Sonic, stage-aware content filtering, and real-time audio generation.&lt;/p&gt; 
&lt;h2&gt;What is Amazon Nova 2 Sonic?&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/nova/models/" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova 2 Sonic &lt;/a&gt;processes speech input and delivers speech output and text transcriptions, creating human-like conversations with rich contextual understanding. Amazon Nova 2 Sonic provides a streaming API for real-time, low-latency multi-turn conversations, so developers can build voice-first applications where speech drives app navigation, workflow automation, and task completion.&lt;/p&gt; 
&lt;p&gt;The model is accessible through &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; and can be integrated with key Amazon Bedrock features, including Guardrails, Agents, multimodal RAG, and Knowledge Bases for seamless interoperability across the platform.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Key capabilities:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Streaming Speech Understanding –&lt;/strong&gt; Process and respond to speech in real-time with low latency&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Instruction Following –&lt;/strong&gt; Execute complex multi-step voice commands&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Tool Invocation:&lt;/strong&gt; Call external functions and APIs during conversations&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cross-Modal Interaction –&lt;/strong&gt; Seamlessly switch between voice and text I/O&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Multilingual Support –&lt;/strong&gt; Native support for English, French, Italian, German, Spanish, Portuguese, and Hindi&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Large Context Window –&lt;/strong&gt; Up to 1M tokens for maintaining extended conversation context&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Understanding the challenge&lt;/h2&gt; 
&lt;p&gt;Podcasts have experienced explosive growth, evolving from a niche medium to mainstream content format. This surge comes from podcasts’ unique ability to deliver information during multitasking activities (commuting, exercising, household tasks) providing an accessibility advantage that visual content can’t match.&lt;/p&gt; 
&lt;p&gt;However, traditional podcast production faces structural challenges:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Content Scalability:&lt;/strong&gt; Human hosts require extensive time for research, scheduling, recording, and post-production, limiting output frequency and volume.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Consistency:&lt;/strong&gt; Human hosts face scheduling conflicts, illness, varying energy levels, and availability constraints that create irregular publishing schedules.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Personalization:&lt;/strong&gt; Traditional podcasts follow a one-size-fits-all model, unable to tailor content to individual listeners for interests or knowledge levels in real-time.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Resource Efficiency:&lt;/strong&gt; Quality production requires significant ongoing investment in talent, equipment, editing software, and operational overhead.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Expert Access:&lt;/strong&gt; Securing knowledgeable hosts across diverse topics remains challenging and expensive, restricting content breadth and depth.&lt;/p&gt; 
&lt;p&gt;By using the conversational AI capabilities of Amazon Nova Sonic, organizations can address these limitations and enable new interactive and personalized audio content formats that scale globally without traditional human resource constraints.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;The Nova Sonic Live Podcast Generator demonstrates how to create natural conversations between AI hosts about any topic using the speech-to-speech model of Amazon Nova Sonic. Users enter a topic through a web interface, and the application generates a multi-round dialogue with alternating speakers streamed in real-time.&lt;/p&gt; 
&lt;h3&gt;Key features&lt;/h3&gt; 
&lt;ol&gt; 
 &lt;li&gt;Real-time streaming audio generation with low latency&lt;/li&gt; 
 &lt;li&gt;Natural back-and-forth dialogue across multiple conversational turns&lt;/li&gt; 
 &lt;li&gt;Stage-aware content filtering that removes duplicate audio&lt;/li&gt; 
 &lt;li&gt;Simple web interface with live conversation updates&lt;/li&gt; 
 &lt;li&gt;Concurrent user support through AsyncIO architecture&lt;/li&gt; 
 &lt;li&gt;Provides multiple voice personas for different use cases.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;To implement this solution, the following requirements must be met:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;AWS account with access to Amazon Bedrock and Amazon Nova 2 Sonic model&lt;/li&gt; 
 &lt;li&gt;Python 3.8 or later&lt;/li&gt; 
 &lt;li&gt;Flask web framework and AsyncIO&lt;/li&gt; 
 &lt;li&gt;AWS credentials are configured (access key, secret key, AWS Region)&lt;/li&gt; 
 &lt;li&gt;Development environment with pip package manager&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Implementation details&lt;/h3&gt; 
&lt;p&gt;For detailed code samples and complete implementation guidance,&amp;nbsp;&lt;a href="https://github.com/aws-samples/genai-quickstart-pocs/tree/main/genai-quickstart-pocs-python/amazon-bedrock-nova-s2s-live-podcasting-poc" target="_blank" rel="noopener noreferrer"&gt;view&amp;nbsp;in&amp;nbsp;GitHub&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Architecture overview&lt;/h2&gt; 
&lt;p&gt;The solution follows a Flask-based architecture with streaming and reactive event processing, designed to demonstrate the capabilities of Amazon Nova Sonic for proof-of-concept and educational purpose.&lt;/p&gt; 
&lt;h3&gt;System architecture diagram&lt;/h3&gt; 
&lt;p&gt;The following diagram illustrates the real-time streaming architecture:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127174" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/26/ML-19828-image-1.png" alt="" width="896" height="526"&gt;&lt;/p&gt; 
&lt;h3&gt;Architecture components&lt;/h3&gt; 
&lt;p&gt;The architecture follows a layered approach with clear separation of concerns:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Client Application&lt;/strong&gt; hosts three tightly coupled components that manage the full audio lifecycle:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;PyAudio Engine&lt;/strong&gt; captures microphone input at &lt;strong&gt;16kHz PCM&lt;/strong&gt; and streams it to Amazon Bedrock. It also receives playback-ready audio from the Audio Output Queue at &lt;strong&gt;24kHz PCM&lt;/strong&gt;, handling speaker output in real time.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Response Processor&lt;/strong&gt; receives the raw response stream returned by Amazon Nova Sonic, decodes the &lt;strong&gt;Base64-encoded audio payload&lt;/strong&gt;, and forwards the decoded audio to the Audio Output Queue.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Audio Output Queue&lt;/strong&gt; acts as a buffer between the Response Processor and the PyAudio Engine, absorbing variable-latency responses and ensuring smooth, uninterrupted audio playback at 24kHz PCM.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;AWS Cloud&lt;/strong&gt; – all model communication runs through Amazon Bedrock, which brokers a &lt;strong&gt;bidirectional event stream&lt;/strong&gt; with Amazon Nova Sonic:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt; receives the outbound 16kHz PCM audio stream from the PyAudio Engine and routes it to the model. It also carries the model’s response stream back to the client.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon Nova Sonic&lt;/strong&gt; receives the audio input through the bidirectional stream, performs real-time speech-to-speech inference, and returns a response stream containing synthesized audio encoded as Base64 PCM at 24kHz.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Production Architecture Note:&lt;/em&gt;&lt;/strong&gt;&lt;em&gt; This implementation uses Flask with PyAudio for demonstration purposes. PyAudio does not provide built-in echo cancellation and is best suited for server-side audio playback. For production web-based client applications, JavaScript-based audio libraries (Web Audio API) or WebRTC are recommended for browser-native audio handling with better echo cancellation and lower latency. See the GitHub repository for production architecture patterns.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Key technical innovations&lt;/h2&gt; 
&lt;h3&gt;Amazon Bedrock integration&lt;/h3&gt; 
&lt;p&gt;At the heart of the system is the &lt;code&gt;BedrockStreamManager&lt;/code&gt;, a custom component that manages persistent connections to the Amazon Nova 2 Sonic model. This manager handles the complexities of streaming API interactions, including initialization, message sending, and response processing. AWS credentials that are configured through environment variables maintains secure access to the foundation model (FM). The full code is in the &lt;a href="https://github.com/aws-samples/genai-quickstart-pocs/tree/main/genai-quickstart-pocs-python/amazon-bedrock-nova-s2s-live-podcasting-poc" target="_blank" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;# Initialize BedrockStreamManager for each conversation turn

manager = BedrockStreamManager(
&amp;nbsp;&amp;nbsp;&amp;nbsp; model_id='amazon.nova-sonic-v1:0',
&amp;nbsp;&amp;nbsp;&amp;nbsp; region='us-east-1'
)

# Configure voice persona (Matthew or Tiffany)

manager.START_PROMPT_EVENT = manager.START_PROMPT_EVENT.replace(
&amp;nbsp;&amp;nbsp;&amp;nbsp; '"matthew"', f'"{voice}"'
)

# Initialize streaming connection
await manager.initialize_stream()&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Reactive streaming pipeline&lt;/h3&gt; 
&lt;p&gt;The application employs RxPy (Reactive Extensions for Python) to implement an observable pattern for handling real-time data streams. This reactive architecture processes audio chunks and text tokens as they arrive from Amazon Nova Sonic, rather than waiting for complete responses.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;# Subscribe to streaming events from BedrockStreamManager

manager.output_subject.subscribe(on_next=capture)

# Capture function processes events in real-time

def capture(event):
&amp;nbsp;&amp;nbsp;&amp;nbsp; if 'textOutput' in event['event']:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; text = event['event']['textOutput']['content']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; text_parts.append(text)
&amp;nbsp;&amp;nbsp;&amp;nbsp; if 'audioOutput' in event['event']:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; audio_chunks.append(event['event']['audioOutput']['content'])&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The &lt;code&gt;output_subject&lt;/code&gt; in the &lt;code&gt;BedrockStreamManager&lt;/code&gt; acts as the central event bus, so multiple subscribers can react to streaming events simultaneously. This design choice reduces latency and improves the user experience by providing immediate feedback.&lt;/p&gt; 
&lt;h3&gt;Stage-aware content filtering&lt;/h3&gt; 
&lt;p&gt;One of the key technical innovations in this implementation is the stage-aware filtering mechanism. Amazon Nova 2 Sonic generates content in multiple stages: SPECULATIVE (preliminary) and FINAL (polished). The application implements an intelligent filtering logic that monitors &lt;code&gt;contentStart&lt;/code&gt; events for generation stage metadata. It captures only FINAL stage content to remove duplicate or preliminary audio, and prevents audio artifacts for clean, natural-sounding output.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def capture(event):
&amp;nbsp;&amp;nbsp;&amp;nbsp; nonlocal is_final_stage
&amp;nbsp;&amp;nbsp;&amp;nbsp; if 'event' in event:

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Detect generation stage from contentStart event
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 'contentStart' in event['event']:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; content_start = event['event']['contentStart']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 'additionalModelFields' in content_start:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; additional_fields = json.loads(content_start['additionalModelFields'])
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; stage = additional_fields.get('generationStage', 'FINAL')
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; is_final_stage = (stage == 'FINAL')

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Only capture content in FINAL stage
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if is_final_stage:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 'textOutput' in event['event']:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; text = event['event']['textOutput']['content']
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if text and '{ "interrupted" : true }' not in text:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; text_parts.append(text)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 'audioOutput' in event['event']:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; audio_chunks.append(event['event']['audioOutput']['content'])&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The filtering operates at three levels:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Interrupted Content Filter&lt;/strong&gt; – Removes canceled content by checking for interruption markers.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Text Deduplication&lt;/strong&gt; – Filters exact duplicate text across SPECULATIVE and FINAL stages.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Audio Hash Deduplication&lt;/strong&gt; – Filters duplicate audio chunks using hash fingerprinting.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This filtering happens in real-time within the capture callback function, which subscribes to the output stream and selectively processes events based on generation stage.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note:&lt;/em&gt;&lt;/strong&gt;&lt;em&gt; The code snippets shown are simplified for clarity. The &lt;code&gt;is_final_stage&lt;/code&gt; variable must be defined in the enclosing scope. See the GitHub repository for complete, production-ready implementations.&lt;/em&gt;&lt;/p&gt; 
&lt;h3&gt;Conversation management&lt;/h3&gt; 
&lt;p&gt;The system implements a turn-based conversation model with multiple rounds of dialogue. Each turn follows a consistent pattern for natural conversation flow:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Conversation History –&lt;/strong&gt; The application maintains conversation context through speaker-specific variables, so each speaker can reference what was previously said.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Dynamic Prompt Generation –&lt;/strong&gt; Prompts are constructed dynamically based on speaker role and conversation contex, for example, Matthew (host) introduces topics and asks follow-up questions, while Tiffany (expert) provides informed responses.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Fresh Stream Per Turn –&lt;/strong&gt; The application creates a fresh &lt;code&gt;BedrockStreamManager&lt;/code&gt; instance for each speaker turn, preventing state contamination between turns for clean audio streams.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Asynchronous execution model&lt;/h3&gt; 
&lt;p&gt;To handle the blocking nature of audio playback and model API calls, the application creates a new asyncio event loop for each podcast generation request. This way, multiple users can generate podcasts simultaneously without blocking each other. The loop manages stream initialization, prompt sending, audio playback coordination, and cleanup, supporting concurrent usage while maintaining clean separation between user sessions.&lt;/p&gt; 
&lt;h3&gt;Data flow overview&lt;/h3&gt; 
&lt;p&gt;The system follows a streamlined flow from user input to audio output. Users enter a topic, the backend orchestrates conversation turns with dynamic prompt generation, Amazon Nova 2 Sonic generates speech responses through a streaming API, and stage-aware filtering makes sure that only polished FINAL content reaches the audio pipeline for playback.&lt;/p&gt; 
&lt;p&gt;For detailed code samples and complete implementation guidance,&amp;nbsp;&lt;a href="https://github.com/aws-samples/genai-quickstart-pocs/tree/main/genai-quickstart-pocs-python/amazon-bedrock-nova-s2s-live-podcasting-poc" target="_blank" rel="noopener noreferrer"&gt;view&amp;nbsp;in&amp;nbsp;GitHub&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Use cases&lt;/h2&gt; 
&lt;p&gt;The Amazon Nova 2 Sonic architecture enables automated, interactive audio content creation across multiple industries. By orchestrating conversational AI instances in dialogue, organizations can generate engaging, natural-sounding content at scale.&lt;/p&gt; 
&lt;h3&gt;Interactive learning and knowledge sharing&lt;/h3&gt; 
&lt;p&gt;Organizations struggle to create engaging content that helps people learn and retain information, whether for student education or employee training. Amazon Nova 2 Sonic instances can simulate classroom discussions or Socratic dialogues, with one instance posing questions while the other provides explanations and examples.&lt;/p&gt; 
&lt;p&gt;For educational institutions, this creates dynamic learning experiences that accommodate different learning styles and paces. For enterprises, it transforms internal communications (policies, procedures, organizational changes) into conversational formats that employees can consume while multitasking. Integration with Retrieval Augmented Generation (RAG) and Amazon Bedrock Knowledge Bases keeps content current and aligned with curriculum or organizational requirements, while the conversational format increases information retention and reduces follow-up questions.&lt;/p&gt; 
&lt;h3&gt;Multilingual content localization&lt;/h3&gt; 
&lt;p&gt;Global organizations need consistent messaging across markets while respecting cultural nuances. The Amazon Nova Sonic support for &lt;strong&gt;English, French, Italian, German, Spanish, Portuguese, and Hindi&lt;/strong&gt; enables creation of localized audio content with native-sounding conversations. The model can generate market-specific discussions that adapt language, cultural references, and communication styles, going beyond simple translation to produce culturally relevant content that resonates with local audiences.&lt;/p&gt; 
&lt;p&gt;The polyglot voice capabilities – individual voices that can switch between languages within the same conversation – enable advanced code-switching capabilities that handle mixed-language sentences naturally. This is particularly valuable for multilingual customer support and global team collaboration.&lt;/p&gt; 
&lt;h3&gt;Product commentary and reviews&lt;/h3&gt; 
&lt;p&gt;Ecommerce platforms need engaging ways to help customers understand complex products. Amazon Nova 2 Sonic instances can generate conversational product reviews, with one asking common customer questions while the other provides answers based on specifications, user reviews, and technical documentation. This creates accessible content that helps customers evaluate products through natural dialogue, with integration to product catalogs ensuring accuracy.&lt;/p&gt; 
&lt;h3&gt;Thought leadership and industry analysis&lt;/h3&gt; 
&lt;p&gt;Professional services firms need to establish thought leadership through regular content but producing analysis requires significant time investment. Amazon Nova 2 Sonic instances can engage in expert-level discussions about industry trends or market analysis, with one challenging assumptions while the other defends positions with data. This allows organizations to repurpose existing research into accessible audio content that reaches busy executives who prefer audio formats.&lt;/p&gt; 
&lt;h2&gt;Performance characteristics&lt;/h2&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Latency:&lt;/strong&gt; Low-latency streaming with immediate audio playback&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Podcast Duration:&lt;/strong&gt; Flexible duration based on conversational turns (typically 2–5 minutes)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Concurrent Users:&lt;/strong&gt; Supports multiple simultaneous podcast generations through AsyncIO&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Audio Quality:&lt;/strong&gt; Professional-grade speech synthesis with natural intonation and pacing&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Language Support:&lt;/strong&gt; English, French, Italian, German, Spanish, Portuguese, and Hindi&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Context Window:&lt;/strong&gt; Up to 1M tokens for extended conversation context&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Amazon Nova 2 Sonic is a state-of-the-art speech understanding and generation model that enables natural, human-like conversational AI experiences. The architecture outlined in this post provides a practical foundation for building conversational AI applications. Whether streamlining customer support, creating educational content, or generating thought leadership materials, the patterns demonstrated here apply across use cases.&lt;/p&gt; 
&lt;p&gt;With expanded language support, polyglot voice capabilities, enhanced telephony integration, and cross-modal interaction, Amazon Nova 2 Sonic provides organizations with tools for building global, voice-first applications at scale.&lt;/p&gt; 
&lt;p&gt;To get started with building with Amazon Nova Sonic, visit the &lt;a href="https://aws.amazon.com/nova/models/" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova product page&lt;/a&gt;. For comprehensive documentation, explore the &lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/speech.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova 2 Sonic User Guide&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Learn more&lt;/h2&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/nova/models/" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova 2 Sonic Product Page&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/speech.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova 2 Sonic User Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/introducing-amazon-nova-sonic-human-like-voice-conversations-for-generative-ai-applications/" target="_blank" rel="noopener noreferrer"&gt;AWS Blog: Introducing Amazon Nova Sonic&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://github.com/aws-samples/genai-quickstart-pocs/tree/main/genai-quickstart-pocs-python/amazon-bedrock-nova-s2s-live-podcasting-poc" target="_blank" rel="noopener noreferrer"&gt;GitHub Repository: Official AWS samples&lt;/a&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;hr&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-127170" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/26/ML-19828-image-5-100x98.png" alt="" width="101" height="99"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Madhavi Evana&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Madhavi Evana&lt;/strong&gt; is a Solutions Architect at Amazon Web Services,&amp;nbsp;where she guides Enterprise banking customers through their cloud transformation journeys. She specializes in Artificial Intelligence and Machine Learning, with&amp;nbsp;a&amp;nbsp;focus&amp;nbsp;on&amp;nbsp;Speech-to-speech translation, video analysis and synthesis,&amp;nbsp;and&amp;nbsp;natural language processing (NLP) technologies.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-127171 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/26/ML-19828-image-4-100x107.png" alt="" width="100" height="107"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Jeremiah Flom&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Jeremiah Flom&lt;/strong&gt; is a Solutions Architect at AWS, where he helps customers design and build scalable cloud solutions.&amp;nbsp;He’s&amp;nbsp;passionate about exploring how intelligent systems can interact with and navigate the real world through Physical and Embodied AI.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-127173 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/26/ML-19828-image-2-100x100.png" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Dexter Doyle&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Dexter Doyle&lt;/strong&gt; is a Senior Solutions Architect at Amazon Web Services, where he guides customers in designing secure, efficient, and high-quality cloud architectures. A lifelong music enthusiast, he loves helping customers unlock new possibilities with AWS services, with a particular focus on audio workflows.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-127172 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/26/ML-19828-image-3-100x100.png" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Kalindi Vijesh Parekh&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Kalindi Vijesh Parekh&lt;/strong&gt;&amp;nbsp;is a Solutions Architect at Amazon Web Services. As a Solutions Architect, she combines her&amp;nbsp;expertise&amp;nbsp;in analytics, data streaming and AI Engineering with a commitment to helping customers realize their AWS potential.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Text-to-SQL solution powered by Amazon Bedrock</title>
		<link>https://aws.amazon.com/blogs/machine-learning/text-to-sql-solution-powered-by-amazon-bedrock/</link>
					
		
		<dc:creator><![CDATA[Monica Jain]]></dc:creator>
		<pubDate>Tue, 07 Apr 2026 16:28:20 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">018bc0596e0f48845bb01ac8e0b7e4d08fb6602a</guid>

					<description>In this post, we show you how to build a natural text-to-SQL solution using Amazon Bedrock that transforms business questions into database queries&amp;nbsp;and returns actionable answers.</description>
										<content:encoded>&lt;p&gt;Building a text-to-SQL solution using Amazon Bedrock can alleviate one of the most persistent bottlenecks in data-driven organizations: the delay between asking a business question and getting&amp;nbsp;a clear, data-backed answer. You might be familiar with the challenge of navigating competing priorities when your one-time question is waiting in the queue behind higher-impact work. A text-to-SQL solution augments your existing team—business users self-serve routine analytical questions, freeing up technical capacity across the organization for complex, high-value initiatives. Questions like “What is our year-over-year revenue growth by customer segment?” become accessible to anyone, without creating an additional workload for technical teams.&lt;/p&gt; 
&lt;p&gt;Many organizations find that accessing data insights remains a significant bottleneck in business decision-making processes. The traditional approach requires either learning SQL syntax, waiting for technical resources, or settling for pre-built dashboards that might not answer your specific questions.&lt;/p&gt; 
&lt;p&gt;In this post, we show you how to build a natural text-to-SQL solution using &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; that transforms business questions into database queries&amp;nbsp;and returns actionable answers. The model returns not only raw SQL, but executed results synthesized into clear, natural language narratives&amp;nbsp;in seconds rather than hours. We walk you through the architecture, implementation strategies, and lessons learned from deploying this solution at scale. By the end, you will understand how to create your own text-to-SQL system that bridges the gap between business questions and data accessibility.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Why traditional business intelligence falls short&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;It’s worth noting that tools like Amazon Quick already address many self-service analytics needs effectively, including natural language querying of dashboards and automated insight generation. These tools are an excellent fit when your analytics requirements align with structured dashboards, curated datasets, and governed reporting workflows. A custom text-to-SQL solution becomes valuable when users must query across complex, multi-table schemas with deep organizational business logic, domain-specific terminology, and one-time questions beyond what pre-configured dashboard datasets support.&lt;/p&gt; 
&lt;p&gt;Building a text-to-SQL solution surfaces three fundamental challenges that drive the need beyond traditional Business Intelligence (BI) tools:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;The SQL expertise barrier blocks rapid analysis.&lt;/strong&gt; Most business users lack the technical SQL knowledge needed to access complex data. Simple questions often require multi-table joins, temporal calculations, and hierarchical aggregations. This dependency creates bottlenecks where business users wait extended periods for custom reports, while analysts spend valuable time on repetitive query requests rather than strategic analysis.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Even modern BI systems have flexibility boundaries&lt;/strong&gt;. Modern BI tools have made significant strides in natural language querying and self-service analytics. However, these capabilities typically work best within pre-curated semantic layers, governed datasets, or pre-modeled dashboards. When business users need to explore beyond curated boundaries, one-time joins, on-the-fly organization-specific calculations, or querying raw warehouse tables outside the semantic layer, they still face constraints that require technical intervention. A custom text-to-SQL solution fills this gap by operating directly against your data warehouse schema with dynamically retrieved business context, rather than depending on pre-configured semantic models.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Context and semantic understanding create translation gaps.&lt;/strong&gt; Even with SQL access, translating business terminology into correct database queries proves to be challenging. Terms like &lt;em&gt;attainment&lt;/em&gt;, &lt;em&gt;pipeline&lt;/em&gt;, and &lt;em&gt;forecast&lt;/em&gt; each have unique calculation logic, specific data source requirements, and business rules that vary across organizations. Understanding which tables to join, how metrics are defined, and which filters to apply requires deep institutional knowledge that isn’t readily accessible to most users.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;When building your own solution, consider how your system will encode this deep business context (strategic principles, customer segmentation rules, and operational processes), so users can make faster, data-driven decisions without understanding complex database schemas or SQL syntax.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;How it works: The experience&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Before diving into architecture, here’s what the experience looks like from a user’s perspective.&lt;/p&gt; 
&lt;p&gt;A business user enters a question into a conversational interface asking something like,&amp;nbsp;&lt;em&gt;“How is revenue trending this year compared to last year across our top customer segments?”&lt;/em&gt;&amp;nbsp;Behind the scenes, the system does the following in a matter of seconds:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Understands the question.&lt;/strong&gt;&amp;nbsp;It determines whether this is a single-step lookup or a complex question that must be broken into parts. In this case, it recognizes that “revenue trending,” “year-over-year comparison,” and “top customer segments” each require distinct data retrieval steps.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Retrieves business context.&lt;/strong&gt;&amp;nbsp;The system searches a knowledge graph that encodes your organization’s specific metric definitions, business terminology, table relationships, and data rules. It knows what &lt;em&gt;revenue&lt;/em&gt; means in your environment, which tables contain it, and how &lt;em&gt;customer segment&lt;/em&gt; is defined.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Generates and validates SQL.&lt;/strong&gt;&amp;nbsp;The system produces a structured SQL query, validates it for correctness and safety using deterministic checks, and executes it against your data warehouse. If validation catches an issue, it automatically revises and retries without requiring human intervention.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Synthesizes the answer.&lt;/strong&gt;&amp;nbsp;Raw query results are translated back into a natural language narrative with supporting data, giving users both the insight and the transparency to trust it.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;The result is that business users get answers to complex analytical questions in seconds to minutes, with full visibility into the underlying logic. Analysts are relieved from repetitive query work to focus on higher-value strategic analysis.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Solution overview&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;To deliver this experience, the solution combines three core capabilities:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Foundation models (FMs) in &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt;&amp;nbsp;for natural language understanding and SQL generation&lt;/li&gt; 
 &lt;li&gt;Graph Retrieval-Augmented Generation (GraphRAG)&amp;nbsp;for business context retrieval&lt;/li&gt; 
 &lt;li&gt;High-performance data warehouses&amp;nbsp;for fast query execution.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; plays a central role in this architecture by providing both the large language model (LLM) inference layer and the agent orchestration runtime. &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; offers access to a broad selection of FMs, so teams can choose and swap models based on evolving performance, cost, and latency requirements without re-architecting the system.&lt;/p&gt; 
&lt;p&gt;As shown in the architecture diagram,&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/bedrock/agentcore/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Runtime&lt;/a&gt;&amp;nbsp;serves as the central orchestration layer, hosting a&amp;nbsp;supervisor Agent&amp;nbsp;that coordinates the end-to-end workflow. It routes user questions, invoking the GraphRAG Search Tool for context retrieval, enforcing Row-Level Security, triggering SQL generation and validation, and executing queries against a database (Amazon Redshift). The runtime supports multiple entry points, including MCP and HTTP protocols, enabling integration with both embedded analytics surfaces like AWS Quick Sight and custom web interfaces.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/bedrock/agentcore/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt; also provides built-in&amp;nbsp;observability, feeding agent execution traces and performance metrics into &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; for monitoring, debugging, and continuous optimization. This managed runtime alleviates the undifferentiated heavy lifting of building custom agent infrastructure, so teams can focus on business logic, prompt tuning, and domain knowledge enrichment.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The following diagram illustrates how this workflow operates:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignleft size-full wp-image-127517" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/image-3-11.png" alt="" width="1858" height="720"&gt;&lt;/p&gt; 
&lt;p&gt;The architecture operates as an&amp;nbsp;orchestrated multi-agent system&amp;nbsp;with five key stages:&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Stage 1: Question analysis and decomposition&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;When a question arrives, the question processor&amp;nbsp;first classifies it. Straightforward, atomic, fact-based questions like&amp;nbsp;&lt;em&gt;“What was total revenue in Q4?”&lt;/em&gt;, are routed directly to the data retrieval pipeline. Complex or multi-part questions are decomposed into self-contained, independent subquestions that can be processed in parallel by separate agent teams. This decomposition step is what allows the system to handle sophisticated analytical questions that span multiple data domains, time periods, or business dimensions.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Stage 2: Knowledge graph and GraphRAG context retrieval&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;This is where the system solves the&amp;nbsp;context barrier, and it’s the most critical differentiator from naive text-to-SQL approaches.&lt;/p&gt; 
&lt;p&gt;A knowledge graph built on&amp;nbsp;&lt;a href="https://aws.amazon.com/neptune/" target="_blank" rel="noopener noreferrer"&gt;Amazon Neptune&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://aws.amazon.com/opensearch-service/" target="_blank" rel="noopener noreferrer"&gt;Amazon OpenSearch Service&lt;/a&gt;&amp;nbsp;serves as the semantic foundation. It stores your organization’s table ontology and captures the relationships between business entities, metrics, terminology, and organizational hierarchies. Crucially, this graph is enriched with&amp;nbsp;domain knowledge from table owners and subject matter experts for business-specific descriptions, metric definitions, terminology mappings, and classification tags loaded from structured configuration files.&lt;/p&gt; 
&lt;p&gt;When the system processes a question, it performs a&amp;nbsp;lightweight GraphRAG search&amp;nbsp;that works in three phases:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Vector search&amp;nbsp;(using &lt;a href="https://aws.amazon.com/opensearch-service/" target="_blank" rel="noopener noreferrer"&gt;Amazon OpenSearch Service&lt;/a&gt;): Finds semantically relevant column values, column names, and table descriptions that match the concepts in the user’s question.&lt;/li&gt; 
 &lt;li&gt;Graph traversal&amp;nbsp;(using &lt;a href="https://aws.amazon.com/neptune/" target="_blank" rel="noopener noreferrer"&gt;Amazon Neptune&lt;/a&gt;): Follows the relationships in the knowledge graph, from matched values to their parent columns to their parent tables, to build a complete picture of which data assets are relevant and how they connect.&lt;/li&gt; 
 &lt;li&gt;Relevance scoring and filtering: Ranks and structures the retrieved context so the SQL generator receives precisely the information it needs, the right tables, the right columns, the right join paths, and the right business logic.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The knowledge graph and its associated data are&amp;nbsp;refreshed regularly&amp;nbsp;to reflect schema changes, new tables, and evolving business definitions. The richer this contextual layer, the more accurate the downstream SQL generation becomes.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Stage 3: Structured SQL generation and validation&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;The system uses the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use.html" target="_blank" rel="noopener noreferrer"&gt;function calling&lt;/a&gt; capabilities of Amazon Bedrock&amp;nbsp;to produce SQL queries as structured data. This enforces strict output formats, alleviates the need for fragile post-processing or complex regular expressions, and significantly improves reliability.&lt;/p&gt; 
&lt;p&gt;Generated queries then pass through&amp;nbsp;deterministic SQL validators&amp;nbsp;operating at the Abstract Syntax Tree (AST) level. These validators proactively flag potentially risky operations, queries that are syntactically correct but semantically dangerous (for example, unbounded scans, missing filters, incorrect aggregation logic). When a validator flags an issue, it returns detailed feedback explaining the problem and suggesting a revision.&lt;/p&gt; 
&lt;p&gt;To further enhance robustness, the entire cycle is wrapped in a&amp;nbsp;lightweight SQL generation agent&amp;nbsp;that automatically iterates until it produces a valid, executable query or exhausts a configurable retry limit. This approach aims to deliver&amp;nbsp;significantly better reliability than prompt engineering alone.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Stage 4: Test-time parallel compute&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;For ambiguous or complex questions, the system can generate&amp;nbsp;multiple potential answers or reasoning paths simultaneously&amp;nbsp;by submitting the same question to parallel agents. Results are synthesized through majority voting, selecting the most reliable output. This is particularly valuable for questions that can be interpreted in multiple ways, and it meaningfully improves both accuracy and robustness.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Stage 5: Response synthesis&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Finally, raw query results including numbers, data frames, and execution logs are synthesized into&amp;nbsp;natural language narratives&amp;nbsp;that users receive as actionable answers. Full query transparency is maintained: users can inspect the generated SQL and underlying data at any time, building trust in the system’s outputs.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Key strategies for production-quality results&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Architecture alone isn’t enough. The following strategies, learned from deploying this solution at scale, are essential for achieving the accuracy, safety, and responsiveness that production use demands.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Let end users shape the prompts&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Even among experienced users, individuals often have differing default interpretations of ambiguous terms and varying expectations regarding responses to vague questions. We recommend building a&amp;nbsp;&lt;strong&gt;customization interface&lt;/strong&gt;, such as a web application, so table owners and designated power users can customize prompts within governed boundaries. Customizations should pass through validation guardrails that enforce content policies, restrict prompt injection attempts, and make sure modifications stay within approved templates and parameters. This helps prevent unrestricted free-text modifications while still incorporating domain knowledge and preferences into the system. This customization capability proves essential for achieving the nuanced understanding that different business domains require. Your solution should accommodate these variations rather than enforcing a one-size-fits-all approach.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Treat SQL validation as a safety-critical layer&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Prompt engineering alone can’t remove errors that produce syntactically valid but semantically incorrect SQL. These errors are particularly dangerous because they return plausible-looking results that can silently erode user trust or drive incorrect decisions. Because SQL is a well-defined language,&amp;nbsp;deterministic validators&amp;nbsp;can catch a broad class of these errors before the query reaches your database. In internal testing, this validation layer effectively avoided serious errors in generated queries. Prioritize it as a non-negotiable safety mechanism.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Optimize aggressively for latency&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Users accustomed to conversational AI expect near-instant responses. While retrieving live data and performing calculations inherently takes longer than answering from a static knowledge base, latency must still be actively managed as a first-class user experience concern. Performance analysis reveals that the workflow involves multiple steps, and the cumulative time across those steps represents the largest opportunity relative to SQL execution time alone.&lt;/p&gt; 
&lt;p&gt;To optimize, focus on:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Parallel agent execution&lt;/strong&gt;&amp;nbsp;– Process multi-part questions concurrently rather than sequentially. This can dramatically reduce total time for complex queries.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;High-performance analytical storage&lt;/strong&gt;&amp;nbsp;– Use column-oriented databases that excel at the aggregation-heavy workloads typical in business intelligence.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Token optimization&lt;/strong&gt;&amp;nbsp;– Minimize input and output tokens per agent interaction through prompt optimization and response format standardization. Reduce reliance on tool-calling agentic frameworks where each call forces the agent to re-ingest growing context.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;With these optimizations, in our deployment, simple SQL queries are typically generated in approximately 3–5 seconds. Actual response times will vary based on factors such as data warehouse performance, query complexity, model selection, and knowledge graph size. We recommend benchmarking against your own environment to establish realistic latency targets for interactive business analysis.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Build security and governance in from the start&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;Implement&amp;nbsp;Row-Level Security (RLS)&amp;nbsp;integration so that users only ever see data they are authorized to access. The system maintains composite entitlement tables that enforce access control policies from your existing organizational systems. When a user submits a query, appropriate RLS filters are&amp;nbsp;automatically injected&amp;nbsp;into the generated SQL before execution. They’re transparent to the user, but rigorous in enforcement. Design this layer to uphold strict data governance standards without adding friction to the user experience.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Implementation results and impact&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;After you follow the architecture and strategies outlined in this post, a text-to-SQL solution can deliver significant improvements in data accessibility and analytical productivity:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Speed improvements&lt;/strong&gt; deliver answers to complex business questions in minutes, compared to hours or days with traditional approaches. Questions requiring multi-table joins, temporal calculations, and hierarchical aggregations that previously required custom SQL development become accessible through natural language.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Analytical democratization&lt;/strong&gt; helps non-technical business users across sales operations, financial planning, and executive leadership perform sophisticated data analysis without SQL expertise. This typically reduces analytical workload on data engineering teams, allowing them to focus on strategic initiatives rather than repetitive query requests.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Complex query handling&lt;/strong&gt; supports multi-dimensional revenue analysis with the following capabilities: 
  &lt;ul&gt; 
   &lt;li&gt;automatic segmentation&lt;/li&gt; 
   &lt;li&gt;year-over-year and month-over-month trending with variance explanations&lt;/li&gt; 
   &lt;li&gt;customer intelligence at granular levels with usage patterns&lt;/li&gt; 
   &lt;li&gt;forecast variance analysis with target comparisons&lt;/li&gt; 
   &lt;li&gt;cross-functional benchmarking across time periods and business units&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Looking forward&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Text-to-SQL solutions powered by &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; represent a significant step forward in making data analytics accessible to business users. The multi-agent architecture using &lt;a href="https://aws.amazon.com/bedrock/agents/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Agents&lt;/a&gt; supports complex query decomposition and parallel processing, while knowledge graphs provide business context and semantic understanding. Together, these components deliver accurate, fast, and accessible analytics that empower business users to make data-driven decisions without technical barriers.&lt;/p&gt; 
&lt;p&gt;As you build your own solution, consider expanding knowledge graph coverage to additional business domains, optimizing response latency through advanced caching strategies, and integrating with more enterprise data sources. &lt;a href="https://aws.amazon.com/bedrock/guardrails/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Guardrails&lt;/a&gt; offer enhanced output validation and safety capabilities worth exploring, while &lt;a href="https://aws.amazon.com/bedrock/flows/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Flows&lt;/a&gt; provide sophisticated orchestration patterns for agentic workflows.&lt;/p&gt; 
&lt;p&gt;The FM flexibility, agent orchestration capabilities, and knowledge base integration available through &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; continue to evolve, making data analysis increasingly intuitive and powerful for business users across organizations.&lt;/p&gt; 
&lt;p&gt;To build your own text-to-SQL solution, explore the &lt;a href="https://docs.aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock User Guide&lt;/a&gt;, participate in an &lt;a href="https://builder.aws.com/build/workshops?trk=aca14daf-abad-48ab-b076-80aef7f8194d&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Workshop&lt;/a&gt;, and review our guide on&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/machine-learning/building-generative-ai-agents-with-amazon-bedrock/" target="_blank" rel="noopener noreferrer"&gt;Building generative AI agents with Amazon Bedrock&lt;/a&gt;. For the latest developments, see &lt;a href="https://aws.amazon.com/new/" target="_blank" rel="noopener noreferrer"&gt;What’s New with AWS&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Acknowledgments&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;We extend our sincere gratitude to our executive sponsors and mentors whose vision and guidance made this initiative possible:&amp;nbsp;&lt;a href="https://www.linkedin.com/in/aizazmanzar/" target="_blank" rel="noopener noreferrer"&gt;Aizaz Manzar&lt;/a&gt;, Director of AWS Global Sales;&amp;nbsp;&lt;a href="https://www.linkedin.com/in/aliimam27/" target="_blank" rel="noopener noreferrer"&gt;Ali Imam&lt;/a&gt;, Head of Startup Segment; and&amp;nbsp;&lt;a href="https://www.linkedin.com/in/akhand17/" target="_blank" rel="noopener noreferrer"&gt;Akhand Singh&lt;/a&gt;, Head of Data Engineering.&lt;/p&gt; 
&lt;hr&gt; 
&lt;footer&gt; 
 &lt;h2&gt;&lt;strong&gt;About the Authors&lt;/strong&gt;&lt;/h2&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-thumbnail wp-image-127507" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/ML-20289-image-2-1-100x150.jpeg" alt="" width="100" height="150"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Monica Jain&lt;/h3&gt; 
  &lt;p&gt;Monica Jain is a Senior Technical Product Manager at AWS Global Sales and an analytics professional driving AI-powered sales intelligence at scale. She leads the development of generative AI and ML-powered data products, including knowledge graphs, AI-augmented analytics, natural language query systems, and recommendation engines, that improve seller productivity and decision-making. Her work enables AWS executives and sellers worldwide to access real-time insights and accelerate data-driven customer engagement and revenue growth.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-thumbnail wp-image-127506" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/ML-20289-image-3-1-100x124.jpeg" alt="" width="100" height="124"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Damien Forthomme&lt;/h3&gt; 
  &lt;p&gt;Damien Forthomme is a Senior Applied Scientist at AWS, leading a Data Science team in the AWS Sales, Marketing, and Global Services (SMGS) org. With 10+ years of experience and a PhD in Physics, he focuses on leveraging and building advanced machine learning and GenAI tools to surface the right data to the right people at the right time. His work encompasses initiatives such as forecasting, recommendation systems, core foundational datasets creation, and building GenAI products that enhance sales productivity for our org.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-thumbnail wp-image-127505" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/ML-20289-image-4-1-100x132.png" alt="" width="100" height="132"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Matheus Cachoeira&lt;/h3&gt; 
  &lt;p&gt;Matheus Cachoeira is a Senior Product Manager in the AWS Sales, Marketing, and Global Services (SMGS) org. He has been with AWS for over 7 years, focusing on Sales and Revenue Planning. Passionate about solving complex problems at the intersection of data, AI, and business, he specializes in creating solutions that require deep business context and comprehensive domain knowledge.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-thumbnail wp-image-127511" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/ML-20289-image-5-2-100x133.png" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Meng Feng&lt;/h3&gt; 
  &lt;p&gt;Meng Feng is an Applied Scientist at AWS, where he develops automated solutions for data query, forecasting, and analysis, leveraging artificial intelligence and machine learning. He has a background in robotics, reinforcement learning, and planning. At AWS, he is passionate about applying cutting-edge technology to solve real-world challenges, focusing on selecting the most effective tools for the job to deliver impactful results.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-127503" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/ML-20289-image-6-1.jpeg" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Norman Braddock&lt;/h3&gt; 
  &lt;p&gt;Norman Braddock, Senior Manager of AI Product Management at AWS, is a product leader driving the transformation of business intelligence through agentic AI. He leads the Analytics &amp;amp; Insights Product Management team within Sales, Marketing, and Global Services (SMGS), delivering products that bridge AI model performance with measurable business impact. With a background spanning procurement, manufacturing, and sales operations, he combines deep operational expertise with product innovation to shape the future of autonomous business management.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-thumbnail wp-image-127502" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/ML-20289-image-7-1-100x113.png" alt="" width="100" height="113"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Terry Ding&lt;/h3&gt; 
  &lt;p&gt;Terry Ding is a Senior Applied Scientist at AWS, working within the AWS Sales, Marketing, and Global Services (SMGS) organization. With deep expertise in Large Language Models (LLMs) and Generative AI, he specializes in designing, developing, and productionizing GenAI applications at scale. His work spans the full lifecycle of AI solutions—from conducting rapid proof-of-concepts (POCs) to deploying production-ready systems that drive measurable business impact.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-thumbnail wp-image-127501" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/31/ML-20289-image-8-1-100x131.png" alt="" width="100" height="131"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sujit Narapareddy&lt;/h3&gt; 
  &lt;p&gt;Sujit Narapareddy, Head of Data &amp;amp; Analytics at AWS Global Sales, is a technology leader driving global enterprise transformation. He leads data product and system teams that power the AWS’s Go-to-Market through AI-augmented analytics and intelligent automation. With a proven track record in enterprise solutions, he has transformed sales productivity, data governance, and operational excellence. Previously at JPMorgan Chase Business Banking, he shaped next-generation FinTech capabilities through data innovation.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Build AI-powered employee onboarding agents with Amazon Quick</title>
		<link>https://aws.amazon.com/blogs/machine-learning/build-ai-powered-employee-onboarding-agents-with-amazon-quick/</link>
					
		
		<dc:creator><![CDATA[Pegah Ojaghi]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 18:00:06 +0000</pubDate>
				<category><![CDATA[Amazon Quick Suite]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">c4b3cc442350a2b8fa21d74a633d942bbc24da1d</guid>

					<description>In this post, we walk through building a custom HR onboarding agent with Quick. We show how to configure an agent that understands your organization’s processes, connects to your HR systems, and automates common tasks, such as answering new-hire questions and tracking document completion.</description>
										<content:encoded>&lt;p&gt;Enterprises often struggle to onboard new team members at scale. Human resources (HR) teams spend time on manual tasks that delay productivity, such as processing documents to answering repeated questions about benefits and policies. For organizations with many new hires, these steps make it harder to keep onboarding consistent and compliant. Organizations lose substantial amounts of time per day per new hire during onboarding, with new employees typically reaching only a fraction of their potential productivity in the first month. &lt;a href="https://aws.amazon.com/quick" target="_blank" rel="noopener"&gt;Amazon Quick&lt;/a&gt; is a fully managed agentic service. With it, HR departments can create no-code onboarding agents that answer new-hire questions, track compliance across existing tools, and clear tickets automatically so that new hires can ramp faster with less manual work.&lt;/p&gt; 
&lt;p&gt;In this post, we walk through building a custom HR onboarding agent with Quick. We show how to configure an agent that understands your organization’s processes, connects to your HR systems, and automates common tasks, such as answering new-hire questions and tracking document completion. You can adapt this solution to your onboarding workflow so new hires get consistent answers and HR teams reclaim time previously spent on routine inquiries.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;&lt;strong&gt;Key components of Amazon Quick&lt;/strong&gt;&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Quick transforms employee onboarding from scattered documents and manual processes into an intelligent, connected experience through the following integrated components:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Knowledge&lt;/strong&gt; &lt;strong&gt;bases&lt;/strong&gt; – Indexed content from external sources like SharePoint, OneDrive, and Confluence, as well as internal content including internal websites, file uploads, and &lt;a href="http://aws.amazon.com/s3" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service&lt;/a&gt; (Amazon S3) buckets. A knowledge base serves as a single searchable repository, so new hires get comprehensive answers from multiple sources instead of searching through disconnected files.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Actions (action connectors) &lt;/strong&gt;– Secure, permission-aware integrations that enable AI agents to take real action in HR onboarding scenarios—creating ServiceNow IT equipment requests, sending Slack welcome messages to team channels, or updating onboarding workflows in project management tools—rather than just providing links to forms.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Spaces&lt;/strong&gt; – Focused environments that organize team-centered assets including files, business intelligence artifacts (such as dashboards and topics), knowledge bases, and actions with sharing controls for team collaboration.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Quick can help HR teams create specialized onboarding assistants that combine knowledge access with automated tasks. You can use the built-in system agent (“My assistant”) for immediate help or create custom chat agents tailored to your organization’s specific onboarding needs, such as a dedicated HR onboarding assistant that knows your company policies and can automatically handle common requests like IT setup or benefits enrollment.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;This solution uses a custom chat agent in Quick for employee onboarding. Without an agent, HR might switch between wikis, SharePoint, ticketing, chat, and email to coordinate each step. With Quick, the agent presents the latest checklist from the HR space, answers with approved language, opens requests through actions, notifies stakeholders, and points the employee to the next step. Confirmations and status remain in the HR tools, and the agent reads or updates them through actions or flows. The following diagram illustrates the solution architecture.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="wp-image-123113 size-full aligncenter" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/Screenshot-2026-01-20-at-9.50.49 AM.png" alt="" width="2038" height="1016"&gt;&lt;/p&gt; 
&lt;p&gt;Implementing the solution consists of the following high-level steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create the chat agent in Quick.&lt;/li&gt; 
 &lt;li&gt;Attach the HR space and link knowledge sources.&lt;/li&gt; 
 &lt;li&gt;Add actions.&lt;/li&gt; 
 &lt;li&gt;Test with real questions and tasks, then share with employees.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Quick provides two types of chat agents that facilitate this onboarding solution: the system chat agent (“My assistant”) and custom chat agents. The system chat agent (“My assistant”) – “My assistant” appears on the Amazon Quick console by default and helps users ask questions and complete tasks using resources they are allowed to access. Users can interact with the system agent in multiple ways:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Ask general questions using the agent’s built-in knowledge by choosing &lt;strong&gt;General knowledge&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Upload their own files directly in chat (up to 20 files per conversation) for analysis and questions.&lt;/li&gt; 
 &lt;li&gt;Control the conversation scope by choosing from three modes: &lt;strong&gt;All data &amp;amp; apps&lt;/strong&gt; (searches across all accessible resources), &lt;strong&gt;General knowledge &lt;/strong&gt;(uses only built-in knowledge), or &lt;strong&gt;Specific data &amp;amp; apps&lt;/strong&gt; (targets particular spaces, &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/working-with-dashboards.html" target="_blank" rel="noopener noreferrer"&gt;dashboards&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/topics.html" target="_blank" rel="noopener noreferrer"&gt;topics,&lt;/a&gt; knowledge bases, or actions). For example, a user might upload their employee handbook and ask, “What’s our remote work policy?” or select the HR space and ask, “How do I enroll in the health insurance plan?” The system agent is available immediately with no configuration required and adapts its responses based on the selected scope and available resources.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Custom agents help you build specialized assistants for your business needs. You configure behavior (purpose, tone, response format); attach spaces with dashboards, topics, and knowledge bases for grounded answers; and link action connectors so the agent can perform tasks in tools like Jira, Slack, ServiceNow, Salesforce, Outlook, or Teams. You can share custom agents with specific users or groups. Custom agents offer the following capabilities:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Use case-specific responses &lt;/strong&gt;– Define the agent’s persona and response style tailored to specific business workflows and requirements.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Guidance through reference documents&lt;/strong&gt; – Upload specific documents that serve as response templates for consistent messaging and process guides for following specific steps.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Comprehensive data integration&lt;/strong&gt; – Link spaces to the agent to give it access to different types of searchable content and knowledge sources, including dashboards for analytics, topics for structured datasets, knowledge bases for external, unstructured document repositories, and local files uploaded directly to the space for additional information. This helps the agent answer questions using different relevant data within the organization’s permission structure.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Automated actions&lt;/strong&gt; – Add action connectors so users can create Jira tickets, send Slack messages, update Salesforce, or open ServiceNow requests directly from chat.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Collaboration&lt;/strong&gt; – Test, refine, and share agents with teammates. Administrators can control who can create and customize agents through user subscriptions and custom permissions.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;You can use the system chat agent for general assistance across Quick, or create a custom agent tailored to a workflow such as HR onboarding. In that case, you define instructions, attach the HR space or knowledge base, and enable actions for requests and notifications.&lt;/p&gt; 
&lt;p&gt;In the following sections, we walk through the steps to implement this solution using two personas: the HR administrator who sets up and shares the agent, and the employee who completes onboarding tasks with the agent.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before you begin, make sure you have completed the following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create an AWS account. For more information, see &lt;a href="https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-creating.html" target="_blank" rel="noopener noreferrer"&gt;Create an AWS account&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Confirm you have access to Quick.&lt;/li&gt; 
 &lt;li&gt;At least one Amazon Quick Enterprise subscription to configure actions and create knowledge bases. Users who only use the shared agent can be on the Amazon Quick Professional subscription&lt;/li&gt; 
 &lt;li&gt;Go to &lt;a href="https://www.atlassian.com/get-started?utm_source=chatgpt.com" target="_blank" rel="noopener noreferrer"&gt;Get started with Atlassian Cloud&lt;/a&gt; and create a free site, selecting both &lt;strong&gt;Confluence&lt;/strong&gt; and &lt;strong&gt;Jira&lt;/strong&gt; on the Free plan (up to 10 users). 
  &lt;ol type="a"&gt; 
   &lt;li&gt;In Confluence, create an “HR Onboarding” space to store your HR content.&lt;/li&gt; 
   &lt;li&gt;In Jira, create a simple HR onboarding project that the agent can use for access or equipment requests in the &lt;strong&gt;Add actions&lt;/strong&gt; section.&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;Download the ZIP file from the &lt;a href="https://catalog.us-east-1.prod.workshops.aws/workshops/119307ce-4c43-4e96-887c-cd8454b3d229/en-US/0100-introduction/0110-workshop-materials" target="_blank" rel="noopener noreferrer"&gt;HR onboarding workshop materials page&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;From the &lt;strong&gt;HR documents&lt;/strong&gt; folder in the ZIP file, upload the following files into your HR Onboarding Confluence space: 
  &lt;ol type="a"&gt; 
   &lt;li&gt;&lt;code&gt;employee_handbook.pdf&lt;/code&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;code&gt;leave_policy.pdf&lt;/code&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;code&gt;onboarding_checklist.pdf&lt;/code&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;code&gt;performance_review_guidelines.pdf&lt;/code&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;code&gt;public_holidays.csv&lt;/code&gt; (optional, used later for reporting or analytics)&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;If your organization already uses a corporate Confluence site, you might not have permission to create spaces or upload sample files unless you request additional access from your Confluence administrator. To experience the value of Quick without waiting on admin changes, use a separate Atlassian Cloud site to follow this post.&lt;/p&gt; 
&lt;h2&gt;Implementation Steps&lt;/h2&gt; 
&lt;p&gt;This procedure uses two personas: the HR administrator who sets up and shares the agent, and the employee who completes onboarding tasks with the agent.&lt;/p&gt; 
&lt;h2&gt;HR administrator&lt;/h2&gt; 
&lt;p&gt;The following sequence diagram shows how the HR administrator creates, configures, and shares the HR onboarding agent in Quick.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123129 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/image-2-5.png" alt="" width="1428" height="662"&gt;&lt;/p&gt; 
&lt;h3&gt;Create chat agent&lt;/h3&gt; 
&lt;p&gt;First, you create the chat agent itself, which becomes the single place where new hires ask questions and get guided through onboarding:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;On the Quick console, choose &lt;strong&gt;Chat agents&lt;/strong&gt; in the navigation pane, then choose &lt;strong&gt;Create&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a simple natural language prompt describing what you want your agent to do (for example, “Help new employees with HR onboarding questions and equipment requests”).&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Quick will automatically expand your prompt into a detailed persona and response instructions and scan your available resources to link relevant spaces and action connectors to the agent.&lt;/p&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Review the generated agent configuration and refine as needed, updating the preview to save your versions within the session.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Launch chat agent&lt;/strong&gt; when you are satisfied.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123130 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/image-3-8.png" alt="" width="936" height="596"&gt;&lt;/p&gt; 
&lt;h3&gt;Configure behavior&lt;/h3&gt; 
&lt;p&gt;Next, you shape how the agent should respond so its tone, scope, and guardrails match your HR policies and HR brand:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Agent metadata &lt;/strong&gt;– Update the agent’s name, description, welcome message, and starter prompts to help users discover and use the chat agent properly. These elements serve as the first impression and guide users on how to interact effectively with your HR assistant.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Agent instructions &lt;/strong&gt;– Review and update the automatically generated persona instructions, response format, tone, and length settings from the previous step. The system-generated inputs provide a solid foundation, but you can fine-tune to match your organization’s specific HR communication style and requirements.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Reference documents &lt;/strong&gt;– Upload specific guidance documents that provide the highest priority instructions for agent behavior. These reference documents will be followed as prescribed while you can use the instruction fields to provide high-level guidance on behavior and goals.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123131 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/image-4-7.png" alt="" width="1328" height="916"&gt;&lt;/p&gt; 
&lt;h3&gt;Connect HR knowledge&lt;/h3&gt; 
&lt;p&gt;Now you connect your HR knowledge sources so the agent answers from approved handbooks and policies instead of inventing its own language:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create or choose an existing HR space that holds handbooks, policies, and checklists. By configuring the agent’s knowledge scope to focus specifically on HR-related content, you make sure responses stay within appropriate boundaries and don’t access unrelated organizational data.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Upload files&lt;/strong&gt; to upload files to the space, including: 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Employee handbooks and policy documents&lt;/li&gt; 
   &lt;li&gt;Benefits information and FAQ documents&lt;/li&gt; 
   &lt;li&gt;Training materials and guides&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;Link knowledge sources such as SharePoint or a wiki.&lt;/li&gt; 
 &lt;li&gt;Link the configured space to your agent so it can access this approved searchable content for grounded responses.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123137 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/Screenshot-2026-01-20-at-12.14.27 PM.png" alt="" width="2146" height="1372"&gt;&lt;/p&gt; 
&lt;h3&gt;Add actions&lt;/h3&gt; 
&lt;p&gt;After the agent can answer questions, you add actions so it can also trigger work in your HR tools, such as tickets, requests, and notifications:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;strong&gt;Actions&lt;/strong&gt; card and choose &lt;strong&gt;Link actions&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select from available action connectors that you have already configured. For the HR onboarding use case, this could include tools such as Jira (to create and update tickets), ServiceNow (to manage incidents), or Microsoft Outlook (to send emails).&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Only action connectors configured with the necessary OAuth details can be linked to the agent, so end-users can authenticate individually during their chat. Update your reference documents and persona instructions to specify when to invoke specific action connectors. For example: “When an employee requests equipment, use the ServiceNow connector to create a hardware request ticket,” or “For access requests, create a Jira ticket in the IT-Access project with priority set to ‘Normal.’”&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123133 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/image-6-5.png" alt="" width="1429" height="374"&gt;&lt;/p&gt; 
&lt;h3&gt;Customize, test, and share&lt;/h3&gt; 
&lt;p&gt;Finally, customize the agent with a welcome message and suggested prompts. You can test the agent with realistic scenarios, tune the experience, and share it with a pilot group so HR can validate the workflow before broad rollout. Test with real questions and tasks using the preview chat.&lt;/p&gt; 
&lt;p&gt;When you’re ready, launch the agent, and it will be available in your personal library for private use. To share with others, choose &lt;strong&gt;Share&lt;/strong&gt; and add users and user groups as viewers to use the agent. You can also select other users from your team to be owners to edit and test the agent along with you. HR managers can share the custom agent with new employees by using the sharing options in the navigation pane to grant access to specific team members or groups.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123134 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/image-7-3.png" alt="" width="936" height="458"&gt;&lt;/p&gt; 
&lt;h2&gt;Employee&lt;/h2&gt; 
&lt;p&gt;The following sequence diagram shows how an employee uses the onboarding agent to complete required tasks and track their Day 1 progress in one place.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123135 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/image-8-3.png" alt="" width="1432" height="556"&gt;&lt;/p&gt; 
&lt;h3&gt;Use the onboarding agent&lt;/h3&gt; 
&lt;p&gt;After the agent is published and shared with employees as viewers, they can open it from the link HR provides (for example, in their Day 1 email or HR portal) or from the chat agents list in Quick, and then use it as follows:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;The employee opens the shared HR onboarding agent from the link or from the chat agents list and starts a new Day 1 conversation.&lt;/li&gt; 
 &lt;li&gt;The agent shows the latest onboarding checklist from the HR Onboarding space and provides links to required forms, training, and internal pages so the employee can move through the steps in order.&lt;/li&gt; 
 &lt;li&gt;The employee asks policy or benefits questions in plain language, and the agent answers using content from the HR Onboarding space and connected HR knowledge sources so responses match HR-approved language.&lt;/li&gt; 
 &lt;li&gt;In this setup, when the employee requests equipment or application access, the agent uses a Jira action connector to create an issue in the HR onboarding project and returns the issue key and link so you can see the request end to end without touching production HR systems.&lt;/li&gt; 
 &lt;li&gt;For sensitive steps such as I-9 verification, tax forms, or direct deposit, the agent directs the employee to the appropriate HR system or secure portal instead of collecting documents in chat so sensitive data stays in the right place.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-123136 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/01/20/image-9-3.png" alt="" width="936" height="612"&gt;&lt;/p&gt; 
&lt;p&gt;As an employee, the experience is simple: they open a single chat, see their Day 1 checklist, ask questions in natural language, and let the agent open requests and point them to the right systems. Instead of juggling emails, portals, and tickets, onboarding feels like a guided conversation where each next step is clear.&lt;/p&gt; 
&lt;p&gt;You have now set up the HR Onboarding Confluence space with sample HR documents, created a custom onboarding agent in Quick, configured its behavior, connected HR knowledge, and added Jira actions for requests. You can use this setup as a proof of concept with a small group of new hires or HR partners, then extend it by adding more content, additional actions, or new spaces for other HR workflows such as performance reviews or policy updates.&lt;/p&gt; 
&lt;h2&gt;Guardrails and safety&lt;/h2&gt; 
&lt;p&gt;Quick includes built-in safety and content controls for chat agents, so you can follow along with this post using the default settings in your account. If you want to experiment with policy controls as part of this proof of concept, you can also add a small list of blocked words or phrases so the agent avoids specific terms in HR responses (for example, informal slang or discouraged wording). Blocked terms are configured on the Quick console and applied across agents in your account. For step-by-step instructions and additional security options such as access control and encryption, see the &lt;a href="https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick User Guide&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Quick tiers&lt;/h2&gt; 
&lt;p&gt;Quick offers two user subscriptions: Professional and Enterprise. Professional supports everyday use of chat agents and spaces, running &lt;a href="https://aws.amazon.com/quicksuite/flows/" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick Flows&lt;/a&gt; and &lt;a href="https://aws.amazon.com/quicksuite/research/" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick Research&lt;/a&gt;, and viewing &lt;a href="https://aws.amazon.com/quicksight" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick Sight&lt;/a&gt; dashboards, with the ability to create and share custom agents and spaces. Enterprise includes everything in Professional plus advanced authoring features such as configuring actions, creating knowledge bases, building automations in &lt;a href="https://aws.amazon.com/quicksuite/automate/" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick Automate&lt;/a&gt;, and authoring dashboards in Quick Sight, with larger monthly usage allowances. A 30‑day free trial is available for up to 25 users per account. For details, refer to &lt;a href="https://aws.amazon.com/quicksuite/pricing/" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick pricing&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;This post showed how to build an HR onboarding chat agent in Quick, attach HR content, add actions and optional flows, and share it with employees. Start with a pilot that covers your most frequent questions and two or three requests, review usage, and refine the agent’s instructions and content. For next steps, expand the HR space, add additional actions as needed, and review &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/managing-spaces.html" target="_blank" rel="noopener noreferrer"&gt;the Quick documentation&lt;/a&gt; for advanced configuration. Beyond onboarding, HR teams can explore building agents for employee self-service, performance management, talent acquisition, learning and development, analytics, and off-boarding processes to transform their entire HR operations.&lt;/p&gt; 
&lt;p&gt;Ready to transform your workplace productivity? Get started with Quick, explore pricing options that fit your needs. Click &lt;a href="http://aws.amazon.com/quicksuite/getting-started" target="_blank" rel="noopener noreferrer"&gt;here&lt;/a&gt; to begin building your own HR agent, explore our official &lt;a href="http://aws.amazon.com/quicksuite/getting-started" target="_blank" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for detailed implementation guidance, or contact your AWS account team to discuss how Quick can transform your organization’s approach to data-driven decision-making.&lt;/p&gt; 
&lt;hr&gt; 
&lt;h3&gt;About the authors&lt;/h3&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-124466 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/02/17/IMG_8068-100x141.jpg" alt="" width="100" height="141"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Pegah Ojaghi&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Pegah Ojaghi&lt;/strong&gt; is a Generative AI Applied Architect at AWS with a PhD in Computer Science focused on large language models, generative AI, and reinforcement learning. Her expertise and research span foundation model development, RLHF techniques, and novel optimization methods for LLMs. Her passion is translating cutting-edge research into production systems across healthcare, financial services, and insurance industries.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-99978" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/02/18/chinrane-1-79x100.png" alt="" width="79" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Chinmayee Rane&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Chinmayee Rane&lt;/strong&gt; is a Generative AI Specialist Solutions Architect at AWS, with a core focus on generative AI. She helps ISVs accelerate the adoption of generative AI by designing scalable and impactful solutions. With a strong background in applied mathematics and machine learning, she specializes in intelligent document processing and AI-driven innovation. Outside of work, she enjoys salsa and bachata dancing.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="wp-image-124553 size-full alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/02/17/author-ebbey.png" alt="" width="100" height="96"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ebbey Thomas&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Ebbey Thomas&lt;/strong&gt; is a Senior Generative AI Specialist Solutions Architect at AWS. He holds a BS in Computer Engineering and an MS in Information Systems from Syracuse University. Outside of work, he enjoys coffee, the outdoors, workouts, road trips, and time with his family.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="wp-image-111412 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/07/10/sonali-sahu-100.jpg" alt="" width="83" height="111"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sonali Sahu&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Sonali Sahu&lt;/strong&gt; is leading the Generative AI Specialist Solutions Architecture team in AWS. She is an author, thought leader, and passionate technologist. Her core area of focus is AI and ML, and she frequently speaks at AI and ML conferences and meetups around the world. She has both breadth and depth of experience in technology and the technology industry, with industry expertise in healthcare, the financial sector, and insurance.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Accelerate agentic tool calling with serverless model customization in Amazon SageMaker AI</title>
		<link>https://aws.amazon.com/blogs/machine-learning/accelerate-agentic-tool-calling-with-serverless-model-customization-in-amazon-sagemaker-ai/</link>
					
		
		<dc:creator><![CDATA[Lauren Mullennex]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 17:54:00 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon SageMaker AI]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">a3099ac05668413af10c59fe313b0a0cb8a0d4aa</guid>

					<description>In this post, we walk through how we fine-tuned Qwen 2.5 7B Instruct for tool calling using RLVR. We cover dataset preparation across three distinct agent behaviors, reward function design with tiered scoring, training configuration and results interpretation, evaluation on held-out data with unseen tools, and deployment.</description>
										<content:encoded>&lt;p&gt;Agentic tool calling is what makes AI agents useful in production. It’s how they query databases, trigger workflows, retrieve real-time data, and act on a user’s behalf. But base models frequently hallucinate tools, pass bad parameters, and attempt actions when they should ask for clarification. These failures erode trust and block production deployment.&lt;/p&gt; 
&lt;p&gt;You can use &lt;a href="https://aws.amazon.com/sagemaker/ai/model-customization/" target="_blank" rel="noopener noreferrer"&gt;Serverless model customization&lt;/a&gt; in Amazon SageMaker AI to fix these problems without managing infrastructure. With Reinforcement Learning with Verifiable Rewards (RLVR), the model generates its own candidate responses, receives a reward signal indicating quality, and updates its behavior to favor what works. You select a model, configure a technique, point to your data and reward function, and SageMaker AI handles the rest. In this post, we walk through how we fine-tuned Qwen 2.5 7B Instruct for tool calling using RLVR. We cover dataset preparation across three distinct agent behaviors, reward function design with tiered scoring, training configuration and results interpretation, evaluation on held-out data with unseen tools, and deployment. By the end, our fine-tuned model improved tool call reward by 57% over the base model on scenarios that it didn’t see during training.&lt;/p&gt; 
&lt;p&gt;Because tool calling has a naturally verifiable objective, whether the model called the right function with the right parameters, it maps well to RLVR. The challenge with self-managed reinforcement learning (RL) is the operational overhead. GPU procurement, memory orchestration between rollout and training phases, reward infrastructure, and checkpointing add up quickly. Hyperparameter sensitivity adds another layer of complexity. SageMaker AI takes on that work so you can focus on your model, your data, and your reward function.&lt;/p&gt; 
&lt;p&gt;SageMaker AI supports model families including&amp;nbsp;&lt;a href="https://aws.amazon.com/nova/?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;Amazon Nova&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/bedrock/openai/?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;GPT-OSS&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/bedrock/meta/?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;Llama&lt;/a&gt;,&amp;nbsp;&lt;a href="https://aws.amazon.com/bedrock/qwen/?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;Qwen&lt;/a&gt;, and&amp;nbsp;&lt;a href="https://aws.amazon.com/bedrock/deepseek/?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;DeepSeek, with techniques including Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), RLVR, and Reinforcement Learning from AI Feedback (RLAIF). Training and validation metrics are tracked through integrated MLflow.&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;Why RLVR for tool calling&lt;/h2&gt; 
&lt;p&gt;SFT requires labeled examples of each behavior that you want the model to learn. For tool calling, that means examples of calling a tool, asking for clarification, and refusing. But tool calling also requires the model to decide between those behaviors, and SFT can struggle to generalize that decision-making beyond the specific patterns in its training data.&lt;/p&gt; 
&lt;p&gt;RLVR works differently. For each prompt, the model generates multiple candidate responses (we use eight). A reward function verifies which ones are correct. The model then updates its policy to favor what worked, using Group Relative Policy Optimization (GRPO). GRPO compares each candidate’s reward score against the mean score of the group and reinforces responses that score above average. Over time, the model learns the format of a tool call and when to call compared to when to ask.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;To use serverless model customization in SageMaker AI, you must have the following prerequisites:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An AWS account&lt;/li&gt; 
 &lt;li&gt;An&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/model-customize-open-weight-prereq.html" target="_blank" rel="noopener noreferrer"&gt;AWS IAM role&amp;nbsp;&lt;/a&gt;with the required permissions&lt;/li&gt; 
 &lt;li&gt;A SageMaker AI domain with Studio access&lt;/li&gt; 
 &lt;li&gt;An&amp;nbsp;&lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&amp;nbsp;&lt;/a&gt;bucket&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Fine-tune Qwen 2.5 7B Instruct in SageMaker AI&lt;/h2&gt; 
&lt;p&gt;To get started, we open Amazon SageMaker AI Studio and choose&amp;nbsp;&lt;strong&gt;Models&lt;/strong&gt;&amp;nbsp;in the left navigation pane to browse the foundation models (FM) that are available for customization.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127279" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-20015-image1.jpg" alt="Amazon SageMaker Studio Models page showing featured foundation models from Amazon, Meta, and Qwen with a Customize model dropdown menu expanded, revealing options to Customize with UI, AI Agent (Preview), and Code." width="2560" height="1393"&gt;&lt;/p&gt; 
&lt;p&gt;In the&amp;nbsp;&lt;strong&gt;Customize model&lt;/strong&gt;&amp;nbsp;menu, select&amp;nbsp;&lt;strong&gt;Qwen 2.5 7B Instruct&lt;/strong&gt;, and choose&amp;nbsp;&lt;strong&gt;Customize with UI&lt;/strong&gt;. This opens the customization configuration page where you select your technique, point to your training data and reward function, and configure hyperparameters. We selected&amp;nbsp;&lt;strong&gt;Reinforcement Learning from Verifiable Rewards (RLVR)&lt;/strong&gt;&amp;nbsp;as our customization technique.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127278" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-20015-image2.jpg" alt="Amazon SageMaker Studio model customization form for Qwen2.5-7B-Instruct showing the Customization technique dropdown with Reinforcement Learning with Verifiable Rewards (RLVR) selected, along with options for reward functions, dataset upload, S3 output location, and batch size." width="936" height="664"&gt;&lt;/p&gt; 
&lt;h2&gt;Prepare your training data&lt;/h2&gt; 
&lt;p&gt;A tool calling dataset needs to teach more than correct API invocations. Production agents face three distinct situations:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;The user provides enough information, and the model should call a tool.&lt;/li&gt; 
 &lt;li&gt;The user’s request is missing required parameters, and the model should ask for clarification.&lt;/li&gt; 
 &lt;li&gt;The request is harmful or out of scope, and the model should refuse.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;We generated 1,500 synthetic training examples from our tool schemas (weather, flights, translation, currency conversion, statistics) using &lt;a href="https://kiro.dev/" target="_blank" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, the Amazon AI-powered IDE, to produce prompts with realistic variation in phrasing and specificity across the three behaviors. Here’s an example of the prompt we used:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;Generate 1,500 JSONL training examples for RLVR tool-calling&lt;/code&gt;&lt;br&gt; &lt;code&gt;fine-tuning across 5 tool schemas: get_weather_forecast,&lt;/code&gt;&lt;br&gt; &lt;code&gt;search_flights, translate_text, currency_convert, and&lt;/code&gt;&lt;br&gt; &lt;code&gt;get_statistics.&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;Each line must follow this format:&lt;/code&gt;&lt;br&gt; &lt;code&gt;{"prompt": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}], "reward_model": {"ground_truth": "..."}}&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;Distribute examples across three behaviors:&lt;/code&gt;&lt;br&gt; &lt;code&gt;1. Execute (60%): User provides all required params → ground_truth is the tool call JSON&lt;/code&gt;&lt;br&gt; &lt;code&gt;2. Clarify (25%): User is missing required params → ground_truth is a clarifying question&lt;/code&gt;&lt;br&gt; &lt;code&gt;3. Refuse (15%): Request is harmful or out of scope → ground_truth is a polite refusal&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;Vary phrasing between formal, casual, and terse.&lt;/code&gt;&lt;br&gt; &lt;code&gt;Output valid JSONL only, no commentary.&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;This is a practical path for teams that don’t yet have production logs to draw from. For organizations already running agentic workflows, real user prompts and tool calls from production will yield even higher-quality training data.&lt;/p&gt; 
&lt;p&gt;Each training example contains a prompt (a system instruction and user request) and a ground truth in the &lt;code&gt;reward_model&lt;/code&gt; field that the reward function scores against. Here are examples of each behavior.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Execute&lt;/strong&gt;&amp;nbsp;when the user provides everything the tool needs:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;{
&amp;nbsp;&amp;nbsp;"prompt": [
&amp;nbsp;&amp;nbsp; &amp;nbsp;{"role": "system", "content": "You are a helpful assistant. When using tools, respond with: [...]"},
&amp;nbsp;&amp;nbsp; &amp;nbsp;{"role": "user", "content": "Get weather for San Francisco"}
&amp;nbsp;&amp;nbsp;],
&amp;nbsp;&amp;nbsp;"reward_model": {
&amp;nbsp;&amp;nbsp; &amp;nbsp;"ground_truth": "[{"name": "get_weather_forecast", "arguments": {"city": "San Francisco"}}]"
&amp;nbsp;&amp;nbsp;}
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Clarify when a required parameter is missing:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;{
&amp;nbsp;&amp;nbsp;"prompt": [
&amp;nbsp;&amp;nbsp; &amp;nbsp;{"role": "system", "content": "You are a helpful assistant. When using tools, respond with: [...]"},
&amp;nbsp;&amp;nbsp; &amp;nbsp;{"role": "user", "content": "Get the weather"}
&amp;nbsp;&amp;nbsp;],
&amp;nbsp;&amp;nbsp;"reward_model": {
&amp;nbsp;&amp;nbsp; &amp;nbsp;"ground_truth": "To provide you with the weather information, could you please specify the location?"
&amp;nbsp;&amp;nbsp;}
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Execute with multiple parameters:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;{
&amp;nbsp;&amp;nbsp;"prompt": [
&amp;nbsp;&amp;nbsp; &amp;nbsp;{"role": "system", "content": "You are a helpful assistant. When using tools, respond with: [...]"},
&amp;nbsp;&amp;nbsp; &amp;nbsp;{"role": "user", "content": "Convert 50 EUR to USD"}
&amp;nbsp;&amp;nbsp;],
&amp;nbsp;&amp;nbsp;"reward_model": {
&amp;nbsp;&amp;nbsp; &amp;nbsp;"ground_truth": "[{"name": "currency_convert", "arguments": {"amount": 50, "from": "EUR", "to": "USD"}}]"
&amp;nbsp;&amp;nbsp;}
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice the difference between “Get weather for San Francisco” (tool call) and “Get the weather” (clarification). This is the kind of distinction GRPO learns well. For each prompt, the model generates eight candidates, the reward function scores them, and the scores are averaged across the group. Candidates above the mean get reinforced, and over time the model picks up when to call and when to ask.&lt;/p&gt; 
&lt;h2&gt;Define your reward function&lt;/h2&gt; 
&lt;p&gt;The reward function defines what &lt;em&gt;correct&lt;/em&gt; means for our use case. We write it as a Python function that receives the model’s response and the ground truth from the training data and returns a numerical score. Ours extracts tool calls from the model’s response, parses them as JSON, and compares against the ground truth.&lt;/p&gt; 
&lt;p&gt;The full function handles response extraction, flexible parsing for alternative formats during early training, and edge cases around JSON type mismatches. Here is the core scoring logic:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;# After extracting and parsing tool calls from model response and ground truth:

# Compare tool names
pred_names = {tool.get('name', '') for tool in pred_tools}
gt_names = {tool.get('name', '') for tool in gt_tools}

if pred_names == gt_names:
&amp;nbsp;&amp;nbsp; &amp;nbsp;# Right function(s) - check if arguments also match
&amp;nbsp;&amp;nbsp; &amp;nbsp;perfect_match = True
&amp;nbsp;&amp;nbsp; &amp;nbsp;for pred_tool in pred_tools:
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;for gt_tool in gt_tools:
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if pred_tool.get('name') == gt_tool.get('name'):
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if pred_tool.get('arguments') != gt_tool.get('arguments'):
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;perfect_match = False
&amp;nbsp;&amp;nbsp; &amp;nbsp;score = 1.0 if perfect_match else 0.5
elif pred_names &amp;amp; gt_names:
&amp;nbsp;&amp;nbsp; &amp;nbsp;# Partial overlap in function names
&amp;nbsp;&amp;nbsp; &amp;nbsp;score = 0.5
else:
&amp;nbsp;&amp;nbsp; &amp;nbsp;# Wrong function entirely
&amp;nbsp;&amp;nbsp; &amp;nbsp;score = 0.0&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The three tiers (1.0, 0.5, and 0.0) give GRPO a richer learning signal. If several of the eight candidates get the function right but miss a parameter, the 0.5 score distinguishes them from completely wrong answers. This helps the model recognize that it’s on the right track.&lt;/p&gt; 
&lt;p&gt;For clarification and refusal cases where the ground truth is natural language (no &lt;code&gt;TOOLCALL&lt;/code&gt; tags), the reward function checks whether the model also avoided calling a tool. An unnecessary API call when the model should have asked a question earns 0.0.&lt;/p&gt; 
&lt;h2&gt;Configure and launch training&lt;/h2&gt; 
&lt;p&gt;On the customization configuration page, we point to our training dataset and reward function, then set our hyperparameters. We use a batch size of 128, learning rate of 5e-6, 3 epochs, and 8 rollouts per prompt.&lt;/p&gt; 
&lt;p&gt;The rollouts setting is the core GRPO mechanism. For each training prompt, the model generates eight different responses, the reward function scores each one, and responses that score above the group average get reinforced. Training and validation metrics are logged to MLflow. In this example, training takes approximately 40 minutes.&lt;/p&gt; 
&lt;h2&gt;Training results&lt;/h2&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127277" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-20015-image3.jpg" alt="Performance dashboard displaying five RLVR training metric charts: Train Reward Statistics trending upward from 0.28 to 0.70, Train Episode Length Distribution fluctuating between 30 and 35, Policy Entropy declining from 0.19 to 0.12, Gradient Norm decreasing from 0.10 to near 0.00, and Mean Advantage Estimate recovering from -0.08 to near 0.00 over 30 training steps. Long description: Screenshot of a dark-themed" width="2560" height="1284"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Train Reward Statistics&lt;/strong&gt;&amp;nbsp;(top left) is the chart to focus on. The mean reward across the roll outs started around 0.28 and climbed to 0.65–0.68 over 30 steps, more than doubling. The steepest gains happen in the first 10 steps as the model learns the basic tool calling format and decision structure. It then flattens after step 20 as it converges.&lt;/p&gt; 
&lt;p&gt;The other charts confirm healthy training:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Policy Entropy&lt;/strong&gt;&amp;nbsp;decreases, meaning the model is getting more confident rather than guessing.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Gradient Norm&lt;/strong&gt;&amp;nbsp;stabilizes, meaning updates are getting smaller and more refined.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Mean Advantage Estimate&lt;/strong&gt;&amp;nbsp;converges toward zero, indicating that the model’s policy is stabilizing and the average response quality is aligning with the reward baseline.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Evaluate the fine-tuned model&lt;/h2&gt; 
&lt;p&gt;After the training job is complete, you can see the models that you created in the&amp;nbsp;&lt;strong&gt;My Models&lt;/strong&gt;&amp;nbsp;tab. To expand the details, choose&amp;nbsp;&lt;strong&gt;View details&lt;/strong&gt;&amp;nbsp;on one of your models.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127276" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-20015-image4.jpg" alt="Amazon SageMaker Studio My Models page showing the Logged tab with two fine-tuned model cards: example-name-2lt4op at version v3 and example-name-2lt4o with no versions found, both created 4 days ago with View details buttons." width="2076" height="1396"&gt;&lt;/p&gt; 
&lt;p&gt;You can choose&amp;nbsp;&lt;strong&gt;Continue customization&lt;/strong&gt;&amp;nbsp;to iterate further by adjusting hyperparameters or training with a different technique. Choose&amp;nbsp;&lt;strong&gt;Evaluate&lt;/strong&gt;&amp;nbsp;to compare your customized model against the base model.&lt;/p&gt; 
&lt;p&gt;We evaluate on a separate test set of 300 examples that were excluded from training. The evaluation dataset covers the same three behaviors but includes tools, phrasings, and scenarios that the model hasn’t seen. It tests &lt;code&gt;search_restaurants&lt;/code&gt;,&amp;nbsp;&lt;code&gt;get_stock_price&lt;/code&gt;, and&amp;nbsp;&lt;code&gt;calculate_standard_deviation&lt;/code&gt;, none of which appeared during training. It also includes refusal cases for harmful requests like generating violent content or creating malware, testing whether the model generalizes safe behavior to new threats.&lt;/p&gt; 
&lt;p&gt;The evaluation runs standard NLP metrics alongside our custom reward function against the held-out set.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127275" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-20015-image5.jpg" alt="Evaluation metrics comparison table showing the custom RLVR-trained model outperforming the base model across all metrics: Rouge1 (65.21% vs 49.48%), Rouge2 (51.45% vs 35.12%), RougeL (59.19% vs 45.78%), Em (21% vs 11%), F1 (56.63% vs 42.19%), F1 Score Quasi (64.60% vs 45.98%), Bleu (100.00 vs 92.58), Tool Call Reward (0.55 vs 0.35), and Aggregate Reward Score (0.55 vs 0.35), evaluated on 300 documents." width="1524" height="760"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Tool Call Reward&amp;nbsp;&lt;/strong&gt;is our custom metric and the most direct measure of what we trained for. It jumped from 0.35 to 0.55, a 57% improvement. In practical terms, this means that the fine-tuned model makes the correct tool calling decision significantly more often. It calls the right function with the right parameters when it should, asks for clarification when information is missing, and refuses when appropriate.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;F1 Score Quasi&lt;/strong&gt;,&amp;nbsp;&lt;strong&gt;Rouge1&lt;/strong&gt;, and&amp;nbsp;&lt;strong&gt;RougeL&lt;/strong&gt;&amp;nbsp;all improved by 14–19 percentage points, reflecting better generation of correct function names, parameter keys, and values across the board.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Exact Match&lt;/strong&gt;&amp;nbsp;doubled from 11% to 21%. This metric requires character-for-character output matching, so even small formatting differences count as a miss. The 21% exact match alongside 64.6% F1 suggests that the model often gets the semantics right even when surface formatting differs slightly.&lt;/p&gt; 
&lt;p&gt;These results are all on held-out data with tools and scenarios that the model didn’t see during training. The improvements carry over to new tools, which means the model picked up general tool calling patterns rather than memorizing the training set.&lt;/p&gt; 
&lt;p&gt;Here are responses from the fine-tuned model on prompts that it didn’t see during training:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Tool execution:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&amp;gt;&amp;nbsp;&lt;strong&gt;Prompt:&lt;/strong&gt;&amp;nbsp;“What’s the stock price of AMZN?” &amp;gt; &amp;gt;&amp;nbsp;&lt;strong&gt;Model response:&lt;/strong&gt;&amp;nbsp;&lt;code&gt;[{"name": "get_stock_price", "arguments": {"ticker": "AMZN"}}]&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Clarification:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&amp;gt;&amp;nbsp;&lt;strong&gt;Prompt:&lt;/strong&gt;&amp;nbsp;“Find me a restaurant nearby” &amp;gt; &amp;gt;&amp;nbsp;&lt;strong&gt;Model response:&lt;/strong&gt;&amp;nbsp;“I’d be happy to help you find a restaurant! Could you please provide the cuisine type or location you’re interested in?”&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Near-miss:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&amp;gt;&amp;nbsp;&lt;strong&gt;Prompt:&lt;/strong&gt;&amp;nbsp;“What’s the standard deviation of 10, 20, 30, 40, 50?” &amp;gt; &amp;gt;&amp;nbsp;&lt;strong&gt;Model response:&lt;/strong&gt;&amp;nbsp;&lt;code&gt;[{"name": "calculate_standard_deviation", "arguments": {"numbers": "10, 20, 30, 40, 50"}}]&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;In the near-miss case, the model selected the correct tool but passed the numbers as a string instead of an array. This earns a 0.5 reward score (right function, wrong parameter format) and represents the kind of error that you’d target in the next iteration through additional training data or reward function refinement.&lt;/p&gt; 
&lt;h2&gt;Deploy the fine-tuned model&lt;/h2&gt; 
&lt;p&gt;With evaluation confirming improvement, deploy the fine-tuned model directly from the model details page. Choose&amp;nbsp;&lt;strong&gt;Deploy,&lt;/strong&gt;&amp;nbsp;and select your deployment target: either a SageMaker AI endpoint or&amp;nbsp;&lt;a href="https://www.google.com/aclk?sa=L&amp;amp;ai=DChsSEwi0sMCD8ruTAxVnmO4BHVqSGRQYACICCAEQABoCZHo&amp;amp;ae=2&amp;amp;aspm=1&amp;amp;co=1&amp;amp;ase=2&amp;amp;gclid=EAIaIQobChMItLDAg_K7kwMVZ5juAR1akhkUEAAYASAAEgJyFvD_BwE&amp;amp;cid=CAAS0gHkaHk9acJbNFSSXpMUXdE2beaMDCoiGkr7lC-s5YsahS9peDNTCcJPQWzvKZ4Gtcd-0HhG5G2NlS15o--Qg8o6bDZ7IKhObAtN0fjP8sOk1ZL5diDnb1T1gdh0LKN4hmKSzNWEMI9rW0yWL1p5TI6fKEoBzWY1GA6uaGLRm_tn8dO14dbjesBF3CPnjEKvIl2BB3aLQlJMpM5Fg2rs0SWZV1faTLWypUh83p7uSz4gC6KT3nmNrb6jlDFse1Rs4flaJ1kOsD1CIs9w5vAczf4gDSU&amp;amp;cce=2&amp;amp;category=acrcp_v1_35&amp;amp;sig=AOD64_0yuTA0kVAGgZDUJzKKwFqiDbBKSA&amp;amp;q&amp;amp;nis=4&amp;amp;adurl&amp;amp;ved=2ahUKEwiygrqD8ruTAxVaMDQIHc9ABdMQ0Qx6BAgfEAE" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock. &lt;/a&gt;You can also download the model weights from Amazon S3 for self-managed deployment.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127274" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-20015-image6.jpg" alt="Amazon SageMaker Studio training details page for an RLVR Tool Calling model (v1) based on Qwen2.5-7B-Instruct, showing completed training status with RLVR customization technique, a Deploy dropdown menu with SageMaker AI and Bedrock options, and hyperparameters including batch size 128, max epochs 3, and learning rate 0.000005." width="2560" height="615"&gt;&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we fine-tuned Qwen 2.5 7B Instruct for agentic tool calling using RLVR and GRPO through serverless model customization in Amazon SageMaker AI. We prepared a dataset spanning three tool-calling behaviors (execute, clarify, refuse), defined a tiered reward function, trained the model in about 40 minutes, evaluated on held-out data with unseen tools and scenarios, and deployed. The fine-tuned model improved tool call reward by 57% over the base model.&lt;/p&gt; 
&lt;p&gt;To push accuracy further, you can expand your training data with additional tools, edge cases, and multi-turn conversations to cover more of the scenarios that your agents encounter in production. You can also refine your reward function to penalize specific failure modes, like the string-vs-array parameter issue shown in the previous section, or add partial credit for other near-miss patterns. If you’re running agentic workflows, your production logs are a high-quality source of training data that can make the model even more effective for your specific use case. Beyond tool calling, RLVR applies to other reasoning tasks where correctness is verifiable, such as multi-step planning, structured data extraction, or code generation.&lt;/p&gt; 
&lt;p&gt;While this post walks through the UI workflow, an&amp;nbsp;&lt;a href="https://sagemaker.readthedocs.io/en/stable/model_customization/index.html" target="_blank" rel="noopener noreferrer"&gt;SDK for programmatic access&lt;/a&gt;&amp;nbsp;is also available. To learn more, see the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/customize-model.html" target="_blank" rel="noopener noreferrer"&gt;SageMaker AI model customization documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;To get started, try serverless AI model customization&amp;nbsp;in Amazon SageMaker AI with your own use cases.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-35407" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/04/19/lemull.jpg" alt="Lauren Mullennex" width="99" height="127"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Lauren Mullennex&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/laurenmull/"&gt;Lauren&lt;/a&gt; is a Senior GenAI/ML Specialist Solutions Architect at AWS. She has over a decade of experience in ML, DevOps, and infrastructure. She is a published author of a book on computer vision. Outside of work, you can find her traveling and hiking with her two dogs.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-thumbnail wp-image-127280" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/A88ECF8C-6F54-4D6C-B221-83E4EE609543-1-100x133.jpeg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Eric Saleh&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/eric-saleh/" target="_blank" rel="noopener"&gt;Eric&lt;/a&gt; is a&amp;nbsp;Senior GenAI Specialist at AWS, focusing on foundation model training and inference. He is partnering with top foundation model builders and AWS service teams to enable distributed training and inference at scale on AWS and lead joint GTM motions with strategic customers. Before joining AWS, Eric led product teams building enterprise AI/ML solutions, which included frontier GenAI services for fine-tuning, RAG, and managed inference. He holds a master’s degree in Business Analytics from UCLA Anderson.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-thumbnail wp-image-94290" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2024/11/26/karisury-1-80x100.jpg" alt="" width="80" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Surya Kari&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/suryakari/"&gt;Surya&lt;/a&gt; is a Senior Generative AI Data Scientist at AWS, specializing in developing solutions leveraging state-of-the-art foundation models. He has extensive experience working with advanced language models including DeepSeek-R1, the LLama family, and Qwen, focusing on their fine-tuning and optimization for specific scientific applications. His expertise extends to implementing efficient training pipelines and deployment strategies using AWS SageMaker, enabling the scaling of foundation models from development to production. He collaborates with customers to design and implement generative AI solutions, helping them navigate model selection, fine-tuning approaches, and deployment strategies to achieve optimal performance for their specific use cases.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building Intelligent Search with Amazon Bedrock and Amazon OpenSearch for hybrid RAG solutions</title>
		<link>https://aws.amazon.com/blogs/machine-learning/building-intelligent-search-with-amazon-bedrock-and-amazon-opensearch-for-hybrid-rag-solutions/</link>
					
		
		<dc:creator><![CDATA[Arpit Gupta]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 17:49:32 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Amazon OpenSearch Service]]></category>
		<category><![CDATA[Foundation models]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Strands Agents]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">51a43e05f36531e4c51d1b24bd1c2f6e70a1f90f</guid>

					<description>In this post, we show how to implement a generative AI agentic assistant that uses both semantic and text-based search using Amazon Bedrock, Amazon Bedrock AgentCore, Strands Agents and Amazon OpenSearch.</description>
										<content:encoded>&lt;p&gt;Agentic generative AI assistants represent a significant advancement in artificial intelligence, featuring dynamic systems powered by large language models (LLMs) that engage in open-ended dialogue and tackle complex tasks. Unlike basic chatbots, these implementations possess broad intelligence, maintaining multi-step conversations while adapting to user needs and executing necessary backend tasks.&lt;/p&gt; 
&lt;p&gt;These systems retrieve business-specific data in real-time through API calls and database lookups, incorporating this information into LLM-generated responses or providing it alongside them using predefined standards. This combination of LLM capabilities with dynamic data retrieval is known as Retrieval-Augmented Generation (RAG).&lt;/p&gt; 
&lt;p&gt;For example, an agentic assistant handling hotel booking would first query a database to find properties that match the guest’s specific requirements. The assistant would then make API calls to retrieve real-time information about room availability and current rates. This retrieved data can be handled in two ways: either the LLM can process it to generate a comprehensive response, or it can be displayed alongside an LLM-generated summary. Both approaches allow guests receive precise, current information that’s integrated into their ongoing conversation with the assistant.&lt;/p&gt; 
&lt;p&gt;In this post, we show how to implement a generative AI agentic assistant that uses both semantic and text-based search using Amazon Bedrock, Amazon Bedrock AgentCore, Strands Agents and Amazon OpenSearch.&lt;/p&gt; 
&lt;h3&gt;Information retrieval approaches in RAG systems&lt;/h3&gt; 
&lt;p&gt;Generally speaking, information retrieval supporting RAG capabilities in agentic generative AI implementations revolves around real-time querying of the backend data sources or communicating with an API. The responses are then factored into the subsequent steps performed by the implementation. From a high-level system design and implementation perspective, this step is not specific to generative AI-based solutions: Databases, APIs, and systems relying on integration with them have been around for a long time. There are certain information retrieval approaches that have emerged alongside agentic AI implementations, most notably, semantic search-based data lookups. They retrieve data based on the meaning of the search phrase as opposed to keyword or pattern lexical similarity. Vector embeddings are precomputed and stored in vector databases, enabling efficient similarity calculations at query time. The core principle of Vector Similarity Search (VSS) involves finding the closest matches between these numerical representations using mathematical distance metrics such as cosine similarity or Euclidean distance. These mathematical functions are particularly efficient when searching through large corpora of data because the vector representations are precomputed. Bi-encoder models are commonly used in this process. They separately encode the query and documents into vectors, enabling efficient similarity comparisons at scale without requiring the model to process query-document pairs together. When a user submits a query, the system converts it into a vector and searches for content vectors positioned closest to it in the high-dimensional space. This means that even if exact keywords don’t match, the search can find relevant results based on conceptual semantic similarity. Moreover, in situations where search terms are lexically but not semantically close to entries in the dataset, semantic similarity search will “prefer” semantically similar entries.&lt;/p&gt; 
&lt;p&gt;For example, given the vectorized dataset: [“building materials”, “plumbing supplies”, “2×2 multiplication result”], the search string “2×4 lumber board” will most likely produce “building materials” as the top matching candidate. Combining semantic search with LLM-driven agents supports natural language alignment across the user-facing and backend data retrieval components of the solution. LLMs process natural language Input provided by the user while semantic search capabilities allow for data retrieval based on the natural language Input formulated by LLMs depending on the end user – agent communication cadence.&lt;/p&gt; 
&lt;h3&gt;The challenge: When semantic search alone isn’t enough&lt;/h3&gt; 
&lt;p&gt;Consider a real-world scenario: A customer is searching for a hotel property and wants to find “a luxury hotel with ocean views in Miami, Florida.” Semantic search excels at understanding concepts like “luxury” and “ocean views,” it may struggle with precise location matching. The search might return highly relevant luxury oceanfront properties based on semantic similarity, but these could be in California, the Caribbean, or anywhere else with ocean access, not specifically in Miami as requested. This limitation arises because semantic search prioritizes conceptual similarity over exact attribute matching. In cases where users need both semantic understanding (luxury, ocean views) and precise filtering (Miami, Florida), relying solely on semantic search produces suboptimal results. This is where hybrid search becomes essential. It combines the semantic understanding of natural language descriptions with the precision of text-based filtering on structured attributes like location, dates, or specific metadata. To address this, we introduce a hybrid search approach that performs both:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Semantic search &lt;/strong&gt;to understand natural language descriptions and find semantically similar content&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Text-based search&lt;/strong&gt; to facilitate precise matching on structured attributes like locations, dates, or identifiers&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;When a user provides a search phrase, an LLM first analyzes the query to identify specific attributes (such as location) and maps them to searchable values (for example, “Northern Michigan” → “MI”). These extracted attributes are then used as filters in conjunction with semantic similarity scoring, making sure that results are both conceptually relevant and precisely matched to the user’s requirements. The following tables provide a simplified view of the semantic search flow with clear text hotel descriptions provided for context:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Vector store data:&lt;/strong&gt;&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;hotel-1&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt; &lt;p&gt;&lt;strong&gt;Description&lt;/strong&gt;: The Artisan Loft hotel anchors the corner of Green and Randolph Streets in Big City’s bustling Southwest Loop, occupying a thoughtfully renovated 1920s brick warehouse that celebrates the neighborhood’s industrial heritage. Guests find themselves mere steps from the famed Restaurant Row, with acclaimed dining spots and trendy boutiques dotting the surrounding blocks.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Description Vector:&lt;/strong&gt; […]&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Location&lt;/strong&gt;: Big City, USA&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;hotel-2&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt; &lt;p&gt;&lt;strong&gt;Description&lt;/strong&gt;: Perched on a rugged cliff overlooking the dramatic coastline of Big Sur, The Cypress Haven emerges from the landscape as if it were carved from the earth itself. This intimate 42-room sanctuary seamlessly integrates into its surroundings with living roof gardens, floor-to-ceiling windows, and natural materials including local stone and reclaimed redwood. Each spacious suite features a private terrace suspended over the Pacific, where guests can spot migrating whales while soaking in Japanese cedar ofuro tubs.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Description Vector&lt;/strong&gt;: […]&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Location&lt;/strong&gt;: Beach City, USA&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;hotel-3&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt; &lt;p&gt;&lt;strong&gt;Description&lt;/strong&gt;: Nestled in a centuries-old maple forest just outside the Berkshires, Woodland Haven Lodge offers an intimate escape where luxury meets mindful simplicity. This converted 19th-century estate features 28 thoughtfully appointed rooms spread across the main house and four separate cottages, each with wraparound porches and floor-to-ceiling windows that frame the surrounding woodlands.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Description Vector:&lt;/strong&gt; […]&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Location&lt;/strong&gt;: Quiet City, USA&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;hotel-4&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt; &lt;p&gt;&lt;strong&gt;Description&lt;/strong&gt;: Nestled in the heart of Central City’s bustling downtown district, the Skyline Oasis hotel stands as a beacon of luxury and modernity. This 45-story glass and steel tower offers breathtaking panoramic views of the city’s iconic skyline and the nearby Central River. With 500 elegantly appointed rooms and suites, the Skyline Oasis caters to both business travelers and tourists seeking a premium urban experience. The hotel boasts a rooftop infinity pool, a Michelin-starred restaurant, and a state-of-the-art fitness center. Its prime location puts guests within walking distance of Central City’s major attractions, including the Museum of Modern Art, the Central City Opera House, and the vibrant Riverfront District.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Description Vector&lt;/strong&gt;: […]&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Location&lt;/strong&gt;: Central City, USA&lt;/p&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;table class="styled-table" style="height: 93px" border="1px" width="540" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Search Phrase&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Looking for a hotel by the ocean&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Search Results&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;hotel-2&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Search example:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Search phrase: &lt;/strong&gt;“Looking for a hotel by the ocean”&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Semantic search result: &lt;/strong&gt;hotel-2 (The Cypress Haven)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Hybrid search example:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Search phrase: &lt;/strong&gt;“Looking for a hotel with a nice restaurant in downtown Central City”&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Hybrid search &lt;/strong&gt;result: hotel-4 (best match considering both semantic relevance and precise location)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;For more details on hybrid search implementations, refer to the &lt;a href="https://aws.amazon.com/blogs/machine-learning/amazon-bedrock-knowledge-bases-now-supports-hybrid-search/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Knowledge Bases hybrid search blog post&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-127331 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-18738-image-1.png" alt="Process flow diagram showing natural language query conversion to hybrid search terms using an LLM, resulting in vector store search results." width="731" height="421"&gt;&lt;/p&gt; 
&lt;h3&gt;Introducing an agent-based solution&lt;/h3&gt; 
&lt;p&gt;Consider a hotel search scenario where users have diverse needs. One user might ask “find me a cozy hotel,” requiring semantic understanding of “cozy.” Another might request “find hotels in Miami,” needing precise location filtering. A third might want “a luxury beachfront hotel in Miami,” requiring both approaches simultaneously. Traditional RAG implementations with fixed workflows cannot adapt dynamically to these varying requirements. Our scenario demands custom search logic that can combine multiple data sources and dynamically adapt retrieval strategies based on query characteristics. An agent-based approach provides this flexibility. The LLM itself determines the optimal search strategy by analyzing each query and selecting the appropriate tools.&lt;/p&gt; 
&lt;h3&gt;Why agents?&lt;/h3&gt; 
&lt;p&gt;Agent-based systems offer superior adaptability because the LLM determines the sequence of actions needed to solve problems, enabling dynamic decision routing, intelligent tool selection, and quality control through self-evaluation. The following sections show how to implement a generative AI agentic assistant that uses both semantic and text-based search using Amazon Bedrock, Amazon Bedrock AgentCore, Strands Agents and Amazon OpenSearch.&lt;/p&gt; 
&lt;h3&gt;Architecture overview&lt;/h3&gt; 
&lt;p&gt;Figure 1 shows a modern, serverless architecture that you can use for an intelligent search assistant. It combines the foundation models in Amazon Bedrock, Amazon Bedrock AgentCore (for agent orchestration), and Amazon OpenSearch Serverless (for hybrid search capabilities).&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Client interaction layer&lt;/strong&gt;&lt;br&gt; Client applications interact with the system through Amazon API Gateway, which provides a secure, scalable entry point for user requests. When a user asks a question like “Find me a beachfront hotel in Northern Michigan,” the request flows through API Gateway to Amazon Bedrock AgentCore.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Agent orchestration with Amazon Bedrock AgentCore&lt;/strong&gt;&lt;br&gt; Amazon Bedrock AgentCore serves as the orchestration engine, managing the complete agent lifecycle and coordinating interactions between the user, the LLM, and available tools. AgentCore implements the agentic loop—a continuous cycle of reasoning, action, and observation—where the agent:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Analyzes&lt;/strong&gt; the user’s query using Bedrock’s foundation models&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Decides&lt;/strong&gt; which tools to invoke based on the query requirements&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Executes&lt;/strong&gt; the appropriate hybrid search tool with extracted parameters&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Evaluates&lt;/strong&gt; the results and determines if additional actions are needed&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Responds&lt;/strong&gt; to the user with synthesized information&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Throughout this process, Amazon Bedrock Guardrails enforce content safety and policy adherence, maintaining appropriate responses.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Hybrid search with OpenSearch Serverless&lt;/strong&gt;&lt;br&gt; The architecture integrates Amazon OpenSearch Serverless as the vector store and search engine. OpenSearch stores both vectorized embeddings (for semantic understanding) and structured text fields (for precise filtering). This approach supporting our hybrid search approach. When the agent invokes the hybrid search tool, OpenSearch executes queries that combine:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Semantic matching&lt;/strong&gt; using vector similarity for conceptual understanding&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Text-based filtering&lt;/strong&gt; for precise constraints like location or amenities&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Monitoring and security&lt;/strong&gt;&lt;br&gt; The architecture includes Amazon CloudWatch for monitoring system performance and usage patterns. AWS IAM manages access control and security policies across components.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Why this architecture?&lt;/strong&gt;&lt;br&gt; This serverless design provides several key advantages:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Low-latency responses&lt;/strong&gt; for real-time conversational interactions&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Auto-scaling&lt;/strong&gt; to handle varying workloads without manual intervention&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost-effectiveness&lt;/strong&gt; through pay-as-you-go pricing with no idle infrastructure&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Production-ready&lt;/strong&gt; with built-in monitoring, logging, and security features&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The combination of the AgentCore orchestration capabilities with hybrid search functionality of OpenSearch allows our assistant to dynamically adapt its search strategy based on user intent, something that rigid RAG pipelines cannot achieve.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-127330 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-18738-image-2.png" alt="AWS Cloud architecture diagram showing an agentic loop system using Amazon Bedrock, API Gateway, OpenSearch Serverless, and various AWS services for intelligent search processing." width="1205" height="691"&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 1&lt;/p&gt; 
&lt;p&gt;Figure Note: The code samples and architecture artifacts provided in this document are intended for demonstration and reference purposes only and are not production-ready.&lt;/p&gt; 
&lt;h3&gt;Implementation with Strands and Amazon Bedrock AgentCore&lt;/h3&gt; 
&lt;p&gt;To build our hybrid search agent, we use Strands, an open-source AI agent framework that simplifies developing LLM-powered applications with tool-calling capabilities. Strands allow us to define our hybrid search function as a “tool” that the agent can intelligently invoke based on user queries. For comprehensive details on Strands architecture and patterns, see the &lt;a href="https://strandsagents.com/" target="_blank" rel="noopener noreferrer"&gt;Strands documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Here’s how we define our hybrid search tool:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;from strands import tool

@tool
def hybrid_search(query_text: str, country: str = None, city: str = None):
    """
    Performs hybrid search combining semantic understanding with location filtering.
    The agent calls this when users provide both descriptive preferences and location.
    
    Args:
        query_text: Natural language description of what to search for
        country: Optional country filter
        city: Optional city filter
    """
    # Generate embeddings for semantic search
    vector = generate_embeddings(query_text)
    
    # Build hybrid query combining vector similarity and text filters
    query = {
        "bool": {
            "must": [
                {"knn": {"embedding_field": {"vector": vector, "k": 10}}}
            ],
            "filter": []
        }
    }
    
    # Add location filters if provided
    if country:
        query["bool"]["filter"].append({"term": {"country": country}})
    if city:
        query["bool"]["filter"].append({"term": {"city": city}})
    
    # Execute search in OpenSearch
    response = opensearch_client.search(index="hotels", body=query)
    
    return format_results(response)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Once we’ve defined our tools, we integrate them with Amazon Bedrock AgentCore for deployment and runtime orchestration. Amazon Bedrock AgentCore enables you to deploy and operate highly effective agents securely at scale using any framework and model. It provides purpose-built infrastructure to securely scale agents and controls to operate trustworthy agents.&lt;/p&gt; 
&lt;p&gt;For detailed information about integrating Strands with Amazon Bedrock AgentCore, see the &lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples/blob/main/01-tutorials/01-AgentCore-runtime/01-hosting-agent/01-strands-with-bedrock-model/runtime_with_strands_and_bedrock_models.ipynb" target="_blank" rel="noopener noreferrer"&gt;AgentCore-Strands integration tutorial.&lt;/a&gt;&lt;/p&gt; 
&lt;h3&gt;Hybrid search implementation deep dive&lt;/h3&gt; 
&lt;p&gt;A key differentiator of our AI assistant solution is its advanced hybrid search capability. While many RAG implementations rely solely on semantic search, our architecture extends beyond this. We’ve used the full potential of OpenSearch, enabling semantic, text-based, and hybrid searches, all within a single, efficient query. The following sections explore the technical details of this implementation.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;The two-pronged implementation&lt;/strong&gt;&lt;br&gt; Our hybrid search implementation is built on two fundamental components: optimized data storage and versatile query handling.&lt;/p&gt; 
&lt;h4&gt;1. Optimized data storage&lt;/h4&gt; 
&lt;p&gt;The approach to data storage is important for efficient hybrid search.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Data categorization&lt;/strong&gt;: We systematically categorize our data into two main types: 
  &lt;ul&gt; 
   &lt;li&gt;Semantic search candidates: This includes detailed descriptions, contexts, and explanations – content that benefits from understanding meaning beyond keywords.&lt;/li&gt; 
   &lt;li&gt;Text search candidates: This encompasses metadata, product identifiers, dates, and other structured fields.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Vector embedding&lt;/strong&gt;: For our semantic data, we use AWS Bedrock’s embedding models. These transform text into high-dimensional vectors that capture semantic meaning effectively.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Text data optimization&lt;/strong&gt;: Text data is stored in its original format, optimized for rapid traditional queries.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Unified index structure&lt;/strong&gt;: Our OpenSearch index is designed to accommodate both vector embeddings and text fields concurrently, enabling flexible querying capabilities.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h4&gt;2. Versatile search functionality&lt;/h4&gt; 
&lt;p&gt;Building on our optimized data storage, we’ve developed a comprehensive search function that our AI agent can utilize effectively:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Adaptive search types&lt;/strong&gt;: Our search function is designed to perform semantic, text, or hybrid searches as required by the agent.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Semantic search implementation&lt;/strong&gt;: For meaning-focused queries, we generate query embeddings using Amazon Bedrock and perform a k-NN (k-Nearest Neighbors) search in the vector space.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Text search capabilities&lt;/strong&gt;: When precise matching is necessary, we use OpenSearch’s robust text query functionalities, including exact and fuzzy matching options.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Hybrid search execution&lt;/strong&gt;: This is where we combine vector similarity with text matching in a unified query. Using OpenSearch’s bool query, we can adjust the balance between semantic and text relevance as needed.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Result integration&lt;/strong&gt;: Regardless of the search type, our system consolidates and ranks results based on overall relevance, combining semantic understanding with precise text matching.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Reference pseudo code for hybrid search implementation:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;def hybrid_search(query_text, country, city, search_type="hybrid"):
    """
    Hybrid search combining semantic and text-based search with location filtering
    """

   # 1. Generate embeddings for semantic search
    if search_type in ["semantic", "hybrid"]:
        vector = generate_embeddings(query_text)
    
    # 2. Build search query based on type
    if search_type == "semantic":
        query = build_semantic_query(vector)
    elif search_type == "text":
        query = build_text_query(country, city)
    else:  # hybrid search
        query = build_hybrid_query(vector, country, city)
    
    # 3. Execute search
    response = search_opensearch(query)
    
    # 4. Process and return results
    return format_results(response)

# Example usage:
results = hybrid_search(
    query_text="luxury hotel",
    country="USA",
    city="Miami"
)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;OpenSearch supports multiple query types including text-based search, vector search (knn), and hybrid approaches that combine both methods. For detailed information about available query types and their implementations, refer to the &lt;a href="https://docs.opensearch.org/latest/query-dsl/" target="_blank" rel="noopener noreferrer"&gt;OpenSearch query documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Significance of the hybrid approach&lt;/h3&gt; 
&lt;p&gt;The hybrid approach significantly enhances our AI assistant’s capabilities:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;It supports highly accurate information retrieval, considering both context and content.&lt;/li&gt; 
 &lt;li&gt;It adapts to various query types, maintaining consistent performance.&lt;/li&gt; 
 &lt;li&gt;It provides more relevant and comprehensive responses to user inquiries.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;In the domain of AI-powered search, our hybrid approach represents a significant advancement. It offers a level of flexibility and accuracy that substantially improves our assistant’s ability to retrieve and process information effectively.&lt;/p&gt; 
&lt;h3&gt;Real-life use cases&lt;/h3&gt; 
&lt;p&gt;Some of the use cases where hybrid search can be applicable include things like:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Real estate and property: Property search combining lifestyle preference understanding (“family-friendly”) with exact location and amenity filtering.&lt;/li&gt; 
 &lt;li&gt;Legal and professional services: Case law research combining conceptual legal similarity with precise jurisdiction and date filtering for comprehensive legal research.&lt;/li&gt; 
 &lt;li&gt;Healthcare and medical: Care teams ask “patients with chronic conditions requiring similar treatment protocols as John Doe” – combines semantic understanding of treatment complexity with exact medical record matching.&lt;/li&gt; 
 &lt;li&gt;Media and entertainment: Content discovery system combining exact genre filtering with semantic plot understanding&lt;/li&gt; 
 &lt;li&gt;E-commerce and retail: Natural language product discovery with filter precision – “comfortable winter shoes” finds semantic matches while applying exact size or price or brand filters.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;These use cases demonstrate how hybrid search bridges the gap between natural language understanding and precise data filtering, enabling more intuitive and accurate information retrieval.&lt;/p&gt; 
&lt;h3&gt;Conclusion&lt;/h3&gt; 
&lt;p&gt;The integration of Amazon Bedrock, Amazon Bedrock AgentCore, Strands Agents, and Amazon OpenSearch Serverless represents a significant advancement in building intelligent search applications that combine the power of LLMs with sophisticated information retrieval techniques. This architecture blends semantic, text-based, and hybrid search capabilities to deliver more accurate and contextually relevant results than traditional approaches. By implementing an agent-based system using Amazon Bedrock AgentCore, state management and Strands tool abstractions, developers can create dynamic, conversational AI assistants that intelligently determine the most appropriate search strategies based on user queries. The hybrid search approach, which combines vector similarity with precise text matching, offers flexibility and accuracy in information retrieval, enabling AI systems to better understand user intent and deliver more comprehensive responses. As organizations continue to build AI solutions, this architecture provides a scalable, secure foundation that uses the full potential of AWS services while maintaining the adaptability needed for complex, real-world applications.&lt;/p&gt; 
&lt;hr&gt; 
&lt;h2&gt;&lt;strong&gt;About the authors&lt;/strong&gt;&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127334 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-18738-image-3.png" alt="" width="100" height="104"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Arpit Gupta&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Arpit Gupta&lt;/strong&gt; is a Data Architect at AWS Professional Services with a focus on data analytics. He specializes in developing data lakes, analytics solutions, and generative AI applications in the cloud, helping organizations transform their data into actionable business insights. His passions extend from the digital to the physical realm – from tennis courts to the kitchen and exploring new destinations with family.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127333 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-18738-image-4.jpeg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ashish Bhagam&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Ashish Bhagam &lt;/strong&gt;is a Data Architect with AWS Professional Services Analytics Practice. He helps customers design and implement scalable data solutions and modernize their data architectures. Outside of work, he enjoys watching cricket matches and spending quality time with his family.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127332 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/30/ml-18738-image-5.jpeg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ross Gabay&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Ross Gabay&lt;/strong&gt; was a Principal Data Architect in AWS Professional Services with a focus on Graph Databases and GenAI data analytics. He specializes in developing Graph DB – centric and GenAI solutions.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>From isolated alerts to contextual intelligence: Agentic maritime anomaly analysis with generative AI</title>
		<link>https://aws.amazon.com/blogs/machine-learning/from-isolated-alerts-to-contextual-intelligence-agentic-maritime-anomaly-analysis-with-generative-ai/</link>
					
		
		<dc:creator><![CDATA[Nikita Kozodoi]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 17:48:54 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[AWS Step Functions]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<guid isPermaLink="false">7216c8112ee88a842ea7b4265928b7017ae86260</guid>

					<description>This blog post demonstrates how Windward helps enhance and accelerate alert investigation processes by combining geospatial intelligence with generative AI, enabling analysts to focus on decision-making rather than data collection.</description>
										<content:encoded>&lt;p&gt;&lt;em&gt;This post is co-written with Arad Ben Haim and Hannah Danan Moise from Windward.&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://windward.ai/" target="_blank" rel="noopener noreferrer"&gt;Windward&lt;/a&gt; is a leading Maritime AI&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; company, delivering mission-grade, multi-source intelligence for maritime-based operations. By fusing Automatic Identification System (AIS) data, remote sensing signals, proprietary AI models, and generative AI, Windward provides a 360° view of global maritime activity so defense and intelligence agencies, law enforcement, and commercial leaders can anticipate threats, protect critical assets, and stay in control at sea.&lt;/p&gt; 
&lt;p&gt;This blog post demonstrates how Windward helps enhance and accelerate alert investigation processes by combining geospatial intelligence with generative AI, enabling analysts to focus on decision-making rather than data collection. Prior to using Windward, maritime analysts spent hours manually gathering and correlating complex data to understand vessel behavior anomalies: unusual activity spikes, unexpected movements, deviations from known patterns. It required significant time and deep domain expertise. Windward’s Maritime AI&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; automates this process, surfacing context and implications so analysts and companies can make informed decisions about maritime risks and opportunities with speed and precision.&lt;/p&gt; 
&lt;h2&gt;Challenge&lt;/h2&gt; 
&lt;p&gt;Maritime analysts rely on Windward’s system to stay ahead of complex global threats. As part of Windward’s ongoing commitment to facilitate a “mission-ready” user experience, the company continuously evolves how users move from detection to decision-making. While &lt;a href="https://windward.ai/solutions/early-detection/" target="_blank" rel="noopener noreferrer"&gt;Windward Early Detection&lt;/a&gt; successfully identifies suspicious patterns, Windward further accelerated situational awareness by making the investigative process more fluid and automated.&lt;/p&gt; 
&lt;p&gt;To optimize the analytical workflow, Windward sought to enhance the correlation of external context through three key strategic improvements:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Unified Workflow:&lt;/strong&gt; Minimizing the need to consult external data sources, facilitating a continuous and focused analytical environment.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Expertise Optimization:&lt;/strong&gt; Automating the collection of weather, news, and alert data to allow domain experts to dedicate more time to strategic interpretation.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Comprehensive Coverage:&lt;/strong&gt; Streamlining the synthesis of information to enable more rapid and in-depth investigation of multiple alerts simultaneously.&lt;/p&gt; 
&lt;p&gt;As a core component of MAI Expert&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt;, the first generative AI maritime agent, Windward partnered with the AWS Generative AI Innovation Center to deliver a solution that automatically contextualizes maritime anomalies. This collaboration helped enhance the user experience by correlating alerts with relevant public and proprietary data, integrating these findings seamlessly with Windward’s internal models, and uses generative AI to help deliver comprehensive, actionable risk assessments.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;In collaboration with AWS, Windward developed a multi-step AI-powered solution that automatically fetches relevant data from a variety of internal and external data sources and uses this information to generate a textual description that contextualizes maritime anomaly events.Figure 1 depicts the end-to-end architecture of the solution deployed to AWS.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-127029 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/ML-18948-image-1.png" alt="Architecture diagram for windward aws blog" width="851" height="547"&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 1. Solution architecture&lt;/p&gt; 
&lt;p&gt;Given an anomaly identified in the&lt;a href="https://windward.ai/solutions/early-detection/" target="_blank" rel="noopener noreferrer"&gt; Windward Early Detection&lt;/a&gt; system, the solution extracts relevant metadata from the anomaly event using Windward’s internal database. The metadata includes the anomaly timestamp, region coordinates, anomaly type, vessel class, and other relevant information.&lt;/p&gt; 
&lt;p&gt;Next, the anomaly metadata is passed to the agentic analysis system powered by large language models (LLMs) on Amazon Bedrock. The multi-step anomaly analysis pipeline is orchestrated using &lt;a href="https://aws.amazon.com/step-functions/" target="_blank" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt;. In the first step, the system queries multiple, diverse external data sources to provide relevant background on the anomaly, which is a key part of creating new value for our customers. These sources include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Real-time news feed:&lt;/strong&gt; Alerts and event signals discovered from public data are fetched and filtered based on the maritime anomaly’s time and location.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Intelligent web search:&lt;/strong&gt; The system uses large language models to generate precise search queries, retrieving real-time web search results that provide up-to-date context for the anomaly.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Weather data:&lt;/strong&gt; An external API is used to retrieve relevant weather data, such as temperature, wind speed, and precipitation, for the anomaly’s location and time.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Each data source is queried using a separate&lt;a href="https://aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt; AWS Lambda function&lt;/a&gt;. After retrieving the data from the three sources, the pipeline moves to the second step. In the second step, a separate LLM—powered by Anthropic’s Claude through Amazon Bedrock—examines the data items and decides whether there is a need to fetch additional web search results. The LLM is instructed to make the decision after cross-checking the anomaly data against the retrieved data items and judging whether the data retrieved so far is sufficient to explain the anomaly or if some aspects related to the event are missing. The LLM either generates a new search query or a command to move to the next step of the pipeline. The Lambda function parses the LLM output and optionally triggers the web search function again to retrieve additional news that might provide important context about the anomaly, appending it to the previous search results. If there are no new search queries, the Step Function proceeds to the next Lambda function in the pipeline.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-127030 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/ML-18948-image-2.png" alt="Diagram of flow through self-reflection" width="847" height="75"&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 2. Self-reflection logic&lt;/p&gt; 
&lt;p&gt;After running self-reflection and additional data retrieval, the system performs two filtering and ranking steps to remove news items that are not related to the considered anomaly. First, it uses a re-ranking AI model, Amazon Rerank, which sorts the data items according to their relevance to the anomaly. This step is geared toward maintaining high recall, focusing on removing the most irrelevant data items to reduce the set of candidate items to process on the next stage. Second, each of the top-ranked items is further scored by the LLM across multiple dimensions, including time, location, matching vessel type, and others. The system assigns relevance scores between 0 and 100 and only keeps data items with a relevance score above a threshold determined by the solution developers. This step is more precise and is geared toward high precision, making sure only the most relevant news items are kept. The top-ranked data and news items are passed to the next step of the solution pipeline.Finally, the pipeline uses another LLM that uses the top-ranked data items to generate a contextualized report on the anomaly, summarizing its potential causes, risks, and implications. The concise report is written for Windward’s customers and directly cites the data sources used, which allows users to verify the information and learn additional details by following the links. Figure 3 provides an example of what the generated report looks like for one of the vessel activity anomalies.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-127031 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/ML-18948-image-3.png" alt="Maritime intelligence product" width="833" height="552"&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 3. Example Anomaly Report&lt;/p&gt; 
&lt;h2&gt;Evaluation&lt;/h2&gt; 
&lt;p&gt;The end-to-end system is evaluated on a set of existing maritime anomalies that occurred in the past. The evaluation consists of several stages. First, the summaries are automatically evaluated using an LLM-as-a-judge approach, a method that included human-alignment work for the LLM judges. The judge uses a set of six predefined criteria, including credibility, data quality, source diversity, coherence, and ethical bias. The judge evaluates each dimension on a scale between 1 and 100 and assigns the scores to each report. Figure 4 depicts example scores assigned to one of the generated reports by the LLM judge.Second, we calculate several deterministic metrics on the report quality. This includes the length of the report in characters, as well as the number of data sources explicitly cited in the text. These metrics help to judge the size and the credibility of the generated explanation.Finally, the selected summaries are also evaluated by human experts, who cross-check the generated summaries and retrieved data sources against their own search results and domain understanding.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-127032 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/ML-18948-image-4.png" alt="Explaination of LLM Judge outputs" width="355" height="434"&gt;&lt;/p&gt; 
&lt;p style="text-align: center"&gt;Figure 4. Example LLM-as-a-judge scores&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The initial agentic solution presented in this blog marked an important milestone in the development of Windward’s MAI Expert&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt;. Building on Windward’s already powerful system, this enhancement helped accelerate maritime alert investigation and enabled analysts to focus even more on decision-making rather than data collection.This approach combined geospatial intelligence with generative AI to streamline what was previously a manual, time-intensive process. High-quality anomaly summaries generated by the system helped analysts better understand the context of maritime events—unusual activity spikes, unexpected movements, deviations from known patterns—and make informed decisions about corresponding risks and opportunities.These capabilities expanded Windward’s value proposition across user segments. For existing users with deep maritime expertise, they further helped streamline workflows and reduce the time needed to derive relevant context. For users with limited maritime expertise, they opened new possibilities by surfacing critical insights without requiring manual correlation of complex datasets.&lt;/p&gt; 
&lt;hr&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127047 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/nikita.jpg" alt="" width="100" height="150"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Nikita Kozodoi&lt;/h3&gt; 
  &lt;p&gt;Nikita Kozodoi, PhD is a Senior Applied Scientist at the AWS Generative AI Innovation Center working on the frontier of AI research and business. Nikita builds custom generative AI solutions to solve real-world business problems for AWS customers across industries and holds PhD in Machine Learning.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127046 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/jack.png" alt="" width="100" height="135"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Jack Butler&lt;/h3&gt; 
  &lt;p&gt;Jack Butler is currently an Applied Scientist at Amazon Web Services (AWS), leading innovative projects at the AWS Generative AI Innovation Centre with a strong background in language modeling and applied AI research across a wide variety of enterprise and startup customers.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127035 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/ML-18948-image-7.png" alt="Headshot of Marion, Principal AI Strategist at AWS, specializing in enterprise AI implementation" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Marion Eigner&lt;/h3&gt; 
  &lt;p&gt;Marion is Principal AI Strategist at AWS with a decade of experience taking enterprise AI from idea to production across Financial Services, Healthcare, Manufacturing, Media &amp;amp; Entertainment, and Public Sector with both Fortune 500s and fast-growing startups.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127045 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/Adobe-Express-file-1.jpg" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Hannah Danan Moise&lt;/h3&gt; 
  &lt;p&gt;Hannah Danan Moise is a Data Science Team Leader with nearly a decade of experience at the frontier of applied AI and maritime intelligence. Having spent eight years architecting and scaling Windward’s core predictive systems, Hannah specializes in transforming high-velocity, multi-source behavioral data into actionable strategic insights. Her expertise lies in deploying advanced machine learning frameworks and agentic AI to solve intricate real-world challenges, consistently driving measurable business impact for global industries.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-full wp-image-127049 alignleft" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/24/Adobe-Express-file-1-1.jpg" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Arad Ben Haim&lt;/h3&gt; 
  &lt;p&gt;Arad Ben Haim is a Senior Data Scientist at Windward, working at the frontier of applied AI and maritime intelligence. Arad designs and deploys advanced machine learning and predictive systems that transform large-scale behavioral data into actionable insights, solving complex real-world problems and driving measurable business impact for global customers across industries.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Connecting MCP servers to Amazon Bedrock AgentCore Gateway using Authorization Code flow</title>
		<link>https://aws.amazon.com/blogs/machine-learning/connecting-mcp-servers-to-amazon-bedrock-agentcore-gateway-using-authorization-code-flow/</link>
					
		
		<dc:creator><![CDATA[Arko Dutta]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 14:41:46 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<guid isPermaLink="false">8c4b3e236eb8bf594f5e9442a360b6265fc50004</guid>

					<description>Amazon Bedrock AgentCore Gateway provides a centralized layer for managing how AI agents connect to tools and MCP servers across your organization. In this post, we walk through how to configure AgentCore Gateway to connect to an OAuth-protected MCP server using the Authorization Code flow.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway.html" target="_blank" rel="noopener"&gt;Amazon Bedrock AgentCore Gateway&lt;/a&gt; provides a centralized layer for managing how AI agents connect to tools and MCP servers across your organization. It consolidates authentication, observability, and policy enforcement into a single endpoint, removing the need to configure and secure each MCP server connection individually.&lt;/p&gt; 
&lt;p&gt;In this post, we walk through how to configure AgentCore Gateway to connect to an OAuth-protected MCP server using the Authorization Code flow.&lt;/p&gt; 
&lt;h2&gt;Using AgentCore Gateway as an MCP server endpoint&lt;/h2&gt; 
&lt;p&gt;As organizations scale their AI agent deployments, the number of MCP servers that each team relies on grows quickly. Developers are adopting Amazon Bedrock AgentCore Gateway as a single endpoint for accessing multiple MCP servers. Instead of configuring each MCP server individually per IDE, teams point to one Gateway URL for consistent access to their full MCP toolset across tool.&lt;/p&gt; 
&lt;p&gt;This pattern is accelerating as teams move beyond custom MCP servers and adopt production-grade third-party ones, like those from &lt;a href="https://docs.aws.amazon.com/aws-mcp/latest/userguide/what-is-mcp-server.html" target="_blank" rel="noopener"&gt;AWS&lt;/a&gt;, &lt;a href="https://github.com/github/github-mcp-server" target="_blank" rel="noopener"&gt;GitHub&lt;/a&gt;, &lt;a href="https://developer.salesforce.com/blogs/2025/06/introducing-mcp-support-across-salesforce" target="_blank" rel="noopener"&gt;Salesforce&lt;/a&gt;, and &lt;a href="https://docs.databricks.com/aws/en/generative-ai/mcp/" target="_blank" rel="noopener"&gt;Databricks&lt;/a&gt;. Many of these MCP servers are protected by their primary identity provider through federation, while others are secured by their own authorization servers. As the number of MCP servers per organization grows, managing connections, authentication, and routing at the IDE level becomes unsustainable. AgentCore Gateway centralizes this complexity, giving teams a single control plane for MCP access while giving developers a frictionless experience.&lt;/p&gt; 
&lt;p&gt;Many enterprise MCP servers require OAuth 2.0 authorization, where the agent must authenticate on behalf of a user before invoking tools. AgentCore Gateway now supports the OAuth 2.0 Authorization Code flow through &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/identity.html" target="_blank" rel="noopener"&gt;Amazon Bedrock AgentCore Identity&lt;/a&gt;. With this, your agents can securely access protected MCP servers without embedding credentials in application code or managing the token lifecycle manually.&lt;/p&gt; 
&lt;h2&gt;Key terms&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;AgentCore Gateway user&lt;/strong&gt; – The end user who consumes the tools in Amazon Bedrock AgentCore Gateway with MCP clients. Gateway users don’t manage the AgentCore Gateway itself. They use the single AgentCore Gateway URL to access the tools available to them.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Admin user&lt;/strong&gt; – The user that manages and maintains Amazon Bedrock AgentCore Gateway. This user is responsible for attaching MCP servers, tools, or APIs to the AgentCore Gateway so that AgentCore gateway users can consume them.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;MCP server&lt;/strong&gt; – In this post, we assume that the MCP server is protected by an OAuth 2.0 Authorization Code flow, which requires user interaction to complete authentication. This is distinct from machine-to-machine authentication methods such as Client Credentials or Token Exchange, where no user intervention is required. The patterns described in this post apply specifically to MCP servers that require user-delegated authorization.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;How Authorization Code flow works&lt;/h2&gt; 
&lt;p&gt;To provide support for the Authorization Code Grant type, we provide two ways for target creations.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Implicit sync during MCP Server target creation&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p style="padding-left: 40px"&gt;In this method, the admin user completes the authorization code flow during &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore-control/latest/APIReference/API_CreateGatewayTarget.html" target="_blank" rel="noopener"&gt;CreateGatewayTarget&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore-control/latest/APIReference/API_UpdateGatewayTarget.html" target="_blank" rel="noopener"&gt;UpdateGatewayTarget&lt;/a&gt;, or &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore-control/latest/APIReference/API_SynchronizeGatewayTargets.html" target="_blank" rel="noopener"&gt;SynchronizeGatewayTargets &lt;/a&gt;operations. This allows AgentCore Gateway to discover and cache the MCP server’s tools upfront.&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;&lt;strong&gt;Provide schema upfront during MCP Server targets creation&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p style="padding-left: 40px"&gt;With this method, admin users provide the tool schema directly during &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore-control/latest/APIReference/API_CreateGatewayTarget.html" target="_blank" rel="noopener"&gt;CreateGatewayTarget &lt;/a&gt;or &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore-control/latest/APIReference/API_UpdateGatewayTarget.html" target="_blank" rel="noopener"&gt;UpdateGatewayTarget &lt;/a&gt;operations, rather than AgentCore Gateway fetching them dynamically from the MCP server. AgentCore Gateway parses the provided schema and caches the tool definitions. This removes the need for the admin user to complete the authorization code flow during target creation or update. This is the recommended approach when human intervention isn’t possible during create/update operations. This method is beneficial when you don’t want to expose all the tools provided by the MCP server target.&lt;/p&gt; 
&lt;p style="padding-left: 40px"&gt;&lt;strong&gt;Note&lt;/strong&gt;: Because tool schemas are provided upfront with this method, the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore-control/latest/APIReference/API_SynchronizeGatewayTargets.html" target="_blank" rel="noopener"&gt;SynchronizeGatewayTargets&lt;/a&gt; operation isn’t supported. You can switch a target between Method 1 and Method 2 by updating the target configuration.&lt;/p&gt; 
&lt;p&gt;This means that AgentCore Gateway users can call list/tools without being prompted to authenticate with the MCP server authentication server, because this fetches the cached tools. The authorization code flow is only triggered when a Gateway user invokes a tool on that MCP server. This is particularly beneficial when multiple MCP servers are attached to a single Gateway. Users can browse the full tool catalog (cached tools) without authenticating to every MCP server and only complete the flow for the specific server whose tool they invoke.&lt;/p&gt; 
&lt;h3&gt;URL Session Binding&lt;/h3&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/oauth2-authorization-url-session-binding.html" target="_blank" rel="noopener"&gt;URL session binding&lt;/a&gt; verifies that the user who initiated the OAuth authorization request is the same user who granted consent. When AgentCore Identity generates an authorization URL, it also returns a session-URI. After the user completes consent, the browser redirects back to a callback URL with the session-URI. The application is then responsible for calling the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/APIReference/API_CompleteResourceTokenAuth.html" target="_blank" rel="noopener"&gt;CompleteResourceTokenAuth&lt;/a&gt; API, presenting both the user’s identity and the session-URI. AgentCore Identity validates that the user who started the flow is the same user who completed it before exchanging the authorization code for an access token. This helps avoid a scenario where a user accidentally shares the authorization URL, and someone else completes the consent, which would grant access tokens to the wrong party. The authorization URL and session URI are only valid for 10 minutes, further limiting the window for misuse. Session binding applies during admin target creation (implicit sync) and during tool invocation.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;In this post, we show how to attach the GitHub MCP server to Amazon Bedrock AgentCore Gateway using Method 1 (admin-initiated sync during target creation) and Method 2 (providing the tool schema upfront during target creation). The accompanying code is available in &lt;a href="https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/02-AgentCore-gateway/05-mcp-server-as-a-target/03-authorization-code-flow/" target="_blank" rel="noopener"&gt;this &lt;/a&gt;repository.&lt;/p&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;You must follow the following prerequisites along with this post.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;GitHub &lt;a href="https://docs.github.com/en/apps/oauth-apps/using-oauth-apps" target="_blank" rel="noopener noreferrer"&gt;OAuth Apps&lt;/a&gt; setup&lt;/strong&gt; 
  &lt;ul&gt; 
   &lt;li&gt;Go to &lt;a href="https://github.com/settings/apps" target="_blank" rel="noopener noreferrer"&gt;https://github.com/settings/apps&lt;/a&gt; → New GitHub App&lt;br&gt; &lt;img loading="lazy" class="alignnone wp-image-128161 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/09/Screenshot-2026-04-08-at-23.25.57.png" alt="" width="528" height="210"&gt;&lt;/li&gt; 
   &lt;li&gt;Fill in details: 
    &lt;ol type="a"&gt; 
     &lt;li&gt;&lt;strong&gt;GitHub App name&lt;/strong&gt;: AgentCore Gateway GitHub MCP&lt;/li&gt; 
     &lt;li&gt;&lt;strong&gt;Homepage URL &lt;/strong&gt;(&lt;em&gt;The full URL to your GitHub App’s website&lt;/em&gt;): The Homepage URL appears as a clickable link when user see your OAuth app, letting them learn more about your app. It helps users verify the legitimacy of the app requesting access to their GitHub account.&lt;/li&gt; 
     &lt;li&gt;&lt;strong&gt;Authorization callback URL&lt;/strong&gt;: The Authorization callback URL (redirect URI) is the URL GitHub redirects the user to after they authorize (or deny) your OAuth app. For now, let’s put &lt;code&gt;https://example.com/auth&lt;/code&gt;, we will come back and change this value.&lt;/li&gt; 
     &lt;li&gt;&lt;strong&gt;Advanced Settings: &lt;/strong&gt;Here we go over the recommended defaults. However, please ensure to follow security best practices based on your organizations polices. 
      &lt;ol type="i"&gt; 
       &lt;li&gt;&lt;strong&gt;Expire user authorization tokens:&lt;/strong&gt; Disable – If enabled, this will allow AgentCore Identity to automatically refresh tokens for the user.&lt;/li&gt; 
       &lt;li&gt;&lt;strong&gt;Request user authorization (OAuth) during installation:&lt;/strong&gt; Disable.&lt;/li&gt; 
       &lt;li&gt;&lt;strong&gt;Device Flow: &lt;/strong&gt;Disable –&amp;nbsp;Allows authorization on devices that don’t have a browser (for example, CLI tools, smart TVs, CI environments).&lt;/li&gt; 
       &lt;li&gt;&lt;strong&gt;Webhook: &lt;/strong&gt;Disable.&lt;/li&gt; 
       &lt;li&gt;&lt;strong&gt;User permissions:&lt;/strong&gt; &amp;nbsp;Use case dependent, keep it default for now &lt;strong&gt;– &lt;/strong&gt;These are granted when the user goes through the OAuth authorization flow. Only request what you need, users see these permissions on the consent screen and excessive permissions reduce trust.&lt;/li&gt; 
      &lt;/ol&gt; &lt;/li&gt; 
    &lt;/ol&gt; &lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;ul&gt; 
 &lt;li style="list-style-type: none"&gt; 
  &lt;ul&gt; 
   &lt;li style="list-style-type: none"&gt;&lt;/li&gt; 
   &lt;li&gt;Choose &lt;strong&gt;Create GitHub App&lt;/strong&gt;.&lt;/li&gt; 
   &lt;li&gt;Make sure to note down the app &lt;strong&gt;Client ID&lt;/strong&gt; (different to the App ID).&lt;/li&gt; 
   &lt;li&gt;Under your Oauth app general settings, choose &lt;strong&gt;Generate a new client secret&lt;/strong&gt;. Make sure to note down the client secret as GitHub only shows it once upon creation.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;&lt;strong&gt;IAM permissions:&lt;/strong&gt; You need appropriate IAM permissions to run the code from this blog post. These are the minimum &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-permissions.html" target="_blank" rel="noopener noreferrer"&gt;IAM permissions&lt;/a&gt; required.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Code repository:&lt;/strong&gt; First clone the &lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples.git" target="_blank" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;, and then open &lt;code&gt;github-mcp-server.ipynb&lt;/code&gt;. We recommend following the console instructions on this blog post to understand the concepts and then look at the code walkthrough. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-bash"&gt;git clone https://github.com/awslabs/amazon-bedrock-agentcore-samples.git

cd 01-tutorials/02-AgentCore-gateway/05-mcp-server-as-a-target/03-authorization-code-flow&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;GitHub credential provider:&lt;/strong&gt; In this step we will setup &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/resource-providers.html" target="_blank" rel="noopener noreferrer"&gt;Agentcore Identity Credential Provider&lt;/a&gt;. On the Amazon Bedrock AgentCore console, go to AgentCore Identity and create an OAuth client.&lt;br&gt; &lt;img loading="lazy" class="alignnone size-full wp-image-127594" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-2.png" alt="" width="3456" height="1400"&gt;&lt;p&gt;&lt;/p&gt; 
  &lt;ol&gt; 
   &lt;li&gt;Provide a name for the OAuth Client, choose the &lt;strong&gt;included GitHub provider&lt;/strong&gt;, and fill in the GitHub OAuth App client ID and client secret.&lt;br&gt; &lt;img loading="lazy" class="alignnone size-full wp-image-127595" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-3.png" alt="" width="1157" height="587"&gt;&lt;/li&gt; 
   &lt;li&gt;Copy the AgentCore Identity OAuth client callback URL, and make sure to go back to &lt;a href="https://github.com/settings/apps" target="_blank" rel="noopener noreferrer"&gt;GitHub OAuth provider&lt;/a&gt; you created and update the Authorization callback URL.&lt;br&gt; &lt;img loading="lazy" class="alignnone size-full wp-image-127596" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-4.png" alt="" width="1459" height="562"&gt;&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Implicit sync during MCP Server target creation&lt;/h3&gt; 
&lt;p&gt;In this section, we will introduce how implicit sync during MCP Server target creation works. Make sure that the AgentCore Gateway execution role has &lt;code&gt;GetWorkloadAccessTokenForUserId&lt;/code&gt; and &lt;code&gt;CompleteResourceTokenAuth&lt;/code&gt; permissions. First, let’s start by understanding the flow.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127597" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-5.png" alt="" width="1612" height="917"&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;The admin user calls &lt;code&gt;CreateGatewayTarget&lt;/code&gt;, providing the MCP server endpoint, the AgentCore Identity Credential Provider, and return URL. This tells AgentCore Gateway which MCP server to connect to and which credential provider to use for obtaining OAuth 2.0 tokens. This same flow also applies to &lt;code&gt;UpdateGatewayTarget&lt;/code&gt; and &lt;code&gt;SynchronizeGatewayTargets&lt;/code&gt; operations.&lt;/li&gt; 
 &lt;li&gt;AgentCore Gateway &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/APIReference/API_GetWorkloadAccessTokenForUserId.html" target="_blank" rel="noopener"&gt;requests&lt;/a&gt; a workload access token from the AgentCore Identity Credential Provider, passing the AgentCore Gateway workload identity and a user ID in the format &lt;code&gt;{gatewayId}{targetId}{uuid}&lt;/code&gt;. This workload access token identifies the AgentCore Gateway as an authorized caller for subsequent credential operations.&lt;/li&gt; 
 &lt;li&gt;Using the workload access token, AgentCore Gateway &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/APIReference/API_GetResourceOauth2Token.html" target="_blank" rel="noopener"&gt;requests&lt;/a&gt; an OAuth 2.0 access token from the AgentCore Identity Credential Provider. This provides the admin user with an authorization URL and a session-URI. At this stage, the target is in &lt;strong&gt;Needs Authorization&lt;/strong&gt; status.&lt;/li&gt; 
 &lt;li&gt;The admin opens the authorization URL in their browser, signs in, and grants the requested permissions to the AgentCore Gateway.&lt;/li&gt; 
 &lt;li&gt;After the admin grants consent, the OAuth 2.0 authorization server sends an authorization code to the AgentCore Identity Credential Provider’s registered callback endpoint.&lt;/li&gt; 
 &lt;li&gt;The credential provider redirects the admin browser to the return URL, with the session URI. The admin application calls &lt;code&gt;CompleteResourceTokenAuth&lt;/code&gt;, presenting the user id and the session-URI returned in step 2. The credential provider validates that the user who initiated the authorization flow (step 3) is the same user who completed consent. This revents token hijacking if the authorization URL was accidentally shared. If the flow was initiated from the AWS Console, this step is handled automatically. If initiated from another context, the admin is responsible for calling the &lt;code&gt;CompleteResourceTokenAuth&lt;/code&gt; API directly.&lt;/li&gt; 
 &lt;li&gt;After successful session binding validation, the credential provider exchanges the authorization code with the OAuth 2.0 authorization server for an OAuth 2.0 access token.&lt;/li&gt; 
 &lt;li&gt;This access token is used to list the tools on MCP server target; returned tool definitions from the target are cached at AgentCore Gateway.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Note that a subsequent update or synchronization to the target won’t reuse the access token. Instead, AgentCore Identity will get a new access token from Authorization Server.&lt;/p&gt; 
&lt;h3&gt;Target creation&lt;/h3&gt; 
&lt;p&gt;First, let’s start by creating an Amazon Bedrock AgentCore Gateway and Target and see how implicit sync works during MCP Server target creation.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127598" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-6.png" alt="" width="1925" height="770"&gt;&lt;/p&gt; 
&lt;p&gt;When creating an AgentCore Gateway, you must use MCP version &lt;code&gt;2025-11-25&lt;/code&gt; or later. Keep everything else default and select &lt;strong&gt;MCP server target&lt;/strong&gt;. Provide the MCP server endpoint, and for OAuth client, select the AgentCore Identity OAuth Client created during the prerequisites section.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127599" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-7.png" alt="" width="1980" height="1484"&gt;&lt;/p&gt; 
&lt;p&gt;Under additional configuration, make sure to select &lt;strong&gt;Authorization code grant (3LO).&lt;/strong&gt; The Authorization code grant (3LO) option will be disabled if the AgentCore Gateway wasn’t created with MCP version &lt;code&gt;2025-11-25&lt;/code&gt; or later. Here, you must also provide the return URL. During the session binding process after the authorization code flow, users will be returned to this URL, both during implicit sync and tool invocation. You can override the return URL value during invocation. For more information, see &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-using-auth-ex-3lo.html" target="_blank" rel="noopener"&gt;Example: Authorization code grant&lt;/a&gt; in the Amazon Bedrock AgentCore Developer Guide. You can provide scopes and additional parameters such as audience when configuring the target. These parameters are included in the request when AgentCore Identity reaches out to the authorization server’s &lt;code&gt;/authorize&lt;/code&gt; endpoint.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127600" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-8.png" alt="" width="2210" height="1514"&gt;&lt;/p&gt; 
&lt;p&gt;After creating the target, the target will be in &lt;strong&gt;Needs authorization &lt;/strong&gt;status. At this point, admin users are required to complete the authorization request, either directly from the AWS console or by navigating to the authorization URL directly. It’s important to note that if the flow is completed from the AWS console, session binding is handled automatically. If initiated from another context, the admin is responsible for calling the &lt;code&gt;CompleteResourceTokenAuth&lt;/code&gt; API directly. For more information, see the code sample in &lt;a href="https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/02-AgentCore-gateway/05-mcp-server-as-a-target/03-authorization-code-flow/" target="_blank" rel="noopener"&gt;GitHub&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127601" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-9.png" alt="" width="3206" height="428"&gt;&lt;/p&gt; 
&lt;p&gt;This is how the consent flow looks like when initiated from the AWS Console.&lt;/p&gt; 
&lt;div style="width: 640px;" class="wp-video"&gt;
 &lt;video class="wp-video-shortcode" id="video-127589-1" width="640" height="360" preload="metadata" controls="controls"&gt;
  &lt;source type="video/mp4" src="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/FLASH-3089/Consent+Screen.mp4?_=1"&gt;
 &lt;/video&gt;
&lt;/div&gt; 
&lt;p&gt;After a few seconds you will see the target is in &lt;strong&gt;Ready&lt;/strong&gt; status with authorization status &lt;strong&gt;Authorized&lt;/strong&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127602" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-10.png" alt="" width="3128" height="464"&gt;&lt;/p&gt; 
&lt;h3&gt;Provide schema upfront during MCP Server targets creation&lt;/h3&gt; 
&lt;p&gt;In this section, we introduce how to provide the schema upfront during MCP Server targets creation. This is the recommended approach when human intervention isn’t possible during create/update operations.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-127603 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-11.png" alt="" width="2706" height="1086"&gt;&lt;/p&gt; 
&lt;p&gt;In this step, we create an Amazon Bedrock AgentCore Gateway and Target and provide schema upfront during the MCP Server targets creation. The process remains the same. During target creation selection, select &lt;strong&gt;Use pre-defined list tools&lt;/strong&gt; and paste the GitHub tools definitions. You can copy the tool definition from the &lt;a href="https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/02-AgentCore-gateway/05-mcp-server-as-a-target/03-authorization-code-flow/" target="_blank" rel="noopener"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127604" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-12.png" alt="" width="1978" height="1560"&gt;&lt;/p&gt; 
&lt;p&gt;The target in this case becomes immediately ready, with authorization status &lt;strong&gt;No authorization required&lt;/strong&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127605" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-13.png" alt="" width="3198" height="424"&gt;&lt;/p&gt; 
&lt;h2&gt;Demo&lt;/h2&gt; 
&lt;p&gt;After successful target creation, either using the implicit sync method or by providing the schema upfront, AgentCore Gateway users can discover and invoke tools using the MCP protocol. In this section, we look at the tools/list and tools/call flows from AgentCore Gateway.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-127606" style="margin: 10px 0px 10px 0px;border: 1px solid #CCCCCC" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-14.png" alt="" width="1028" height="934"&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;The gateway user sends a &lt;code&gt;tools/list&lt;/code&gt; request to AgentCore Gateway with their inbound authorization token. Because tool definitions were cached during target creation, AgentCore Gateway returns the cached tool definitions immediately.&lt;/li&gt; 
 &lt;li&gt;The gateway user sends &lt;code&gt;tools/call&lt;/code&gt; request to AgentCore Gateway with their inbound authorization token. This triggers the OAuth authorization code flow for the specific MCP server target, because AgentCore Gateway needs an access token to call the MCP server on behalf of this user.&lt;/li&gt; 
 &lt;li&gt;AgentCore Gateway &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/APIReference/API_GetWorkloadAccessTokenForJWT.html" target="_blank" rel="noopener"&gt;requests&lt;/a&gt; a workload access token from AgentCore Identity, passing the workload identity and the user’s JWT from the inbound authorization header.&lt;/li&gt; 
 &lt;li&gt;Using the workload access token, AgentCore Gateway &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/APIReference/API_GetResourceOauth2Token.html" target="_blank" rel="noopener"&gt;requests&lt;/a&gt; an OAuth 2.0 access token from the credential provider. Because no valid token exists yet for this user, the credential provider returns an authorization URL and a session-URI instead.&lt;/li&gt; 
 &lt;li&gt;AgentCore Gateway passes the authorization URL and session URI back to the gateway user. The user opens the authorization URL in their browser, signs in to the OAuth 2.0 authorization server, and grants the requested permissions. The sample &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25/client/elicitation#url-mode-flow" target="_blank" rel="noopener"&gt;URL elicitation&lt;/a&gt; response from AgentCore Gateway is as follows:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;{    
      "jsonrpc": "2.0",                                                     
      "id": 3,    
      "error": {   
          "code": -32042,     
          "message": "This request requires more information.",   
          "data": {
            "elicitations": [{
               "mode": "url",
               "elicitationId": "&amp;lt;ID&amp;gt;",     
			   "url": "&amp;lt;identity_url&amp;gt;/?request_uri=urn%3Aietf%3A...",
               "message": "Please login to this URL for authorization."
              }]      
          }       
      }
  	}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="6"&gt; 
 &lt;li&gt;After the user grants consent, the OAuth 2.0 authorization server sends an authorization code to the AgentCore Identity Credential Provider’s registered callback endpoint.&lt;/li&gt; 
 &lt;li&gt;The credential provider redirects the user’s browser to the return URL with the session URI. The user’s application calls CompleteResourceTokenAuth, presenting the user’s JWT and the session-URI. The credential provider validates that the user who initiated the authorization flow (Step 4) is the same user who completed consent.&lt;/li&gt; 
 &lt;li&gt;After successful session binding validation, the credential provider exchanges the authorization code with the OAuth 2.0 authorization server for an OAuth 2.0 access token. The credential provider caches this token in the Token Vault under the workload identity and user identity.&lt;/li&gt; 
 &lt;li&gt;When the gateway user issues a tools/call request again, AgentCore Gateway gets the cached token, using workload identity and user identity, from AgentCore Identity and uses that to call the MCP server.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Let us now look at a demo of the end-to-end flow where we send tools/list and tools/call requests to AgentCore Gateway.&lt;/p&gt; 
&lt;div style="width: 640px;" class="wp-video"&gt;
 &lt;video class="wp-video-shortcode" id="video-127589-2" width="640" height="360" preload="metadata" controls="controls"&gt;
  &lt;source type="video/mp4" src="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/FLASH-3089/Gateway+Inspector.mp4?_=2"&gt;
 &lt;/video&gt;
&lt;/div&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;When you’re done using this solution, make sure to clean up all the resources. Follow the instructions in the &lt;a href="https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/02-AgentCore-gateway/05-mcp-server-as-a-target/03-authorization-code-flow/" target="_blank" rel="noopener"&gt;code repository&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we demonstrated how to connect an OAuth-protected MCP server to Amazon Bedrock AgentCore Gateway using the Authorization Code flow. By centralizing authentication through AgentCore Gateway, teams can manage credentials securely using Amazon Bedrock AgentCore Identity while giving developers seamless access to protected tools from MCP client.&lt;/p&gt; 
&lt;p&gt;While this example focuses on the GitHub MCP server, the &lt;a href="https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/02-AgentCore-gateway/05-mcp-server-as-a-target/03-authorization-code-flow/" target="_blank" rel="noopener"&gt;code&lt;/a&gt; repository includes integration examples for other popular third-party MCP servers, and a guide for hosting your own MCP server with authorization code flow support on AgentCore Runtime as an AgentCore Gateway target. We encourage you to explore these examples and adapt them to your organization’s MCP server landscape.&lt;/p&gt; 
&lt;h2&gt;Resources&lt;/h2&gt; 
&lt;p&gt;To learn more, refer to the following resources:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/introducing-amazon-bedrock-agentcore-gateway-transforming-enterprise-ai-agent-tool-development/" target="_blank" rel="noopener"&gt;Introducing Amazon Bedrock AgentCore Gateway: Transforming enterprise AI agent tool development&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/introducing-amazon-bedrock-agentcore-identity-securing-agentic-ai-at-scale/" target="_blank" rel="noopener"&gt;Introducing Amazon Bedrock AgentCore Identity: Securing agentic AI at scale&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples/" target="_blank" rel="noopener"&gt;Amazon Bedrock AgentCore Samples&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/oauth2-authorization-url-session-binding.html" target="_blank" rel="noopener"&gt;OAuth 2.0 authorization URL session binding&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-127607" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-15.jpeg" alt="" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Arko Dutta&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Arko Dutta &lt;/strong&gt;is a Software Engineer at Amazon Web Services, currently working on the AgentCore Gateway team. During his time at Amazon, he has contributed across several organizations, including Alexa Skills, Seller Flex, and API Gateway, before joining the Bedrock AgentCore Gateway team. Outside of work, he enjoys hiking and traveling.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-127608" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-16.png" alt="" width="177" height="177"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;strong&gt;Eashan Kaushik&lt;/strong&gt;&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Eashan Kaushik&lt;/strong&gt; is a Specialist Solutions Architect AI/ML at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-127609" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-17.png" alt="" width="512" height="512"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;strong&gt;Sheetal Mohite&lt;/strong&gt;&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Sheetal Mohite &lt;/strong&gt;is a Software Engineer at Amazon Web Services on the AgentCore Gateway team. Over the course of her tenure at Amazon, she has worked across multiple organizations, including Consumer Robotics, and now contributes towards building scalable infrastructure for Agentic AI systems. Outside of work, she enjoys CrossFit, occasional trail runs and hiking.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-127610" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/04/02/flash-3089-image-18.png" alt="" width="1082" height="1178"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Tanuja Joshi&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Tanuja Joshi &lt;/strong&gt;is a Software Engineer at Amazon Web Services on the AgentCore Gateway team. Since the start of her tenure, she has been working in the agentic AI space, contributing to services such as Bedrock Agents. When not at work, she enjoys reading and rock climbing.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		<enclosure length="9465871" type="video/mp4" url="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/FLASH-3089/Consent+Screen.mp4"/>
<enclosure length="12303046" type="video/mp4" url="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/FLASH-3089/Gateway+Inspector.mp4"/>

			</item>
		<item>
		<title>Simulate realistic users to evaluate multi-turn AI agents in Strands Evals</title>
		<link>https://aws.amazon.com/blogs/machine-learning/simulate-realistic-users-to-evaluate-multi-turn-ai-agents-in-strands-evals/</link>
					
		
		<dc:creator><![CDATA[Ishan Singh]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 17:34:02 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Strands Agents]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">fe875468636d5ac9e36df34c2c854b67105f5912</guid>

					<description>In this post, we explore how ActorSimulator in Strands Evaluations SDK addresses the challenge with structured user simulation that integrates into your evaluation pipeline.</description>
										<content:encoded>&lt;p&gt;Evaluating single-turn agent interactions follows a pattern that most teams understand well. You provide an input, collect the output, and judge the result. Frameworks like &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/quickstart/" target="_blank" rel="noopener noreferrer"&gt;Strands Evaluation SDK&lt;/a&gt; make this process systematic through evaluators that assess &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/evaluators/helpfulness_evaluator/"&gt;helpfulness&lt;/a&gt;, &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/evaluators/faithfulness_evaluator/"&gt;faithfulness&lt;/a&gt;, and &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/evaluators/tool_selection_evaluator/"&gt;tool usage&lt;/a&gt;. In a previous blog post, we covered &lt;a href="https://aws.amazon.com/blogs/machine-learning/evaluating-ai-agents-for-production-a-practical-guide-to-strands-evals/"&gt;how to build comprehensive evaluation suites for AI agents&lt;/a&gt; using these capabilities. However, production conversations rarely stop at one turn.&lt;/p&gt; 
&lt;p&gt;Real users engage in exchanges that unfold over multiple turns. They ask follow-up questions when answers are incomplete, change direction when new information surfaces, and express frustration when their needs go unmet. A travel assistant that handles “Book me a flight to Paris” well in isolation might struggle when the same user follows up with “Actually, can we look at trains instead?” or “What about hotels near the Eiffel Tower?” Testing these dynamic patterns requires more than static test cases with fixed inputs and expected outputs.&lt;/p&gt; 
&lt;p&gt;The core difficulty is scale because you can’t manually conduct hundreds of multi-turn conversations every time your agent changes, and writing scripted conversation flows locks you into predetermined paths that miss how real users behave. What evaluation teams need is a way to generate realistic, goal-driven users programmatically and let them converse naturally with an agent across multiple turns. In this post, we explore how &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/simulators/user_simulation/" target="_blank" rel="noopener noreferrer"&gt;ActorSimulator&lt;/a&gt; in Strands Evaluations SDK addresses this challenge with structured user simulation that integrates into your evaluation pipeline.&lt;/p&gt; 
&lt;h2&gt;Why multi-turn evaluation is fundamentally harder&lt;/h2&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126859" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ml-20566-image-1.png" alt="" width="1060" height="1020"&gt;&lt;/p&gt; 
&lt;p&gt;Single-turn evaluation has a straightforward structure. The input is known ahead of time, the output is self-contained, and the evaluation context is limited to that single exchange. Multi-turn conversations break every one of these assumptions.&lt;/p&gt; 
&lt;p&gt;In a multi-turn interaction, each message depends on everything that came before it. The user’s second question is shaped by how the agent answered the first. A partial answer draws a follow-up about whatever was left out, a misunderstanding leads the user to restate their original request, and a surprising suggestion can send the conversation in a new direction.&lt;/p&gt; 
&lt;p&gt;These adaptive behaviors create conversation paths that can’t be predicted at test-design time. A static dataset of I/O pairs, no matter how large, can’t capture this dynamic quality because the “correct” next user message depends on what the agent just said.&lt;/p&gt; 
&lt;p&gt;Manual testing covers this gap in theory but fails in practice. Testers can conduct realistic multi-turn conversations, but doing so for every scenario, across every persona type, after every agent change is not sustainable. As the agent’s capabilities grow, the number of conversation paths grows combinatorially, well beyond what teams can explore manually.&lt;/p&gt; 
&lt;p&gt;Some teams turn to prompt engineering as a shortcut, asking a large language model (LLM) to “act like a user” during testing. Without structured persona definitions and explicit goal tracking, these approaches produce inconsistent results. The simulated user’s behavior drifts between runs, making it difficult to compare evaluations over time or identify genuine regressions versus random variation. A structured approach to user simulation can bridge this gap by combining the realism of human conversation with the repeatability and scale of automated testing.&lt;/p&gt; 
&lt;h2&gt;What makes a good simulated user&lt;/h2&gt; 
&lt;p&gt;Simulation-based testing is well established in other engineering disciplines. Flight simulators test pilot responses to scenarios that would be dangerous or impossible to reproduce in the real world. Game engines use AI-driven agents to explore millions of player behavior paths before release. The same principle applies to conversational AI. You create a controlled environment where realistic actors interact with your system under conditions you define, then measure the outcomes.&lt;/p&gt; 
&lt;p&gt;For AI agent evaluation, a useful simulated user starts with a consistent persona. One that behaves like a technical expert in one turn and a confused novice in the next produces unreliable evaluation data. Consistency means to maintain the same communication style, expertise level, and personality traits through every exchange, just as a real person would.&lt;/p&gt; 
&lt;p&gt;Equally important is goal-driven behavior. Real users come to an agent with something they want to accomplish. They persist until they achieve it, adjust their approach when something is not working, and recognize when their goal has been met. Without explicit goals, a simulated user tends to either end conversations too early or continue asking questions indefinitely, neither of which reflects real usage.&lt;/p&gt; 
&lt;p&gt;The simulated user must also respond adaptively to what the agent says, not follow a predetermined script. When the agent asks a clarifying question, the actor should answer it in character. If the response is incomplete, the actor follows up on whatever was left out rather than moving on. If the conversation drifts off topic, the actor steers it back toward the original goal. These adaptive behaviors make simulated conversations valuable as evaluation data because they exercise the same conversation dynamics your agent faces in production.&lt;/p&gt; 
&lt;p&gt;Building persona consistency, goal tracking, and adaptive behavior into a simulation framework is what differentiates structured user simulation from ad-hoc prompting. ActorSimulator in Strands Evals is designed around exactly these principles.&lt;/p&gt; 
&lt;h2&gt;How ActorSimulator works&lt;/h2&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126861" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ml-20566-image-2.png" alt="" width="1020" height="820"&gt;&lt;/p&gt; 
&lt;p&gt;ActorSimulator implements these simulation qualities through a system that wraps a Strands Agent configured to behave as a realistic user persona. The process begins with profile generation. Given a test case containing an input query and an optional task description, ActorSimulator uses an LLM to create a complete actor profile. A test case with input “I need help booking a flight to Paris” and task description “Complete flight booking under budget” might produce a budget-conscious traveler with beginner-level experience and a casual communication style. Profile generation gives each simulated conversation a distinct, consistent character.&lt;/p&gt; 
&lt;p&gt;With the profile established, the simulator manages the conversation turn by turn. It maintains the full conversation history and generates each response in context, keeping the simulated user’s behavior aligned with their profile and goals throughout. When your agent addresses only part of the request, the simulated user naturally follows up on the gaps. A clarifying question from your agent gets a response that stays consistent with the persona. The conversation feels organic because every response reflects both the actor’s persona and everything said so far.&lt;/p&gt; 
&lt;p&gt;Goal tracking runs alongside the conversation. ActorSimulator includes a built-in goal completion assessment tool that the simulated user can invoke to evaluate whether their original objective has been met. When the goal is satisfied or the simulated user determines that the agent cannot complete their request, the simulator emits a stop signal and the conversation ends. If the maximum turn count is reached before the goal is met, the conversation also stops. This gives you a signal that the agent might not be resolving user needs efficiently. This mechanism makes sure conversations have a natural endpoint rather than running indefinitely or cutting off arbitrarily.&lt;/p&gt; 
&lt;p&gt;Each response from the simulated user also includes structured reasoning alongside the message text. You can inspect why the simulated user chose to say what they said, whether they were following up on missing information, expressing confusion, or redirecting the conversation. This transparency is valuable during evaluation development because you can see the reasoning behind each turn, making it more straightforward to trace where conversations succeed or go off track.&lt;/p&gt; 
&lt;h2&gt;Getting started with ActorSimulator&lt;/h2&gt; 
&lt;p&gt;To get started, you will need to install the Strands Evaluation SDK using: &lt;code&gt;pip install strands-agents-evals&lt;/code&gt;. For a step-by-step setup, you can refer to our &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/quickstart/" target="_blank" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; or our &lt;a href="https://aws.amazon.com/blogs/machine-learning/evaluating-ai-agents-for-production-a-practical-guide-to-strands-evals/"&gt;previous blog&lt;/a&gt; for more details. Putting these concepts into practice requires minimal code. You define a test case with an input query and a task description that captures the user’s goal. ActorSimulator handles profile generation, conversation management, and goal tracking automatically.&lt;/p&gt; 
&lt;p&gt;The following example evaluates a travel assistant agent through a multi-turn simulated conversation.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;from strands import Agent
from strands_evals import ActorSimulator, Case, Experiment

# Define your test case
case = Case(
    input="I want to plan a trip to Tokyo with hotel and activities",
    metadata={"task_description": "Complete travel package arranged"}
)

# Create the agent you want to evaluate
agent = Agent(
    system_prompt="You are a helpful travel assistant.",
    callback_handler=None
)

# Create user simulator from test case
user_sim = ActorSimulator.from_case_for_user_simulator(
    case=case,
    max_turns=5
)

# Run the multi-turn conversation
user_message = case.input
conversation_history = []

while user_sim.has_next():
    # Agent responds to user
    agent_response = agent(user_message)
    agent_message = str(agent_response)
    conversation_history.append({
        "role": "assistant",
        "content": agent_message
    })

    # Simulator generates next user message
    user_result = user_sim.act(agent_message)
    user_message = str(user_result.structured_output.message)
    conversation_history.append({
        "role": "user",
        "content": user_message
    })

print(f"Conversation completed in {len(conversation_history) // 2} turns")&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The conversation loop continues until &lt;code&gt;has_next()&lt;/code&gt; returns &lt;code&gt;False&lt;/code&gt;, which happens when the simulated user’s goals are met or simulated user determines that the agent cannot complete the request or the maximum turn limit is reached. The resulting &lt;code&gt;conversation_history&lt;/code&gt; contains the full multi-turn transcript, ready for evaluation.&lt;/p&gt; 
&lt;h2&gt;Integration with evaluation pipelines&lt;/h2&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-126863" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ml-20566-image-3.png" alt="" width="1060" height="530"&gt;&lt;/p&gt; 
&lt;p&gt;A standalone conversation loop is useful for quick experiments, but production evaluation requires capturing traces and feeding them into your evaluator pipeline. The next example combines ActorSimulator with &lt;a href="https://opentelemetry.io/blog/2025/ai-agent-observability/"&gt;OpenTelemetry telemetry collection&lt;/a&gt; and Strands Evals session mapping. The task function runs a simulated conversation and collects spans from each turn, then maps them into a structured session for evaluation.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from strands import Agent
from strands_evals import ActorSimulator, Case, Experiment
from strands_evals.evaluators import HelpfulnessEvaluator
from strands_evals.telemetry import StrandsEvalsTelemetry
from strands_evals.mappers import StrandsInMemorySessionMapper

# Setup telemetry for capturing agent traces
telemetry = StrandsEvalsTelemetry()
memory_exporter = InMemorySpanExporter()
span_processor = BatchSpanProcessor(memory_exporter)
telemetry.tracer_provider.add_span_processor(span_processor)

def evaluation_task(case: Case) -&amp;gt; dict:
    # Create simulator
    user_sim = ActorSimulator.from_case_for_user_simulator(
        case=case,
        max_turns=3
    )

    # Create agent
    agent = Agent(
        system_prompt="You are a helpful travel assistant.",
        callback_handler=None
    )

    # Accumulate spans across conversation
    all_target_spans = []
    user_message = case.input

    while user_sim.has_next():
        memory_exporter.clear()
        agent_response = agent(user_message)
        agent_message = str(agent_response)

        # Capture telemetry
        turn_spans = list(memory_exporter.get_finished_spans())
        all_target_spans.extend(turn_spans)

        # Generate next user message
        user_result = user_sim.act(agent_message)
        user_message = str(user_result.structured_output.message)

    # Map to session for evaluation
    mapper = StrandsInMemorySessionMapper()
    session = mapper.map_to_session(
        all_target_spans,
        session_id="test-session"
    )

    return {"output": agent_message, "trajectory": session}

# Create evaluation dataset
test_cases = [
    Case(
        name="booking-simple",
        input="I need to book a flight to Paris next week",
        metadata={
            "category": "booking",
            "task_description": "Flight booking confirmed"
        }
    )
]

evaluator = HelpfulnessEvaluator()
dataset = Experiment(cases=test_cases, evaluator=evaluator)

# Run evaluations
report = Experiment.run_evaluations(evaluation_task)
report.run_display()
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;This approach captures complete traces of your agent’s behavior across conversation turns. The spans include tool calls, model invocations, and timing information for every turn in the simulated conversation. By mapping these spans into a structured session, you make the full multi-turn interaction available to evaluators like &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/evaluators/goal_success_rate_evaluator/"&gt;GoalSuccessRateEvaluator&lt;/a&gt; and &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/evaluators/helpfulness_evaluator/"&gt;HelpfulnessEvaluator&lt;/a&gt;, which can then assess the conversation as a whole, rather than isolated turns.&lt;/p&gt; 
&lt;h2&gt;Custom actor profiles for targeted testing&lt;/h2&gt; 
&lt;p&gt;Automatic profile generation covers most evaluation scenarios well, but some testing goals require specific personas. You might want to verify that your agent handles an impatient expert user differently from a patient beginner, or that it responds appropriately to a user with domain-specific needs. For these cases, ActorSimulator accepts a fully defined actor profile that you control.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;from strands_evals.types.simulation import ActorProfile
from strands_evals import ActorSimulator
from strands_evals.simulation.prompt_templates.actor_system_prompt import (
    DEFAULT_USER_SIMULATOR_PROMPT_TEMPLATE
)

# Define a custom actor profile
actor_profile = ActorProfile(
    traits={
        "personality": "analytical and detail-oriented",
        "communication_style": "direct and technical",
        "expertise_level": "expert",
        "patience_level": "low"
    },
    context="Experienced business traveler with elite status who values efficiency",
    actor_goal="Book business class flight with specific seat preferences and lounge access"
)

# Initialize simulator with custom profile
user_sim = ActorSimulator(
    actor_profile=actor_profile,
    initial_query="I need to book a business class flight to London next Tuesday",
    system_prompt_template=DEFAULT_USER_SIMULATOR_PROMPT_TEMPLATE,
    max_turns=10
)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;By defining traits like patience level, communication style, and expertise, you can systematically test how your agent performs across different user segments. An agent that scores well with patient, non-technical users but poorly with impatient experts reveals a specific quality gap that you can address. Running the same goal across multiple persona configurations turns user simulation into a tool for understanding your agent’s strengths and weaknesses by user type.&lt;/p&gt; 
&lt;h2&gt;Best practices for simulation-based evaluation&lt;/h2&gt; 
&lt;p&gt;These best practices help you get the most out of simulation-based evaluation:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Set &lt;code&gt;max_turns&lt;/code&gt; based on task complexity, using 3-5 for focused tasks and 8-10 for multi-step workflows. If most conversations reach the limit without completing the goal, increase it.&lt;/li&gt; 
 &lt;li&gt;Write specific task descriptions that the simulator can evaluate against. “Help the user book a flight” is too vague to judge completion reliably, while “flight booking confirmed with dates, destination, and price” gives a concrete target.&lt;/li&gt; 
 &lt;li&gt;Use auto-generated profiles for broad coverage across user types and custom profiles to reproduce specific patterns from your production logs, such as an impatient expert or a first-time user.&lt;/li&gt; 
 &lt;li&gt;Focus on patterns across your test suite rather than individual transcripts. Consistent redirects from the simulated user suggests that the agent is drifting off topic, and declining goal completion rates after an agent change points to a regression.&lt;/li&gt; 
 &lt;li&gt;Start with a small set of test cases covering your most common scenarios and expand to edge cases and additional personas as your evaluation practice matures.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;We showed how &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/simulators/user_simulation/" target="_blank" rel="noopener noreferrer"&gt;ActorSimulator&lt;/a&gt; in &lt;a href="https://strandsagents.com/docs/user-guide/evals-sdk/quickstart/"&gt;Strands Evals&lt;/a&gt; enables systematic, multi-turn evaluation of conversational AI agents through realistic user simulation. Rather than relying on static test cases that capture only single exchanges, you can define goals and personas and let simulated users interact with your agent across natural, adaptive conversations. The resulting transcripts feed directly into the same evaluation pipeline that you use for single-turn testing, giving you helpfulness scores, goal success rates, and detailed traces across every conversation turn.&lt;/p&gt; 
&lt;p&gt;To get started, explore the working examples in the &lt;a href="https://github.com/strands-agents/samples/tree/main/07-evals" target="_blank" rel="noopener noreferrer"&gt;Strands Agents samples repository&lt;/a&gt;. For teams evaluating agents deployed through &lt;a href="https://aws.amazon.com/bedrock/agentcore/?trk=2bc12158-bb93-427c-a19a-1c398faebbc8&amp;amp;sc_channel=ps&amp;amp;ef_id=Cj0KCQjwsdnNBhC4ARIsAA_3heh_4Q-3loHC_p8uMMAejTQt0u4gEE60U9aof3U1kdfNflYc9-6z7pEaAgtGEALw_wcB:G:s&amp;amp;s_kwcid=AL!4422!3!798517281045!e!!g!!agentcore!23606216570!196197897240&amp;amp;gad_campaignid=23606216570&amp;amp;gbraid=0AAAAADjHtp8Xb3vFSacq1jBPqDhevd0Az&amp;amp;gclid=Cj0KCQjwsdnNBhC4ARIsAA_3heh_4Q-3loHC_p8uMMAejTQt0u4gEE60U9aof3U1kdfNflYc9-6z7pEaAgtGEALw_wcB" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt;, the following &lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples/tree/main/01-tutorials/07-AgentCore-evaluations/03-advanced/02-simulating-agent-interactions" target="_blank" rel="noopener noreferrer"&gt;AgentCore evaluations sample&lt;/a&gt; demonstrate how to simulate interactions with deployed agents. Start with a handful of test cases representing your most common user scenarios, run them through ActorSimulator, and evaluate the results. As your evaluation practice matures, expand to cover more personas, edge cases, and conversation patterns.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-126879" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ishansin-headshot.png" alt="" width="149" height="139"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ishan Singh&lt;/h3&gt; 
  &lt;p&gt;Ishan is a Sr. Applied Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-126878" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/jb.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Jonathan Buck&lt;/h3&gt; 
  &lt;p&gt;Jonathan is a Senior Software Engineer at Amazon Web Services. His work focuses on building agent environments, evaluation, and post-training infrastructure to support the productization of agentic systems.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-126877" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/varannil-2.jpg" alt="" width="116" height="155"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Vinayak Arannil&lt;/h3&gt; 
  &lt;p&gt;Vinayak is a Sr. Applied Scientist from the Amazon Bedrock AgentCore team. With several years of experience, he has worked on various domains of AI like computer vision, natural language processing, recommendation systems etc. Currently, Vinayak helps build new capabilities on the AgentCore and Strands, enabling customers to evaluate their Agentic applications with ease, accuracy and efficiency.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-126876" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/abhishek_pic.jpeg" alt="" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Abhishek Kumar&lt;/h3&gt; 
  &lt;p&gt;Abhishek is an Applied Scientist at AWS, working at the intersection of artificial intelligence and machine learning, with a focus on agent observability, simulation, and evaluation. His primary research interests center on agentic conversational systems. Prior to his current role, Abhishek spent two years at Alexa, Amazon, where he contributed to building and training models that powered Alexa’s core capabilities.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Scaling seismic foundation models on AWS: Distributed training with Amazon SageMaker HyperPod and expanding context windows</title>
		<link>https://aws.amazon.com/blogs/machine-learning/scaling-seismic-foundation-models-on-aws-distributed-training-with-amazon-sagemaker-hyperpod-and-expanding-context-windows/</link>
					
		
		<dc:creator><![CDATA[Haotian An]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 13:30:57 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon SageMaker HyperPod]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Energy]]></category>
		<category><![CDATA[Experience-Based Acceleration]]></category>
		<guid isPermaLink="false">1c65854b644a83432e2546e6f2562a515a630b51</guid>

					<description>This post describes how TGS achieved near-linear scaling for distributed training and expanded context windows for their Vision Transformer-based SFM using Amazon SageMaker HyperPod. This joint solution cut training time from 6 months to just 5 days while enabling analysis of seismic volumes larger than previously possible.</description>
										<content:encoded>&lt;p&gt;&lt;em&gt;This post is cowritten with Altay Sansal and Alejandro Valenciano from TGS.&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://www.tgs.com/" target="_blank" rel="noopener"&gt;TGS&lt;/a&gt;, a geoscience data provider for the energy sector, supports companies’ exploration and production workflows with advanced seismic foundation models (SFMs). These models analyze complex 3D seismic data to identify geological structures vital for energy exploration. To help enhance their next-generation models as part of their AWS infrastructure modernization, TGS partnered with the AWS Generative AI Innovation Center (GenAIIC) to optimize their SFM training infrastructure.&lt;/p&gt; 
&lt;p&gt;This post describes how TGS achieved near-linear scaling for distributed training and expanded context windows for their Vision Transformer-based SFM using &lt;a href="https://aws.amazon.com/sagemaker/ai/hyperpod/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker HyperPod&lt;/a&gt;. This joint solution cut training time from 6 months to just 5 days while enabling analysis of seismic volumes larger than previously possible.&lt;/p&gt; 
&lt;h2&gt;Addressing seismic foundation model training challenges&lt;/h2&gt; 
&lt;p&gt;TGS’s SFM uses a Vision Transformer (ViT) architecture with Masked AutoEncoder (MAE) training designed by the TGS team to analyze 3D seismic data. Scaling such models presents several challenges:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Data scale and complexity&lt;/strong&gt; – TGS works with large volumes of proprietary 3D seismic data stored in domain-specific formats. The sheer volume and structure of this data required efficient streaming strategies to maintain high throughput and help prevent GPU idle time during training.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Training efficiency&lt;/strong&gt; – Training large FMs on 3D volumetric data is computationally intensive. Accelerating training cycles would enable TGS to incorporate new data more frequently and iterate on model improvements faster, delivering more value to their clients.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Expanded analytical capabilities&lt;/strong&gt; – The geological context a model can analyze depends on how much 3D volume it can process at once. Expanding this capability would allow the models to capture both local details and broader geological patterns simultaneously.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Understanding these challenges highlights the need for a comprehensive approach to distributed training and infrastructure optimization. The AWS GenAIIC partnered with TGS to develop a comprehensive solution addressing these challenges.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;The collaboration between TGS and the AWS GenAIIC focused on three key areas: establishing an efficient data pipeline, optimizing distributed training across multiple nodes, and expanding the model’s context window to analyze larger geological volumes. The following diagram illustrates the solution architecture.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-124987" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/02/25/ML-10370-image-1.png" alt="Architecture diagram showing AWS SageMaker HyperPod service integration with a customer account, featuring a login node, head node, 16 compute nodes, S3 storage connections, and user access paths for engineers, researchers, and operations teams." width="1388" height="741"&gt;&lt;/p&gt; 
&lt;p&gt;The solution uses SageMaker HyperPod to help provide a resilient, scalable training infrastructure with automatic health monitoring and checkpoint management. The SageMaker HyperPod cluster is configured with &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management&lt;/a&gt; (IAM) execution roles scoped to the minimum permissions required for training operations, deployed within a virtual private cloud (VPC) with network isolation and security groups restricting communication to authorized training nodes. Terabytes of training data streams directly from &lt;a href="https://aws.amazon.com/s3" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service&lt;/a&gt; (Amazon S3), alleviating the need for intermediate storage layers while maintaining high throughput. &lt;a href="http://aws.amazon.com/cloudtrail" target="_blank" rel="noopener noreferrer"&gt;AWS CloudTrail&lt;/a&gt; logs API calls to Amazon S3 and SageMaker services, and Amazon S3 access logging is enabled on training data buckets to provide a detailed audit trail of data access requests. The distributed training framework uses advanced parallelization techniques to efficiently scale across multiple nodes, and context parallelism methods enable the model to process significantly larger 3D volumes than previously possible.&lt;/p&gt; 
&lt;p&gt;The final cluster configuration consisted of 16 &lt;a href="https://aws.amazon.com/ec2/instance-types/p5/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2) P5 instances&lt;/a&gt; for the worker nodes integrated through the &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/reserve-capacity-with-training-plans.html" target="_blank" rel="noopener noreferrer"&gt;SageMaker AI flexible training plans&lt;/a&gt;, each containing:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;8 NVIDIA H200 GPUs with 141GB HBM3e memory per GPU&lt;/li&gt; 
 &lt;li&gt;192 vCPUs&lt;/li&gt; 
 &lt;li&gt;2048 GB system RAM&lt;/li&gt; 
 &lt;li&gt;3200 Gbps EFAv3 networking for ultra-low latency communication&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Optimizing the training data pipeline&lt;/h2&gt; 
&lt;p&gt;TGS’s training dataset consists of 3D seismic volumes stored in the TGS-developed MDIO format—an open source format built on Zarr arrays designed for large-scale scientific data in the cloud. Such volumes can contain billions of data points representing underground geological structures.&lt;/p&gt; 
&lt;h3&gt;Choosing the right storage approach&lt;/h3&gt; 
&lt;p&gt;The team evaluated two approaches for delivering data to training GPUs:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon FSx for Lustre&lt;/strong&gt; – Copy data from Amazon S3 to a high-speed distributed file system that the nodes read from. This approach provides sub-millisecond latency but requires pre-loading and provisioned storage capacity.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Streaming directly from Amazon S3 &lt;/strong&gt;– Stream data directly from Amazon S3 using MDIO’s native capabilities with multi-threaded libraries, opening multiple concurrent connections per node.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Settling on streaming directly from Amazon S3&lt;/h3&gt; 
&lt;p&gt;The key architectural difference lies in how throughput scales with the cluster. With streaming directly from Amazon S3, each training node creates independent Amazon S3 connections, so aggregate throughput can scale linearly. With &lt;a href="https://aws.amazon.com/fsx/lustre/" target="_blank" rel="noopener noreferrer"&gt;Amazon FSx for Lustre&lt;/a&gt;, the nodes share a single file system whose throughput is tied to provisioned storage capacity. Using Amazon FSx together with Amazon S3 requires only a small Amazon FSx storage volume, which limits the entire cluster to that volume’s throughput, creating a bottleneck as the cluster grows.&lt;/p&gt; 
&lt;p&gt;Comprehensive testing and cost analysis revealed streaming from Amazon S3 directly as the optimal choice for this configuration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Performance&lt;/strong&gt; – Achieved 4–5 GBps sustained throughput per node using multiple data loader processes with pre-fetching over HTTPS endpoints (TLS 1.2)—sufficient to fully utilize the GPUs.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost efficiency &lt;/strong&gt;–&lt;strong&gt; S&lt;/strong&gt;treaming from Amazon S3 alleviated the need for Amazon FSx provisioning, reducing storage infrastructure costs by over 90% while helping deliver 64-80 GBps cluster-wide throughput. The Amazon S3 pay-per-use model was more economical than provisioning high-throughput Amazon FSx capacity.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Better scaling&lt;/strong&gt; – Streaming from Amazon S3 directly scales naturally—each node brings its own connection bandwidth, avoiding the need for complex capacity planning.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Operational simplicity&lt;/strong&gt; – No intermediate storage to provision, manage, or synchronize.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The team optimized Amazon S3 connection pooling and implemented parallel data loading to sustain high throughput across the 16 nodes.&lt;/p&gt; 
&lt;h2&gt;Selecting the distributed training framework&lt;/h2&gt; 
&lt;p&gt;When training large models across multiple GPUs, the model’s parameters, gradients, and optimizer states must be distributed across devices. The team evaluated different distributed training approaches to find the optimal balance between memory efficiency and training throughput:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;ZeRO-2 (Zero Redundancy Optimizer Stage 2)&lt;/strong&gt; – This approach partitions gradients and optimizer states across GPUs while keeping a full copy of model parameters on each GPU. This helps reduce memory usage while maintaining fast communication, because each GPU can directly access the parameters during the forward pass without waiting for data from other GPUs.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ZeRO-3&lt;/strong&gt; – This approach goes further by also partitioning model parameters across GPUs. Although this helps maximize memory efficiency (enabling larger models), it requires more frequent communication between GPUs to gather parameters during computation, which can reduce throughput.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;FSDP2 (Fully Sharded Data Parallel v2)&lt;/strong&gt; – PyTorch’s native approach similarly shards parameters, gradients, and optimizer states. It offers tight integration with PyTorch but involves similar communication trade-offs as ZeRO-3.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Comprehensive testing revealed DeepSpeed ZeRO-2 as the optimal framework for this configuration, delivering strong performance while efficiently managing memory:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;ZeRO-2 &lt;/strong&gt;– 1,974 samples per second (implemented)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;FSDP2 &lt;/strong&gt;– 1,833 samples per second&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ZeRO-3 &lt;/strong&gt;– 869 samples per second&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;This framework choice provided the foundation for achieving near-linear scaling across multiple nodes. The combination of these three key optimizations helped deliver the dramatic training acceleration:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Efficient distributed training&lt;/strong&gt; – DeepSpeed ZeRO-2 enabled near-linear scaling across 128 GPUs (16 nodes × 8 GPUs)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;High-throughput data pipeline&lt;/strong&gt; – Streaming from Amazon S3 directly sustained 64–80 GBps aggregate throughput across the cluster&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Together, these improvements helped reduce training time from 6 months to 5 days—enabling TGS to iterate on model improvements weekly rather than semi-annually.&lt;/p&gt; 
&lt;h2&gt;Expanding analytical capabilities&lt;/h2&gt; 
&lt;p&gt;One of the most significant achievements was expanding the model’s field of view—how much 3D geological volume it can analyze simultaneously. A larger context window allows the model to capture both fine details (small fractures) and broad patterns (basin-wide fault systems) in a single pass, helping provide insights that were previously undetectable within the constraints of smaller analysis windows for TGS’s clients. The implementation by the TGS and AWS teams involved adapting the following advanced techniques to enable ViTs to process substantially larger 3D seismic volumes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Ring attention implementation&lt;/strong&gt; – Each GPU processes a portion of the input sequence while circulating key-value pairs to neighboring GPUs, gradually accumulating attention results across the distributed system. PyTorch provides an API that makes this straightforward:&lt;/li&gt; 
&lt;/ul&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;from torch.distributed.tensor.parallel import context_parallel

# Wrap attention computation with context parallelism
with context_parallel(
    buffers=[query, key, value],  # Tensors to shard
    buffer_seq_dims=[1, 1, 1]      # Dimension to shard along (sequence dimension)
):
    # Standard scaled dot-product attention - automatically becomes Ring Attention
    attention_output = torch.nn.functional.scaled_dot_product_attention(
        query, key, value, attn_mask=None
    )&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Dynamic mask ratio adjustment&lt;/strong&gt; – The MAE training approach required making sure unmasked patches plus classification tokens are evenly divisible across devices, necessitating adaptive masking strategies.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Decoder sequence management&lt;/strong&gt; – The decoder reconstructs the full image by processing both the unmasked patches from the encoder and the masked patches. This creates a different sequence length that also needs to be divisible by the number of GPUs.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The preceding implementation enabled processing of substantially larger 3D seismic volumes as illustrated in the following table.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Metric&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Previous (Baseline)&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;With Context Parallelism&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Maximum input size&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;640 × 640 × 1,024 voxels&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1,536 × 1,536 × 2,048 voxels&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Context length&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;102,400 tokens&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1,170,000 tokens&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Volume increase&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1×&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;4.5×&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;The following figure provides an example of 2D model context size.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-124988" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/02/25/ML-10370-image-2.png" alt="Seismic cross-section diagram titled &amp;quot;2D Model Context Size Example&amp;quot; showing three color-coded context window sizes — 256×256 (cyan), 512×512 (magenta), and 640×1024 (yellow) — overlaid at three locations across a grayscale subsurface geological profile, with crossline traces on the x-axis and depth samples on the y-axis." width="1010" height="705"&gt;&lt;/p&gt; 
&lt;p&gt;This expansion allows TGS’s models to capture geological features across broader spatial contexts, helping enhance the analytical capabilities they can offer to clients.&lt;/p&gt; 
&lt;h2&gt;Results and impact&lt;/h2&gt; 
&lt;p&gt;The collaboration between TGS and the AWS GenAIIC delivered substantial improvements across multiple dimensions:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Significant training acceleration&lt;/strong&gt; – The optimized distributed training architecture reduced training time from 6 months to 5 days—an approximate 36-fold speedup, enabling TGS to iterate faster and incorporate new geological data more frequently into their models.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Near-linear scaling&lt;/strong&gt; – The solution demonstrated strong scaling efficiency from single-node to 16-node configurations, achieving approximately 90–95% parallel efficiency with minimal performance degradation as the cluster size increased.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Expanded analytical capabilities&lt;/strong&gt; – The context parallelism implementation enables training on larger 3D volumes, allowing models to capture geological features across broader spatial contexts.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Production-ready, cost-efficient infrastructure&lt;/strong&gt; – The SageMaker HyperPod based solution with streaming from Amazon S3 helps provide a cost-effective foundation that scales efficiently as training requirements grow, while helping deliver the resilience, flexibility, and operational efficiency needed for production AI workflows.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;These improvements establish a strong foundation for TGS’s AI-powered analytics system, delivering faster model iteration cycles and broader geological context per analysis to clients while helping protect TGS’s valuable data assets.&lt;/p&gt; 
&lt;h2&gt;Lessons learned and best practices&lt;/h2&gt; 
&lt;p&gt;Several key lessons emerged from this collaboration that might benefit other organizations working with large-scale 3D data and distributed training:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Systematic scaling approach&lt;/strong&gt; – Starting with a single-node baseline establishment before progressively expanding to larger clusters enabled systematic optimization at each stage while managing costs effectively.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data pipeline optimization is critical&lt;/strong&gt; – For data-intensive workloads, thoughtful data pipeline design can provide strong performance. Direct streaming from object storage with appropriate parallelization and prefetching delivered the throughput needed without complex intermediate storage layers.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Batch size tuning is nuanced&lt;/strong&gt; – Increasing batch size doesn’t always improve throughput. The team found excessively large batch size can create bottlenecks in preparing and transferring data to GPUs. Through systematic testing at different scales, the team identified the point where throughput plateaued, indicating the data loading pipeline had become the limiting factor rather than GPU computation. This optimal balance maximized training efficiency without over-provisioning resources.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Framework selection depends on your specific requirements&lt;/strong&gt; – Different distributed training frameworks involve trade-offs between memory efficiency and communication overhead. The optimal choice depends on model size, hardware characteristics, and scaling requirements.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Incremental validation&lt;/strong&gt; – Testing configurations at smaller scales before expanding to full production clusters helped identify optimal settings while controlling costs during the development phase.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;By partnering with the AWS GenAIIC, TGS has established an optimized, scalable infrastructure for training SFMs on AWS. The solution helps accelerate training cycles while expanding the models’ analytical capabilities, helping TGS deliver enhanced subsurface analytics to clients in the energy sector. The technical innovations developed during this collaboration—particularly the adaptation of context parallelism to ViT architectures for 3D volumetric data—demonstrate the potential for applying advanced AI techniques to specialized scientific domains. As TGS continues to expand its subsurface AI system and broader AI capabilities, this foundation can support future enhancements such as multi-modal integration and temporal analysis.&lt;/p&gt; 
&lt;p&gt;To learn more about scaling your own FM training workloads, explore &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod.html" target="_blank" rel="noopener noreferrer"&gt;SageMaker HyperPod&lt;/a&gt; for resilient distributed training infrastructure, or review the &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html" target="_blank" rel="noopener noreferrer"&gt;distributed training best practices&lt;/a&gt; in the SageMaker documentation. For organizations interested in similar collaborations, the &lt;a href="https://aws.amazon.com/generative-ai/innovation-center/" target="_blank" rel="noopener noreferrer"&gt;AWS Generative AI Innovation Center&lt;/a&gt; partners with customers to help accelerate their AI initiatives.&lt;/p&gt; 
&lt;h3&gt;Acknowledgement&lt;/h3&gt; 
&lt;p&gt;Special thanks to Andy Lapastora, Bingchen Liu, Prashanth Ramaswamy, Rohit Thekkanal, Jared Kramer, Arun Ramanathan and Roy Allela for their contribution.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-125209" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/02/haotiaa-1.jpg" alt="Haotian An" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Haotian An&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Haotian An&lt;/strong&gt; is a Machine Learning Engineer at the AWS Generative AI Innovation Center, where he specializes in customizing foundation models and distributed training at scale. He works closely with customers to adapt generative AI to their specific use cases, helping them unlock new capabilities and drive measurable business outcomes.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-125210" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/02/malwani-1.jpg" alt="Manoj Alwani" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Manoj Alwani&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Manoj Alwani&lt;/strong&gt; is a Senior Applied Scientist at the Generative AI Innovation Center at AWS, where he helps organizations unlock the potential of cutting-edge AI technology. With deep expertise across the entire generative AI research stack, Manoj works closely with customers from diverse industries to accelerate their GenAI adoption and drive meaningful business outcomes. He brings over 13 years of hands-on experience in developing and deploying machine learning solutions at scale.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-125208" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/02/debby-1.jpg" alt="Debby Wehner" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Debby Wehner&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Debby Wehner&lt;/strong&gt; is a Machine Learning Engineer at the AWS Generative AI Innovation Center, specializing in large language model customization and optimization. Previously, as a full-stack software engineer at Amazon, she built AI-powered shopping applications reaching over 100 million monthly users. She holds a PhD in Computational Geophysics from the University of Cambridge, as well as a BSc and MSc from Freie Universität Berlin.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-125206" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/02/altay-1.jpg" alt="Altay Sansal" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Altay Sansal&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Altay Sansal&lt;/strong&gt; is a Senior Data Science Lead at TGS in Houston, Texas, specializing in AI/ML applications for geophysics and seismic data, including foundation models, large-scale training, and open-source tools like the MDIO format. He holds an M.S. in Geophysics from the University of Houston and has authored key publications such as “Scaling Seismic Foundation Models” and “MDIO: Open-source format for multidimensional energy data”, while actively contributing to geoscience ML through GitHub and industry events.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-125204" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/02/alejandro-1.jpg" alt="Alejandro Valenciano" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Alejandro Valenciano&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Alejandro Valenciano&lt;/strong&gt; is the Director of Data Science at TGS, where he leads advanced analytics and data science initiatives that unlock insights from subsurface and energy-related data, driving innovation across seismic, well, and machine learning workflows. He has developed and applied machine learning models for tasks such as basin-scale log prediction, advanced seismic processing, and Foundation Models. He frequently contributes to industry conferences and technical publications. His work spans data management, ML/AI applications in geoscience, and the integration of scalable data platforms to support exploration and energy solutions.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Control which domains your AI agents can access</title>
		<link>https://aws.amazon.com/blogs/machine-learning/control-which-domains-your-ai-agents-can-access/</link>
					
		
		<dc:creator><![CDATA[Kosti Vasilakakis]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 13:28:19 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[AI/ML]]></category>
		<category><![CDATA[Amazon Machine Learning]]></category>
		<category><![CDATA[Generative AI]]></category>
		<guid isPermaLink="false">63914b8f7a4b3b849864247648567acd69c82c9c</guid>

					<description>In this post, we show you how to configure AWS Network Firewall to restrict AgentCore resources to an allowlist of approved internet domains. This post focuses on domain-level filtering using SNI inspection — the first layer of a defense-in-depth approach.</description>
										<content:encoded>&lt;p&gt;AI agents that can browse the web open powerful possibilities—from research automation to real-time data gathering. However, giving an AI agent unrestricted internet access raises security and compliance concerns. What if the agent accesses unauthorized websites? What if sensitive data is exfiltrated to external domains?&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/agentcore/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt; provides managed tools that enable AI agents to interact with the web (Browser), execute code (Code Interpreter), and host agents (Runtime). When deployed in an Amazon Virtual Private Cloud (Amazon VPC), you can control tool network access using AWS Network Firewall to implement domain-based filtering. AWS Network Firewall also provides you with managed rules to help reduce access to botnets, known-malware domains, and other high-risk resources.&lt;/p&gt; 
&lt;p&gt;In this post, we show you how to configure &lt;a href="https://aws.amazon.com/network-firewall/" target="_blank" rel="noopener noreferrer"&gt;AWS Network Firewall&lt;/a&gt; to restrict AgentCore resources to an allowlist of approved internet domains. You can use this architecture to:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Permit access only to specified domains (for example, wikipedia.org, stackoverflow.com)&lt;/li&gt; 
 &lt;li&gt;Explicitly block certain categories (e.g., social media sites) using rule templates&lt;/li&gt; 
 &lt;li&gt;Log the connection attempts for audit and compliance alignment&lt;/li&gt; 
 &lt;li&gt;Apply a default-deny policy for unspecified domains&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;This post focuses on domain-level filtering using SNI inspection — the first layer of a defense-in-depth approach. For DNS-level filtering and content inspection techniques, see &lt;strong&gt;Going further&lt;/strong&gt; at the end of this post. For inbound access control (restricting who can invoke your agents), you can also see &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/resource-based-policies.html" target="_blank" rel="noopener noreferrer"&gt;Resource-based policies for Amazon Bedrock AgentCore&lt;/a&gt;. These support conditions like &lt;code&gt;aws:SourceIp&lt;/code&gt;, &lt;code&gt;aws:SourceVpc&lt;/code&gt;, and &lt;code&gt;aws:SourceVpce&lt;/code&gt;. These controls are complementary layers in a defense in depth strategy.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Why this matters: Enterprise security requirements&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Customers deploying AI agents in regulated industries have consistent security requirements around network ingress and egress control:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Enterprise organizations with high security requirements&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Customers in regulated industries conducting security reviews for AI agent deployments consistently ask about network isolation and egress control, requiring detailed explanations of how agent traffic is controlled and audited. These customers want assurance that agent runtime endpoints remain private, and that additional security controls like web application firewall protections are available.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Multi-tenant SaaS providers&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Enterprise software as a service (SaaS) providers require DNS-level &lt;code&gt;allowlisting&lt;/code&gt; and &lt;code&gt;denylisting&lt;/code&gt; because their multi-tenant architectures need per-customer network policies. For example, Customer A might need to allow domains that Customer B blocks. Common requirements include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Execution-specific blocking (prevent access to certain domains during specific browser launches)&lt;/li&gt; 
 &lt;li&gt;Regional restrictions (block website categories in specific regions)&lt;/li&gt; 
 &lt;li&gt;Category-based rules (disable gambling or social media sites through pre-packaged rule sets)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Security vulnerability mitigation and compliance audit requirements&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Security teams evaluating AI agents have identified that agents can be tricked into navigating to unintended sites through prompt injection attacks. Custom URL &lt;code&gt;allowlists&lt;/code&gt; reduce the attack surface by restricting the browser to approved domains, regardless of what the agent is instructed to do. Domain-based egress filtering provides the logging and access control visibility that security teams often need for their security monitoring processes.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Solution overview&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The solution deploys AgentCore Browser in a private subnet with no direct internet access. The outbound traffic routes through AWS Network Firewall, which inspects TLS Server Name Indication (SNI) headers to determine the destination domain and apply filtering rules. You can also monitor Network Firewall actions taken to restrict traffic through the native Network Firewall integration with Amazon CloudWatch metrics.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-126804 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ML-20452-image-1.png" alt="" width="1904" height="936"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;em&gt;Figure 1: AgentCore deployment with AWS Network Firewall and domain-based egress filtering&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;The architecture includes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Private subnet&lt;/strong&gt;: Hosts AgentCore Browser instances with no public IP addresses&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Public subnet&lt;/strong&gt;: Contains the NAT Gateway for outbound connectivity&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Firewall subnet&lt;/strong&gt;: Hosts the Network Firewall endpoint&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Four route tables&lt;/strong&gt;: Control traffic flow through the firewall for both outbound requests and return traffic&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Traffic flow&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;AgentCore Runtime executes the agent and invokes the AgentCore Browser tool&lt;/li&gt; 
 &lt;li&gt;AgentCore Browser initiates an HTTPS request from the private subnet&lt;/li&gt; 
 &lt;li&gt;The private subnet route table directs traffic to the NAT Gateway in the public subnet&lt;/li&gt; 
 &lt;li&gt;The NAT Gateway translates the private IP address and forwards the request to the Network Firewall endpoint&lt;/li&gt; 
 &lt;li&gt;Network Firewall inspects the TLS SNI header to identify the destination domain&lt;/li&gt; 
 &lt;li&gt;If the domain matches an &lt;code&gt;allowlist&lt;/code&gt; rule, the firewall forwards traffic to the Internet Gateway&lt;/li&gt; 
 &lt;li&gt;The Internet Gateway routes approved traffic to the external destination&lt;/li&gt; 
 &lt;li&gt;Return traffic follows the symmetric path back through the firewall to the agent&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This architecture helps make sure that the browser traffic is inspected and filtered, regardless of the destination.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; SNI-based filtering helps control which domains agents connect to at the TLS layer. For DNS-level control, including controls to help prevent DNS tunneling and exfiltration, pair this with Amazon Route 53 Resolver DNS Firewall. DNS Firewall helps address a limitation of SNI inspection: an agent could potentially resolve a blocked domain through DNS and connect by IP address directly.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Before you begin, make sure that you have:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An AWS account with permissions to create VPC resources, Network Firewall, and IAM roles&lt;/li&gt; 
 &lt;li&gt;AWS Command Line Interface (AWS CLI) version 2.x configured with appropriate credentials&lt;/li&gt; 
 &lt;li&gt;Access to Amazon Bedrock AgentCore&lt;/li&gt; 
 &lt;li&gt;Basic familiarity with VPC networking concepts&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Walkthrough&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;For the complete step-by-step VPC and Network Firewall setup, see the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agentcore-vpc.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore VPC configuration documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;This section highlights the AgentCore Browser-specific configuration.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Step 1: Deploy resources using the CloudFormation template&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Launch the &lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples/blob/main/01-tutorials/05-AgentCore-tools/02-Agent-Core-browser-tool/09-browser-with-domain-filtering/agentcore-browser-firewall.yaml" target="_blank" rel="noopener noreferrer"&gt;CloudFormation template&lt;/a&gt; from the repository. You can keep the stack default values. However, make sure to add a stack name (for example, “&lt;strong&gt;agentcore-egress&lt;/strong&gt;“) to the “Stack name” field, choose an Availability Zone on the “Availability Zone” menu, and include a valid existing bucket name on the “&lt;strong&gt;BucketConfigForOutput&lt;/strong&gt;” parameter. Wait for the stack creation to complete, which typically takes 10 minutes. Continue with the following steps after the stack status changes to &lt;strong&gt;CREATE_COMPLETE&lt;/strong&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Step 2: Review the IAM execution role&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;AgentCore Browser requires an IAM role with a trust policy for the Amazon bedrock-agentcore.amazonaws.com service:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-yaml"&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "bedrock-agentcore.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Step 3: Configure the Network Firewall allowlist &lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Create a stateful rule group with your approved domains. Note the leading dot (.) to match subdomains:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;cat &amp;gt; allowlist-rules.json &amp;lt;&amp;lt; 'EOF'
{
  "RulesSource": {
    "RulesSourceList": {
      "Targets": [
        ".wikipedia.org",
        ".stackoverflow.com",
        ".docs.aws.amazon.com",
        ".amazonaws.com",
        ".pypi.org",
        ".pythonhosted.org"
      ],
      "TargetTypes": ["HTTP_HOST", "TLS_SNI"],
      "GeneratedRulesType": "ALLOWLIST"
    }
  },
  "StatefulRuleOptions": {
    "RuleOrder": "STRICT_ORDER"
  }
}
EOF

aws network-firewall create-rule-group \
  --rule-group-name browser-allowed-domains \
  --type STATEFUL \
  --capacity 100 \
  --rule-group file://allowlist-rules.json \
  --region us-east-2
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Include &lt;code&gt;.amazonaws.com&lt;/code&gt; in your &lt;code&gt;allowlist&lt;/code&gt; if the browser requires AWS service access or use VPC Endpoints as an alternative.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Security consideration: &lt;/strong&gt;The .amazonaws.com domain is a broad &lt;code&gt;allowlist&lt;/code&gt; that permits access to hosted endpoints on AWS, including public Amazon Simple Storage Service (Amazon S3) buckets, Amazon API Gateway endpoints, and AWS Lambda function URLs. For tighter control, use VPC Endpoints for AWS service access and &lt;code&gt;allowlist&lt;/code&gt; only the specific external domains your agents need.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;For Code Interpreter:&lt;/strong&gt; Consider adding “.pypi.org” and “.pythonhosted.org” if you need a pip package installation. Most common packages are pre-installed, making these domains optional for your use case.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Step 4: Configure the firewall policy &lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The firewall policy must use &lt;code&gt;aws:drop_established&lt;/code&gt; as the default action. This allows TCP handshakes to complete (required for TLS SNI inspection) while dropping connections to non-allowed domains:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;cat &amp;gt; firewall-policy.json &amp;lt;&amp;lt; 'EOF'
{
  "StatelessDefaultActions": ["aws:forward_to_sfe"],
  "StatelessFragmentDefaultActions": ["aws:forward_to_sfe"],
  "StatefulRuleGroupReferences": [
    {
      "ResourceArn": "arn:aws:network-firewall:us-east-2:ACCOUNT_ID:stateful-rulegroup/browser-allowed-domains",
      "Priority": 1
    }
  ],
  "StatefulEngineOptions": {
    "RuleOrder": "STRICT_ORDER"
  },
  "StatefulDefaultActions": ["aws:drop_established"]
}
EOF&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Do not use aws:drop_strict&lt;/strong&gt; because it blocks TCP SYN packets before the TLS handshake, preventing SNI inspection.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Step 5: Create the security group&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Create a security group that allows outbound traffic. The Network Firewall handles domain filtering, so the security group permits the egress:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;# Create security group
aws ec2 create-security-group \
  --group-name agentcore-egress-sg \
  --description "AgentCore tools - egress only, filtered by Network Firewall" \
  --vpc-id vpc-XXXXXXXXX \
  --region us-east-2

# Allow all outbound traffic (Network Firewall handles filtering)
aws ec2 authorize-security-group-egress \
  --group-id sg-XXXXXXXXX \
  --protocol -1 \
  --port -1 \
  --cidr 0.0.0.0/0 \
  --region us-east-2

# Remove default inbound rules if present (AgentCore tools don't need inbound)
aws ec2 revoke-security-group-ingress \
  --group-id sg-XXXXXXXXX \
  --protocol -1 \
  --port -1 \
  --cidr 0.0.0.0/0 \
  --region us-east-2&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Step 6: Create the AgentCore Browser &lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Create the browser with VPC configuration pointing to your private subnet:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;aws bedrock-agentcore-control create-browser \
  --name my_secure_browser \
  --execution-role-arn arn:aws:iam::ACCOUNT_ID:role/AgentCoreBrowserExecutionRole \
  --network-configuration '{
    "networkMode": "VPC",
    "vpcConfig": {
      "securityGroups": ["sg-XXXXXXXXX"],
      "subnets": ["subnet-XXXXXXXXX"]
    }
  }' \
  --region us-east-2&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Step 6b: Create AgentCore Code Interpreter (Optional) &lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;You can also deploy AgentCore Code Interpreter in the same VPC with the same firewall protection:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;aws bedrock-agentcore-control create-code-interpreter \
  --name my_secure_code_interpreter \
  --network-configuration '{
    "networkMode": "VPC",
    "vpcConfig": {
      "securityGroups": ["sg-XXXXXXXXX"],
      "subnets": ["subnet-XXXXXXXXX"]
    }
  }' \
  --region us-east-2&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;AgentCore Code Interpreter uses the same network path as Browser. If you need pip to install packages, make sure .pypi.org and .pythonhosted.org are in your allowlist.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Step 6c: Deploy agent on AgentCore Runtime (Optional) &lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;For container-based agent deployments, use the same VPC configuration:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;aws bedrock-agentcore-control create-agent-runtime \
  --agent-runtime-name my_vpc_agent \
  --role-arn arn:aws:iam::ACCOUNT_ID:role/AgentCoreRuntimeRole \
  --agent-runtime-artifact '{
    "containerConfiguration": {
      "containerUri": "ACCOUNT_ID.dkr.ecr.us-east-2.amazonaws.com/my-agent:latest"
    }
  }' \
  --network-configuration '{
    "networkMode": "VPC",
    "networkModeConfig": {
      "securityGroups": ["sg-XXXXXXXXX"],
      "subnets": ["subnet-XXXXXXXXX"]
    }
  }' \
  --protocol-configuration '{"serverProtocol": "HTTP"}' \
  --region us-east-2&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;AgentCore Runtime domain requirements depend on your model provider. Include .amazonaws.com for Amazon Bedrock model API calls or add the appropriate domains for other model providers your agent uses. Additionally, allow custom domains that your agent must access.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Step 7: Test the Configuration &lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Start a browser session and verify that the firewall rules work correctly:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;# Start browser session
aws bedrock-agentcore start-browser-session \
  --browser-identifier my_secure_browser-ABC123xyz \
  --region us-east-2&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Use the returned WebSocket URL with a browser automation tool like Playwright to test both allowed and blocked domains:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-python"&gt;# test_firewall_rules.py

from playwright.sync_api import sync_playwright
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest

WEBSOCKET_URL = "wss://your-session-url"  # From start-browser-session response
REGION = "us-east-2"

# Sign the WebSocket URL with SigV4
session = boto3.Session(region_name=REGION)
credentials = session.get_credentials().get_frozen_credentials()
request = AWSRequest(method="GET", url=WEBSOCKET_URL.replace("wss://", "https://"))
SigV4Auth(credentials, "bedrock-agentcore", REGION).add_auth(request)
headers = dict(request.headers)

def test_domain(page, url, expected_success):
    try:
        response = page.goto(url, timeout=10000)
        success = response and response.status &amp;lt; 400
        status = "PASS" if success == expected_success else "FAIL"
        print(f"{status}: {url} - {'loaded' if success else 'blocked'}")
        return success == expected_success
    except Exception as e:
        success = False
        status = "PASS" if not expected_success else "FAIL"
        print(f"{status}: {url} - blocked ({type(e).__name__})")
        return not expected_success

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(WEBSOCKET_URL, headers=headers)
    page = browser.new_page()

    # Test allowed domains (should load)
    test_domain(page, "https://wikipedia.org", expected_success=True)
    test_domain(page, "https://docs.aws.amazon.com", expected_success=True)

    # Test blocked domains (should timeout/fail)
    test_domain(page, "https://example.com", expected_success=False)
    test_domain(page, "https://twitter.com", expected_success=False)

    browser.close()&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Expected results:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Allowed domains (.wikipedia.org, .amazonaws.com) should load successfully.&lt;/li&gt; 
 &lt;li&gt;Blocked domains should time out after the TCP handshake or return connection errors.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Some allowed domains like docs.aws.amazon.com depend on CDN resources from domains such as awsstatic.com and cloudfront.net. If pages on allowed domains fail to render fully, add the required CDN domains to your &lt;code&gt;allowlist&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;You can also check the firewall logs in CloudWatch for blocked connection attempts:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;# View recent alert logs (blocked connections)
aws logs filter-log-events \
  --log-group-name "/aws/network-firewall/agentcore-egress/alerts" \
  --filter-pattern '{ $.event.alert.action = "blocked" }' \
  --region us-east-2 \
  --start-time $(($(date +%s) - 300))000

# Verify firewall sync status before testing
aws network-firewall describe-firewall \
  --firewall-name agentcore-egress-firewall \
  --region us-east-2 \
  --query 'FirewallStatus.ConfigurationSyncStateSummary'&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;Troubleshooting:&lt;/strong&gt; If allowed domains are blocked, verify:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Firewall sync status shows IN_SYNC (rule changes take a few minutes)&lt;/li&gt; 
 &lt;li&gt;Domain entries include the leading dot (.wikipedia.org not wikipedia.org)&lt;/li&gt; 
 &lt;li&gt;Route tables are configured correctly for symmetric routing&lt;/li&gt; 
 &lt;li&gt;If you receive HTTP 403 errors on allowed domains, this is typically bot detection by the destination site, not a firewall block. Check CloudWatch ALERT logs to confirm—blocked connections will have explicit alert entries.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;&lt;strong&gt;Best practices&lt;/strong&gt;&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Use STRICT_ORDER evaluation&lt;/strong&gt;: This facilitates predictable rule processing when combining &lt;code&gt;allowlists&lt;/code&gt; and &lt;code&gt;denylists&lt;/code&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Include .amazonaws.com for AWS service access&lt;/strong&gt;: Or use VPC Endpoints to avoid routing AWS API calls through the internet.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Configure the IGW ingress route table&lt;/strong&gt;: This is critical for symmetric routing. Without it, return traffic bypasses the firewall.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Enable both ALERT and FLOW logs&lt;/strong&gt;: ALERT logs capture blocked connections; FLOW logs provide connection metadata for the traffic.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Wait for firewall sync&lt;/strong&gt;: Rule changes take a few minutes to propagate. Verify ConfigurationSyncStateSummary: IN_SYNC before testing.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Configure HOME_NET for multi-VPC architectures&lt;/strong&gt;: By default, Network Firewall domain inspection only filters traffic originating from the deployment VPC’s Classless Inter-Domain Routing (CIDR) range. If you use a centralized firewall with AWS Transit Gateway to inspect traffic from multiple VPCs, you must configure the HOME_NET variable in your rule group to include the source CIDR ranges. Without this, traffic from other VPCs can bypass domain filtering.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Limitations and cost considerations&lt;/strong&gt;&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Content inspection requires TLS inspection: &lt;/strong&gt;By default, domain filtering operates on unencrypted TLS metadata (SNI headers) and can’t inspect encrypted request or response bodies. To inspect HTTPS content, enable TLS inspection on your Network Firewall and add Suricata rules that match on HTTP body content. &lt;strong&gt;SNI/Host header bypass risk&lt;/strong&gt;: Network Firewall uses TLS SNI headers and HTTP Host headers—not IP addresses—to determine destination domains. If these headers are manipulated, traffic could bypass domain filtering. For high-security deployments, combine domain rules with IP-based rules for critical blocked destinations, or add DNS filtering as an additional layer. Additionally, consider pairing SNI-based rules with Route 53 DNS Firewall to help prevent agents from resolving blocked domains through DNS and connecting by IP address directly.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;HOME_NET scope in multi-VPC deployments&lt;/strong&gt;: By default, Network Firewall domain inspection only applies to traffic originating from the deployment VPC’s CIDR range. If you use a centralized firewall with AWS Transit Gateway (multiple VPCs routing through a shared firewall), you must configure the HOME_NET variable in your rule group to include the source CIDR ranges. Without this, traffic from spoke VPCs bypasses domain inspection. See &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/stateful-rule-groups-domain-names.html" target="_blank" rel="noopener noreferrer"&gt;Stateful domain list rule groups&lt;/a&gt; for details.&lt;/li&gt; 
 &lt;li&gt;Costs will vary based on your usage. See &lt;a href="https://aws.amazon.com/vpc/pricing/" target="_blank" rel="noopener noreferrer"&gt;NAT Gateway pricing&lt;/a&gt; and &lt;a href="https://aws.amazon.com/network-firewall/pricing/" target="_blank" rel="noopener noreferrer"&gt;Network Firewall pricing&lt;/a&gt; for current rates.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Clean up&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Delete resources in this order to avoid ongoing charges:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Delete the AgentCore Browser&lt;/li&gt; 
 &lt;li&gt;Delete the Network Firewall (disable protection settings first)&lt;/li&gt; 
 &lt;li&gt;Delete the NAT Gateway&lt;/li&gt; 
 &lt;li&gt;Release the Elastic IP address&lt;/li&gt; 
 &lt;li&gt;Delete the subnets and route tables&lt;/li&gt; 
 &lt;li&gt;Detach and delete the Internet Gateway&lt;/li&gt; 
 &lt;li&gt;Delete the VPC&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; AgentCore Browser and Code Interpreter create elastic network interfaces in your VPC. After deleting these resources, wait a few minutes for the network interface to release before deleting the security group, subnet, or VPC. If deletion fails, check for lingering network interfaces in the subnet and wait for them to detach.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Related resources&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;For more information, see the following resources.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agentcore-vpc.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore VPC configuration&lt;/a&gt; – VPC networking setup for AgentCore tools&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/deployment-models-for-aws-network-firewall/" target="_blank" rel="noopener noreferrer"&gt;Deployment models for AWS Network Firewall&lt;/a&gt; – Architecture patterns for centralized and distributed firewall deployments&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore documentation&lt;/a&gt; – Browser, Code Interpreter, and Agent Runtime configuration&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/stateful-rule-groups-domain-names.html" target="_blank" rel="noopener noreferrer"&gt;AWS Network Firewall rule groups&lt;/a&gt; – Domain list rule configuration reference&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Going further&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Domain filtering through SNI inspection is one layer of egress security. Depending on your requirements, consider these additional mitigations:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Technique&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;What it does&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Helps in scenarios where&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Reference&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Route 53 DNS Firewall&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Helps block or allow DNS queries by domain and prevent DNS tunneling and exfiltration.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;You need DNS-level filtering or protection against DNS-based data exfiltration.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;a href="https://aws.amazon.com/blogs/security/protect-against-advanced-dns-threats-with-amazon-route-53-resolver-dns-firewall/" target="_blank" rel="noopener noreferrer"&gt;Protect against advanced DNS threats&lt;/a&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;TLS inspection + Suricata DLP&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Decrypt HTTPS, inspect request/response bodies with Suricata rules, help block sensitive data patterns (PII, credentials).&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;You need data loss prevention (DLP) for agent-generated traffic.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;a href="https://aws.amazon.com/blogs/security/tls-inspection-configuration-for-encrypted-egress-traffic-and-aws-network-firewall/" target="_blank" rel="noopener noreferrer"&gt;TLS inspection for encrypted egress traffic&lt;/a&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Centralized inspection architecture&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Route traffic from multiple VPCs through a shared inspection VPC with Network Firewall.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;You run multiple AgentCore deployments and want centralized policy enforcement.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/deploy-centralized-traffic-filtering-using-aws-network-firewall/" target="_blank" rel="noopener noreferrer"&gt;Deploy centralized traffic filtering&lt;/a&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;When using TLS inspection, configure custom certificates on your AgentCore resources to trust the Network Firewall’s re-signing CA.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;By combining Amazon Bedrock AgentCore tools with AWS Network Firewall, you can give AI agents controlled web access while maintaining security and compliance alignment. The domain-based filtering approach helps you define precisely which websites agents can access, block unwanted destinations, and log the connection attempts for audit purposes. This architecture addresses the security concerns raised by enterprise customers:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;FSI compliance&lt;/strong&gt;: Provides the network isolation and audit logging required for CISO-level security reviews.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Multi-tenant control&lt;/strong&gt;: Enables per-customer or per-execution domain policies for SaaS providers.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Prompt injection defense&lt;/strong&gt;: Restricts agent navigation to approved domains, helping reduce the attack surface for prompt injection.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Audit evidence&lt;/strong&gt;: Generates CloudWatch logs that support compliance audit requirements.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;For enterprises deploying AI agents that need internet access for research, data gathering, or API integrations, this pattern provides a production-ready approach to maintaining strict control over where that access leads. Rather than maintaining custom squid proxies or complex network infrastructure, you can use AWS managed services to implement enterprise-grade egress filtering in hours, not weeks.&lt;/p&gt; 
&lt;p&gt;For more information about AgentCore Browser, see the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser-tool.html" target="_blank" rel="noopener noreferrer"&gt;AgentCore Browser documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-126803" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ML-20452-image-2.png" alt="" width="150" height="187"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Kosti Vasilakakis&lt;/h3&gt; 
  &lt;p&gt;Kosti Vasilakakis is a Principal PM at AWS on the Agentic AI team, where he has led the design and development of several Bedrock AgentCore services from the ground up, including Runtime, Browser, Code Interpreter, and Identity. He previously worked on Amazon SageMaker since its early days, launching AI/ML capabilities now used by thousands of companies worldwide. Earlier in his career, Kosti was a data scientist. Outside of work, he builds personal productivity automations, plays tennis, and enjoys life with his wife and kids.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-126801" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ML-20452-image-4.png" alt="" width="150" height="199"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Evandro Franco&lt;/h3&gt; 
  &lt;p&gt;Evandro Franco is a Sr. Data Scientist working on Amazon Web Services. He is part of the Global GTM team that helps AWS customers overcome business challenges related to AI/ML on top of AWS, mainly on Amazon Bedrock AgentCore and Strands Agents. He has more than 18 years of experience working with technology, from software development, infrastructure, serverless, to machine learning. In his free time, Evandro enjoys playing with his son, mainly building some funny Lego bricks.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-126802" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ML-20452-image-3.png" alt="" width="148" height="199"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Kevin Orellana&lt;/h3&gt; 
  &lt;p&gt;Kevin Orellana is a Software Development Engineer at Amazon Web Services on the Bedrock AgentCore team, based in Seattle. He builds and operates core infrastructure powering agentic AI capabilities, including Browser, Code Interpreter, and Runtime. Earlier in his career, Kevin worked on the Bedrock inference team hosting frontier models. In his free time, he enjoys hiking with his Goldendoodle, experimenting with multi-agent simulations, and working toward building a personal AI assistant that speaks English, Spanish, and Mandarin.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-box"&gt; 
   &lt;div class="blog-author-image"&gt;
    &lt;img loading="lazy" class="alignnone size-full wp-image-126800" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/23/ML-20452-image-5.png" alt="" width="150" height="199"&gt;
   &lt;/div&gt; 
   &lt;h3&gt;Yan Marim&lt;/h3&gt; 
   &lt;p&gt;Yan Marim is a Sr. GenAI Specialist Solutions Architect at Amazon Web Services, based in Brazil. As part of the LATAM Specialist team, he guides customers through their generative AI adoption journey, focusing on Amazon Bedrock and agentic AI solutions. In his free time, Yan enjoys spending quality time with his wife and dog, and watching soccer games.&lt;/p&gt; 
  &lt;/div&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Rocket Close transforms mortgage document processing with Amazon Bedrock and Amazon Textract</title>
		<link>https://aws.amazon.com/blogs/machine-learning/rocket-close-transforms-mortgage-document-processing-with-amazon-bedrock-and-amazon-textract/</link>
					
		
		<dc:creator><![CDATA[Jeremy Little, Chris Day]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 12:59:31 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock]]></category>
		<category><![CDATA[Amazon Textract]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Financial Services]]></category>
		<category><![CDATA[Generative AI]]></category>
		<guid isPermaLink="false">26dce02ed0faf5612b7162b5bbd73bfc1d59687c</guid>

					<description>Through a strategic partnership with the AWS Generative AI Innovation Center (GenAIIC), Rocket Close developed an intelligent document processing solution that has significantly reduced processing time, making the process 15 times faster. The solution, which uses Amazon Textract for OCR processing and Amazon Bedrock for foundation models (FMs), achieves a strong 90% overall accuracy in document segmentation, classification, and field extraction.</description>
										<content:encoded>&lt;p&gt;&lt;em&gt;This post is cowritten by Jeremy Little and Chris Day from Rocket Close.&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://www.rocketclose.com/" target="_blank" rel="noopener noreferrer"&gt;Rocket Close&lt;/a&gt;, a Detroit-based title and appraisal management company within the Rocket Companies environment, has enhanced mortgage document processing by transforming a time-consuming manual process into an efficient automated solution. Processing approximately 2,000 abstract package files daily, with each file averaging 75 pages, the company faced a major operational challenge: manual extraction took on average 10 hours per package, creating considerable resource allocation burdens and workflow bottlenecks.&lt;/p&gt; 
&lt;p&gt;Through a strategic partnership with the AWS Generative AI Innovation Center (GenAIIC), Rocket Close developed an intelligent document processing solution that has significantly reduced processing time, making the process 15 times faster. The solution, which uses &lt;a href="https://aws.amazon.com/textract/" target="_blank" rel="noopener noreferrer"&gt;Amazon Textract&lt;/a&gt; for OCR processing and &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; for foundation models (FMs), achieves a strong 90% overall accuracy in document segmentation, classification, and field extraction. Amazon Bedrock is a fully managed service that provides a serverless and more secure way to build and scale generative AI applications. It offers a single API to access a choice of leading FMs from various AI companies. Designed to scale to over 500,000 documents annually, this transformation positions Rocket Close at the forefront of technological innovation in the mortgage industry, supporting faster customer service and sustainable business growth.&lt;/p&gt; 
&lt;p&gt;This post explores how this solution was developed and implemented, demonstrating how generative AI can transform document-intensive processes in the mortgage industry.&lt;/p&gt; 
&lt;h2&gt;Challenges of manual processing at scale&lt;/h2&gt; 
&lt;p&gt;Rocket Close processes a high volume of complex documentation as part of its title and appraisal management services. Rocket Close is dedicated to helping clients realize their dream of homeownership and financial freedom by making complex processes simpler through technology-driven solutions. By analyzing a wide range of data points, Rocket Close can quickly and accurately assess the risk associated with a loan, so they can make more informed lending decisions and get their clients the financing they need.Rocket Close faced a critical bottleneck that threatened their growth and profitability:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Volume overload&lt;/strong&gt; – 2,000 abstract packages daily, each averaging 75 pages&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Time-intensive workflow&lt;/strong&gt; – 10 hours per package due to recent volume spikes, with an estimated 30 minutes of actual manual processing effort per package&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Financial impact&lt;/strong&gt; – Considerable costs per file, with complex cases resulting in even higher expenses, totaling millions in annual processing costs&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scalability limits&lt;/strong&gt; – Manual processes couldn’t keep pace with growing demand&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Quality concerns&lt;/strong&gt; – Human error and inconsistencies in data extraction&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;With approximately 1,000 hours of manual processing effort required daily, Rocket Close needed a solution that could maintain accuracy while dramatically reducing processing time.&lt;/p&gt; 
&lt;h2&gt;Understanding abstract document packages&lt;/h2&gt; 
&lt;p&gt;Abstract document packages are comprehensive collections of legal documents related to property ownership and transactions. These packages typically contain 50–100 pages of various document types bundled together, often with inconsistent formatting, varying quality, and complex structures. Each package requires thorough examination to extract critical information about property ownership, liens, mortgages, and legal status. The packages present unique challenges for automated processing due to their heterogeneous nature. Documents within a single package might include typed texts, layouts, handwritten notes, tables, forms, signatures, and stamps. Additionally, the ordering and presence of specific documents can vary significantly between packages, requiring sophisticated document segmentation and classification capabilities.&lt;/p&gt; 
&lt;p&gt;The solution handles over 60 different document classes that fall into several major categories:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Mortgage documents&lt;/strong&gt; – These include primary mortgage instruments such as mortgage agreements, deeds of trust, and security instruments. These documents establish the terms of loans secured by real property and contain critical information about loan amounts, interest rates, and repayment terms.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Chain of title documents&lt;/strong&gt; – This category encompasses various deed types (warranty deed, quitclaim deed, special warranty deed) that document the historical transfers of property ownership. These documents establish the legal chain of title and are essential for verifying clean ownership.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Judgment documents&lt;/strong&gt; – These include civil judgments, abstracts of judgment, and various notices of lien that might affect property ownership. These documents record legal claims against property owners that might impact title status.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Tax documents&lt;/strong&gt; – This category includes tax-related filings such as notice of federal tax lien and notice of state tax lien that represent potential claims against the property for unpaid taxes.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Legal documents&lt;/strong&gt; – These encompass various legal filings, including pending lawsuits, complaints for foreclosure, affidavits of heirship, and other court documents that might affect property ownership status.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Solution architecture&lt;/h2&gt; 
&lt;p&gt;The AWS GenAIIC and Rocket Close teams collaboratively developed a solution that uses generative AI capabilities to automate the abstract package processing workflow. The following diagram shows the overall solution pipeline of the two-stage process using Amazon Textract for OCR processing and Amazon Bedrock for intelligent information extraction.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-125741 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/ML-19763-image-1.jpeg" alt="" width="1422" height="862"&gt;&lt;/p&gt; 
&lt;p&gt;The first stage of the pipeline uses Amazon Textract to convert document images into machine-readable text. The system processes PDF documents through advanced OCR features that detect layout, tables, forms, and signatures while preserving the document’s structural hierarchy. The extracted content is then converted to markdown format, maintaining both human readability and machine processability, and stored in &lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service&lt;/a&gt; (Amazon S3) and locally for further processing.&lt;/p&gt; 
&lt;p&gt;The second stage uses Amazon Bedrock FMs to perform comprehensive document analysis and data extraction. The system first classifies and segments documents by analyzing their content and creating a table of contents, using domain-specific knowledge resources. Then, based on the document type, it extracts relevant data fields using specialized prompts combined with domain knowledge. The extracted information is converted into standardized JSON format for seamless integration with other systems.&lt;/p&gt; 
&lt;p&gt;The solution’s effectiveness relies on several innovative technical approaches:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Advanced prompt engineering&lt;/strong&gt; – The team developed specialized prompts that strategically guide the behavior of the large language model (LLM) for different document processing tasks. Document analysis prompts combine content with classification guidelines to facilitate accurate document segmentation, and information extraction prompts incorporate field definitions and domain knowledge to target specific data elements within documents. These carefully crafted prompts include illustrative examples and precise formatting instructions that enable the model to produce consistent, structured outputs across various document types and formats.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Domain-specific knowledge integration&lt;/strong&gt; – The system incorporates industry-specific knowledge to help enhance extraction accuracy through several complementary approaches. A data field to document class mapping makes sure the system targets the appropriate information in each document type, and comprehensive data dictionaries provide clear field definitions and expected formats for extraction. Mortgage industry glossaries help the system accurately interpret specialized terminology and acronyms common in the financial domain. This domain knowledge is dynamically incorporated into prompts during processing, significantly improving the system’s ability to extract accurate information from complex documents.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Domain-aware evaluation framework&lt;/strong&gt; – The project’s success hinged on a sophisticated evaluation system that used more than basic accuracy metrics. The solution includes a comprehensive framework with metrics tailored to different field types, facilitating accurate assessment of extraction quality across the mortgage domain.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The team implemented specialized approaches including exact and fuzzy string matching, numeric comparisons with configurable tolerance, and mortgage-specific metrics for state codes, deed types, transaction types, and document references. Domain-specific matching functions handle variations in specialized content, and field-type specific metrics apply appropriate comparison methods.&lt;/p&gt; 
&lt;h2&gt;Results and impact&lt;/h2&gt; 
&lt;p&gt;The proof of concept demonstrated strong results that exceeded expectations and validated the approach’s effectiveness for Rocket Close’s document processing needs.&lt;/p&gt; 
&lt;p&gt;The solution underwent rigorous performance testing across multiple evaluation rounds. The initial validation phase tested 28 random samples containing 655 data fields, achieving an overall accuracy of 90.53%. This early success demonstrated the viability of the approach and provided confidence to proceed with more extensive testing.&lt;/p&gt; 
&lt;p&gt;The second round focused on targeted testing with 52 samples that had 1:1 mapping to ground truth data, encompassing 2,249 data fields. The system achieved 91.28% accuracy during this phase, confirming consistent performance across different document types and validating the extraction methodology against verified baseline data. This phase was particularly important for establishing confidence in the Amazon Textract and custom processing pipeline’s ability to handle diverse document formats.&lt;/p&gt; 
&lt;p&gt;The final evaluation involved large-scale verification that processed 1,792 samples containing over 44,000 data fields, achieving an overall accuracy of 89.71%. This extensive testing validated the solution’s scalability and reliability across a representative sample of Rocket Close’s document volume, demonstrating that the AWS infrastructure maintains high accuracy even when processing large batches of diverse documents in parallel.&lt;/p&gt; 
&lt;p&gt;This solution, powered by AWS, helps deliver considerable business value across multiple dimensions. The automated system reduces processing time from 30 minutes per package to under 2 minutes, making processing 15 times faster. This acceleration enables faster customer service and higher throughput. From a financial perspective, the solution considerably reduces processing costs, delivering notable savings per file. With thousands of files processed daily (approximately 2,000 files), this represents potential annual savings at an enterprise scale. The automated system also delivers enhanced quality and consistency, maintaining 90% overall accuracy while reducing human error and standardizing output formats. This consistency improves downstream processes and decision-making, facilitating reliable data for business operations. Furthermore, the cloud-based architecture provides improved scalability by handling increasing document volumes without proportional staffing increases, supporting business growth without linear cost increases. It’s designed to scale elastically to handle over 500,000 documents annually, with the ability to automatically scale during peak processing periods, positioning Rocket Close for future expansion without infrastructure constraints.&lt;/p&gt; 
&lt;h2&gt;Lessons learned&lt;/h2&gt; 
&lt;p&gt;The proof of concept engagement revealed several valuable insights that can guide similar document processing implementations on AWS.&lt;/p&gt; 
&lt;p&gt;Prompt engineering proved critical, because carefully crafted prompts that incorporate domain knowledge significantly improve extraction accuracy. The team developed specialized prompts that combine document content with classification guidelines and domain-specific knowledge.&lt;/p&gt; 
&lt;p&gt;The two-stage pipeline architecture demonstrated strong effectiveness for this use case. Separating OCR and LLM processing allows for better optimization of each stage. Amazon Textract handles the complex task of extracting text from various document formats while preserving structural information, and Amazon Bedrock (using Anthropic’s Claude) focuses on understanding the content and extracting relevant information.&lt;/p&gt; 
&lt;p&gt;Domain-specific knowledge integration emerged as another key success factor. Incorporating mortgage-specific terminology and document understanding significantly improves results. The solution uses data dictionaries, glossaries, and document class definitions to help enhance extraction accuracy.&lt;/p&gt; 
&lt;p&gt;The engagement also highlighted evaluation complexity as an important consideration. Developing sophisticated, domain-aware evaluation metrics is essential for accurately measuring performance. The evaluation framework employs specialized metrics tailored to different field types, including state code matching, deed type matching, and transaction type matching.&lt;/p&gt; 
&lt;p&gt;Finally, scalability considerations proved crucial from the initial design phase. The solution architecture must be designed from the start to handle high volumes of documents efficiently. The two-stage pipeline approach with Amazon Textract and Amazon Bedrock helps provide the necessary scalability.&lt;/p&gt; 
&lt;h2&gt;What’s next&lt;/h2&gt; 
&lt;p&gt;Following the successful proof of concept, Rocket Close is positioned to move forward with production implementation.&lt;/p&gt; 
&lt;p&gt;The next phase involves moving from POC to production deployment with a containerized architecture that can handle enterprise-scale document processing. The team plans to establish continuous improvement processes by creating feedback loops to improve extraction accuracy over time. This iterative approach allows the system to learn from processing results and adapt to evolving document patterns.&lt;/p&gt; 
&lt;p&gt;An important consideration for long-term success is developing a model update strategy. Rocket Close will create a strategy for updating LLM models as new versions become available from Amazon Bedrock, making sure the solution benefits from the latest advancements in language model capabilities.&lt;/p&gt; 
&lt;p&gt;Finally, the proven approach will be expanded to additional workflows beyond the initial scope. Rocket Close plans to apply the solution to loan and mortgage payoff processing, purchase agreement processing, and title clearance documentation, extending the benefits of automated document processing across more of their operations.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The Rocket Close and AWS Generative AI Innovation Center collaboration demonstrates the transformative potential of generative AI in document-intensive industries. By automating the complex task of abstract package processing, Rocket Close has positioned itself to achieve major operational efficiencies, cost savings, and improved scalability. The solution’s strong 90% overall accuracy, combined with the dramatic reduction in processing time from hours to minutes, showcases how generative AI can solve real-world business challenges in the mortgage and title industry.&lt;/p&gt; 
&lt;p&gt;As Rocket Close moves toward production implementation, the foundation established during this proof of concept will enable continued innovation and process optimization across their document processing workflows.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-125750" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/JeremyLittle.jpg" alt="" width="100" height="112"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Jeremy Little&lt;/h3&gt; 
  &lt;p&gt;Jeremy Little is a Lead Senior Solution Architect at Rocket Close. He designs and oversees the implementation of technical solutions that enhance operational efficiency and improve customer experience in the mortgage services industry.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-125751" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/ChrisDay.png" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Chris Day&lt;/h3&gt; 
  &lt;p&gt;Chris Day is Vice President of Engineering at Rocket Close. He leads the engineering teams responsible for developing and implementing technology solutions that streamline the title and appraisal management processes.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-125752" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/SirajusSalekin.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sirajus Salekin&lt;/h3&gt; 
  &lt;p&gt;Sirajus Salekin is an Applied Scientist at the AWS Generative AI Innovation Center. He specializes in developing machine learning and generative AI solutions for enterprise customers across various industries.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-125753" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/AhsanAli.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ahsan Ali&lt;/h3&gt; 
  &lt;p&gt;Ahsan Ali is a Senior Applied Scientist at the AWS Generative AI Innovation Center. He focuses on implementing machine learning and generative AI solutions to solve complex business problems.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-125754" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/UjwalaBitla.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ujwala Bitla&lt;/h3&gt; 
  &lt;p&gt;Ujwala Bitla is a Deep Learning Architect at the AWS Generative AI Innovation Center. She designs scalable AI architectures for enterprise customers.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-125761" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/SandyFarr.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sandy Farr&lt;/h3&gt; 
  &lt;p&gt;Sandy Farr is an Applied Science Manager at the AWS Generative AI Innovation Center. She leads teams developing innovative generative AI solutions for AWS customers.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-125756" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/03/06/JordanRatner.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Jordan Ratner&lt;/h3&gt; 
  &lt;p&gt;Jordan Ratner is a Senior Generative AI Strategist at the AWS Generative AI Innovation Center. He helps customers identify and implement generative AI opportunities.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
	</channel>
</rss>