<?xml version="1.0" encoding="UTF-8" standalone="no"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" version="2.0">

<channel>
	<title>AWS Big Data Blog</title>
	<atom:link href="https://aws.amazon.com/blogs/big-data/feed/" rel="self" type="application/rss+xml"/>
	<link>https://aws.amazon.com/blogs/big-data/</link>
	<description>Official Big Data Blog of Amazon Web Services</description>
	<lastBuildDate>Tue, 28 Apr 2026 17:29:01 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Unified observability in Amazon OpenSearch Service: metrics, traces, and AI agent debugging in a single interface</title>
		<link>https://aws.amazon.com/blogs/big-data/unified-observability-in-amazon-opensearch-service-metrics-traces-and-ai-agent-debugging-in-a-single-interface/</link>
					
		
		<dc:creator><![CDATA[Muthu Pitchaimani]]></dc:creator>
		<pubDate>Tue, 28 Apr 2026 17:29:01 +0000</pubDate>
				<category><![CDATA[Amazon OpenSearch Service]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Launch]]></category>
		<guid isPermaLink="false">1402dbf1fe29f4a55e798bc7d879de3064ae09bf</guid>

					<description>Amazon OpenSearch Service now brings application monitoring, native Amazon Managed Service for Prometheus integration, and AI agent tracing together in OpenSearch UI's observability workspace. In this post, we walk through two real-world scenarios using the OpenTelemetry sample app: a multi-agent travel planner facing slow processing, and a checkout flow quietly failing on one microservice.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon OpenSearch Service&lt;/a&gt; now brings application monitoring, native &lt;a href="https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Prometheus&lt;/a&gt; integration, and AI agent tracing together in &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/application.html" target="_blank" rel="noopener noreferrer"&gt;OpenSearch UI&lt;/a&gt;‘s observability workspace. You can query Prometheus metrics with &lt;a href="https://prometheus.io/docs/prometheus/latest/querying/basics/" target="_blank" rel="noopener noreferrer"&gt;PromQL&lt;/a&gt; alongside logs and traces stored in Amazon OpenSearch Service, trace an AI agent’s full reasoning chain down to the failing tool call, and drill from a service-level health view to the exact span that caused a checkout failure, all without leaving the interface.&lt;/p&gt; 
&lt;p&gt;In this post, we walk through two real-world scenarios using the OpenTelemetry sample app: a multi-agent travel planner facing slow processing, and a checkout flow quietly failing on one microservice. We chase each one to its root cause using these new capabilities.&lt;/p&gt; 
&lt;h2&gt;Scenario 1: An underperforming AI agent&lt;/h2&gt; 
&lt;p&gt;Your multi-agent travel planner is live and users start reporting slow responses. With the new AI agent tracing capability in Amazon OpenSearch Service, you can trace the agent’s full processing path to pinpoint exactly where things went wrong.&lt;/p&gt; 
&lt;p&gt;In any observability workspace in OpenSearch UI, navigate to &lt;strong&gt;Application Map&lt;/strong&gt; in the left navigation pane.&lt;/p&gt; 
&lt;p&gt;&lt;img class="alignnone size-full wp-image-90438" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image003.jpg" alt="OpenSearch Service application map" width="2258" height="1520"&gt;&lt;/p&gt; 
&lt;p&gt;You can see the full topology of your system including the travel agent and the sub-agents it calls. The travel agent node shows elevated latency and occasional errors. Select it, and the side panel confirms that latency is up but the latency chart shows intermittent spikes rather than consistent degradation.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90439" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image005-scaled.jpg" alt="System topology with service health metrics" width="2560" height="1302"&gt;&lt;/p&gt; 
&lt;p&gt;The application map tells you something is wrong, but understanding &lt;em&gt;why&lt;/em&gt; an AI agent is underperforming requires seeing its reasoning chain. Select &lt;strong&gt;Agent Traces&lt;/strong&gt; in the left navigation pane, then filter by service name and time range.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90440" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image007.png" alt="Agent processing steps with invocation data" width="1430" height="728"&gt;&lt;/p&gt; 
&lt;p&gt;Select one of the traces to see the trace tree. Unlike a traditional span waterfall, this view organizes around the agent’s reasoning chain: the root agent span, the LLM calls it made, the tools it invoked, and how they nested each step color-coded by type. The trace map provides a visual directed graph of the same execution. You can see which model was called, how many input and output tokens were consumed, and the actual messages sent to and received from the model.&lt;/p&gt; 
&lt;p&gt;A tool call inside the weather agent errored out. The agent then spent additional time reasoning about the failure before returning a partial response explaining the intermittent latency spikes and occasional faults.&lt;/p&gt; 
&lt;h3&gt;Why this matters for AI agents&lt;/h3&gt; 
&lt;p&gt;Agents make autonomous decisions based on LLM responses, tool results, and chained reasoning. Unlike traditional microservices with deterministic code paths, agent behavior varies across executions. Without semantic tracing that captures these AI-specific signals, root-cause analysis is guesswork. The trace tree surfaced the model name, token counts, and failing tool call because the travel planner was instrumented with OpenTelemetry’s generative AI semantic conventions. The next section describes how.&lt;/p&gt; 
&lt;h3&gt;Instrumenting AI agents&lt;/h3&gt; 
&lt;p&gt;OpenTelemetry auto-instrumentation enriches spans with well-known attributes for HTTP, database, and gRPC calls. AI agents need a different set of attributes such as which LLM was called, what tokens were consumed, which tools were invoked, that standard instrumentation doesn’t cover.&lt;/p&gt; 
&lt;p&gt;The &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/" target="_blank" rel="noopener"&gt;OpenTelemetry gen_ai semantic conventions&lt;/a&gt; define standard attributes for these signals, including &lt;code&gt;gen_ai.operation.name&lt;/code&gt;, &lt;code&gt;gen_ai.usage.input_tokens&lt;/code&gt;, &lt;code&gt;gen_ai.request.model&lt;/code&gt;, and &lt;code&gt;gen_ai.tool.name&lt;/code&gt;. When Amazon OpenSearch Service receives spans with these attributes, it categorizes them by operation type (agent, LLM, tool, embeddings, retrieval) and renders the agent trace tree and trace map views.&lt;/p&gt; 
&lt;p&gt;The Python SDK provides one way to generate these spans. To send traces to Amazon OpenSearch Ingestion, configure the SDK with AWS Signature Version 4 (SigV4) authentication. The &lt;code&gt;AWSSigV4OTLPExporter&lt;/code&gt; cryptographically signs each HTTP request to help prevent unauthorized data ingestion. The calling identity needs an IAM policy that grants &lt;code&gt;osis:Ingest&lt;/code&gt; on your pipeline’s ARN. Credentials are resolved through the standard AWS credential provider chain.&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-python"&gt;from opensearch_genai_observability_sdk_py import register, AWSSigV4OTLPExporter

exporter = AWSSigV4OTLPExporter(
    endpoint="https://pipeline.us-east-1.osis.amazonaws.com/v1/traces",
    service="osis",
    region="us-east-1",
)

register(service_name="my-agent", exporter=exporter)
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Use the &lt;code&gt;@observe&lt;/code&gt; decorator to trace agent functions and &lt;code&gt;enrich()&lt;/code&gt; to add model metadata:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="language-python"&gt;@observe(op=Op.EXECUTE_TOOL)
def get_weather(city: str) -&amp;gt; dict:
    return {"city": city, "temp": 22, "condition": "sunny"}

@observe(op=Op.INVOKE_AGENT)
def assistant(query: str) -&amp;gt; str:
    enrich(model="gpt-4o", provider="openai")
    data = get_weather("Paris")
    return f"{data['condition']}, {data['temp']}C"

result = assistant("What's the weather?")
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The SDK also supports auto-instrumentation for OpenAI, Anthropic, Amazon Bedrock, LangChain, LlamaIndex, and others. Because the instrumentation is built on OpenTelemetry standards, any agent framework that emits spans with &lt;code&gt;gen_ai.*&lt;/code&gt; attributes is compatible with OpenSearch UI.&lt;/p&gt; 
&lt;h2&gt;Scenario 2: Investigating a microservice issue&lt;/h2&gt; 
&lt;p&gt;AI agents are only one part of most production environments. The same interface surfaces telemetry from conventional microservices, where the troubleshooting workflow follows a more familiar path.&lt;/p&gt; 
&lt;p&gt;Your ecommerce checkout begins paging during a busy traffic window. From OpenSearch UI, navigate to &lt;strong&gt;APM Services&lt;/strong&gt; in the left navigation pane. Every instrumented service is listed alongside its health indicators. The checkout service shows an elevated error rate.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90441" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image009-scaled.jpg" alt="Service overview panel with request, error, duration metrics" width="2560" height="1306"&gt;&lt;/p&gt; 
&lt;p&gt;Select the affected service. The detail view shows Request, Error, and Duration (RED) metrics: request rate is climbing, fault rate has spiked in the last 15 minutes, and p99 duration has doubled. You can see exactly when the degradation started.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90442" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image011.png" alt="Service drilldown health dashboard" width="1431" height="723"&gt;&lt;/p&gt; 
&lt;p&gt;Drill into the correlated spans for the affected time window. The span list shows multiple failed requests, all hitting the same endpoint. Select one to see the full trace waterfall. The checkout service called &lt;code&gt;prepareOrder&lt;/code&gt;, which failed trying to retrieve a product from the catalog. The error message in the span details tells you exactly what went wrong, that’s your root cause.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-90443 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image013.png" alt="Waterfall transaction view of spans" width="1429" height="730"&gt;&lt;/p&gt; 
&lt;h3&gt;Checking the infrastructure with PromQL&lt;/h3&gt; 
&lt;p&gt;In both scenarios, the natural next question is whether the problem originates in the application or in the infrastructure beneath it. With the new Amazon Managed Service for Prometheus integration, you can answer that question without leaving OpenSearch UI.&lt;/p&gt; 
&lt;p&gt;Prometheus metrics are now queryable directly from the same workspace using native PromQL syntax, alongside the logs and traces you’ve already been navigating.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90444" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image015.png" alt="Metric query showing Prometheus Query Language" width="1431" height="820"&gt;&lt;/p&gt; 
&lt;p&gt;For the database timeout in Scenario 2, run a PromQL query to check the database instance’s read/write throughput for the same time window. For the agent latency issue in Scenario 1, check the LLM endpoint’s response time metrics to see if the slowness originates from the model provider.&lt;/p&gt; 
&lt;p&gt;This is a key architectural decision: metrics continue to live in Amazon Managed Service for Prometheus, logs and traces continue to live in Amazon OpenSearch Service, and neither signal is copied or warehoused into a second store. Each backend remains the single store for the data type it’s purpose-built to handle, while OpenSearch UI federates queries across both at runtime. The cost, retention, and operational model of each store stay intact while the troubleshooting workflow collapses into a single interface.&lt;/p&gt; 
&lt;p&gt;To configure the OpenTelemetry Collector and OpenSearch Ingestion pipelines that route metrics into Amazon Managed Service for Prometheus, see &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/observability-ingestion.html" target="_blank" rel="noopener"&gt;Ingesting application telemetry&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;How it’s wired together&lt;/h2&gt; 
&lt;p&gt;The following diagram shows the end-to-end architecture. Applications instrumented with OpenTelemetry send traces, logs, and metrics over OTLP to Amazon OpenSearch Ingestion. OpenSearch Ingestion routes each signal to the appropriate store: traces and logs land in Amazon OpenSearch Service, while metrics flow into Amazon Managed Service for Prometheus. OpenSearch UI then queries both stores to render the Application Map, Services catalog, Agent Traces, and Metrics views.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90446" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image019.png" alt="OpenSearch Observability Stack Architecture" width="1202" height="472"&gt;&lt;/p&gt; 
&lt;p&gt;The entire experience rests on open-source foundations, Prometheus for metrics, OpenSearch for logs and traces, and OpenTelemetry for instrumentation, so teams already running an OpenTelemetry collector can adopt it by updating the collector’s export configuration to point at Amazon OpenSearch Ingestion, with no proprietary agents or rewritten instrumentation required.&lt;/p&gt; 
&lt;h2&gt;Getting started&lt;/h2&gt; 
&lt;p&gt;To enable these capabilities, log in to OpenSearch UI’s observability workspace, select the &lt;strong&gt;Gear&lt;/strong&gt; icon in the bottom left corner to open Settings and setup, and verify that the &lt;strong&gt;Observability:apmEnabled&lt;/strong&gt; toggle is on under the Observability section. OpenSearch UI is available at no additional charge for Amazon OpenSearch Service customers.&lt;/p&gt; 
&lt;div style="width: 640px;" class="wp-video"&gt;
 &lt;video class="wp-video-shortcode" id="video-90656-1" width="640" height="360" preload="metadata" controls="controls"&gt;
  &lt;source type="video/mp4" src="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/BDB-5856/BDB-5856.mp4?_=1"&gt;
 &lt;/video&gt;
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Explore locally first.&lt;/strong&gt; The &lt;a href="https://opensearch.org/platform/observability-stack/" target="_blank" rel="noopener"&gt;OpenSearch Observability Stack&lt;/a&gt; gives you a fully configured environment including application monitoring, agent tracing, and Prometheus integration, running on your machine with a single install command. It ships with sample instrumented services, including a multi-agent travel planner, so you can explore the full workflow with real telemetry data out of the box.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;For AI agent development.&lt;/strong&gt; &lt;a href="https://observability.opensearch.org/docs/agent-health/" target="_blank" rel="noopener"&gt;Agent Health&lt;/a&gt; is an open-source, evaluation-driven observability tool designed for local development. It gives you execution flow graphs, token tracking, and tool invocation visibility right in your development loop, before you push to production.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;For production.&lt;/strong&gt; The &lt;a href="https://observability.opensearch.org/docs/send-data/ai-agents/python/" target="_blank" rel="noopener"&gt;Python SDK&lt;/a&gt; provides one-line setup and decorator-based tracing with gen_ai semantic conventions, with auto-instrumentation support for OpenAI, Anthropic, Amazon Bedrock, LangChain, LlamaIndex, and others. See the &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/observability.html" target="_blank" rel="noopener"&gt;Amazon OpenSearch Service documentation&lt;/a&gt; and the &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/direct-query-prometheus-overview.html" target="_blank" rel="noopener"&gt;Amazon Managed Service for Prometheus integration guide&lt;/a&gt; for the full managed experience.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90447" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image021.png" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Muthu Pitchaimani&lt;/h3&gt; 
  &lt;p&gt;Muthu is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search applications and solutions. Muthu is interested in the topics of networking and security, and is based out of Austin, Texas.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90450" style="font-size: 16px" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image022.png" alt="" width="100" height="102"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Raaga N.G&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/raaga-shree/" target="_blank" rel="noopener noreferrer"&gt;Raaga&lt;/a&gt; is a Solutions Architect at AWS with over 5 years of experience helping enterprises modernize their technology landscape and build scalable, cloud-native solutions. She partners with customers to translate business requirements into efficient cloud architectures that drive measurable outcomes, supporting their journey from application modernization to AI adoption through thoughtful, customer-centric solutions.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90448" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image023.png" alt="" width="1920" height="2560"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Rekha Thottan&lt;/h3&gt; 
  &lt;p&gt;Rekha Thottan is a Senior Technical Product Manager at AWS OpenSearch, contributing to AI agent observability and evaluation for the OpenSearch Project.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90449" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/image025.png" alt="" width="576" height="768"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Kevin Lewin&lt;/h3&gt; 
  &lt;p&gt;Kevin is a Cloud Operations Specialist Solution Architect at Amazon Web Services. He focuses on helping customers achieve their operational goals through observability and automation.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		<enclosure length="30351156" type="video/mp4" url="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/BDB-5856/BDB-5856.mp4"/>

			</item>
		<item>
		<title>Migrate to Apache Flink 2.2 on Amazon Managed Service for Apache Flink</title>
		<link>https://aws.amazon.com/blogs/big-data/migrate-to-apache-flink-2-2-on-amazon-managed-service-for-apache-flink/</link>
					
		
		<dc:creator><![CDATA[Francisco Morillo]]></dc:creator>
		<pubDate>Mon, 27 Apr 2026 17:57:34 +0000</pubDate>
				<category><![CDATA[Amazon Managed Service for Apache Flink]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[AWS CloudFormation]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">85924922056cff871769bd6a673bf09a18f9e7a4</guid>

					<description>In this post, we explain what's new in Amazon Managed Service for Apache Flink 2.2, provide a guided migration using CLI commands, console instructions, and code examples, and show you how to monitor the upgrade and roll back if needed.</description>
										<content:encoded>&lt;p&gt;Migrating to &lt;a href="https://aws.amazon.com/managed-service-apache-flink/" target="_blank" rel="noopener noreferrer"&gt;Apache Flink 2.2&lt;/a&gt; on&amp;nbsp;&lt;a href="https://aws.amazon.com/managed-service-apache-flink/" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink&amp;nbsp;&lt;/a&gt;gives you access to Java 17 runtime, faster checkpoints and recovery through RocksDB 8.10.0, and SQL-native artificial intelligence and machine learning (AI/ML) inference. If you run Flink 1.x today, you might be dealing with an aging Java 11 runtime that will no longer receive standard support by the end of this year, slower state backend performance, and a fragmented API surface split across DataSet, DataStream, and legacy connector interfaces. Flink 2.2 addresses these gaps in a single major version upgrade.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://flink.apache.org/" target="_blank" rel="noopener noreferrer"&gt;Apache Flink&amp;nbsp;&lt;/a&gt;is an open source distributed processing engine for stream and batch data, with first-class support for stateful processing and event-time semantics. Amazon Managed Service for Apache Flink removes the operational overhead of running Flink. You provide your application code, and the service provisions, scales, checkpoints, and patches the infrastructure for you.&lt;/p&gt; 
&lt;p&gt;In this post, we explain what’s new in &lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/flink-2-2.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink 2.2&lt;/a&gt;, provide a guided migration using CLI commands, console instructions, and code examples, and show you how to monitor the upgrade and roll back if needed.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Before you upgrade:&lt;/strong&gt;&amp;nbsp;Flink 2.2 removes the DataSet API, drops Java 11 support, and replaces legacy connector interfaces. We recommend reviewing the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/flink-2-2-upgrade-guide.html" target="_blank" rel="noopener noreferrer"&gt;Upgrading to Flink 2.2: Complete Guide&amp;nbsp;&lt;/a&gt;and the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/state-compatibility.html" target="_blank" rel="noopener noreferrer"&gt;State Compatibility Guide for Flink 2.2 Upgrade&lt;/a&gt;s&amp;nbsp;before upgrading production applications.&lt;/p&gt; 
&lt;h2&gt;What’s new in Amazon Managed Service for Apache Flink 2.2&lt;/h2&gt; 
&lt;p&gt;This release spans runtime upgrades, SQL, and Table API capabilities. The following sections break down each area.&lt;/p&gt; 
&lt;h3&gt;Runtime and performance&lt;/h3&gt; 
&lt;p&gt;These changes improve application performance and bring your runtime up to current standards.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;&lt;a href="https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/deployment/java_compatibility/" target="_blank" rel="noopener noreferrer"&gt;Java 17 runtime&lt;/a&gt; –&lt;/strong&gt;&amp;nbsp;Flink 2.2 requires Java 17. Build your application code with JDK 17 for better garbage collection, a more secure runtime, and modern language features like sealed classes and records. Java 11 is no longer supported.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;&lt;a href="https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/python/overview/" target="_blank" rel="noopener noreferrer"&gt;Python 3.12&lt;/a&gt; –&lt;/strong&gt;&amp;nbsp;Flink 2.2 requires Python 3.9+, with Python 3.12 as the default. Python 3.8 is no longer supported.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;RocksDB 8.10.0 –&lt;/strong&gt;&amp;nbsp;Your stateful applications benefit from improved I/O performance with the upgraded state backend, resulting in faster checkpoints and recovery.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Dedicated collection serializers –&lt;/strong&gt;&amp;nbsp;Improved serializers for Map, List, and Set types reduce serialization overhead, which lowers checkpoint sizes for applications that use these data structures frequently.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Kryo 5.6 –&lt;/strong&gt;&amp;nbsp;Kryo upgrades from version 2.24–5.6. This has state compatibility implications covered in the migration section.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;SQL and Table API highlights&lt;/h3&gt; 
&lt;p&gt;With Flink 2.2, you can:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Call Machine Leaning (ML) models directly from SQL using &lt;a href="https://nightlies.apache.org/flink/flink-docs-release-2.2/" target="_blank" rel="noopener noreferrer"&gt;ML_PREDICT&lt;/a&gt; and CREATE MODEL&lt;/li&gt; 
 &lt;li&gt;Work with semistructured data through the native &lt;a href="https://nightlies.apache.org/flink/flink-docs-master/docs/sql/reference/data-types/" target="_blank" rel="noopener noreferrer"&gt;VARIANT type&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Build stateful event-driven logic in SQL with &lt;a href="https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/functions/ptfs/" target="_blank" rel="noopener noreferrer"&gt;ProcessTableFunction&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;Run more efficient streaming joins with &lt;a href="https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/table/runtime/operators/join/stream/StreamingMultiJoinOperator.html" target="_blank" rel="noopener noreferrer"&gt;StreamingMultiJoinOperator&lt;/a&gt; and &lt;a href="https://flink.apache.org/2025/12/04/apache-flink-2.2.0-advancing-real-time-data--ai-and-empowering-stream-processing-for-the-ai-era/#delta-join" target="_blank" rel="noopener noreferrer"&gt;Delta Join&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;For details on these features, see the&amp;nbsp;&lt;a href="https://flink.apache.org/2025/12/04/apache-flink-2.2.0-advancing-real-time-data--ai-and-empowering-stream-processing-for-the-ai-era/" target="_blank" rel="noopener noreferrer"&gt;Apache Flink 2.2 release documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Migrating from Flink 1.x to 2.2&lt;/h2&gt; 
&lt;h3&gt;In-place version upgrades&lt;/h3&gt; 
&lt;p&gt;You can upgrade a running Flink 1.x application to 2.2 using the &lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/how-in-place-version-upgrades.html" target="_blank" rel="noopener noreferrer"&gt;UpdateApplication API&lt;/a&gt;, the AWS Management Console, AWS CloudFormation, the AWS SDK, and Terraform Modules. The upgrade preserves your application configuration, logs, metrics, tags, and, if your state and binaries are compatible.&lt;/p&gt; 
&lt;h3&gt;Auto-rollback&lt;/h3&gt; 
&lt;p&gt;With &lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/troubleshooting-system-rollback.html" target="_blank" rel="noopener noreferrer"&gt;auto-rollback&lt;/a&gt; turned on, binary incompatibilities detected during job startup trigger an automatic revert to the previous Flink version within minutes, with no manual intervention required. For state incompatibilities that surface as restart loops after a successful upgrade, invoke the Rollback API to return to your previous version and state.&lt;/p&gt; 
&lt;h3&gt;Unsupported open source features&lt;/h3&gt; 
&lt;p&gt;The following Flink 2.2 features aren’t currently supported in Amazon Managed Service for Apache Flink because they’re still considered experimental: Materialized Tables, ForSt State Backend (disaggregated state storage), Java 21, and custom metric reporters/telemetry configurations. We continue to evaluate these features as they mature in the Apache Flink project and will share updates on availability. You can have a closer look to which features are supported in &lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/flink-2-2.html#flink-2-2-supported-features" target="_blank" rel="noopener noreferrer"&gt;Apache Flink 2.2 features supported&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;Now that you know what’s changed, the next section walks through the migration process.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before starting the migration, confirm that you have the following in place:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An existing Apache Flink 1.x application running on Amazon Managed Service for Apache Flink.&lt;/li&gt; 
 &lt;li&gt;JDK 17 installed in your local build environment.&lt;/li&gt; 
 &lt;li&gt;The AWS Command Line Interface (AWS CLI) installed and configured with permissions to call the&amp;nbsp;kinesisanalyticsv2&amp;nbsp;APIs (UpdateApplication, CreateApplicationSnapshot, DescribeApplication, RollbackApplication).&lt;/li&gt; 
 &lt;li&gt;An Amazon Simple Storage Service (Amazon S3) bucket to upload your updated application JAR.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;We recommend testing each phase on a non-production replica of your application before applying the same steps to production.&lt;/p&gt; 
&lt;h3&gt;Step 1: Update your application code&lt;/h3&gt; 
&lt;p&gt;Start by updating your Flink dependencies to version 2.2.0 and replacing deprecated APIs. The following sections show the most common changes.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Update your pom.xml:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-html"&gt;&amp;lt;properties&amp;gt;
    &amp;lt;flink.version&amp;gt;2.2.0&amp;lt;/flink.version&amp;gt;
    &amp;lt;java.version&amp;gt;17&amp;lt;/java.version&amp;gt;
&amp;lt;/properties&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Replace legacy Kinesis connectors:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Flink 2.2 removes the&amp;nbsp;FlinkKinesisConsumer&amp;nbsp;and&amp;nbsp;FlinkKinesisProducer&amp;nbsp;classes. The following example shows how to migrate to the FLIP-27 based&amp;nbsp;KinesisStreamsSource.Before (Flink 1.x):&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-java"&gt;FlinkKinesisConsumer&amp;lt;String&amp;gt; consumer = new FlinkKinesisConsumer&amp;lt;&amp;gt;(
    "my-stream",
    new SimpleStringSchema(),
    consumerConfig);
env.addSource(consumer);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After (Flink 2.2):&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-java"&gt;KinesisStreamsSource&amp;lt;String&amp;gt; source = KinesisStreamsSource.&amp;lt;String&amp;gt;builder()
    .setStreamArn("arn:aws:kinesis:us-east-1:123456789012:stream/my-stream")
    .setDeserializationSchema(new SimpleStringSchema())
    .build();
env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kinesis Source");&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Update connector dependencies:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The following AWS connectors have Flink 2.x-compatible releases:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Connector&lt;/strong&gt;&lt;/th&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Flink 2.x Artifact&lt;/strong&gt;&lt;/th&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Version&lt;/strong&gt;&lt;/th&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Apache Kafka&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;flink-connector-kafka&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;4.0.0-2.0&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon Kinesis Data Streams&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;flink-connector-aws-kinesis-streams&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6.0.0-2.0&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon Data Firehose&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;flink-connector-aws-kinesis-firehose&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6.0.0-2.0&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon DynamoDB&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;flink-connector-dynamodb&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6.0.0-2.0&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon Simple Queue Service (Amazon SQS)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;flink-connector-sqs&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;6.0.0-2.0&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;During writing, the JDBC, OpenSearch, and Prometheus connectors don’t yet have Flink 2.x-compatible releases. For the latest versions, see the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/how-flink-connectors.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink connector documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Beyond connector updates, make the following code changes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Replace DataSet API usage with the DataStream API or Table API/SQL.&lt;/li&gt; 
 &lt;li&gt;Replace Scala API usage with the Java API.&lt;/li&gt; 
 &lt;li&gt;Verify that your build targets JDK 17.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Build your updated application JAR and upload it to Amazon S3 with a different file name than your current JAR (for example,&amp;nbsp;my-app-flink-2.2.jar).&lt;/p&gt; 
&lt;h3&gt;Step 2: Check state compatibility&lt;/h3&gt; 
&lt;p&gt;Before upgrading, assess whether your application state is compatible with Flink 2.2. The Kryo upgrade from version 2.24 to 5.6 changes the binary format of serialized state. Applications using POJOs with Java collections (HashMap, ArrayList, HashSet) are the most common source of incompatibility.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Quick compatibility check:&lt;/strong&gt;&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Serialization type&lt;/strong&gt;&lt;/th&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Compatible?&lt;/strong&gt;&lt;/th&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Avro (SpecificRecord, GenericRecord)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; Yes&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Protobuf&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; Yes&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;POJOs without collections&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; Yes&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Custom TypeSerializers (no Kryo delegation)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; Yes&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;POJOs with Java collections&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; No&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Scala case classes&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; No&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Types using Kryo fallback&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;img src="https://s.w.org/images/core/emoji/14.0.0/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;"&gt; No&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Check your logs for Kryo fallback:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Search your application logs for this pattern, which indicates a type is falling back to Kryo serialization:&lt;code&gt;Class class &amp;lt;className&amp;gt; cannot be used as a POJO type&lt;/code&gt;&lt;/p&gt; 
&lt;h3&gt;Step 3: Turn on auto-rollback and automatic snapshots&lt;/h3&gt; 
&lt;p&gt;Turn on auto-rollback so the service automatically reverts to the previous version if the upgrade fails. Also, verify that automatic snapshots are turned on. The service takes a snapshot before the upgrade that serves as your rollback point.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Check current settings:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;aws kinesisanalyticsv2 describe-application \
    --application-name MyApplication \
    --query 'ApplicationDetail.ApplicationConfigurationDescription.{
        AutoRollback: ApplicationSystemRollbackConfigurationDescription.RollbackEnabled,
        AutoSnapshots: ApplicationSnapshotConfigurationDescription.SnapshotsEnabled
    }'&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Turn on both if they’re not already active:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;aws kinesisanalyticsv2 update-application \
    --application-name MyApplication \
    --current-application-version-id &amp;lt;version-id&amp;gt; \
    --application-configuration-update '{
        "ApplicationSystemRollbackConfigurationUpdate": {
            "RollbackEnabledUpdate": true
        },
        "ApplicationSnapshotConfigurationUpdate": {
            "SnapshotsEnabledUpdate": true
        }
    }'&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Step 4: Take a manual snapshot (recommended)&lt;/h3&gt; 
&lt;p&gt;Although the upgrade process takes an automatic snapshot, taking a manual snapshot gives you a named restore point that you can quickly identify.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws kinesisanalyticsv2 create-application-snapshot \
    --application-name MyApplication \
    --snapshot-name pre-flink-2.2-upgrade&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Verify that the snapshot is ready before proceeding:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws kinesisanalyticsv2 describe-application-snapshot \
    --application-name MyApplication \
    --snapshot-name pre-flink-2.2-upgrade&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Wait until&amp;nbsp;SnapshotStatus&amp;nbsp;is&amp;nbsp;READY.&lt;/p&gt; 
&lt;h3&gt;Step 5: Run the upgrade&lt;/h3&gt; 
&lt;p&gt;Run the upgrade while the application is in&amp;nbsp;RUNNING&amp;nbsp;or&amp;nbsp;READY&amp;nbsp;(stopped) state. The following example upgrades a running application and points to the new JAR.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;AWS CLI:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;aws kinesisanalyticsv2 update-application \
    --application-name MyApplication \
    --current-application-version-id &amp;lt;version-id&amp;gt; \
    --runtime-environment-update FLINK-2_2 \
    --application-configuration-update '{
        "ApplicationCodeConfigurationUpdate": {
            "CodeContentUpdate": {
                "S3ContentLocationUpdate": {
                    "FileKeyUpdate": "my-app-flink-2.2.jar"
                }
            }
        }
    }'&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;AWS Management Console:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;To upgrade from the console, follow these steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to your application in the Amazon Managed Service for Apache Flink console.&lt;/li&gt; 
 &lt;li&gt;Choose&amp;nbsp;&lt;strong&gt;Configure&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select the&amp;nbsp;&lt;strong&gt;Flink 2.2&lt;/strong&gt;&amp;nbsp;runtime.&lt;/li&gt; 
 &lt;li&gt;Point to your new application JAR on Amazon S3.&lt;/li&gt; 
 &lt;li&gt;Select the snapshot to restore from (use&amp;nbsp;&lt;strong&gt;Latest&lt;/strong&gt;&amp;nbsp;to start from the most recent snapshot).&lt;/li&gt; 
 &lt;li&gt;Choose&amp;nbsp;&lt;strong&gt;Update&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;strong&gt;AWS CloudFormation:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Update the&amp;nbsp;&lt;code&gt;RuntimeEnvironment&lt;/code&gt;&amp;nbsp;field in your template. AWS CloudFormation now performs an in-place update instead of deleting and recreating the application.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Terraform:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;If you manage your Flink application with Terraform, you can perform the same in-place upgrade by updating the&amp;nbsp;&lt;code&gt;runtime_environment&lt;/code&gt; and code reference in your&amp;nbsp;aws_kinesisanalyticsv2_application&amp;nbsp;resource. Note: Terraform support for&amp;nbsp;FLINK-2_2&amp;nbsp;requires AWS provider version 6.40.0 or later (released April 8, 2026). Earlier provider versions don’t recognize this runtime value. First, update your provider version constraint:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "&amp;gt;= 6.40.0"
    }
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Then run&amp;nbsp;terraform init -upgrade&amp;nbsp;to pull the new provider.Next, update your application resource. Change&amp;nbsp;&lt;code&gt;runtime_environment&lt;/code&gt;&amp;nbsp;from&amp;nbsp;“FLINK-1_20”&amp;nbsp;to&amp;nbsp;“FLINK-2_2”&amp;nbsp;and point to your new JAR:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-typescript"&gt;resource "aws_kinesisanalyticsv2_application" "my_app" {
  name                   = "MyApplication"
  runtime_environment    = "FLINK-2_2"
  service_execution_role = aws_iam_role.flink.arn
  application_configuration {
    application_code_configuration {
      code_content_type = "ZIPFILE"
      code_content {
        s3_content_location {
          bucket_arn = aws_s3_bucket.app_code.arn
          file_key   = "my-app-flink-2.2.jar"
        }
      }
    }
    application_snapshot_configuration {
      snapshots_enabled = true
    }
    flink_application_configuration {
      checkpoint_configuration {
        configuration_type = "DEFAULT"
      }
      monitoring_configuration {
        configuration_type = "CUSTOM"
        log_level          = "INFO"
        metrics_level      = "APPLICATION"
      }
      parallelism_configuration {
        auto_scaling_enabled = true
        configuration_type   = "CUSTOM"
        parallelism          = 4
        parallelism_per_kpu  = 1
      }
    }
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Run the upgrade:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;terraform plan    # Review the in-place update
terraform apply   # Apply the runtime change&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Terraform will perform an in-place update of the application, changing the runtime version and code location. The application will restart with the new Flink 2.2 runtime. To roll back with Terraform, revert &lt;code&gt;runtime_environment&lt;/code&gt;&amp;nbsp;to&amp;nbsp;“FLINK-1_20”, point&amp;nbsp;&lt;code&gt;file_key&lt;/code&gt;&amp;nbsp;back to your original JAR, and run&amp;nbsp;terraform apply&amp;nbsp;again. Note that you cannot restore a Flink 2.2 snapshot on Flink 1.x, so the rollback will start from the last Flink 1.x snapshot.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Important Terraform considerations:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Auto-rollback and the &lt;code&gt;RollbackApplication&lt;/code&gt; API aren’t directly exposed as Terraform resource attributes. If you need auto-rollback during the upgrade, enable it using the AWS CLI (Step 3) before running&amp;nbsp;terraform apply, or use a provisioner/null_resource to call the CLI.&lt;/li&gt; 
 &lt;li&gt;Always take a manual snapshot (Step 4) before running&amp;nbsp;terraform apply&amp;nbsp;for the upgrade. Terraform doesn’t automatically snapshot before updating the runtime.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Step 6: Monitor the upgrade&lt;/h3&gt; 
&lt;p&gt;After initiating the upgrade, monitor the application to verify that it completes successfully.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Check application status:&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The application should transition through&amp;nbsp;RUNNING&amp;nbsp;→&amp;nbsp;UPDATING&amp;nbsp;→&amp;nbsp;RUNNING. Confirm the runtime version changed to 2.2:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws kinesisanalyticsv2 describe-application \
    --application-name MyApplication \
    --query 'ApplicationDetail.RuntimeEnvironment'&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;What to watch for:&lt;/strong&gt;&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;thead&gt; 
  &lt;tr&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Scenario&lt;/strong&gt;&lt;/th&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;What happens&lt;/strong&gt;&lt;/th&gt; 
   &lt;th style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Action&lt;/strong&gt;&lt;/th&gt; 
  &lt;/tr&gt; 
 &lt;/thead&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Binary incompatibility&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Upgrade operation fails. Auto-rollback reverts to the previous version automatically.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Check operation logs for the exception, fix your code, and retry.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;State incompatibility&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Upgrade appears to succeed but the application enters restart loops.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Monitor&amp;nbsp;&lt;code&gt;numRestarts&lt;/code&gt;&amp;nbsp;metric. If restarts are continuous, invoke the Rollback API manually. Review the [State Compatibility Guide].&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Successful upgrade&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;numRestarts&lt;/code&gt;&amp;nbsp;is zero,&amp;nbsp;uptime&amp;nbsp;is increasing, checkpoints are completing.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Proceed to validation.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Key CloudWatch metrics to monitor:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;code&gt;numRestarts&lt;/code&gt;: should be zero after upgrade&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;lastCheckpointDuration&lt;/code&gt;: should be similar to pre-upgrade values&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;numberOfFailedCheckpoints&lt;/code&gt;: should remain at zero&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;uptime&lt;/code&gt;: should be steadily increasing&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Step 7: Validate application behavior&lt;/h3&gt; 
&lt;p&gt;After the application is running on Flink 2.2:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Confirm that data is being read from sources and written to sinks.&lt;/li&gt; 
 &lt;li&gt;Compare the output with your pre-upgrade baseline.&lt;/li&gt; 
 &lt;li&gt;Monitor latency, throughput, checkpoint duration, and resource utilization.&lt;/li&gt; 
 &lt;li&gt;Run for at least 24 hours to confirm stable behavior: no memory leaks, no unexpected restarts, consistent checkpoint sizes.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Step 8: Rollback (if needed)&lt;/h3&gt; 
&lt;p&gt;If the application is running but is unhealthy after the upgrade, invoke the Rollback API:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;AWS CLI:&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws kinesisanalyticsv2 rollback-application \
    --application-name MyApplication \
    --current-application-version-id &amp;lt;version-id&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;AWS Management Console:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Navigate to your application.&lt;/li&gt; 
 &lt;li&gt;Choose&amp;nbsp;&lt;strong&gt;Actions&lt;/strong&gt;, &lt;strong&gt;Roll back&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Confirm the rollback.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;During rollback, the application stops, reverts to the previous Flink version and application code, and restarts from the snapshot taken before the upgrade.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt;&amp;nbsp;You can’t restore a Flink 2.2 snapshot on Flink 1.x. Rollback uses the snapshot taken before the upgrade. This is why Steps 3 and 4 are critical.&lt;/p&gt; 
&lt;h2&gt;Next steps&lt;/h2&gt; 
&lt;p&gt;Your path depends on where you are today:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;If you’re new to Apache Flink:&lt;/strong&gt;&amp;nbsp;Start with the&amp;nbsp;&lt;a href="https://file+.vscode-resource.vscode-cdn.net/Users/fmorillo/Downloads/blog/link" target="_blank" rel="noopener noreferrer"&gt;guide to choosing the right API and language&lt;/a&gt;, the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/getting-started.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink getting started guide&lt;/a&gt;, and the&amp;nbsp;&lt;a href="https://catalog.workshops.aws/managed-flink" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink workshop&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;If you’re running Flink 1.x in production:&lt;/strong&gt;&amp;nbsp;Follow the migration steps in this post on a non-production replica first, then apply to production. For the complete reference, see the&amp;nbsp;&lt;a href="https://file+.vscode-resource.vscode-cdn.net/Users/fmorillo/Downloads/blog/link" target="_blank" rel="noopener noreferrer"&gt;Upgrading to Flink 2.2: Complete Guide&lt;/a&gt;&amp;nbsp;and the&amp;nbsp;&lt;a href="https://file+.vscode-resource.vscode-cdn.net/Users/fmorillo/Downloads/blog/link" target="_blank" rel="noopener noreferrer"&gt;State Compatibility Guide for Flink 2.2 Upgrades&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;If you’re evaluating Flink 2.2 features:&lt;/strong&gt;&amp;nbsp;Launch a new application on the Flink 2.2 runtime to explore SQL/ML capabilities, the VARIANT data type, and the new join operators. See the&amp;nbsp;&lt;a href="https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink sample applications on GitHub&lt;/a&gt;&amp;nbsp;for reference architectures.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;If you need help with your migration:&lt;/strong&gt;&amp;nbsp;Use the &lt;a href="https://github.com/awslabs/managed-service-for-apache-flink-agent-steering-files" target="_blank" rel="noopener noreferrer"&gt;Kiro Power and Agent Skill for Amazon Managed Service for Apache Flink&lt;/a&gt; to identify compatibility issues in your existing codebase and receive guidance on refactoring steps. You can also open a case through&amp;nbsp;&lt;a href="https://aws.amazon.com/support/" target="_blank" rel="noopener noreferrer"&gt;AWS Support&lt;/a&gt;, post a question on&amp;nbsp;&lt;a href="https://repost.aws/tags/TAjj_AYVQYR-a2FMqOkMcEPg/amazon-managed-service-for-apache-flink" target="_blank" rel="noopener noreferrer"&gt;AWS re:Post for Amazon Managed Service for Apache Flink&lt;/a&gt;, or reach out through the&amp;nbsp;&lt;a href="https://flink.apache.org/community/" target="_blank" rel="noopener noreferrer"&gt;Apache Flink community&lt;/a&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;For the Apache Flink 2.2 documentation, see&amp;nbsp;&lt;a href="https://nightlies.apache.org/flink/flink-docs-release-2.2/" target="_blank" rel="noopener noreferrer"&gt;nightlies.apache.org/flink/flink-docs-release-2.2&lt;/a&gt;. For Amazon Managed Service for Apache Flink documentation, see the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Developer Guide&lt;/a&gt;. For pricing, see the&amp;nbsp;&lt;a href="https://aws.amazon.com/managed-service-apache-flink/pricing/" target="_blank" rel="noopener noreferrer"&gt;pricing page&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;With Apache Flink 2.2 on Amazon Managed Service for Apache Flink, you get a modern Java 17 runtime, SQL-native AI/ML inference, improved state management performance, and a streamlined API surface. In-place upgrades with state preservation and auto-rollback make the migration straightforward. Test on a replica, follow the steps in this post, and start building on Flink 2.2.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-medium wp-image-90329" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/16/fmorillo-225x300.jpg" alt="" width="225" height="300"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Francisco Morillo&lt;/h3&gt; 
  &lt;p&gt;Francisco Morillo&amp;nbsp;is a Sr. Streaming Specialist Solutions Architect at AWS, helping customers design and operate real-time data processing applications using Amazon Managed Service for Apache Flink and Amazon Managed Streaming for Apache Kafka.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-90607 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/23/profilepic.jpg" alt="" width="940" height="1072"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Mayank Juneja&lt;/h3&gt; 
  &lt;p&gt;Mayank Juneja is a Senior Product Manager at AWS, leading Amazon Managed Service for Apache Flink. He lives at the intersection of real-time data streaming and AI, previously driving Flink SQL and AI inference products at Confluent.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Using Apache Sedona with AWS Glue to process billions of daily points from a geospatial dataset</title>
		<link>https://aws.amazon.com/blogs/big-data/using-apache-sedona-with-aws-glue-to-process-billions-of-daily-points-from-a-geospatial-dataset/</link>
					
		
		<dc:creator><![CDATA[Ruan Roloff]]></dc:creator>
		<pubDate>Wed, 22 Apr 2026 15:42:28 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[AWS Glue]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">36afbfac5e6c3da1aa34ee703d844eb614c66772</guid>

					<description>In this post, we explore how to use Apache Sedona with AWS Glue to process and analyze massive geospatial datasets.</description>
										<content:encoded>&lt;p&gt;Data strategy can use geospatial data to provide organizations with insights for decision-making and operational optimization. By incorporating geospatial data (such as GPS coordinates, points, polygons and geographic boundaries), businesses can uncover patterns, trends, and relationships that might otherwise remain hidden across multiple industries, from aviation and transportation to environmental studies and urban planning. Processing and analyzing this geospatial data at scale can be challenging, especially when dealing with billions of daily observations.&lt;/p&gt; 
&lt;p&gt;In this post, we explore how to use &lt;a href="https://sedona.apache.org/latest/" target="_blank" rel="noopener noreferrer"&gt;Apache Sedona&lt;/a&gt; with &lt;a href="https://aws.amazon.com/glue/" target="_blank" rel="noopener noreferrer"&gt;AWS Glue&lt;/a&gt; to process and analyze massive geospatial datasets.&lt;/p&gt; 
&lt;h2&gt;Introduction to geospatial data&lt;/h2&gt; 
&lt;p&gt;Geospatial data is information that has a geographic component. It describes objects, events, or phenomena along with their location on the Earth’s surface. This data includes coordinates (latitude and longitude), shapes (points, lines, polygons), and associated attributes (such as the name of a city or the type of road).&lt;/p&gt; 
&lt;p&gt;Key types of geospatial geometries (and examples of each in parentheses) include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Point –&lt;/strong&gt; Represents a single coordinate (a weather station).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;MultiPoint –&lt;/strong&gt; A collection of points (bus stops in a city).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;LineString –&lt;/strong&gt; A series of points connected in a line (a river or a flight path).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;MultiLineString –&lt;/strong&gt; Multiple lines (multiple flight routes).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Polygon –&lt;/strong&gt; A closed area (the boundary of a city).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;MultiPolygon –&lt;/strong&gt; Multiple polygons (national parks in a country).&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Geospatial datasets come in different formats, each designed to store and represent different types of geographic information. Common formats for geospatial data are vector formats (Shapefile, GeoJSON), raster formats (GeoTIFF, ESRI Grid), GPS formats (GPX, NMEA), web formats (WMS, GeoRSS) among others.&lt;/p&gt; 
&lt;h2&gt;Core concepts of Apache Sedona&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://sedona.apache.org/" target="_blank" rel="noopener noreferrer"&gt;Apache Sedona&lt;/a&gt; is an open-source computing framework for processing large-scale geospatial data. Built on top of &lt;a href="https://spark.apache.org/" target="_blank" rel="noopener noreferrer"&gt;Apache Spark&lt;/a&gt;, Sedona extends Spark’s capabilities to handle spatial operations efficiently. At its core, Sedona introduces several key concepts that enable distributed spatial processing. These include Spatial Resilient Distributed Datasets (SRDDs), which allow for the distribution of spatial data across a cluster, and Spatial SQL, which provides a familiar SQL-like interface for spatial queries. Some of the core capabilities of Apache Sedona are:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Efficient spatial data types like points, lines and polygons.&lt;/li&gt; 
 &lt;li&gt;Spatial operations and functions such as &lt;code&gt;ST_Contains&lt;/code&gt; (check if point is inside of a polygon), &lt;code&gt;ST_Intersects&lt;/code&gt; (check if point is inside of a polygon), &lt;code&gt;ST_H3CellIDs&lt;/code&gt; (geospatial indexing system developed by Uber, return the &lt;a href="https://h3geo.org/" target="_blank" rel="noopener noreferrer"&gt;H3&lt;/a&gt; cell ID(s) that contain the given point at the specified resolution).&lt;/li&gt; 
 &lt;li&gt;Spatial joins to combine different spatial datasets.&lt;/li&gt; 
 &lt;li&gt;Integration with Spark SQL (geospatial functions to run spatial SQL queries).&lt;/li&gt; 
 &lt;li&gt;Spatial indexing techniques, such as quad-trees and R-trees, to optimize query performance.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;For more information about the functions available in Apache Sedona, visit the official Sedona &lt;a href="https://sedona.apache.org/1.8.0/api/sql/Function/" target="_blank" rel="noopener noreferrer"&gt;Functions&lt;/a&gt; documentation.&lt;/p&gt; 
&lt;h2&gt;Use case&lt;/h2&gt; 
&lt;p&gt;This use case consists of a global air traffic visualization and analysis platform that processes and displays real-time or historical aircraft tracking data on an interactive world map. Using unique aircraft identifiers from the International Civic Aviation Organization (ICAO), the system ingests trajectory records containing information such as geographic position (latitude and longitude), altitude, speed, and flight direction, then transforms this raw data into two complementary visual layers. The Flight Tracks Layer plots the routes traveled by each aircraft individually, allowing for the analysis of specific trajectories and navigation patterns. The Flight Density Layer uses hexagonal spatial indexing (H3) to aggregate and identify regions of higher air traffic concentration worldwide, revealing busy air corridors, aviation hubs, and high-density flight zones.&lt;/p&gt; 
&lt;p&gt;The dataset used for this use case is &lt;a href="https://www.adsb.lol/docs/open-data/historical/" target="_blank" rel="noopener noreferrer"&gt;historical flight tracker data&lt;/a&gt; from &lt;a href="https://www.adsb.lol/" target="_blank" rel="noopener noreferrer"&gt;ADSB.lol&lt;/a&gt;. ADSB.lol provides unfiltered flight tracker with a focus on open data. Data is also freely available via the API. The data contains a file per aircraft, a JSON gzip file containing the data for that aircraft for the day.&lt;/p&gt; 
&lt;p&gt;This is a JSON trace file format sample:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-typescript"&gt;{
    icao: "0123ac", // hex id of the aircraft
    timestamp: 1609275898.495, // unix timestamp in seconds since epoch (1970)
    trace: [
        [ seconds after timestamp,
            lat,
            lon,
            altitude in ft or "ground" or null,
            ground speed in knots or null,
            track in degrees or null, (if altitude == "ground", this will be true heading instead of track)
            flags as a bitfield: (use bitwise and to extract data)
                (flags &amp;amp; 1 &amp;gt; 0): position is stale (no position received for 20 seconds before this one)
                (flags &amp;amp; 2 &amp;gt; 0): start of a new leg (tries to detect a separation point between landing and takeoff that separates flights)
                (flags &amp;amp; 4 &amp;gt; 0): vertical rate is geometric and not barometric
                (flags &amp;amp; 8 &amp;gt; 0): altitude is geometric and not barometric
             ,
            vertical rate in fpm or null,
            aircraft object with extra details or null,
            type / source of this position or null,
            geometric altitude or null,
            geometric vertical rate or null,
            indicated airspeed or null,
            roll angle or null
        ],
    ]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;For this use case, this is a simplified schema of the dataset after processing:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;icao -&lt;/code&gt; Unique aircraft identifier&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;timestamp -&lt;/code&gt; Epoch timestamp of the observation (converted to readable format)&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;trace.lat / trace.lon -&lt;/code&gt; Latitude and longitude of the aircraft&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;trace.altitude -&lt;/code&gt; Aircraft altitude&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;trace.ground_speed -&lt;/code&gt; Ground speed&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;geometry -&lt;/code&gt; Geospatial geometry of the observation point (&lt;code&gt;Point&lt;/code&gt;)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;This solution enables aircraft tracking and analysis. The data can be visualized on maps and used for aviation management and safety applications. The process begins with data acquisition, extracting the compressed JSON files from TAR archives, then transforms this raw data into geospatial objects, aggregating them into H3 cells for efficient analysis. The processed data schema includes ICAO aircraft identifiers, timestamps, latitude/longitude coordinates, and derived fields such as H3 cell identifiers and point counts per cell. This structure allows detailed tracking of individual flights and aggregate analysis of traffic patterns. For visualization, you can generate density maps using the H3 grid system and create visual representations of individual flight tracks. The architecture data flow is as follows:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Data ingestion –&lt;/strong&gt; Aircraft observation data stored as JSON compressed files in &lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service&lt;/a&gt; (Amazon S3).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data processing –&lt;/strong&gt; AWS Glue jobs using Apache Sedona for geospatial processing.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data visualization –&lt;/strong&gt; Spark SQL with Sedona’s spatial functions to extract insights and export data to visualize the information in a map on Kepler.gl.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The following figure illustrates this solution.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90098" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/13/BDB-5249-geospatial-1_v2.png" alt="AWS architecture diagram showing a geospatial data processing pipeline." width="761" height="728"&gt;&lt;/p&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;You will need the following for this solution:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An &lt;a href="https://aws.amazon.com/resources/create-account/" target="_blank" rel="noopener noreferrer"&gt;AWS Account&lt;/a&gt; and a user with AWS Console access.&lt;/li&gt; 
 &lt;li&gt;Access to a Linux terminal and the &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface&lt;/a&gt; (AWS CLI).&lt;/li&gt; 
 &lt;li&gt;An &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html" target="_blank" rel="noopener noreferrer"&gt;IAM role for AWS Glue&lt;/a&gt; with list, read, and write permissions for Amazon S3 buckets.&lt;/li&gt; 
 &lt;li&gt;An Amazon S3 Bucket for flight files. For this example, name the bucket &lt;code&gt;blog-sedona-nessie-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;&lt;/code&gt;, using your account number and region.&lt;/li&gt; 
 &lt;li&gt;An Amazon S3 bucket for artifacts and Sedona libraries. For this example, name the bucket &lt;code&gt;blog-sedona-artifacts-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;&lt;/code&gt;, using your account number and region.&lt;/li&gt; 
 &lt;li&gt;Download a day of historical data from &lt;a href="https://www.adsb.lol/docs/open-data/historical/" target="_blank" rel="noopener noreferrer"&gt;ADSB.lol&lt;/a&gt;. In our examples, we used &lt;a href="https://github.com/adsblol/globe_history_2025/releases/download/v2025.05.29-planes-readsb-prod-0tmp/v2025.05.29-planes-readsb-prod-0tmp.tar.aa" target="_blank" rel="noopener noreferrer"&gt;v2025.05.29-planes-readsb-prod-0tmp.tar.aa&lt;/a&gt; and &lt;a href="https://github.com/adsblol/globe_history_2025/releases/download/v2025.05.29-planes-readsb-prod-0tmp/v2025.05.29-planes-readsb-prod-0tmp.tar.ab" target="_blank" rel="noopener noreferrer"&gt;v2025.05.29-planes-readsb-prod-0tmp.tar.ab&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Download the Apache Sedona libraries. The example was created using &lt;a href="https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.5_2.12/1.7.1/sedona-spark-shaded-3.5_2.12-1.7.1.jar" target="_blank" rel="noopener noreferrer"&gt;sedona-spark-shaded-3.5_2.12-1.7.1.jar&lt;/a&gt; and &lt;a href="https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.7.1-28.5/geotools-wrapper-1.7.1-28.5.jar" target="_blank" rel="noopener noreferrer"&gt;geotools-wrapper-1.7.1-28.5.jar&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Download the &lt;a href="https://github.com/aws-samples/sample-blog-geospacial-lake-on-aws-with-aws-dataservices/blob/main/src/glue_scripts/process_sedona_geo_track.py" target="_blank" rel="noopener noreferrer"&gt;AWS Glue script&lt;/a&gt; from AWS Sample to process the geospatial data.&lt;/li&gt; 
 &lt;li&gt;Review the &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/security.html" target="_blank" rel="noopener noreferrer"&gt;AWS Glue security best practices&lt;/a&gt;, especially IAM least-privilege, encryption for sensitive data at rest and in transit, and configuring VPC Endpoints to prevent data from routing through the public internet.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Solution walkthrough&lt;/h2&gt; 
&lt;p&gt;From now on, executing the next steps will incur costs on AWS. This step-by-step walkthrough demonstrates an approach to processing and analyzing large-scale geospatial flight data using Apache Sedona and Uber’s H3 spatial indexing system, using AWS Glue for distributed processing and Apache Sedona for efficient geospatial computations. It explains how to ingest raw flight data, transform it using Sedona’s geospatial functions, and index it with H3 for optimized spatial queries. Finally, it also demonstrates how to visualize the data using Kepler.gl. For data processing, it is possible to use both Glue scripts and &lt;a href="https://sedona.apache.org/latest/setup/glue/" target="_blank" rel="noopener noreferrer"&gt;Glue notebooks&lt;/a&gt;. In this post, we will focus only on Glue scripts.&lt;/p&gt; 
&lt;h3&gt;Upload the Apache Sedona libraries to Amazon S3&lt;/h3&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open your OS terminal command line.&lt;/li&gt; 
 &lt;li&gt;Create a folder to download the Sedona libraries and name it &lt;strong&gt;jar&lt;/strong&gt;. &lt;pre&gt;&lt;code class="lang-bash"&gt;
	# Create a directory for the Sedona libraries (JARs files)
	mkdir jar
	# Go to the folder JARs folder
	cd jar
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Download the Apache Sedona libraries. &lt;pre&gt;&lt;code class="lang-bash"&gt;
	# Download required Sedona libraries (JARs files)
	wget https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.5_2.12/1.7.1/sedona-spark-shaded-3.5_2.12-1.7.1.jar
	wget https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.7.1-28.5/geotools-wrapper-1.7.1-28.5.jar
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Upload the Sedona libraries (JARs files) to Amazon S3. In this example, we use the S3 path &lt;code&gt;s3://aws-blog-post-sedona-artifacts/jar/&lt;/code&gt;. &lt;pre&gt;&lt;code class="lang-bash"&gt;
	# Upload the JARs files to Amazon S3 bucket
	aws s3 cp . s3://blog-sedona-artifacts-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;/jar/ --recursive
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Your Amazon S3 folder should now look similar to the following image:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90099" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-2.jpg" alt="Amazon S3 console screenshot displaying the jar folder contents in blog-sedona-artifacts bucket." width="2560" height="919"&gt;&lt;/p&gt; 
&lt;h3&gt;Download and upload the geospatial data to Amazon S3&lt;/h3&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open your OS terminal command line.&lt;/li&gt; 
 &lt;li&gt;Create a folder to download the flight files and name it &lt;strong&gt;adsb_dataset&lt;/strong&gt;. &lt;pre&gt;&lt;code class="lang-bash"&gt;		# Create a directory for download the geospatial flight files
		mkdir adsb_dataset
		# Go to the folder for geospatial flight files
		cd adsb_dataset
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Download the flight files data from &lt;a href="https://github.com/adsblol/globe_history_2025/releases" target="_blank" rel="noopener noreferrer"&gt;adsblol GitHub repository&lt;/a&gt;. &lt;pre&gt;&lt;code class="lang-bash"&gt;	# Download the geospatial flight files in the folder created
	wget https://github.com/adsblol/globe_history_2025/releases/download/v2025.05.29-planes-readsb-prod-0tmp/v2025.05.29-planes-readsb-prod-0tmp.tar.aa
	wget https://github.com/adsblol/globe_history_2025/releases/download/v2025.05.29-planes-readsb-prod-0tmp/v2025.05.29-planes-readsb-prod-0tmp.tar.ab
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Extract the flight files. &lt;pre&gt;&lt;code class="lang-bash"&gt;	# Combine the two the tar files together
	cat v2025.05.29* &amp;gt;&amp;gt; combined.tar
	# Extract the json flight files from the tar file
	tar xf combined.tar
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Copy the flight files to Amazon S3. In this case, we are using the S3 folder: &lt;code&gt;s3://blog-sedona-nessie-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;/raw/adsb-2025-05-28/traces/&lt;/code&gt;. &lt;pre&gt;&lt;code class="lang-bash"&gt;	# Copy the json flight files to Amazon S3
	aws s3 cp ./traces/ s3://blog-sedona-nessie-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;/raw/adsb-2025-05-28/traces/ --recursive
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Your Amazon S3 folder should now look similar to the following image.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90100" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-3-scaled.jpg" alt="Amazon S3 console showing JSON trace files in the path raw/adsb-2025-05-28/traces/00/." width="2560" height="1096"&gt;&lt;/p&gt; 
&lt;h3&gt;Create an AWS Glue job and set up the job&lt;/h3&gt; 
&lt;p&gt;Now, we are ready to define the AWS Glue job using Apache Sedona to read the geospatial data files. To create a Glue job:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://console.aws.amazon.com/glue/" target="_blank" rel="noopener noreferrer"&gt;AWS Glue console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;On the &lt;strong&gt;Notebooks&lt;/strong&gt; page, choose &lt;strong&gt;Script editor&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90101" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-4-scaled.jpg" alt="AWS Glue Studio jobs creation interface showing three job creation methods: Visual ETL with data flow interface, Notebook for interactive coding, and Script editor for code authoring" width="2560" height="800"&gt;&lt;/p&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;On the Script screen, for the engine, choose &lt;strong&gt;Spark&lt;/strong&gt;, then select the option &lt;strong&gt;Upload script&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Choose file&lt;/strong&gt;. Find the &lt;code&gt;process_sedona_geo_track.py&lt;/code&gt; file, then choose &lt;strong&gt;Create script&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90102" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-5.jpg" alt="Script creation dialog box with Spark engine selected. Upload script option is active, showing successfully uploaded file process_sedona_geo_track.py." width="1602" height="730"&gt;&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Rename the job from &lt;strong&gt;Untitled&lt;/strong&gt; to &lt;strong&gt;process_sedona_geo_track&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Save&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Now, let’s set up the AWS Glue job. Choose &lt;strong&gt;Job Details.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose the &lt;strong&gt;IAM Role&lt;/strong&gt; created to be used with Glue. For this example, we use &lt;strong&gt;blog-glue&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Set the &lt;strong&gt;Glue version&lt;/strong&gt; to &lt;strong&gt;Glue 5.0&lt;/strong&gt; and the Worker type as needed. For this example, &lt;strong&gt;G.1X&lt;/strong&gt; is sufficient, but we use &lt;strong&gt;G.2X&lt;/strong&gt; to speed up processing.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90103" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-6.jpg" alt="AWS Glue job details configuration page for process_sedona_geo_track." width="2182" height="1050"&gt;&lt;/p&gt; 
&lt;ol start="10"&gt; 
 &lt;li&gt;Now, let’s import the libraries for Apache Sedona.&lt;/li&gt; 
 &lt;li&gt;In the &lt;strong&gt;Dependent JARs path&lt;/strong&gt;, type the path of the JAR files for Apache Sedona that you uploaded in the preceding steps. For this example, we used &lt;code&gt;s3://blog-sedona-artifacts-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;/jar/sedona-spark-shaded-3.5_2.12-1.7.1.jar,s3://blog-sedona-artifacts-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;/jar/geotools-wrapper-1.7.1-28.5.jar&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;In &lt;strong&gt;Additional Python modules path&lt;/strong&gt;, enter the modules for Apache Sedona: &lt;strong&gt;apache-sedona==1.7.1,geopandas==0.13.2,shapely==2.0.1,pyproj==3.6.0,fiona==1.9.5,rtree==1.2.0&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90104" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-7.jpg" alt="ob libraries configuration section showing Dependent JARs path pointing to S3 bucket." width="2016" height="956"&gt;&lt;/p&gt; 
&lt;ol start="13"&gt; 
 &lt;li&gt;In the &lt;strong&gt;Job parameters&lt;/strong&gt; section, in the &lt;strong&gt;Key&lt;/strong&gt; field, type &lt;strong&gt; —BUCKET_NAME&lt;/strong&gt;. For its &lt;strong&gt;Value&lt;/strong&gt;, enter your bucket name. In this example, ours is &lt;code&gt;blog-sedona-nessie-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;&lt;/code&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90105" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-8.jpg" alt="ob parameters configuration interface showing key-value pair with --BUCKET_NAME parameter." width="704" height="229"&gt;&lt;/p&gt; 
&lt;ol start="14"&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Save&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Processing the geospatial flights data&lt;/h3&gt; 
&lt;p&gt;Before we run the job, let’s understand how the code works. First, import the Apache Sedona libraries:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;import json 
import gzip 
from sedona.spark import SedonaContext&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Next, initialize the Sedona context using an existing Spark session:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;sedona = SedonaContext.create(spark)&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After that, create a function for handling compressed JSON data:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def parse_gzip_json(byte_content):
        try:
            decompressed = gzip.decompress(byte_content)
            return json.loads(decompressed.decode('utf-8'))
        except Exception as e:
            print(f"Error during gzip parse: {str(e)}")
            return None&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Add a function to transform raw tracking data into a structured format suitable for a valid coordinates process:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def flatten_records(json_obj):
    records = []
    if "trace" in json_obj and isinstance(json_obj["trace"], list):
        for point in json_obj["trace"]:
            if len(point) &amp;gt;= 3:
                lat, lon = float(point[1]), float(point[2])
                if -90 &amp;lt;= lat &amp;lt;= 90 and -180 &amp;lt;= lon &amp;lt;= 180:
                    records.append(Row(
                        icao=json_obj.get("icao", None),
                        timestamp=json_obj.get("timestamp", None),
                        lat=lat,
                        lon=lon
                    ))
    return records&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The &lt;code&gt;flat_rdd&lt;/code&gt; variable applies these functions to the structured data from the original gzipped JSON. Each element in this RDD is a Row object representing a single data point from an aircraft’s trace, with fields for ICAO, timestamp, latitude, and longitude.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;flat_rdd = raw_rdd.map(lambda x: parse_gzip_json(x[1])).filter(lambda x: x is not None).flatMap(flatten_records)&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The ADSB trace files contain a deeply nested JSON structure where the trace field holds an array of mixed-type arrays, compressed in Gzip format. For this specific case, developing a UDF represented one of the most practical and efficient solutions. Since Gzip is a non-splittable format, Spark is unable to parallelize processing, constraining both methods to a single worker per file and processing the data multiple times across JVM decompression, full JSON parsing, and subsequent re-parsing operations. The UDF bypasses all of this by reading raw bytes and doing everything in a single Python pass: decompress → parse → extract → validate, returning only the small set of needed fields directly to Spark.&lt;/p&gt; 
&lt;p&gt;The Spark SQL query processes geographic trace data using the H3 hexagonal grid system, converting point data into a regularized hexagonal grid that can help identify areas of high point density. A &lt;a href="https://h3geo.org/docs/core-library/restable/#average-area-in-km2" target="_blank" rel="noopener noreferrer"&gt;resolution&lt;/a&gt; of 5 was adopted, producing hexagons of approximately 253 km² (roughly the same size as the city of Edinburgh, Scotland, which is approximately 264 km²), for its ability to effectively capture route density patterns at the city and metropolitan level.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;h3_traces_df = spark.sql("""
WITH base_h3 AS (
    SELECT
        ST_H3CellIDs(geometry, 5, false)[0] AS h3_index,
        lat,
        lon
    FROM traces
)
SELECT
    COUNT(*) AS num, -- Count points in each H3 cell
    h3_index,
    AVG(lon) AS center_lon,
    AVG(lat) AS center_lat
FROM base_h3
GROUP BY h3_index
""")
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Finally, this code prepares the datasets for visualization purposes. The first dataset is based on the aircraft unique identifier. The complete dataset for a single day can contain more than 80 million data points. A random sampling rate of 0.1% was applied, which proves sufficient to illustrate route density patterns without overwhelming the Kepler.gl browser renderer. The second dataset aggregates trace points into hexagonal spatial cells (result from the query above).&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;points_viz_sampled = df_points.select(
    col("icao"), # Aircraft unique identifier (24-bit address)
    col("timestamp").cast("double").alias("timestamp"),
    col("lat").cast("double").alias("lat"),
    col("lon").cast("double").alias("lon")
).sample(False, 0.001)

h3_viz_csv = h3_traces_df.select(
    col("num").alias("point_count"),
    col("h3_index").cast("string").alias("h3_index"),
    col("center_lon"),
    col("center_lat")
)&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Now that we understand the code, let’s run it.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://console.aws.amazon.com/glue/" target="_blank" rel="noopener noreferrer"&gt;AWS Glue console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;On the &lt;strong&gt;ETL jobs &amp;gt;&amp;gt; Notebooks &lt;/strong&gt;page, choose the job name &lt;strong&gt;process_sedona_geo_track&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Run&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90106" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-9.jpg" alt="Python script editor showing import statements for process_sedona_geo_track job." width="1038" height="417"&gt;&lt;/p&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Now, it is possible to monitor the job by choosing the &lt;strong&gt;Runs&lt;/strong&gt; tab.&lt;/li&gt; 
 &lt;li&gt;It may take a few minutes to run the entire job. It took nearly 8 minutes to process approximately 2.50 GB (67,540 compressed files) with 20 DPUs. After the job is processed, you should see your job with the status &lt;strong&gt;Succeeded&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90107" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-10.jpg" alt="Job runs monitoring dashboard showing successful execution on June 5, 2025, running from 12:28:03 to 12:36:37 with 8 minutes 19 seconds duration." width="1253" height="785"&gt;&lt;/p&gt; 
&lt;p&gt;Now your data should be saved for a preview visualization demo in a folder named &lt;code&gt;s3://blog-sedona-nessie-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;/visualization/&lt;/code&gt;.&lt;/p&gt; 
&lt;h3&gt;Performance insights&lt;/h3&gt; 
&lt;p&gt;The workload characterization of this job reveals a CPU-intensive profile, primarily because of the processing of small binary files with GZIP compression and subsequent JSON parsing. Given the inherent nature of this pipeline, which includes Python UDF serialization and partial single-partition write stages, linear scaling does not yield proportional performance gains. The following table presents an analysis of AWS Glue configurations, evaluating the trade-off between computational capacity, execution duration, and associated costs:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Duration&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Capacity (DPUs)&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Worker type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Glue version&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Estimated Cost*&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10 m 7 s&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;32 DPUs&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;G.1X&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$2.34&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;11 m 50 s&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10 DPUs&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;G.1X&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$0.88&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;19 m 7 s&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;4 DPUs&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;G.1X&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$0.59&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;8 m 19 s&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;20 DPUs&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;G.2X&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;5&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;$1.32&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;*Estimated Cost = DPUs x Duration (hours) x $0.44 per DPU-hour (&lt;code&gt;us-east-1&lt;/code&gt;)&lt;/p&gt; 
&lt;h2&gt;Visualizing and analyzing geospatial data with Kepler.gl&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://kepler.gl/" target="_blank" rel="noopener noreferrer"&gt;Kepler.gl&lt;/a&gt; is an open-source geospatial analysis tool developed by &lt;a href="https://www.uber.com/en-HK/blog/keplergl/" target="_blank" rel="noopener noreferrer"&gt;Uber&lt;/a&gt; with code available at &lt;a href="https://github.com/keplergl/kepler.gl" target="_blank" rel="noopener noreferrer"&gt;Github&lt;/a&gt;. Kepler.gl is designed for large-scale data exploration and visualization, offering multiple map layers, including point, arc, heatmap, and 3D hexagon. It supports various file formats like CSV, GeoJSON, and KML. In this use case, we will use Kepler.gl to present interactive visualizations that illustrate flight patterns, routes, and densities across global airspace.&lt;/p&gt; 
&lt;h3&gt;Downloading the geospatial files&lt;/h3&gt; 
&lt;p&gt;Before we can view the graph, we will need to download the flight files to our local machine, unzip them, and rename them (to make it easier to identify the files).&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open your OS terminal command line.&lt;/li&gt; 
 &lt;li&gt;Create the folders to download the data processed in the steps before. In this case, we create &lt;strong&gt;kepler&lt;/strong&gt; and &lt;strong&gt;kepler_csv&lt;/strong&gt;. &lt;pre&gt;&lt;code class="lang-bash"&gt;	#create kepler folders: first folder is to download the files,
	#second folder is to organize the files to use in the next step
	mkdir kepler
	mkdir kepler_csv
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Replace the bracketed variables with your account and directory information, then download all the CSV files. &lt;pre&gt;&lt;code class="lang-bash"&gt;	#copy the files from Amazon S3 to local machine
	aws s3 cp s3://blog-sedona-nessie-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;/visualization/ /&amp;lt;user_directory&amp;gt;/kepler --recursive
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Extract the files, rename them, and move them to another folder. &lt;pre&gt;&lt;code class="lang-bash"&gt;	# Extract the files processed by Spark and Sedona
	gzip -d ./kepler/kepler_h3_density/*.gz
	gzip -d ./kepler/kepler_track_points_sample/*.gz
	
	# Rename the Spark output files to more readable names
	cd ./kepler/kepler_h3_density/
	ls
	mv part-00000-*.csv kepler_h3_density.csv
	cd ..
	
	cd ./kepler/kepler_track_points_sample/
	ls
	mv part-00000-*.csv kepler_track_points_sample.csv
	cd ..
	
	# Ensure the output folder exists
	mkdir -p ../kepler_csv
	
	# Copy the renamed CSV files to the folder that will be used as input in kepler.gl
	cp ./kepler/kepler_h3_density/*.csv ../kepler_csv
	cp ./kepler/kepler_track_points_sample/*.csv ../kepler_csv
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
 &lt;li&gt;Your &lt;strong&gt;kepler_csv&lt;/strong&gt; folder should look similar to the return of the command below. &lt;pre&gt;&lt;code class="lang-bash"&gt;	#list the files in the kepler_csv directory
	ls -l
	total 11684
	-rw-rw-r-- 1 ec2-user ec2-user 8630110 Jun 12 14:47 kepler_h3_density.csv
	-rw-rw-r-- 1 ec2-user ec2-user 3331763 Jun 12 14:47 kepler_track_points_sample.csv
	&lt;/code&gt;&lt;/pre&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Visualizing the data in a graph&lt;/h3&gt; 
&lt;p&gt;Now that you have saved the data to your local machine, you can analyze the flight data through interactive map graphics. To import the data into the Kepler.gl web visualization tool:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://kepler.gl/demo" target="_blank" rel="noopener noreferrer"&gt;Kepler.gl Demo&lt;/a&gt; web application.&lt;/li&gt; 
 &lt;li&gt;Load data into Kepler.gl: 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Choose &lt;strong&gt;Add Data&lt;/strong&gt; in the left panel.&lt;/li&gt; 
   &lt;li&gt;Drag and drop both CSV files (&lt;code&gt;flight_points&lt;/code&gt; and &lt;code&gt;h3_density&lt;/code&gt;) into the upload area.&lt;/li&gt; 
   &lt;li&gt;Confirm that both datasets are loaded successfully.&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;Delete all layers.&lt;/li&gt; 
 &lt;li&gt;Create the &lt;strong&gt;Flight Density Layer:&lt;/strong&gt; 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Choose &lt;strong&gt;Add Layer&lt;/strong&gt; in the left panel.&lt;/li&gt; 
   &lt;li&gt;In &lt;strong&gt;Basic&lt;/strong&gt;, choose &lt;strong&gt;H3&lt;/strong&gt; as the layer type, then add the following configuration: 
    &lt;ol type="i"&gt; 
     &lt;li&gt;Layer Name: &lt;strong&gt;Flight Density&lt;/strong&gt;&lt;/li&gt; 
     &lt;li&gt;Data Source: &lt;strong&gt;kepler_h3_density.csv&lt;/strong&gt;&lt;/li&gt; 
     &lt;li&gt;Hex ID: &lt;strong&gt;h3_index&lt;/strong&gt;&lt;/li&gt; 
    &lt;/ol&gt; &lt;/li&gt; 
   &lt;li&gt;In the &lt;strong&gt;Fill Color&lt;/strong&gt; section: 
    &lt;ol type="i"&gt; 
     &lt;li&gt;Color: &lt;strong&gt;point_count&lt;/strong&gt;&lt;/li&gt; 
     &lt;li&gt;Color Scale: &lt;strong&gt;Quantile&lt;/strong&gt;.&lt;/li&gt; 
     &lt;li&gt;Color Range: Choose a blue/green gradient.&lt;/li&gt; 
    &lt;/ol&gt; &lt;/li&gt; 
   &lt;li&gt;Set &lt;strong&gt;Opacity&lt;/strong&gt; to &lt;strong&gt;0.7&lt;/strong&gt;.&lt;/li&gt; 
   &lt;li&gt;In the &lt;strong&gt;Coverage&lt;/strong&gt; section, set it to &lt;strong&gt;0.9&lt;/strong&gt;.&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;Create the &lt;strong&gt;Flight Tracks Layer:&lt;/strong&gt; 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Choose &lt;strong&gt;Add Layer&lt;/strong&gt; in the left panel.&lt;/li&gt; 
   &lt;li&gt;In &lt;strong&gt;Basic&lt;/strong&gt;, choose &lt;strong&gt;Point&lt;/strong&gt; as the layer type, then add the following configuration: 
    &lt;ol type="i"&gt; 
     &lt;li&gt;Layer Name: &lt;strong&gt;Flight Tracks&lt;/strong&gt;&lt;/li&gt; 
     &lt;li&gt;Data Source: &lt;strong&gt;kepler_track_points_sample.csv&lt;/strong&gt;&lt;/li&gt; 
     &lt;li&gt;Columns: 
      &lt;ol&gt; 
       &lt;li&gt;Latitude: &lt;strong&gt;lat&lt;/strong&gt;&lt;/li&gt; 
       &lt;li&gt;Longitude: &lt;strong&gt;lon&lt;/strong&gt;&lt;/li&gt; 
      &lt;/ol&gt; &lt;/li&gt; 
    &lt;/ol&gt; &lt;/li&gt; 
   &lt;li&gt;In the &lt;strong&gt;Fill Color &lt;/strong&gt;section: 
    &lt;ol type="i"&gt; 
     &lt;li&gt;Solid Color: &lt;strong&gt;Orange&lt;/strong&gt;&lt;/li&gt; 
     &lt;li&gt;Opacity: &lt;strong&gt;0.3&lt;/strong&gt;&lt;/li&gt; 
    &lt;/ol&gt; &lt;/li&gt; 
   &lt;li&gt;Set the Point’s &lt;strong&gt;Radius&lt;/strong&gt; to 1&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;The layers should look similar to the following figure.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90108" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-11.jpg" alt="Kepler.gl layer configuration panel for Flight Density H3 layer using kepler_h3_density.csv data source." width="998" height="1051"&gt;&lt;/p&gt; 
&lt;ol start="7"&gt; 
 &lt;li&gt;The graph visualization should now show flight density through color-coded hexagons, with individual flight tracks visible as orange points:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90109" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5249-geospatial-12.jpg" alt="Kepler.gl interactive map visualization displaying global flight density heatmap. High-density areas shown in yellow over North America, particularly the United States." width="1897" height="924"&gt;&lt;/p&gt; 
&lt;p&gt;There you go! Now that you have knowledge about geospatial data and have created your first use case, take the opportunity to do some analysis and learn some interesting facts about flight patterns.&lt;/p&gt; 
&lt;p&gt;It is possible to experiment with other interesting types of analysis in Kepler.gl, such as &lt;a href="https://docs.kepler.gl/docs/user-guides/h-playback" target="_blank" rel="noopener noreferrer"&gt;Time Playback&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;To clean up your resources, complete the following tasks:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Delete the AWS Glue job &lt;code&gt;process_sedona_geo_track&lt;/code&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/empty-bucket.html" target="_blank" rel="noopener noreferrer"&gt;Delete content&lt;/a&gt; from the Amazon S3 buckets: &lt;code&gt;blog-sedona-artifacts-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;&lt;/code&gt; and &lt;code&gt;blog-sedona-nessie-&amp;lt;account_number&amp;gt;-&amp;lt;aws_region&amp;gt;&lt;/code&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we showed how processing geospatial data can present significant challenges due to its complex nature (from big data to data structure format). For this use case of flight trackers, it involves vast amounts of information across multiple dimensions such as time, location, altitude, and flight paths, however, the combination of Spark’s distributed computing capabilities and Sedona’s optimized geospatial functions helps overcome those challenges. The spatial partitioning and indexing features of Sedona, coupled with Spark’s framework, enable us to perform complex spatial joins and proximity analyses efficiently, simplifying the overall data processing workflow.&lt;/p&gt; 
&lt;p&gt;The serverless nature of AWS Glue eliminates the need for managing infrastructure while automatically scaling resources based on workload demands, making it an ideal platform for processing growing volumes of flight data. As the volume of flight data grows or as processing requirements fluctuate, with AWS Glue, you can quickly adjust resources to meet demand, ensuring optimal performance without the need for cluster management.&lt;/p&gt; 
&lt;p&gt;By converting the processed results into CSV format and visualizing them in Kepler.gl, it is possible to create interactive visualizations that reveal patterns in flight paths, and you can efficiently analyze air traffic patterns, routes, and other insights. This end-to-end solution demonstrates how a modern data strategy in AWS with the support of open-source tools can transform raw geospatial data into actionable insights.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90123" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/13/ruanroloff.jpeg" alt="Ruan" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;strong&gt;Ruan Roloff&lt;/strong&gt; is a Lead GTM Specialist Architect for Analytics and AI at AWS. During his time at AWS, he was responsible for the data journey and AI product strategy of customers across a range of industries, including finance, oil and gas, manufacturing, digital natives, public sector, and startups. He has helped these organizations achieve multi-million dollar use cases. Outside of work, Ruan likes to assemble and disassemble things, fish on the beach with friends, play SFII, and go hiking in the woods with his family.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90122" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/13/lucasvitoreti.jpeg" alt="Lucas" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;strong&gt;Lucas Vitoreti&lt;/strong&gt; is a ProServe Data &amp;amp; Analytics Specialist at AWS with 12+ years in the data domain. Architects and delivers solutions for data warehouses, lakes, lakehouses, and meshes, helping organizations transform their data strategies and achieve business outcomes. Expertise in scalable data architectures and guiding data-driven transformations. He balances professional life with weightlifting, music, and family time.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90121" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/13/denysgonzaga.jpeg" alt="Denys" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;strong&gt;Denys Gonzaga&lt;/strong&gt; is a ProServe Consultant at AWS, he is an experienced professional with over 15 years of working across multiple technical domains, with a strong focus on development and data analytics. Throughout his career, he has successfully applied his skills in various industries, including aerospace, finance, telecommunications, and retail. Outside of AWS, Denys enjoys spending time with his family and playing video games.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Analyzing your data catalog: Query SageMaker Catalog metadata with SQL</title>
		<link>https://aws.amazon.com/blogs/big-data/analyzing-your-data-catalog-query-sagemaker-catalog-metadata-with-sql/</link>
					
		
		<dc:creator><![CDATA[Ramesh H Singh]]></dc:creator>
		<pubDate>Wed, 22 Apr 2026 15:37:38 +0000</pubDate>
				<category><![CDATA[Amazon SageMaker Data & AI Governance]]></category>
		<category><![CDATA[Amazon SageMaker Unified Studio]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">6de67b4491648f7f2b2aa026de7adf83a8026c80</guid>

					<description>In this post, we demonstrate how to use the metadata export capability in Amazon SageMaker Catalog and perform analytics such as historical changes, monitor asset growth and track metadata improvements.</description>
										<content:encoded>&lt;p&gt;As your data and machine learning (ML) assets grow, tracking which assets lack documentation or monitoring asset registration trends becomes challenging without custom reporting infrastructure. You need visibility into your catalog’s health, without the overhead of managing ETL jobs. The metadata feature of &lt;a href="https://aws.amazon.com/sagemaker/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker&lt;/a&gt; provides this capability to users.&amp;nbsp;Converting catalog asset metadata into &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/introduction.html" target="_blank" rel="noopener noreferrer"&gt;Apache Iceberg&lt;/a&gt; tables stored in &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables.html" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 Tables&lt;/a&gt; removes the need to build and maintain custom ETL pipelines. Your team can then query asset metadata directly using standard SQL tools. You can now answer governance questions like asset registration trends, classification status, and metadata completeness using standard SQL queries through tools like &lt;a href="https://aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena&lt;/a&gt;, &lt;a href="https://aws.amazon.com/sagemaker/unified-studio/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Unified Studio&lt;/a&gt; notebooks, and BIsystems.&lt;/p&gt; 
&lt;p&gt;This automated approach reduces ETL development time and gives your team visibility into catalog health, compliance gaps, and asset lifecycle patterns. The exported tables include technical metadata, business metadata, project ownership details, and timestamps, partitioned by snapshot date to enable time travel queries and historical analysis. Teams can use this capability to proactively monitor catalog health, identify gaps in documentation, track asset lifecycle patterns, and make sure that governance policies are consistently applied.&lt;/p&gt; 
&lt;h2&gt;How metadata export works&lt;/h2&gt; 
&lt;p&gt;After you enable the metadata export feature, it runs automatically on a daily schedule:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;SageMaker Catalog creates the infrastructure&lt;/strong&gt; — An Amazon Simple Storage Service (Amazon S3) table bucket named &lt;code&gt;aws-sagemaker-catalog&lt;/code&gt; is created with an &lt;code&gt;asset_metadata&lt;/code&gt; namespace and an empty asset table.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Daily snapshots are captured&lt;/strong&gt; — A scheduled job runs once per day around midnight (local time per AWS Region) to export updated asset metadata.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Metadata is structured and partitioned&lt;/strong&gt; — The export captures technical metadata (resource_id, resource_type), business metadata (asset_name, business_description), project ownership details, and timestamps, partitioned by &lt;code&gt;snapshot_date&lt;/code&gt; for query performance.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data becomes queryable&lt;/strong&gt; — Within 24 hours, the asset table appears in Amazon SageMaker Unified Studio under the &lt;code&gt;aws-sagemaker-catalog&lt;/code&gt; bucket and becomes accessible through Amazon Athena, Studio notebooks, or external BI tools.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Teams query using standard SQL&lt;/strong&gt; — Data teams can now answer questions like “How many assets were registered last month?” or “Which assets lack business descriptions?” without building custom ETL pipelines.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;The export evaluates catalog assets and their metadata properties in the domain, converting them into Apache Iceberg table format. The data flows into downstream analytics operations immediately, with no separate ETL or batch processes to maintain. The exported metadata becomes part of a queryable data lake that supports time-travel queries and historical analysis.&lt;/p&gt; 
&lt;p&gt;In this post, we demonstrate how to use the metadata export capability in Amazon SageMaker Catalog and perform analytics on these tables. We explore the following specific use-cases.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Audit historical changes to investigate what an asset looked like at a specific point in time.&lt;/li&gt; 
 &lt;li&gt;Monitor asset growth view how the data catalog has grown over the last 30 days.&lt;/li&gt; 
 &lt;li&gt;Track metadata improvements to see which assets gained descriptions or ownership over time.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;div id="attachment_90416" style="width: 1431px" class="wp-caption alignleft"&gt;
 &lt;img aria-describedby="caption-attachment-90416" loading="lazy" class="wp-image-90416 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-1.jpeg" alt="AWS Cloud architecture diagram showing data pipeline from Amazon SageMaker Catalog to Amazon S3 Tables with daily export, connecting to query engines including Amazon Athena, Amazon Redshift, and Apache Spark" width="1421" height="801"&gt;
 &lt;p id="caption-attachment-90416" class="wp-caption-text"&gt;Figure 1 – SageMaker catalog export to S3 Tables&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The architecture consists of three key components:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Amazon SageMaker Catalog exports asset metadata daily to Amazon S3.&lt;/li&gt; 
 &lt;li&gt;S3 Tables stores metadata as Apache Iceberg tables in the &lt;code&gt;aws-sagemaker-catalog&lt;/code&gt; bucket with ACID compliance and time travel.&lt;/li&gt; 
 &lt;li&gt;Query engines (Amazon Athena, &lt;a href="https://aws.amazon.com/pm/redshift/" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift&lt;/a&gt;, and Apache Spark) access metadata using standard SQL from the &lt;code&gt;asset_metadata.asset&lt;/code&gt; table.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;What metadata is exposed?&lt;/h3&gt; 
&lt;p&gt;SageMaker Catalog exports metadata in the &lt;code&gt;asset_metadata.asset&lt;/code&gt; table:&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Metadata Type&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Fields&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Technical metadata&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;resource_id&lt;/code&gt;, &lt;code&gt;resource_type_enum&lt;/code&gt;, &lt;code&gt;account_id&lt;/code&gt;, &lt;code&gt;region&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Resource identifiers (ARN), types (&lt;code&gt;GlueTable&lt;/code&gt;, &lt;code&gt;RedshiftTable&lt;/code&gt;, &lt;code&gt;S3Collection&lt;/code&gt;), and location&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Namespace hierarchy&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;catalog&lt;/code&gt;, &lt;code&gt;namespace&lt;/code&gt;, &lt;code&gt;resource_name&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Organizational structure for assets&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Business metadata&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;asset_name&lt;/code&gt;, &lt;code&gt;business_description&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Human-readable names and descriptions&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Ownership&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;extended_metadata['owningEntityId']&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Asset ownership information&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Timestamps&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;asset_created_time&lt;/code&gt;, &lt;code&gt;asset_updated_time&lt;/code&gt;, &lt;code&gt;snapshot_time&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Creation&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Custom metadata&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;extended_metadata['form-name.field-name']&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;User-defined metadata forms as key-value pairs&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;The &lt;code&gt;snapshot_time&lt;/code&gt; column supports point-in-time analysis and query of historical catalog states.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;To follow along with this post, you must have the following:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An &lt;a href="https://aws.amazon.com/sagemaker/unified-studio/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Unified Studio&lt;/a&gt; domain set up with a domain owner or domain unit owner permissions. 
  &lt;ul&gt; 
   &lt;li&gt;A SageMaker Unified Studio domain identifier&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; permissions for configuring metadata export.&lt;/li&gt; 
 &lt;li&gt;Grant catalog, database, and table Select and Describe permissions with &lt;a href="https://aws.amazon.com/lake-formation/" target="_blank" rel="noopener noreferrer"&gt;AWS Lake Formation&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt; version 2.33.0 or later installed and configured&lt;/li&gt; 
 &lt;li&gt;An Amazon SageMaker project for publishing assets.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;For SageMaker Unified Studio domain setup instructions, refer to the SageMaker Unified Studio &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/getting-started.html" target="_blank" rel="noopener noreferrer"&gt;Getting started&lt;/a&gt; guide.&lt;/p&gt; 
&lt;p&gt;After you complete the prerequisites, complete the following steps.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Add this policy to our IAM user or role to enable metadata export. If using SageMaker Unified Studio to query the catalog, add this policy to the &lt;code&gt;AmazonSageMakerAdminIAMExecutionRole&lt;/code&gt; managed role.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;pre&gt;&lt;code class="lang-json"&gt;{ "Version": "2012-10-17", 
"Statement": [ 
{
 "Effect": "Allow",
 "Action": [ "datazone:GetDataExportConfiguration",
 "datazone:PutDataExportConfiguration"
 ],
 "Resource": "*"
 },
 {
 "Effect": "Allow",
 "Action": [
 "s3tables:CreateTableBucket",
 "s3tables:PutTableBucketPolicy"
 ],
 "Resource": "arn:aws:s3tables:*:*:bucket/aws-sagemaker-catalog" 
} 
]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;&lt;strong&gt;Grant describe&lt;/strong&gt; and &lt;strong&gt;select&lt;/strong&gt; permissions for SageMaker Catalog with AWS Lake Formation. This step can be performed in the AWS Lake Formation console. 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Select &lt;strong&gt;Permissions&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Data permissions&lt;/strong&gt; and choose &lt;strong&gt;&lt;strong&gt;Grant.&lt;br&gt; &lt;/strong&gt;&lt;/strong&gt;&lt;p&gt;&lt;/p&gt; &lt;p&gt;&lt;/p&gt;
    &lt;div id="attachment_90415" style="width: 1435px" class="wp-caption alignnone"&gt;
     &lt;img aria-describedby="caption-attachment-90415" loading="lazy" class="size-full wp-image-90415" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-2.jpeg" alt="AWS Lake Formation Grant Permissions interface showing principal type selection with IAM users and roles option selected and AmazonSageMakerAdminIAMExecutionRole assigned" width="1425" height="878"&gt;
     &lt;p id="caption-attachment-90415" class="wp-caption-text"&gt;Figure 2 – AWS Lake Formation grant permission&lt;/p&gt;
    &lt;/div&gt;&lt;/li&gt; 
   &lt;li&gt;Under &lt;strong&gt;Principal type&lt;/strong&gt;, select &lt;strong&gt;Principals&lt;/strong&gt;, &lt;strong&gt;IAM users and roles&lt;/strong&gt; and the AWS managed &lt;strong&gt;AmazonSageMakerAdminIAMExecutionRole&lt;/strong&gt; execution role.&lt;/li&gt; 
   &lt;li&gt;Choose &lt;strong&gt;Named Data Catalog resources&lt;/strong&gt;.&lt;/li&gt; 
   &lt;li&gt;Under &lt;strong&gt;Catalogs&lt;/strong&gt;, search for and select &lt;strong&gt;&amp;lt;account-id&amp;gt;:s3tablecatalog/aws-sagemaker-catalog.&lt;/strong&gt;&lt;/li&gt; 
   &lt;li&gt;Under &lt;strong&gt;Databases&lt;/strong&gt;, select &lt;strong&gt;asset_metadata&lt;/strong&gt; database. 
    &lt;div id="attachment_90414" style="width: 1439px" class="wp-caption alignnone"&gt;
     &lt;img aria-describedby="caption-attachment-90414" loading="lazy" class="size-full wp-image-90414" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-3.jpeg" alt="AWS Lake Formation Grant Permissions page showing Named Data Catalog resources method with s3tablescatalog/aws-sagemaker-catalog selected, asset_metadata database, and asset table configured" width="1429" height="1073"&gt;
     &lt;p id="caption-attachment-90414" class="wp-caption-text"&gt;Figure 3 – AWS Lake Formation catalog, database, and table&lt;/p&gt;
    &lt;/div&gt; &lt;p&gt;&lt;/p&gt;
    &lt;div id="attachment_90413" style="width: 1438px" class="wp-caption alignnone"&gt;
     &lt;img aria-describedby="caption-attachment-90413" loading="lazy" class="size-full wp-image-90413" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-4.jpeg" alt="AWS Lake Formation Grant Permissions interface showing table permissions with Select and Describe checked, grantable permissions section, and All data access radio button selected" width="1428" height="1247"&gt;
     &lt;p id="caption-attachment-90413" class="wp-caption-text"&gt;Figure 4 – AWS Lake Formation grant permission&lt;/p&gt;
    &lt;/div&gt;&lt;/li&gt; 
   &lt;li&gt;For &lt;strong&gt;Table&lt;/strong&gt;, select &lt;strong&gt;asset&lt;/strong&gt;.&lt;/li&gt; 
   &lt;li&gt;Under &lt;strong&gt;Table permissions&lt;/strong&gt;, check &lt;strong&gt;Select&lt;/strong&gt; and &lt;strong&gt;Describe.&lt;/strong&gt;&lt;/li&gt; 
   &lt;li&gt;Choose &lt;strong&gt;Grant&lt;/strong&gt; to save the permissions.&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Enable data export using the AWS CLI&lt;/h3&gt; 
&lt;p&gt;Configure metadata export using the &lt;code&gt;PutDataExportConfiguration&lt;/code&gt; API. The &lt;a href="https://aws.amazon.com/datazone/" target="_blank" rel="noopener noreferrer"&gt;Amazon DataZone&lt;/a&gt; service automatically creates an S3 table bucket named &lt;code&gt;aws-sagemaker-catalog&lt;/code&gt; with an &lt;code&gt;asset_metadata&lt;/code&gt; namespace, and schedules a daily export job.&amp;nbsp;Asset metadata is exported once daily around midnight local time per AWS Region.&lt;/p&gt; 
&lt;p&gt;The SageMaker Domain identifier is available on domain detail page in the &lt;a href="https://aws.amazon.com/console/" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;. Accessing the asset table through the S3 Tables console or the Data tab in SageMaker Unified Studio can require up to 24 hours.&lt;/p&gt; 
&lt;p&gt;AWS CLI command to enable SageMaker catalog export:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws datazone put-data-export-configuration --domain-identifier &amp;lt;domain-id&amp;gt; --region &amp;lt;region&amp;gt; --enable-export&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Use this AWS CLI command to validate the configuration is enabled:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;aws datazone get-data-export-configuration --domain-identifier &amp;lt;domain-id&amp;gt;&amp;nbsp;--region &amp;lt;region&amp;gt;
{
&amp;nbsp;&amp;nbsp; &amp;nbsp;"isExportEnabled": true,
&amp;nbsp;&amp;nbsp; &amp;nbsp;"status": "COMPLETED",
&amp;nbsp;&amp;nbsp; &amp;nbsp;"s3TableBucketArn": "arn:aws:s3tables:&amp;lt;region&amp;gt;:&amp;lt;account-id&amp;gt;:bucket/aws-sagemaker-catalog",
&amp;nbsp;&amp;nbsp; &amp;nbsp;"createdAt": "2025-11-26T18:24:02.150000+00:00",
&amp;nbsp;&amp;nbsp; &amp;nbsp;"updatedAt": "2026-02-23T19:33:40.987000+00:00"
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Access the exported asset table&lt;/h3&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to Amazon SageMaker &lt;strong&gt;Domains&lt;/strong&gt; in the AWS Management Console.&lt;/li&gt; 
 &lt;li&gt;Select your domain and select &lt;strong&gt;Open&lt;/strong&gt;. &lt;p&gt;&lt;/p&gt;
  &lt;div id="attachment_90412" style="width: 1440px" class="wp-caption alignnone"&gt;
   &lt;img aria-describedby="caption-attachment-90412" loading="lazy" class="size-full wp-image-90412" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-5.jpeg" alt="Amazon SageMaker Domains management page showing an Identity Center based domain with Available status, created February 26, 2026, with Open unified studio button highlighted" width="1430" height="313"&gt;
   &lt;p id="caption-attachment-90412" class="wp-caption-text"&gt;Figure 5 – Open Amazon SageMaker Unified Studio&lt;/p&gt;
  &lt;/div&gt;&lt;/li&gt; 
 &lt;li&gt;In SageMaker Unified Studio, choose a project from the &lt;strong&gt;Select a project&lt;/strong&gt; dropdown list.&lt;/li&gt; 
 &lt;li&gt;To query SageMaker catalog data, select &lt;strong&gt;Build&lt;/strong&gt; in the menu bar and then choose &lt;strong&gt;Query Editor&lt;/strong&gt;. To create a new project, follow the instructions in the &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/getting-started-create-a-project.html" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Unified Studio User Guide&lt;/a&gt;. &lt;p&gt;&lt;/p&gt;
  &lt;div id="attachment_90411" style="width: 1439px" class="wp-caption alignnone"&gt;
   &lt;img aria-describedby="caption-attachment-90411" loading="lazy" class="size-full wp-image-90411" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-6.jpeg" alt="SageMaker Unified Studio project overview dashboard showing IDE and Applications, Data Analysis and Integration with Query Editor highlighted, Orchestration, and Machine Learning and Generative AI categories" width="1429" height="620"&gt;
   &lt;p id="caption-attachment-90411" class="wp-caption-text"&gt;Figure 6 – Open SageMaker Unified Studio Query Editor&lt;/p&gt;
  &lt;/div&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;The&amp;nbsp;&lt;code&gt;asset_metadata.asset&lt;/code&gt;&amp;nbsp;table is available in Data explorer. Use &lt;strong&gt;Data explorer&lt;/strong&gt; to view the schema and query data to perform analytics from.&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Expand &lt;strong&gt;Catalogs&lt;/strong&gt; in Data explorer. Then, select and expand &lt;strong&gt;s3tablecatalog, aws-sagemaker-catalog&lt;/strong&gt;, &lt;strong&gt;asset_metadata,&lt;/strong&gt; and &lt;strong&gt;asset.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Test querying the catalog with &lt;code&gt;SELECT * FROM asset_metadata.asset LIMIT 10;&lt;/code&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div id="attachment_90410" style="width: 1439px" class="wp-caption alignleft"&gt;
 &lt;img aria-describedby="caption-attachment-90410" loading="lazy" class="wp-image-90410 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-7.jpeg" alt="SageMaker Unified Studio Query Editor with Data Explorer showing Lakehouse hierarchy including s3tablescatalog, aws-sagemaker-catalog, asset_metadata database, and asset table schema with SQL SELECT query" width="1429" height="731"&gt;
 &lt;p id="caption-attachment-90410" class="wp-caption-text"&gt;Figure 7 – Query SageMaker catalog&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Queries for observability and analytics&lt;/h2&gt; 
&lt;p&gt;With setup complete, execute queries to gain insights on catalog usage and changes. To monitor asset growth, and view how the data catalog has grown over the last five days:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;SELECT 
 &amp;nbsp;&amp;nbsp; DATE (snapshot_time) as date,
 &amp;nbsp;&amp;nbsp; COUNT (*) as total_assets
FROM asset_metadata.asset
WHERE 
 &amp;nbsp;&amp;nbsp; &amp;nbsp;DATE (snapshot_time) &amp;gt;= CURRENT_DATE - INTERVAL '5' DAY
GROUP BY DATE (snapshot_time)
ORDER BY date DESC;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90409" style="width: 1439px" class="wp-caption alignleft"&gt;
 &lt;img aria-describedby="caption-attachment-90409" loading="lazy" class="wp-image-90409 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-8.jpeg" alt="SageMaker Unified Studio Query Editor showing SQL aggregation query on asset_metadata.asset table with results displaying date and total_assets columns, returning 42 assets for March 7-8, 2026&amp;quot;" width="1429" height="730"&gt;
 &lt;p id="caption-attachment-90409" class="wp-caption-text"&gt;Figure 8 – Query asset growth&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;Use the catalog to track metadata changes to determine which assets gained descriptions or ownership over time. Use this query to identify assets that gained business descriptions over the past five days by comparing today’s snapshot with the earlier snapshot.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;SELECT
 &amp;nbsp;&amp;nbsp; t.asset_id,
 &amp;nbsp;&amp;nbsp; t.resource_name,
 &amp;nbsp;&amp;nbsp; p.business_description as description_before,
 &amp;nbsp;&amp;nbsp; t.business_description as description_now
FROM asset_metadata.asset t
JOIN asset_metadata.asset p ON t.asset_id = p.asset_id
WHERE DATE(t.snapshot_time) = CURRENT_DATE
 &amp;nbsp;&amp;nbsp; AND DATE(p.snapshot_time) = CURRENT_DATE - INTERVAL '5' DAY
 &amp;nbsp;&amp;nbsp; AND p.business_description IS NULL
 &amp;nbsp;&amp;nbsp; AND t.business_description IS NOT NULL;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Investigate asset values at a specific point in time using this query to retrieve metadata from any snapshot date.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;SELECT
 &amp;nbsp;&amp;nbsp; &amp;nbsp;asset_id,
 &amp;nbsp;&amp;nbsp; &amp;nbsp;resource_name,
 &amp;nbsp;&amp;nbsp; &amp;nbsp;business_description,
 &amp;nbsp;&amp;nbsp; &amp;nbsp;extended_metadata['owningEntityId'] as owner,
 &amp;nbsp;&amp;nbsp; &amp;nbsp;snapshot_time
FROM asset_metadata.asset
WHERE asset_id = 'your-asset-id'
 &amp;nbsp;&amp;nbsp; &amp;nbsp;AND DATE(snapshot_time) = DATE('2025-11-26');&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Clean up resources&lt;/h2&gt; 
&lt;p&gt;To avoid ongoing charges, clean up the resources created in this walkthrough:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Disable metadata export:&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Disable the daily metadata export to stop new snapshots:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;aws datazone put-data-export-configuration \
  --domain-identifier &amp;lt;domain-id. \
  --no-enable-export \
  --region &amp;lt;region&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;&lt;strong&gt;Delete S3 Tables resources:&lt;/strong&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Optionally, delete the S3 Tables namespace containing the exported metadata to remove historical snapshots and stop storage charges. For instructions on how to delete S3 tables, see &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-delete.html" target="_blank" rel="noopener noreferrer"&gt;Deleting an Amazon S3 table&lt;/a&gt; in the Amazon Simple Storage Service User Guide.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, you enabled the metadata export feature of SageMaker Catalog and used SQL queries to gain visibility into your asset inventory. The feature converts asset metadata into Apache Iceberg tables partitioned by snapshot date, so you can perform time-travel queries, monitor catalog growth, track metadata completeness, and audit historical asset states. This provides a repeatable, low-overhead way to maintain catalog health and meet governance requirements over time.&lt;/p&gt; 
&lt;p&gt;To learn more about Amazon SageMaker Catalog, see the&amp;nbsp;&lt;a href="https://aws.amazon.com/sagemaker/catalog/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Catalog documentation&lt;/a&gt;. To explore Apache Iceberg table formats and time-travel queries, see the&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables.html" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 Tables documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;hr&gt; 
&lt;h3&gt;About the Authors&lt;/h3&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-90408" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-9.png" alt="Photo of Author Ramesh Singh" width="100" height="134"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="http://www.linkedin.com/in/ramesh-harisaran-singh" target="_blank" rel="noopener noreferrer"&gt;Ramesh&lt;/a&gt;&amp;nbsp;is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon SageMaker team. He is passionate about building high-performance ML/AI and analytics products that help enterprise customers achieve their critical goals using cutting-edge technology.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-90407" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-10.png" alt="Photo of Author Pradeep Misra" width="100" height="130"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/pradeep-m-326258a/" target="_blank" rel="noopener noreferrer"&gt;Pradeep&lt;/a&gt;&amp;nbsp;is a Principal Analytics and Applied AI Solutions Architect at AWS. He is passionate about solving customer challenges using data, analytics, and Applied AI. Outside of work, he likes exploring new places and playing badminton with his family. He also likes doing science experiments, building LEGOs, and watching anime with his daughters.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-90406" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-11.png" alt="Photo of Author - Rohith Kayathi" width="190" height="203"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/rohith-kayathi/" target="_blank" rel="noopener noreferrer"&gt;Rohith&lt;/a&gt; is a Senior Software Engineer at Amazon Web Services (AWS) working with Amazon SageMaker team. He leads business data catalog, generative AI–powered metadata curation, and lineage solutions. He is passionate about building large-scale distributed systems, solving complex problems, and setting the bar for engineering excellence for his team.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-90405" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/20/BDB-5843-image-12.jpeg" alt="Photo of AUthor - Steve Phillips" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/stevephillipsca" target="_blank" rel="noopener noreferrer"&gt;Steve&lt;/a&gt; is a Principal Technical Account Manager and Analytics specialist at AWS in the North America region. Steve currently focuses on data warehouse architectural design, data lakes, data ingestion pipelines, and cloud distributed architectures.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Configure a custom domain name for your Amazon MSK cluster enabled with IAM authentication</title>
		<link>https://aws.amazon.com/blogs/big-data/configure-a-custom-domain-name-for-your-amazon-msk-cluster-enabled-with-iam-authentication/</link>
					
		
		<dc:creator><![CDATA[Mazrim Mehrtens]]></dc:creator>
		<pubDate>Tue, 21 Apr 2026 16:33:29 +0000</pubDate>
				<category><![CDATA[Amazon Managed Streaming for Apache Kafka (Amazon MSK)]]></category>
		<category><![CDATA[Expert (400)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">b127855f1f99dc9a8ba267ae5be51e65c2868798</guid>

					<description>In the first part of Configure a custom domain name for your Amazon MSK cluster, we discussed about why custom domain names are important and provided details on how to configure a custom domain name in Amazon MSK when using SASL_SCRAM authentication. In this post, we discuss how to configure a custom domain name in Amazon MSK when using IAM authentication.</description>
										<content:encoded>&lt;p&gt;Most &lt;a href="https://aws.amazon.com/msk/" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Streaming for Apache Kafka&lt;/a&gt; (Amazon MSK) customers are simplifying and standardizing access control to Kafka resources using&amp;nbsp;&lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management&lt;/a&gt;&amp;nbsp;(IAM) authentication. This adoption is also accelerated as &lt;a href="https://aws.amazon.com/blogs/big-data/amazon-msk-iam-authentication-now-supports-all-programming-languages/" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK now supports IAM authentication in popular languages&lt;/a&gt; including&amp;nbsp;&lt;a href="https://github.com/aws/aws-msk-iam-auth" target="_blank" rel="noopener noreferrer"&gt;Java&lt;/a&gt;, &lt;a href="https://github.com/aws/aws-msk-iam-sasl-signer-python" target="_blank" rel="noopener noreferrer"&gt;Python&lt;/a&gt;, &lt;a href="https://github.com/aws/aws-msk-iam-sasl-signer-go" target="_blank" rel="noopener noreferrer"&gt;Go&lt;/a&gt;, &lt;a href="https://github.com/aws/aws-msk-iam-sasl-signer-js" target="_blank" rel="noopener noreferrer"&gt;JavaScript&lt;/a&gt;, and &lt;a href="https://github.com/aws/aws-msk-iam-sasl-signer-net" target="_blank" rel="noopener noreferrer"&gt;.NET&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;In the first part of &lt;a href="https://aws.amazon.com/blogs/big-data/configure-a-custom-domain-name-for-your-amazon-msk-cluster/" target="_blank" rel="noopener noreferrer"&gt;Configure a custom domain name for your Amazon MSK cluster&lt;/a&gt;, we discussed about why custom domain names are important and provided details on how to configure a custom domain name in Amazon MSK when using SASL_SCRAM authentication. In this post, we discuss how to configure a custom domain name in Amazon MSK when using IAM authentication. We recommend you read the first part of this blog as it captures solution details implementation steps.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;IAM authentication for Amazon MSK uses TLS to encrypt the Kafka protocol traffic between the client and Kafka broker. To use a custom domain name, the Kafka broker needs to present a server certificate that matches the custom domain name. To achieve this, this solution uses an&amp;nbsp;&lt;a href="https://aws.amazon.com/elasticloadbalancing/network-load-balancer/" target="_blank" rel="noopener noreferrer"&gt;Network Load Balancers (NLBs)&lt;/a&gt; with Amazon Certificate Manager to provide a custom certificate on behalf of the MSK brokers, and a Route 53 Private Hosted Zone to provide DNS for the custom domain name.&lt;/p&gt; 
&lt;p&gt;The following diagram shows all components used by the solution.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89068 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-1-2.jpg" alt="Architecture showing configuration of custom domain name with Amazon MSK" width="571" height="536"&gt;&lt;/p&gt; 
&lt;h3&gt;Certificate management&lt;/h3&gt; 
&lt;p&gt;For clients to perform TLS communication with the MSK cluster the cluster needs to provide a certificate with hostnames matching the custom domain name. This solution uses a certificate in&amp;nbsp;&lt;a href="https://aws.amazon.com/certificate-manager/" target="_blank" rel="noopener noreferrer"&gt;AWS Certificate Manager&lt;/a&gt; (ACM) signed with a Private Certificate Authority (PCA) for TLS with the custom domain name. This solution uses a&amp;nbsp;certificate with&amp;nbsp;&lt;code&gt;bootstrap.example.com&lt;/code&gt; as the Common Name (CN) so that the certificate is valid for the bootstrap address, and Subject Alternative Names (SANs) are set for all broker DNS names (such as&amp;nbsp;&lt;code&gt;b-1.example.com&lt;/code&gt;). Since this solution uses a private certificate authority, the CA chain must be imported into the client trust stores.&lt;/p&gt; 
&lt;p&gt;This solution works with any server certificate, whether certificates are signed by a public or private Certificate Authority (CA). You can import existing certificates into ACM to be used with this solution. Certificates must provide a common name and/or subject alternative names that match the bootstrap DNS address as well as the individual broker DNS addresses. If the certificate is issued by a private CA, clients need to import the root and intermediate CA certificates to the client trust store. If the certificate is issued by a public CA, the root and intermediate CA certificates will be in the default trust store.&lt;/p&gt; 
&lt;h3&gt;Network Load Balancer&lt;/h3&gt; 
&lt;p&gt;The NLB provides the ability to use a &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/create-tls-listener.html" target="_blank" rel="noopener noreferrer"&gt;TLS listener&lt;/a&gt;. The ACM certificate is associated with the listeners and enables TLS negotiation between the client and the NLB. The NLB performs a separate TLS negotiation between itself and the MSK brokers. In addition to the above architecture, this solution also allows using AWS Private Link to connect the cluster to external VPCs. This allows secure access to MSK between VPCs while using a custom domain name.&lt;/p&gt; 
&lt;p&gt;The following diagram illustrates the NLB port and target configuration. A TLS listener with port 9000 is used for bootstrap connections with all MSK brokers set as targets. IAM authentication is configured to run on port 9098 of the MSK brokers using a TLS target type. A TLS listener port is used to represent each broker in the MSK cluster. In this post, there are three brokers in the MSK cluster starting with port 9001, representing broker 1 and up to port 9003, representing broker 3.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89067 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-2-2.jpg" alt="Target Group mapping in NLB" width="1001" height="748"&gt;&lt;/p&gt; 
&lt;h3&gt;Domain Name System (DNS)&lt;/h3&gt; 
&lt;p&gt;For the client to resolve DNS queries for the custom domain, we use an &lt;a href="https://aws.amazon.com/route53/" target="_blank" rel="noopener noreferrer"&gt;Amazon Route 53&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/hosted-zones-private.html" target="_blank" rel="noopener noreferrer"&gt;private hosted zone&lt;/a&gt; to host the DNS records, and associate it with the client’s VPC to enable DNS resolution from the &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver.html" target="_blank" rel="noopener noreferrer"&gt;Route 53 VPC resolver&lt;/a&gt;. This solution uses a private MSK cluster and private DNS. For publicly accessible MSK clusters a public NLB and DNS provider such as a &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/AboutHZWorkingWith.html"&gt;Route53 public hosted zone&lt;/a&gt; can be used.&lt;/p&gt; 
&lt;h3&gt;Amazon MSK&lt;/h3&gt; 
&lt;p&gt;Finally, each broker needs to have its advertised listeners configuration (&lt;code&gt;advertised.listeners&lt;/code&gt;) updated to match the custom domain name and NLB ports.&amp;nbsp;Advertised listeners is a configuration option used by Kafka clients to connect to the brokers. By default, an advertised listener is not set. Once set, Kafka clients use the advertised listener instead of&amp;nbsp;&lt;code&gt;listeners&lt;/code&gt; to obtain the connection information for brokers.&amp;nbsp;MSK brokers use the listener configuration to tell clients the DNS names and ports to use to connect to the individual brokers for each authentication type enabled. Advertised listeners are unique to each broker; and the cluster won’t start if multiple brokers have the same advertised listener address. For this reason, this solution uses a unique custom DNS name for each broker&amp;nbsp;(such as,&amp;nbsp;&lt;code&gt;b-1.example.com&lt;/code&gt;).&lt;/p&gt; 
&lt;h2&gt;Solution Deployment&lt;/h2&gt; 
&lt;p&gt;To deploy the solution, use the CloudFormation template from the &lt;a href="https://github.com/aws-samples/sample-msk-custom-domain-name-iam-auth" target="_blank" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; repository.&lt;/p&gt; 
&lt;p&gt;This template deploys a VPC, NLB, PCA, ACM certificate, MSK cluster, and an Amazon EC2 instance for cluster connectivity. The EC2 instance includes a script to handle updating the broker &lt;code&gt;advertised.listeners&lt;/code&gt; settings to match the custom domain name. For more information on deploying a CloudFormation template, refer to&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html"&gt;Create a stack from the CloudFormation console&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;After deploying the CloudFormation template, run the script to update advertised listeners as follows:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Retrieve the &lt;strong&gt;MSKClusterARN&lt;/strong&gt; and &lt;strong&gt;CertificateAuthorityARN&lt;/strong&gt; from the CloudFormation outputs for your stack as they will be used in subsequent steps.&lt;br&gt; &lt;img loading="lazy" class="size-full wp-image-89066 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-3-4.png" alt="" width="2324" height="1052"&gt;&lt;/li&gt; 
 &lt;li&gt;Navigate to the EC2 console and identify the KafkaClientInstance. Choose &lt;strong&gt;Connect&lt;/strong&gt; to connect to the instance using &lt;a href="https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager.html" target="_blank" rel="noopener noreferrer"&gt;AWS Systems Manager Session Manager&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Session Manager starts a session in shell. Start a bash session with the command: 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-shell"&gt;bash -l&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89913" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/msk-iam-domain-image-4.jpg" alt="" width="656" height="149"&gt;&lt;/p&gt;&lt;/li&gt; 
 &lt;li&gt;The Kafka client SDKs have already been installed in the EC2 instance. You can update the &lt;code&gt;advertised.listeners&lt;/code&gt; configuration as follows, replacing &lt;strong&gt;CLUSTER_ARN&lt;/strong&gt; with the ARN of your MSK cluster retrieved from CloudFormation in step 1: 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-shell"&gt;./update_advertised_listeners.sh --region us-east-1 --cluster-arn CLUSTER_ARN&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;p&gt;Note that once this script completes, the brokers will have new advertised listeners configurations. Connections using the standard IAM address for the MSK service will not work until we complete the next steps, as the brokers will redirect connections over this address back to the custom domain name and TLS will fail.&lt;/p&gt;&lt;/li&gt; 
 &lt;li&gt;Next, we need to create a truststore with the certificate for our AWS Private Certificate Authority (PCA) to allow TLS with the NLB. In the following command, replace &lt;strong&gt;PCA_ARN&lt;/strong&gt; with the ARN of the PCA retrieved from CloudFormation in step 1:&lt;br&gt; &lt;img loading="lazy" class="size-full wp-image-89064 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-5-3.png" alt="" width="2324" height="836"&gt;We’re using the default Java truststore which uses the password &lt;code&gt;changeit&lt;/code&gt;.When asked “Trust this certificate?” enter “yes”.&lt;p&gt;&lt;/p&gt; 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-shell"&gt;export&amp;nbsp;PCA_ARN=&amp;lt;&amp;lt;PCA_ARN&amp;gt;&amp;gt;
export&amp;nbsp;REGION=&amp;lt;&amp;lt;REGION&amp;gt;&amp;gt;

cp /etc/pki/java/cacerts . &amp;amp;&amp;amp; chmod 600 cacerts
aws acm-pca get-certificate-authority-certificate --certificate-authority-arn $PCA_ARN --region $REGION&amp;nbsp;| jq -r '.Certificate' &amp;gt; pca.pem
keytool -import -file pca.pem -alias AWSPCA -keystore&amp;nbsp;cacerts&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create a new properties file to allow IAM authentication with our custom truststore: 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-shell"&gt;cat &amp;lt;&amp;lt;EOF &amp;gt;&amp;gt; /home/ssm-user/client-iam.properties
ssl.truststore.location=/home/ssm-user/cacerts
ssl.truststore.password=changeit
EOF&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Verify you can connect to the cluster using IAM authentication using our new custom domain name, replacing bootstrap.example.com with your own custom domain name if you used a different one in CloudFormation: 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-code"&gt;bin/kafka-topics.sh --list --command-config client-iam.properties --bootstrap-server bootstrap.example.com:9000&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-89544 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/26/bdb5167i6.jpg" alt="" width="2560" height="360"&gt;&lt;/p&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Cleanup&lt;/h2&gt; 
&lt;p&gt;To stop incurring costs navigate to CloudFormation and delete the CloudFormation stack to remove all resources provisioned by CloudFormation.&lt;/p&gt; 
&lt;h2&gt;Frequently Asked Question about Custom Domain Name&lt;/h2&gt; 
&lt;p&gt;Customers have asked a few questions about implementing custom domain names with MSK. You can find answers to some of the most popular questions here.&lt;/p&gt; 
&lt;h3&gt;Are there any limitations for this solution on MSK?&lt;/h3&gt; 
&lt;p&gt;The &lt;code&gt;advertised.listeners&lt;/code&gt; setting was removed as a dynamic broker in KRaft-based Kafka clusters. Therefore, this solution is only supported in Zookeeper-based MSK clusters. Additionally, this solution is only applicable to SASL/SCRAM and IAM-authentication based MSK clusters.&lt;/p&gt; 
&lt;h3&gt;How the custom domain name solution scales when we add new brokers?&lt;/h3&gt; 
&lt;p&gt;When using the NLB for broker connectivity (&lt;a href="https://aws.amazon.com/blogs/big-data/configure-a-custom-domain-name-for-your-amazon-msk-cluster/#:~:text=Option%202%3A%20All%20connections%20through%20an%20NLB" target="_blank" rel="noopener noreferrer"&gt;option 2 in the configure a custom domain name for your Amazon MSK cluster blog post&lt;/a&gt;), you will need to add an additional listener for each additional broker created.&lt;/p&gt; 
&lt;p&gt;For TLS, if using Subject Alternative Name (SAN) to list individual broker DNS hostnames, you will need to create a new certificate that includes the names of the additional brokers. One option is to create a certificate with SANs for more brokers than needed to allow for growth.If a wildcard certificate is used, you do not need to modify certificates when adding brokers.&lt;/p&gt; 
&lt;h3&gt;What changes are required when we remove brokers?&lt;/h3&gt; 
&lt;p&gt;Amazon MSK supports scale-in by removing brokers from the cluster. Brokers are removed from each availability zones (AZ). So a 6 broker Amazon MSK cluster deployed in 3 AZ can be reduced to 3 broker cluster deployed in 3 AZ. When brokers are removed, you can remove the NLB listeners for the removed broker along with the Route53 DNS endpoints. However, you can also leave them as is, or just remove the target IP from the broker numbers target group. The NLB will mark the targets as unhealthy and stop directing traffic to them. If you ever plan to scale-out the number of brokers, you can re-use the existing NLB listeners and Route 53 DNS entries and would only need to update the target IPs used in the broker numbers target group.&lt;/p&gt; 
&lt;h3&gt;Is there any change in configuration required if there is any broker failure?&lt;/h3&gt; 
&lt;p&gt;No. When a broker fails, Amazon MSK replaces the failed broker with a new broker instance keeping the configuration of the broker exactly the same. So, there would be no change in the advertised listener of the broker. Once the broker is healthy, the broker can accept new connections and read/write traffic.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Can you use Amazon MSK Replicator between MSK clusters in multiple AWS Regions when using the custom domain name solution?&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The &lt;a href="https://aws.amazon.com/msk/features/msk-replicator/" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK Replicator&lt;/a&gt; can be used when using the custom domain name solution, either in an active-passive or active-active setup. The same process can be followed to set the custom domain name.&lt;/p&gt; 
&lt;p&gt;You then follow &lt;a href="https://aws.amazon.com/blogs/big-data/build-multi-region-resilient-apache-kafka-applications-with-identical-topic-names-using-amazon-msk-and-amazon-msk-replicator/" target="_blank" rel="noopener noreferrer"&gt;build multi-Region resilient Apache Kafka applications with identical topic names using Amazon MSK and Amazon MSK Replicator&lt;/a&gt; post to configure MSK Replicator.&lt;/p&gt; 
&lt;p&gt;The following diagram shows an active-active AWS multi-Region MSK setup using the custom domain name solution:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89062 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-7-5.png" alt="" width="1430" height="823"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Can I use a global bootstrap DNS name to connect to Amazon MSK clusters deployed across multiple AWS regions when IAM authentication is enabled?&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;No, it is not possible to use a global bootstrap reference to represent MSK clusters deployed in multiple AWS Regions, unless the client is aware of the cluster’s region when connecting. To use IAM authentication, the correct AWS Region must be included in the IAM authentication request for a given cluster. This is because the AWS Region is a part of the Sigv4 authentication protocol used by IAM. This scope prevents the IAM authorization being used to talk to a resource in another AWS Region. You can provide the AWS Region in one of two ways– with region-specific bootstrap URLs or by explicitly configuring the region.&lt;/p&gt; 
&lt;p&gt;For example, if the bootstrap string is &lt;a href="http://bootstrap.us-east-1.example.com/" target="_blank" rel="noopener noreferrer"&gt;bootstrap.us-east-1.example.com&lt;/a&gt;, then &lt;a href="https://github.com/aws/aws-msk-iam-auth" target="_blank" rel="noopener noreferrer"&gt;msk-iam-auth&lt;/a&gt; library will to extract the AWS Region from the broker connection string and use us-east-1 in its IAM requests. If the bootstrap string is simply &lt;a href="http://bootstrap.example.com/" target="_blank" rel="noopener noreferrer"&gt;bootstrap.example.com&lt;/a&gt;, then the client must explicitly configure AWS_REGION=us-east-1 to connect to the cluster if it is in us-east-1, or us-west-2 if it is in us-west-2.&lt;/p&gt; 
&lt;p&gt;Note that this is a limitation for IAM authentication, but not for SASL/SCRAM authentication. With SASL/SCRAM authentication, if the client’s credentials are applied to both clusters the global endpoint can point to either cluster and the client will be able to connect. The AWS Region is not used in SASL/SCRAM authentication, so it does not restrict the authentication scope.&lt;/p&gt; 
&lt;h3&gt;How to allow public access to a private MSK cluster using the custom domain name solution?&lt;/h3&gt; 
&lt;p&gt;To provide public access to a MSK cluster using the custom domain solution, you will need to do the following:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Create an Internet-facing NLB, and associate public subnets (subnets that have a route to the Internet Gateway attached to the VPC).&lt;/li&gt; 
 &lt;li&gt;Create ingress rules in both the NLB and MSK security groups permitting the required public addresses. Note: the port will be 9098 for the MSK security group, and the ports you are using on the NLB listeners.&lt;/li&gt; 
 &lt;li&gt;Provide public DNS resolution for the Kafka clients, by using a Route 53 public zone, or an alternative public DNS resolver.&lt;/li&gt; 
 &lt;li&gt;The client needs have IAM credentials, with permission, to talk to the MSK brokers, using an &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html" target="_blank" rel="noopener noreferrer"&gt;IAM role&lt;/a&gt;,&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds-programmatic-access.html" target="_blank" rel="noopener noreferrer"&gt;IAM access keys&lt;/a&gt;, &lt;a href="https://aws.amazon.com/iam/roles-anywhere/" target="_blank" rel="noopener noreferrer"&gt;IAM Roles Anywhere&lt;/a&gt;, or another mechanism that uses the AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89061 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-8.jpg" alt="" width="1478" height="1262"&gt;&lt;/p&gt; 
&lt;h3&gt;In the first part of the blog, two patterns have been highlighted.&amp;nbsp;How to decide which pattern to use and why?&lt;/h3&gt; 
&lt;h3&gt;Option 1: Only bootstrap connection through NLB&lt;/h3&gt; 
&lt;p&gt;If the Kafka clients have direct access to the broker, then you can use custom domain name for the bootstrap connection while the clients can still connect to the MSK Brokers with broker DNS. This is the simplest option, as it does not require custom TLS certificates or TLS listeners.Note that this option is not necessary when using MSK Express brokers, as MSK Express brokers already manages bootstrapping via a broker-agnostic connection string. For MSK Express, this option does not add value other than configuring a custom domain name for appearances / simplicity of client configuration. For MSK Standard brokers, this can improve client connectivity by making connection strings broker agnostic.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89060 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-9.jpg" alt="" width="625" height="107"&gt;&lt;/p&gt; 
&lt;h3&gt;Option 2:&amp;nbsp;All connections through NLB&lt;/h3&gt; 
&lt;p&gt;When Kafka clients don’t have direct access to Amazon MSK Brokers, routing all connections through the NLB can be preferred. This can occur when a client is deployed in a different VPC than Amazon MSK VPC or the client is external, and when Amazon MSK Multi VPC Connectivity is not an option. In general, Amazon MSK Multi VPC Connectivity is preferred as this is a simpler pattern for most organizations to manage MSK Connectivity across accounts and VPCs.When Multi VPC Connectivity is not an option, NLB can be used to provide connectivity with Transit Gateway or PrivateLink, and the solution mentioned in the blog should be used.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89059 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-10.jpg" alt="" width="623" height="168"&gt;&lt;/p&gt; 
&lt;p&gt;Here is an example architecture how Kafka client and Amazon MSK cluster deployed in two separate VPCs but connected via AWS Private Link.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89058 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/image-11-2.png" alt="" width="1287" height="611"&gt;&lt;/p&gt; 
&lt;h3&gt;Is Amazon Route 53 required to use a custom domain name with Amazon MSK?&lt;/h3&gt; 
&lt;p&gt;You can use an alternative DNS resolver service, and do not require Amazon Route 53 to use a custom domain name with Amazon MSK. The only requirement is that your clients can resolve against your DNS resolver service. The only change required, is to use a CNAME for the DNS records, referencing the &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#dns-name" target="_blank" rel="noopener noreferrer"&gt;NLBs DNS record&lt;/a&gt;, in place of the Alias records, as this is record type is only available in Amazon Route 53.&lt;/p&gt; 
&lt;h3&gt;We don’t use Amazon Certificate Manager (ACM), can NLB integrate with other 3rd party certificate managers?&lt;/h3&gt; 
&lt;p&gt;NLB only supports ACM to bind a certificate to a TLS listener. You can import a certificate created using your 3rd party certificate manager into ACM, and do not need to create a certificate using ACM.&lt;/p&gt; 
&lt;h3&gt;Getting connection to node terminated during authentication after setting&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;, what could be the issue?&lt;/h3&gt; 
&lt;p&gt;As the issue started to occur after changing the&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;configuration, the issue is unlikely to be related to permissions. The following can cause this issue:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;The NLB and/or client’s Security Group does not permit access to the listener ports on the NLB from the client.&lt;/li&gt; 
 &lt;li&gt;A firewall appliance between the NLB and client does not permit the client to talk to the NLB using the listener ports.&lt;/li&gt; 
 &lt;li&gt;The&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;configuration has an error causing the client to receive invalid details, such as a typo in the name. If this is the case, use a client in the same VPC as the MSK broker that has IAM permissions to talk to the MSK broker, and Security Group rules permitting connectivity, you then use the following command to delete the&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;configuration.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-shell"&gt;/home/ec2-user/kafka/bin/kafka-configs.sh --alter \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --bootstrap-server  \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --entity-type brokers \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --entity-name  \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --command-config ~/kafka/config/client_iam.properties \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --delete-config advertised.listeners&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;BROKERS_AMAZON_DNS_NAME such as&amp;nbsp;&lt;code&gt;b-1.clustername.xxxxxx.yy.kafka.region.amazonaws.com:9098&lt;/code&gt;.&lt;/p&gt; 
&lt;h3&gt;Getting “unexpected broker id, expected 2 or empty string, but received 1”, what is causing this error?&lt;/h3&gt; 
&lt;p&gt;This error is typically presented when the&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;configuration for one of the brokers has the port used by another broker set. For example broker 2 has port 9001 set for IAM, but this port is used to connect to broker 1, so broker 1 is responding with an error to say you presented broker id 2, but I am broker 1.&lt;/p&gt; 
&lt;p&gt;To correct this, you will need to update the broker with the incorrect&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;configuration to use the correct port. To gain access to the broker to make the change, you will need to use the following command to delete the incorrect configuration:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-shell"&gt;/home/ec2-user/kafka/bin/kafka-configs.sh --alter \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --bootstrap-server \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --entity-type brokers \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --entity-name  \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --command-config ~/kafka/config/client_iam.properties \
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --delete-config advertised.listeners&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;BROKERS_AMAZON_DNS_NAME such as&amp;nbsp;&lt;code&gt;b-2.clustername.xxxxxx.yy.kafka.region.amazonaws.com:9098&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;You then need to use the following command to set the&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;configuration for that broker:&lt;/p&gt; 
&lt;p&gt;Note:&amp;nbsp;The&amp;nbsp;&lt;code&gt;advertised.listeners&lt;/code&gt;&amp;nbsp;configuration in the below assumes only IAM is used for authentication. If you are using additional authentication options, you will need to include them.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-shell"&gt;MSKDOMAIN=
broker_id=
Domain=

/home/ec2-user/kafka/bin/kafka-configs.sh&amp;nbsp;--alter&amp;nbsp;\
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --bootstrap-server&amp;nbsp;&amp;nbsp;\
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --entity-type&amp;nbsp;brokers&amp;nbsp;\
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --entity-name&amp;nbsp;"$broker_id"&amp;nbsp;\
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --command-config&amp;nbsp;~/kafka/config/client_iam.properties&amp;nbsp;\
&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; --add-config&amp;nbsp;"advertised.listeners=[CLIENT_IAM://b-$broker_id.$Domain:900$broker_id,REPLICATION://b-$broker_id-internal.$MSKDOMAIN:9093,REPLICATION_SECURE://b-$broker_id-internal.$MSKDOMAIN:9095]"&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Summary&lt;/h2&gt; 
&lt;p&gt;In this post, we explained how you can use an NLB, Route 53, and the advertised listener configuration option in Amazon MSK to support custom domain names with MSK clusters when using IAM authentication. You can use this solution to keep your existing Kafka bootstrap DNS name and reduce or remove the need to change client applications because of a migration, recovery process, or to use a DNS name in line with your organization’s naming convention (for example,&amp;nbsp;msk.prod.example.com).&lt;/p&gt; 
&lt;p&gt;Try the solution out for yourself, and leave your questions and feedback in the comments section.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-29797" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2024/04/26/subham.jpg" alt="Subham Rakshit" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Subham Rakshit&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/subhamrakshit/" target="_blank" rel="noopener"&gt;Subham&lt;/a&gt; is a Senior Streaming Solutions Architect for Analytics at AWS based in the UK. He works with customers to design and build streaming architectures so they can get value from analyzing their streaming data. His two little daughters keep him occupied most of the time outside work, and he loves solving jigsaw puzzles with them.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-29797" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2024/06/17/mgtaylor_headshot.jpg" alt="Mark Taylor" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Mark Taylor&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/mark-taylor-5b77a525/" target="_blank" rel="noopener"&gt;Mark&lt;/a&gt; is a Senior Technical Account Manager at AWS, working with enterprise customers to implement best practices, optimize AWS usage, and address business challenges. Mark lives in Folkestone, England, with his wife and two dogs. Outside of work, he enjoys watching and playing football, watching movies, playing board games, and traveling.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-89475" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/bdb-5775-mmehrten-headshot.png" alt="" width="100" height="107"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Mazrim Mehrtens&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/mmehrtens/" target="_blank" rel="noopener"&gt;Mazrim&lt;/a&gt; is a Sr. Specialist Solutions Architect for messaging and streaming workloads. Mazrim works with customers to build and support systems that process and analyze terabytes of streaming data in real time, run enterprise Machine Learning pipelines, and create systems to share data across teams seamlessly with varying data toolsets and software stacks.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Migrate third-party and self-managed Apache Kafka clusters to Amazon MSK Express brokers with Amazon MSK Replicator</title>
		<link>https://aws.amazon.com/blogs/big-data/migrate-third-party-and-self-managed-apache-kafka-clusters-to-amazon-msk-express-brokers-with-amazon-msk-replicator/</link>
					
		
		<dc:creator><![CDATA[Ankita Mishra]]></dc:creator>
		<pubDate>Mon, 20 Apr 2026 20:00:27 +0000</pubDate>
				<category><![CDATA[Amazon Managed Streaming for Apache Kafka (Amazon MSK)]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Migration]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">8cb88c43081d8e03912134cd163fcbe713a04c24</guid>

					<description>In this post, we walk you through how to replicate Apache Kafka data from your external Apache Kafka deployments to Amazon MSK Express brokers using MSK Replicator. You will learn how to configure authentication on your external cluster, establish network connectivity, set up bidirectional replication, and monitor replication health to achieve a low-downtime migration.</description>
										<content:encoded>&lt;p&gt;Migrating Apache Kafka workloads to the cloud often involves managing complex replication infrastructure, coordinating application cutovers with extended downtime windows, and maintaining deep expertise in open-source tools like Apache Kafka’s MirrorMaker 2 (MM2). These challenges slow down migrations and increase operational risk. &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-replicator.html" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK Replicator&lt;/a&gt; addresses these challenges, enabling you to migrate your Kafka deployments (referred to as “external” Kafka clusters) to &lt;a href="https://aws.amazon.com/msk/" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-broker-types-express.html" target="_blank" rel="noopener noreferrer"&gt;Express brokers&lt;/a&gt; with minimal operational overhead and reduced downtime. MSK Replicator supports data migration from Kafka deployments (version 2.8.1 or later) that have &lt;a href="https://kafka.apache.org/42/security/authentication-using-sasl/" target="_blank" rel="noopener noreferrer"&gt;SASL/SCRAM authentication&lt;/a&gt; enabled – including Kafka clusters running on-premises, on AWS, or other cloud providers, as well as Kafka-protocol-compatible services like Confluent Platform, Avien, RedPanda, WarpStream, or AutoMQ when configured with SASL/SCRAM authentication.&lt;/p&gt; 
&lt;p&gt;In this post, we walk you through how to replicate Apache Kafka data from your external Apache Kafka deployments to Amazon MSK Express brokers using MSK Replicator. You will learn how to configure authentication on your external cluster, establish network connectivity, set up bidirectional replication, and monitor replication health to achieve a low-downtime migration.&lt;/p&gt; 
&lt;h2&gt;How it works&lt;/h2&gt; 
&lt;p&gt;MSK Replicator is a fully managed serverless service that replicates topics, configurations, and offsets from cluster to cluster. It alleviates the need to manage complex infrastructure or configure open-source tools.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90053" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5876-1.png" alt="" width="3084" height="3044"&gt;&lt;/p&gt; 
&lt;p&gt;Before MSK Replicator, customers used tools like MM2 for migrations. These tools lack bi-directional topic replication when using the same topic names, creating complex application architectures to consume different topics on different clusters. Custom replication policies in MM2 can allow identical topic names, but MM2 still lacks bidirectional offset replication because the MM2 architecture requires producers and consumers to run on the same cluster to replicate offsets. This created complex migrations that required either migrating consumers before producers or big-bang migrations migrating all applications at once. When customers run into issues during the migration, the rollback process is error-prone and introduces large amounts of duplicate message processing due to the lack of consumer group offset synchronization. These approaches create risk and complexity for customers that make migrations difficult to manage.&lt;/p&gt; 
&lt;p&gt;MSK Replicator addresses these problems by supporting bidirectional replication of data and enhanced consumer group offset synchronization. MSK Replicator copies topics and offsets from an external Kafka cluster to MSK, allowing you to preserve the same topic and consumer group names on both clusters. MSK Replicator also supports creating a second Replicator instance for bidirectional replication of both data and enhanced offset synchronization, allowing producers and consumers to run independently on different Kafka clusters. Data published or consumed on the Amazon MSK cluster will be replicated back to the external cluster by the second Replicator. This feature works when producers and consumers are migrated regardless of order without worrying about dependencies between applications.&lt;/p&gt; 
&lt;p&gt;Because MSK Replicator provides bidirectional data replication and enhanced consumer group offset synchronization, you can move producers and consumers at your own pace without data loss. This reduces migration complexity, allowing you to migrate applications between your external Kafka cluster and Amazon MSK regardless of order. If you run into problems during the migration, enhanced offset synchronization allows you to roll back changes by moving applications back to the external Kafka cluster, where they restart from the latest checkpoint from the Amazon MSK cluster.&lt;/p&gt; 
&lt;p&gt;For example, consider three applications:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;The “Orders” application, which accepts incoming orders and writes them to the orders Kafka topic&lt;/li&gt; 
 &lt;li&gt;The “Order status” application, which reads from the “orders” Kafka topic and writes status updates to the &lt;code&gt;order_status&lt;/code&gt; topic&lt;/li&gt; 
 &lt;li&gt;The “Customer notification” application, which reads from the &lt;code&gt;order_status&lt;/code&gt; topic and notifies customers when status changes&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90054" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/BDB-5876-2.png" alt="" width="3764" height="1364"&gt;&lt;/p&gt; 
&lt;p&gt;MSK Replicator enables these applications to be migrated between an on-premises Apache Kafka cluster and an Amazon MSK Express cluster with low downtime and no data loss, regardless of order. The “Order status” application can migrate first, receive orders from the on-premises “Orders” application, and send status updates to the on-premises “Customer notification” application. If issues arise during the migration, the “Order status” application can roll back to the on-premises cluster and its consumer group offsets for the orders topic will be ready for it to pick up from where it left off on the Amazon MSK cluster.&lt;/p&gt; 
&lt;p&gt;MSK Replicator supports data distribution across hybrid and multi-cloud environments for analytics, compliance, and business continuity. It is also configured for disaster recovery scenarios where Amazon MSK Express serves as a resilient target for your external Kafka clusters.&lt;/p&gt; 
&lt;p&gt;If you are currently using MM2 for replication, see &lt;a href="https://aws.amazon.com/blogs/big-data/amazon-msk-replicator-and-mirrormaker2-choosing-the-right-replication-strategy-for-apache-kafka-disaster-recovery-and-migrations/" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK Replicator and MirrorMaker2: Choosing the right replication strategy for Apache Kafka disaster recovery and migrations&lt;/a&gt; to understand which solution best fits your use case.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;MSK Replicator supports Kafka deployments running version 2.8.1 or later as a source, including 3rd party managed Kafka services, self-managed Kafka, and on-premises or third-party cloud-hosted Kafka. MSK Replicator automatically handles data transfer, uses SASL/SCRAM authentication with SSL encryption, and maintains consumer group positions across both clusters. If you do not use SASL/SCRAM today, this can be configured as a new listener used for MSK Replicator allowing current clients to use their existing authentication mechanisms alongside MSK Replicator.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;To follow along with this walkthrough, you need the following resources in place:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;A source Kafka cluster using &lt;a href="https://kafka.apache.org/community/downloads/#281" target="_blank" rel="noopener noreferrer"&gt;Kafka version 2.8.1&lt;/a&gt; or above&lt;/li&gt; 
 &lt;li&gt;Network connectivity between your external Kafka cluster and AWS (for example, using &lt;a href="https://aws.amazon.com/directconnect/" target="_blank" rel="noopener noreferrer"&gt;AWS Direct Connect&lt;/a&gt;, &lt;a href="https://aws.amazon.com/vpn/" target="_blank" rel="noopener noreferrer"&gt;Site-to-Site VPN&lt;/a&gt;, or &lt;a href="https://aws.amazon.com/vpc/" target="_blank" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud&lt;/a&gt; (VPC) &lt;a href="https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html"&gt;peering&lt;/a&gt; or &lt;a href="https://aws.amazon.com/transit-gateway/"&gt;AWS Transit Gateway&lt;/a&gt; for connections between AWS VPCs) so that MSK Replicator can reach your source brokers&lt;/li&gt; 
 &lt;li&gt;SASL/SCRAM authentication configured on your external cluster (SHA-256 or SHA-512), which MSK Replicator uses to authenticate with external clusters&lt;/li&gt; 
 &lt;li&gt;An admin user configured on your external cluster with permissions to describe the external cluster and create and modify users/ACLs&lt;/li&gt; 
 &lt;li&gt;An Amazon MSK Express cluster with &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/iam-access-control.html" target="_blank" rel="noopener noreferrer"&gt;IAM authentication enabled&lt;/a&gt; to serve as your target&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/secrets-manager/" target="_blank" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt; configured to store your SASL/SCRAM credentials for the external cluster so that MSK Replicator can securely retrieve them at runtime&lt;/li&gt; 
 &lt;li&gt;An &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; log group for MSK Replicator logs&lt;/li&gt; 
 &lt;li&gt;Appropriate &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-replicator-create-iam-perms.html" target="_blank" rel="noopener noreferrer"&gt;IAM permissions for creating and managing MSK Replicator&lt;/a&gt; resources&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Setting up replication&lt;/h2&gt; 
&lt;h3&gt;Step 1: Configure network connectivity&lt;/h3&gt; 
&lt;p&gt;You can set up network connectivity between your external Kafka cluster and your AWS VPC using methods such as AWS Direct Connect for dedicated network connections, AWS Site-to-Site VPN for encrypted connections over the internet, and AWS VPC peering or AWS Transit Gateway for connections between AWS VPCs. Verify that IP routing and DNS resolution are properly configured between your external cluster and AWS.&lt;/p&gt; 
&lt;p&gt;To verify IP routing and DNS resolution, connect to your external Kafka cluster from inside of your VPC by using the Kafka CLI to list topics on the external cluster. If you can list topics from your VPC using the Kafka CLI, this means DNS resolution and IP routing are working successfully. If it fails, work with your network admins to troubleshoot network connectivity issues.&lt;/p&gt; 
&lt;h3&gt;Step 2: Configure external cluster&lt;/h3&gt; 
&lt;p&gt;In this step, you will set up authentication on your external Kafka cluster and store the credentials in AWS Secrets Manager so that MSK Replicator can connect securely.&lt;/p&gt; 
&lt;h4&gt;Configure authentication&lt;/h4&gt; 
&lt;p&gt;Using the external cluster admin user, configure SASL/SCRAM authentication for MSK Replicator using SHA-256 or 512 on your external Kafka cluster. Create a SASL/SCRAM user for MSK Replicator and give the user the following ACL permissions:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Topic operations –&lt;/strong&gt; Alter, AlterConfigs, Create, Describe, DescribeConfigs, Read, Write&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Group operations –&lt;/strong&gt; Read, Describe&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cluster operations –&lt;/strong&gt; Create, ClusterAction, Describe, DescribeConfigs&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h4&gt;Configure SecretsManager&lt;/h4&gt; 
&lt;p&gt;AWS Secrets Manager stores your SASL/SCRAM credentials securely so that MSK Replicator can retrieve them at runtime. The secret must use JSON format and have the following keys:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;&lt;strong&gt;username&lt;/strong&gt;&lt;/code&gt; – The SCRAM username that you configured in the authentication step above&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;&lt;strong&gt;password&lt;/strong&gt;&lt;/code&gt; – The SCRAM password that you configured in the authentication step above&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;&lt;strong&gt;certificate&lt;/strong&gt;&lt;/code&gt; – The public root CA certificate (the top-level certificate authority that issued your cluster’s TLS certificate) and the intermediate CA chain (intermediate certificates between the root and your cluster’s certificate), used for SSL handshakes with the external cluster&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Optionally, you may create separate secrets for SCRAM credentials and the SSL certificate. This approach is useful when secrets for SCRAM credentials and certificates are provisioned in different stages, such as in Infrastructure as Code (IaC) pipelines.&lt;/p&gt; 
&lt;h4&gt;Retrieve the cluster ID&lt;/h4&gt; 
&lt;p&gt;As the admin user, use the &lt;a href="https://downloads.apache.org/kafka/" target="_blank" rel="noopener noreferrer"&gt;Kafka CLI tools&lt;/a&gt; to retrieve the cluster ID of your external cluster. Run the following command, replacing &lt;code&gt;your-broker-host:9096&lt;/code&gt; with the address of one of your external cluster’s bootstrap servers:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-code"&gt;bin/kafka-cluster.sh cluster-id --bootstrap-server your-broker-host:9096 --config admin.properties&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The command returns a cluster ID string such as &lt;code&gt;lkc-abc123&lt;/code&gt;. Take note of this value because you will need it when creating the replicator in Step 4.&lt;/p&gt; 
&lt;h3&gt;Step 3: Create your MSK Express target cluster&lt;/h3&gt; 
&lt;p&gt;With your external cluster configured, you can now set up the target. Create an Amazon MSK Express cluster with IAM authentication enabled. Make sure that the cluster is in subnets that have access to &lt;a href="https://aws.amazon.com/secrets-manager/" target="_blank" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt; endpoints. See &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/getting-started.html" target="_blank" rel="noopener noreferrer"&gt;Get started using Amazon MSK&lt;/a&gt; for more information on creating an MSK cluster.&lt;/p&gt; 
&lt;h3&gt;Step 4: Create the replicator&lt;/h3&gt; 
&lt;p&gt;Now that both clusters are ready, you can connect them by setting up the MSK Replicator with the appropriate IAM role and replication configuration.&lt;/p&gt; 
&lt;h4&gt;Set up an IAM role for MSK Replicator&lt;/h4&gt; 
&lt;p&gt;MSK Replicator needs an IAM role to interact with your MSK Express cluster and retrieve secrets. Set up a service execution IAM role with a trust policy allowing &lt;code&gt;kafka.amazonaws.com&lt;/code&gt; and attach the &lt;code&gt;AWSMSKReplicatorExecutionRole&lt;/code&gt; permissions policy. Take note of the role ARN for creating the replicator.&lt;/p&gt; 
&lt;p&gt;Create and attach a policy for accessing your Secrets Manager secrets and reading/writing data in your MSK cluster. See &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions_create-policies.html" target="_blank" rel="noopener noreferrer"&gt;Creating roles and attaching policies (console)&lt;/a&gt; for more information on creating IAM roles and policies.&lt;/p&gt; 
&lt;p&gt;The following is an example policy for reading and writing data to your MSK cluster and reading KMS-encrypted Secrets Manager secrets:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-json"&gt;{&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "Version": "2012-10-17",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "Statement": [&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Sid": "SecretsManagerAccess",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Effect": "Allow",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Action": [&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "secretsmanager:GetSecretValue",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "secretsmanager:DescribeSecret"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ],&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Resource": [&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "&amp;lt;SCRAM_SECRET_ARN&amp;gt;",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "&amp;lt;CERT_SECRET_ARN&amp;gt;"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ]&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Sid": "KMSDecrypt",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Effect": "Allow",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Action": "kms:Decrypt",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Resource": "&amp;lt;SECRETSMANAGER_KMS_KEY_ARN&amp;gt;"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Sid": "TargetClusterAccess",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Effect": "Allow",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Action": [&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:Connect",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:DescribeCluster",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:AlterCluster",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:DescribeClusterDynamicConfiguration",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:AlterClusterDynamicConfiguration",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:DescribeTopic",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:CreateTopic",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:AlterTopic",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:DescribeTopicDynamicConfiguration",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:AlterTopicDynamicConfiguration",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:WriteData",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:WriteDataIdempotently",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:ReadData",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:DescribeGroup",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "kafka-cluster:AlterGroup"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ],&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Resource": [&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "arn:aws:kafka:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:cluster/&amp;lt;MSK_CLUSTER_NAME&amp;gt;*/*",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "arn:aws:kafka:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:topic/&amp;lt;MSK_CLUSTER_NAME&amp;gt;/*",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "arn:aws:kafka:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:group/&amp;lt;MSK_CLUSTER_NAME&amp;gt;*/*"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ]&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; },&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Sid": "CloudWatchLogsAccess",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Effect": "Allow",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Action": [&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "logs:CreateLogStream",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "logs:PutLogEvents",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "logs:DescribeLogStreams"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ],&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Resource": "&amp;lt;MSK_REPLICATOR_LOG_GROUP_ARN&amp;gt;"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; ]&amp;nbsp;
}
&lt;/code&gt;&lt;/pre&gt; 
&lt;h4&gt;Create the replicator for external to MSK replication&lt;/h4&gt; 
&lt;p&gt;Use the AWS CLI, API, or Console to create your replicator. Here’s an example using the AWS CLI:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;aws kafka create-replicator \
&amp;nbsp; --replicator-name external-to-msk \
&amp;nbsp; --service-execution-role-arn "arn:aws:iam::123456789012:role/MSKReplicatorRole" \
&amp;nbsp; --kafka-clusters file://./kafka-clusters.json \
&amp;nbsp; --replication-info-list file://./replication-info.json \
&amp;nbsp; --log-delivery file://./log-delivery.json \
&amp;nbsp; --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;The &lt;code&gt;kafka-clusters.json&lt;/code&gt; file defines the source and target Kafka cluster connection information, &lt;code&gt;replication-info.json&lt;/code&gt; specifies which topics to replicate and how to handle consumer group offset synchronization, and &lt;code&gt;log-delivery.json&lt;/code&gt; specifies the CloudWatch logging configuration. The following tables describe the required parameters:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;CLI inputs:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CLI Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Example&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;replicator-name&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The name of the replicator&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;external-to-msk&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;service-execution-role-arn&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The ARN for the service execution IAM role you created&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;arn:aws:iam::123456789012:role/MSKReplicatorRole&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;kafka-clusters&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The Kafka cluster connection info&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;See below&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;replication-info-list&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The replication configuration&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;See below&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;log-delivery&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The logging configuration&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;See below&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Key &lt;code&gt;kafka-clusters.json&lt;/code&gt; inputs:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CLI Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Example&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ApacheKafkaClusterId&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The cluster ID retrieved in Step 2&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;lkc-abc123&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;RootCaCertificate&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The Secrets Manager ARN containing the public CA certificate and intermediate CA chain&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;arn:aws:secretsmanager:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:secret:my-cert&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;MskClusterArn&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The ARN for the MSK Express cluster&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;arn:aws:kafka:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:cluster/my-cluster/abc-123&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;SecretArn&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The Secrets Manager ARN containing the SASL/SCRAM username and password&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;arn:aws:secretsmanager:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:secret:my-creds&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;SecurityGroupIds&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The security group IDs for MSK Replicator&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;sg-0123456789abcdef0&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Key &lt;code&gt;replication-info.json&lt;/code&gt; inputs:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CLI Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Example&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;TargetCompressionType&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The compression type to use for replicating data&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;LZ4&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;TopicsToReplicate&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The list of topics to replicate (use [“.*”] for all topics)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;[“my-topic”]&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ConsumerGroupsToReplicate&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The list of consumer groups to replicate&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;[“my-group”]&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;StartingPosition&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The point in the Kafka topics to begin replication from (either EARLIEST or LATEST)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EARLIEST&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ConsumerGroupOffsetSyncMode&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Whether or not to use enhanced bidirectional consumer group offset synchronization&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ENHANCED&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;Note that &lt;code&gt;startingPosition&lt;/code&gt; is set to &lt;code&gt;EARLIEST&lt;/code&gt; in the configuration below, which means the replicator begins reading from the oldest available offset on each topic. This is the recommended setting for migrations to avoid data loss.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Key &lt;code&gt;log-delivery.json&lt;/code&gt; inputs:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CLI Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Example&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Enabled&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Allows you to enable CloudWatch logging&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;true&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;LogGroup&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;The CloudWatch logs log group name to log to&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;/msk/replicator/my-replicator&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;Additional log delivery methods for &lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon S3&lt;/a&gt; and &lt;a href="https://aws.amazon.com/firehose/" target="_blank" rel="noopener noreferrer"&gt;Amazon Data Firehose&lt;/a&gt; are supported. In this post, we use CloudWatch logging.&lt;/p&gt; 
&lt;p&gt;The configs should look like the following for external to MSK replication.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;code&gt;kafka-clusters.json:&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-json"&gt;[&amp;nbsp;
&amp;nbsp; {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "ApacheKafkaCluster": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "ApacheKafkaClusterId": "lkc-abc123",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "BootstrapBrokerString": "broker1.example.com:9096"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; },&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "ClientAuthentication": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "SaslScram": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Mechanism": "SHA512",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "SecretArn": "arn:aws:secretsmanager:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:secret:my-creds"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; },&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "EncryptionInTransit": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "EncryptionType": "TLS",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "RootCaCertificate": "arn:aws:secretsmanager:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:secret:my-cert"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;
&amp;nbsp; },&amp;nbsp;
&amp;nbsp; {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "AmazonMskCluster": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "MskClusterArn": "arn:aws:kafka:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:cluster/my-cluster/abc-123"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; },&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "VpcConfig": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "SecurityGroupIds": ["sg-0123456789abcdef0"],&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "SubnetIds": ["subnet-abc123", "subnet-abc124", "subnet-abc125"]&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;
&amp;nbsp; }&amp;nbsp;
]&amp;nbsp;&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;&lt;code&gt;replication-info.json:&amp;nbsp;&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-json"&gt;[&amp;nbsp;
&amp;nbsp; {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "SourceKafkaClusterId": "lkc-abc123",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "TargetKafkaClusterArn": "arn:aws:kafka:&amp;lt;REGION&amp;gt;:&amp;lt;ACCOUNT_ID&amp;gt;:cluster/my-cluster/abc-123",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "TargetCompressionType": "LZ4",&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "TopicReplication": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "TopicsToReplicate": ["my-topic"],&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "CopyTopicConfigurations": true,&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "CopyAccessControlListsForTopics": true,&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "DetectAndCopyNewTopics": true,&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "StartingPosition": {"Type": "EARLIEST"},&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "TopicNameConfiguration": {"Type": "IDENTICAL"}&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; },&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; "ConsumerGroupReplication": {&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "ConsumerGroupsToReplicate": ["my-group"],&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "SynchroniseConsumerGroupOffsets": true,&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "DetectAndCopyNewConsumerGroups": true,&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "ConsumerGroupOffsetSyncMode": "ENHANCED"&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;
&amp;nbsp; }&amp;nbsp;
]&amp;nbsp;&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;&lt;strong&gt;&lt;code&gt;log-delivery.json:&amp;nbsp;&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-json"&gt;{&amp;nbsp;
&amp;nbsp; "ReplicatorLogDelivery": {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "CloudWatchLogs": {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Enabled": true,&amp;nbsp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "LogGroup": "&amp;lt;LOG_GROUP_NAME&amp;gt;"
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }
&amp;nbsp; }&amp;nbsp;
}&lt;/code&gt;&lt;/pre&gt; 
&lt;h4&gt;Configure bidirectional replication from MSK to the external cluster&lt;/h4&gt; 
&lt;p&gt;To enable bidirectional replication, create a second replicator that replicates in the opposite direction. Use the same IAM role and network configuration from Step 4, but swap the source and target. Replace &lt;code&gt;SourceKafkaClusterId&lt;/code&gt; with &lt;code&gt;TargetKafkaClusterId&lt;/code&gt; and &lt;code&gt;TargetKafkaClusterArn&lt;/code&gt; with &lt;code&gt;SourceKafkaClusterArn&lt;/code&gt; in a new &lt;code&gt;msk-to-external-replication-info.json&lt;/code&gt; file:&lt;/p&gt; 
&lt;pre&gt;&lt;code class="lang-bash"&gt;aws kafka create-replicator \
  --replicator-name msk-to-external \
  --service-execution-role-arn "arn:aws:iam::123456789012:role/MSKReplicatorRole" \
  --kafka-clusters file:///./kafka-clusters.json \
  --replication-info-list file:///./msk-to-external-replication-info.json \
  --log-delivery file:///./log-delivery.json \
  --region us-east-1&lt;/code&gt;&lt;/pre&gt; 
&lt;h2&gt;Monitoring replication health&lt;/h2&gt; 
&lt;p&gt;Monitor your replication using Amazon CloudWatch metrics. Three key metrics to understand are &lt;code&gt;MessageLag&lt;/code&gt;, &lt;code&gt;SumOffsetLag&lt;/code&gt;, and &lt;code&gt;ReplicationLatency&lt;/code&gt;. &lt;code&gt;MessageLag&lt;/code&gt; measures how far behind the replicator is from the external cluster in terms of messages not yet replicated, while &lt;code&gt;SumOffsetLag&lt;/code&gt; measures how far behind a consumer group is from the latest message in a topic. &lt;code&gt;ReplicationLatency&lt;/code&gt; is the amount of latency between the source and target clusters in data replication. When the three reach a sustained low level, your clusters are fully synchronized for both data and consumer group offsets.&lt;/p&gt; 
&lt;p&gt;To troubleshoot MSK Replicator replication or errors, use the CloudWatch logs to get more details about the health of the replicator. MSK Replicator logs status and troubleshooting information which can be helpful in diagnosing issues like connectivity, authentication, and SSL errors.&lt;/p&gt; 
&lt;p&gt;Note that the replication is asynchronous, so there will be some lag during replication. The lag will reach zero once a client is shut down during migration to the target cluster. This takes about 30 seconds under normal operations, allowing a low downtime migration without data loss. If your lag is continually increasing or does not reach a sustained low level, this indicates that you have insufficient partitions for high-throughput replication. Refer to &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-replicator-troubleshooting.html"&gt;Troubleshoot MSK Replicator&lt;/a&gt; for more information on troubleshooting replication throughput and lag.&lt;/p&gt; 
&lt;p&gt;Key metrics include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;MessageLag –&lt;/strong&gt; Monitors the sync between the MSK Replicator and the source cluster. MessageLag indicates the lag between the messages produced to the source cluster and messages consumed by the replicator. It is not the lag between the source and target cluster.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ReplicationLatency –&lt;/strong&gt; Time taken for records to replicate from source to target cluster (ms)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ReplicatorThroughput –&lt;/strong&gt; Average number of bytes replicated per second&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ReplicatorFailure –&lt;/strong&gt; Number of failures the replicator is experiencing&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;KafkaClusterPingSuccessCount –&lt;/strong&gt; Connection health indicator (1 = healthy, 0 = unhealthy)&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ConsumerGroupCount –&lt;/strong&gt; Total consumer groups being synchronized&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ConsumerGroupOffsetSyncFailure –&lt;/strong&gt; Failures during offset synchronization&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AuthError –&lt;/strong&gt; Number of connections with failed authentication per second, by cluster&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ThrottleTime –&lt;/strong&gt; Average time in ms a request was throttled by brokers, by cluster&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;SumOffsetLag –&lt;/strong&gt; Aggregated offset lag across partitions for a consumer group on a topic (MSK cluster-level metric)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;For more details on these metrics, see the &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-replicator-monitor.html" target="_blank" rel="noopener noreferrer"&gt;MSK Replicator metrics documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Your applications are ready to migrate when the following conditions are met. For most workloads, you should expect these metrics to stabilize within a few hours of starting replication. High-throughput clusters may take longer depending on topic volume and partition count.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;ReplicatorFailure&lt;/strong&gt; = 0&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ConsumerGroupOffsetSyncFailure&lt;/strong&gt; = 0&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;KafkaClusterPingSuccessCount&lt;/strong&gt; = 1 for both source and target clusters&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;MessageLag&lt;/strong&gt; &amp;lt; 1,000 
  &lt;ul&gt; 
   &lt;li&gt;Your sustained lag may be lower or higher depending on your throughput per partition, message size, and other factors&lt;/li&gt; 
   &lt;li&gt;Sustained high message lag usually indicates insufficient partitions for high-throughput replication&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ReplicationLatency&lt;/strong&gt; &amp;lt; 90 seconds 
  &lt;ul&gt; 
   &lt;li&gt;Your sustained latency may be lower or higher depending on your throughput per partition, message size, and other factors&lt;/li&gt; 
   &lt;li&gt;Sustained high latency usually indicates insufficient partitions for high-throughput replication&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;SumOffsetLag&lt;/strong&gt; is at a sustained low level on both clusters 
  &lt;ul&gt; 
   &lt;li&gt;Offset values on the two clusters may not be numerically identical.&lt;/li&gt; 
   &lt;li&gt;MSK Replicator translates offsets between clusters so that consumers resume from the correct position, but the raw offset numbers can differ due to how offset translation works. What matters is that SumOffsetLag is at a sustained low level.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ConsumerGroupCount&lt;/strong&gt; (MSK) = Expected count (external cluster) 
  &lt;ul&gt; 
   &lt;li&gt;If ConsumerGroupCount is zero or does not match the expected count, then there is an issue in the Replicator configuration or a permissions issue preventing consumer group synchronization&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Migrating your applications&lt;/h2&gt; 
&lt;p&gt;With bidirectional consumer offset synchronization, you can migrate your producers and consumers regardless of order. Start by monitoring replication metrics until they reach the target values described in the previous section. Then migrate your applications (producers or consumers) to use the MSK Express cluster endpoints and verify that they are producing and consuming as expected. If you encounter issues, you can roll back by switching applications back to the external cluster. The consumer offset synchronization makes sure that your applications resume from their last committed position regardless of which cluster they connect to.&lt;/p&gt; 
&lt;p&gt;For a comprehensive, hands-on walkthrough of the end-to-end migration process, explore the &lt;a href="https://catalog.workshops.aws/msk-migration-lab" target="_blank" rel="noopener noreferrer"&gt;MSK Migration Workshop&lt;/a&gt;, which provides step-by-step guidance for migrating your Kafka workloads to Amazon MSK.&lt;/p&gt; 
&lt;h2&gt;Security considerations&lt;/h2&gt; 
&lt;p&gt;MSK Replicator uses SASL/SCRAM authentication with SSL encryption for secure data transfer between your external cluster and AWS. The solution supports both publicly trusted certificates and private or self-signed certificates. Credentials are stored securely in &lt;a href="https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html" target="_blank" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt;, and the target MSK Express cluster uses &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/iam-access-control.html" target="_blank" rel="noopener noreferrer"&gt;IAM authentication&lt;/a&gt; for access control.&lt;/p&gt; 
&lt;p&gt;When configuring security, keep the following in mind:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Make sure that the IAM role you create in Step 4 follows the principle of least privileges. Only attach &lt;code&gt;AWSMSKReplicatorExecutionRole&lt;/code&gt; and an IAM policy for Secrets Manager with least-privileges access to read secret values and avoid adding broader permissions.&lt;/li&gt; 
 &lt;li&gt;Verify that your Secrets Manager secret is encrypted with an AWS KMS key that the MSK Replicator service execution role has permission to decrypt.&lt;/li&gt; 
 &lt;li&gt;Confirm that the security groups assigned to MSK Replicator allow outbound traffic to your external cluster’s broker ports (typically 9096 for SASL/SCRAM with TLS) and to the MSK Express cluster.&lt;/li&gt; 
 &lt;li&gt;Rotate your SASL/SCRAM credentials periodically and update the corresponding Secrets Manager secret. MSK Replicator picks up the new credentials automatically on the next connection attempt.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Under the &lt;a href="https://aws.amazon.com/compliance/shared-responsibility-model/" target="_blank" rel="noopener noreferrer"&gt;AWS shared responsibility model&lt;/a&gt;, AWS is responsible for securing the underlying infrastructure that runs MSK Replicator, including the compute, storage, and networking resources. You are responsible for configuring authentication mechanisms (SASL/SCRAM), managing credentials in AWS Secrets Manager, configuring network security (security groups and VPC settings), implementing IAM policies following least privilege, and rotating credentials. For more information, see &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/security.html" target="_blank" rel="noopener noreferrer"&gt;Security in Amazon MSK&lt;/a&gt; in the Amazon MSK Developer Guide.&lt;/p&gt; 
&lt;h2&gt;Cleanup&lt;/h2&gt; 
&lt;p&gt;To avoid ongoing charges, delete the resources you created during this walkthrough. Start by deleting the replicators first, because they depend on the other resources:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;aws kafka delete-replicator --replicator-arn &amp;lt;replicator-arn&amp;gt;&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;After both replicators are deleted, you can remove the following resources if they were created solely for this walkthrough:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;The MSK Express cluster (deleting a cluster also removes its stored data, so verify that your applications have fully migrated before proceeding)&lt;/li&gt; 
 &lt;li&gt;The Secrets Manager secrets containing your SASL/SCRAM credentials and certificates&lt;/li&gt; 
 &lt;li&gt;The IAM role and policies created for MSK Replicator&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;You can verify that a replicator has been fully deleted by running &lt;code&gt;aws kafka list-replicators&lt;/code&gt; and confirming it no longer appears in the output.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Amazon MSK Replicator simplifies the process of migrating to Amazon MSK Express brokers and establishes hybrid Kafka architectures. The fully managed service alleviates the operational complexity of managing replication while bidirectional consumer offset synchronization enables flexible, low-risk application migration.&lt;/p&gt; 
&lt;h3&gt;Next Steps&lt;/h3&gt; 
&lt;p&gt;To get started using MSK Replicator to migrate applications to MSK Express brokers, use the &lt;a href="https://catalog.workshops.aws/msk-migration-lab" target="_blank" rel="noopener noreferrer"&gt;MSK Migration Workshop&lt;/a&gt; for a hands-on, end-to-end migration walkthrough. The &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-replicator.html" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK Replicator documentation&lt;/a&gt; includes detailed configuration details to help configure MSK Replicator for your use case. From there, use MSK Replicator to migrate your Apache Kafka workloads to MSK Express broker.&lt;/p&gt; 
&lt;p&gt;Once your migration is complete, consider exploring multi-region replication patterns for disaster recovery, or integrating your MSK Express cluster with AWS analytics services such as &lt;a href="https://aws.amazon.com/firehose/" target="_blank" rel="noopener noreferrer"&gt;Amazon Data Firehose&lt;/a&gt; and &lt;a href="https://aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena&lt;/a&gt;. If you need help planning your migration, reach out to your AWS account team, &lt;a href="https://aws.amazon.com/support/" target="_blank" rel="noopener noreferrer"&gt;AWS Support&lt;/a&gt; or &lt;a href="https://aws.amazon.com/professional-services/" target="_blank" rel="noopener noreferrer"&gt;AWS Professional Services&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone wp-image-90062 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/10/ankitams-100x133.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ankita Mishra&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/ankitamishra05" target="_blank" rel="noopener"&gt;Ankita&lt;/a&gt; is a Product Manager for Amazon Managed Streaming for Apache Kafka. She works closely with AWS customers to understand their needs for real-time analytics and high throughput, low latency streaming workloads. Working backwards from their needs, she helps drive the MSK roadmap and deliver new innovations that help AWS customers focus on building novel streaming applications.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-89475" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/bdb-5775-mmehrten-headshot.png" alt="" width="100" height="107"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Mazrim Mehrtens&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/mmehrtens/" target="_blank" rel="noopener"&gt;Mazrim&lt;/a&gt; is a Sr. Specialist Solutions Architect for messaging and streaming workloads. Mazrim works with customers to build and support systems that process and analyze terabytes of streaming data in real time, run enterprise Machine Learning pipelines, and create systems to share data across teams seamlessly with varying data toolsets and software stacks.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Building unified data pipelines with Apache Iceberg and Apache Flink</title>
		<link>https://aws.amazon.com/blogs/big-data/building-unified-data-pipelines-with-apache-iceberg-and-apache-flink/</link>
					
		
		<dc:creator><![CDATA[Nikhil Jha]]></dc:creator>
		<pubDate>Mon, 20 Apr 2026 16:59:46 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Managed Service for Apache Flink]]></category>
		<category><![CDATA[AWS Big Data]]></category>
		<category><![CDATA[AWS Glue]]></category>
		<category><![CDATA[Financial Services]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">95a314f2dba6484ebfd5ac609fa7a195b4550f34</guid>

					<description>In this post, you build a unified pipeline using Apache Iceberg and Amazon Managed Service for Apache Flink that replaces the dual-pipeline approach. This walkthrough is for intermediate AWS users who are comfortable with Amazon Simple Storage Service (Amazon S3) and AWS Glue Data Catalog but new to streaming from Apache Iceberg tables.</description>
										<content:encoded>&lt;p&gt;You can process real-time data from your data lake with &lt;a href="https://docs.aws.amazon.com/managed-flink/" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink&lt;/a&gt; without maintaining two separate pipelines. Yet many teams do exactly that, and the cost adds up fast. In this post, you build a unified pipeline using Apache Iceberg and Amazon Managed Service for Apache Flink that replaces the dual-pipeline approach. This walkthrough is for intermediate AWS users who are comfortable with &lt;a href="https://docs.aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html" target="_blank" rel="noopener noreferrer"&gt;AWS Glue Data Catalog&lt;/a&gt; but new to streaming from Apache Iceberg tables.&lt;/p&gt; 
&lt;h2&gt;The dual-pipeline problem&lt;/h2&gt; 
&lt;p&gt;&lt;img loading="lazy" class="wp-image-89997 size-full aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/BDB-5291-image-1-1.png" alt="Traditional dual-pipeline architecture with separate batch and streaming paths, each with its own ingestion, processing, storage, and serving layers, processing the same source data independently." width="962" height="495"&gt;&lt;/p&gt; 
&lt;p&gt;This dual-pipeline approach creates three problems:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Double the infrastructure costs.&lt;/strong&gt; You run and pay for two separate compute environments, two storage layers, and two sets of monitoring. For example, if you’re spending $10,000/month on separate streaming and batch infrastructure, a meaningful portion of that spend is pure duplication.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data synchronization issues.&lt;/strong&gt; Your batch and streaming consumers read from different copies of the data, processed at different times. When a transaction shows up in your real-time dashboard but not in your batch report (or vice versa), debugging the inconsistency takes hours.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Operational complexity.&lt;/strong&gt; Two pipelines mean two deployment processes, two failure modes to monitor, and two sets of schema evolution to manage. Your team spends time reconciling systems instead of building features.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Where this pattern fits&lt;/h2&gt; 
&lt;p&gt;Before diving into the implementation, consider whether streaming from your data lake is the right approach for your use case.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Streaming from Apache Iceberg tables works well when&lt;/strong&gt; you need data available within seconds to minutes and you query recent data frequently, multiple times per hour. Common scenarios include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Operational data stores&lt;/strong&gt; — Stream customer profile updates to serve downstream applications like recommendation engines. When a customer updates their preferences, those changes reach your operational data store within seconds.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Fraud detection&lt;/strong&gt; — Stream transactions for immediate analysis. Start with a 3-second monitor interval and adjust based on your detection accuracy needs.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Live dashboards&lt;/strong&gt; — Power real-time analytics directly from your lake. This is the strongest starting point if you’re evaluating the approach for the first time, because the feedback loop is immediate and straightforward to validate.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Event-driven architectures&lt;/strong&gt; — Trigger downstream processes based on data changes in your Apache Iceberg tables.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Batch processing remains more cost-effective when&lt;/strong&gt; you process data once per day or less, or you primarily query historical data. Batch queries on Apache Iceberg tables cost less because they don’t require a continuous Apache Flink runtime.&lt;/p&gt; 
&lt;h2&gt;How Apache Iceberg solves this&lt;/h2&gt; 
&lt;p&gt;Apache Iceberg’s snapshot-based architecture removes the need for a separate streaming pipeline. Think of snapshots like Git commits for your data. Each time you write data to your Iceberg table, Iceberg creates a new snapshot that points to the new data files while preserving references to existing files. Apache Flink reads only the changes between snapshots (the new files that arrived after the last checkpoint), rather than scanning the entire table. Atomicity, Consistency, Isolation, Durability (ACID) transactions prevent your concurrent reads and writes from producing partial or inconsistent results. For example, if your batch extract, transform, and load (ETL) job is writing 10,000 records while your Flink application is reading, ACID transactions mean that your streaming query sees either the complete batch of 10,000 records or none of them, not a partial set that could skew your analytics.&lt;/p&gt; 
&lt;p&gt;The result is a single pipeline that handles both real-time and batch access from the same data, through the same storage layer, with the same schema.&lt;/p&gt; 
&lt;h2&gt;Solution architecture&lt;/h2&gt; 
&lt;p&gt;Your architecture uses four AWS services and one open source table format working together. The following diagram shows how these components connect, replacing the dual-pipeline pattern shown earlier with a single unified flow.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89963 aligncenter" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/BDB-5291-image-2.png" alt="Unified pipeline architecture with data flowing from Amazon S3 through Apache Iceberg tables, with AWS Glue Data Catalog managing metadata, and Amazon Managed Service for Apache Flink consuming incremental snapshots for near real-time processing." width="1101" height="581"&gt;&lt;/p&gt; 
&lt;p&gt;Your source data lands in Amazon S3 as Apache Iceberg table files. AWS Glue Data Catalog tracks the metadata and schema. When new data arrives, Apache Iceberg creates a new snapshot that your application detects. Your Flink application monitors these snapshots and processes new records incrementally, reading only the files that arrived after the last checkpoint, not the entire table.&lt;/p&gt; 
&lt;p&gt;You use four main components:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt; — Foundational storage layer for your data lake&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data Catalog&lt;/strong&gt; — Metadata and schema management for Apache Iceberg tables&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Apache Iceberg&lt;/strong&gt; — Table format with snapshot-based streaming capabilities&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon Managed Service for Apache Flink&lt;/strong&gt; — Stream processing and incremental consumption&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Important notices&lt;/h2&gt; 
&lt;p&gt;Before implementing this solution, evaluate these risks for your environment:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Data security:&lt;/strong&gt; Streaming from data lakes exposes data to additional processing systems. Classify your data before implementation—customer profile updates and transaction data typically contain personally identifiable information (PII) and treat them as confidential. Apply encryption at rest and in transit for confidential data. Key risks include unauthorized data access through misconfigured Amazon S3 bucket policies or overly permissive IAM roles. Mitigations: use the resource-scoped IAM policy and TLS-enforcing bucket policy provided in the Security section.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Data integrity:&lt;/strong&gt; Misconfigured checkpoints or schema changes during streaming can lead to data inconsistency. Mitigations: enable exactly-once processing semantics and test schema evolution in a non-production environment first.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Compliance:&lt;/strong&gt; Verify that real-time data processing meets your regulatory requirements. For workloads subject to HIPAA, confirm that you use HIPAA Eligible Services and have a Business Associate Agreement (BAA) with AWS. For PCI-DSS or GDPR workloads, review the relevant compliance documentation on the AWS Compliance page. Implement data retention policies that comply with your regulatory framework.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Nearly continuous streaming incurs ongoing compute costs. Monitor usage to avoid unexpected charges. Cost estimates in this post are based on pricing as of March 2026 and might change. Verify current pricing on the relevant AWS service pricing pages.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Operational:&lt;/strong&gt; Pipeline failures might impact downstream systems. Implement monitoring and alerting before running in production.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before you begin, make sure that you have the following in place. This walkthrough assumes intermediate Python skills (comfortable with functions, error handling, and environment variables), basic Apache Flink concepts (streaming compared to batch processing), and basic &lt;a href="https://docs.aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (AWS IAM)&lt;/a&gt; knowledge (creating roles and attaching policies). Plan for approximately 90–120 minutes, including setup, implementation, and testing. First-time setup might take longer as you download dependencies and configure AWS resources. Expected AWS costs: approximately $5–10 if you complete the walkthrough within 2 hours and clean up resources immediately afterward. The primary cost driver is Amazon Managed Service for Apache Flink runtime ($0.11/hour per Kinesis Processing Unit (KPU)). You can minimize costs by stopping your application when not in use.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An AWS account with AWS IAM permissions for: &lt;code&gt;s3:GetObject&lt;/code&gt;, &lt;code&gt;s3:PutObject&lt;/code&gt;, &lt;code&gt;s3:ListBucket&lt;/code&gt; on your data bucket; &lt;code&gt;glue:GetDatabase&lt;/code&gt;, &lt;code&gt;glue:GetTable&lt;/code&gt; for catalog access; and &lt;code&gt;flink:CreateApplication&lt;/code&gt;, &lt;code&gt;flink:StartApplication&lt;/code&gt; for Amazon Managed Service for Apache Flink&lt;/li&gt; 
 &lt;li&gt;An existing Amazon S3 bucket for your data lake&lt;/li&gt; 
 &lt;li&gt;An AWS Glue Data Catalog database configured&lt;/li&gt; 
 &lt;li&gt;Apache Flink 1.19.1 installed locally&lt;/li&gt; 
 &lt;li&gt;Python 3.8 or later&lt;/li&gt; 
 &lt;li&gt;Java 11 or a more recent version&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt; configured with credentials (aws configure)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Required Java Archive (JAR) dependencies&lt;/h3&gt; 
&lt;p&gt;You need multiple JAR files because your Flink application coordinates between different systems—Amazon S3 for storage, AWS Glue for metadata, Hadoop for file operations, and Apache Iceberg for the table format. Each JAR handles a specific part of this integration. Missing even one causes ClassNotFoundException errors at runtime.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;iceberg-flink-runtime-1.19-1.6.1.jar — Core Apache Iceberg integration with Apache Flink&lt;/li&gt; 
 &lt;li&gt;iceberg-aws-bundle-1.6.1.jar — AWS-specific Apache Iceberg functionality for Amazon S3 and AWS Glue&lt;/li&gt; 
 &lt;li&gt;flink-s3-fs-hadoop-1.19.1.jar — Provides Apache Flink read and write access to Amazon S3&lt;/li&gt; 
 &lt;li&gt;flink-sql-connector-hive-3.1.3_2.12-1.19.1.jar — Hive metastore connector for catalog compatibility&lt;/li&gt; 
 &lt;li&gt;hadoop-common-3.4.0.jar — Core Hadoop libraries required by Apache Iceberg&lt;/li&gt; 
 &lt;li&gt;flink-shaded-hadoop-2-uber-2.8.3-10.0.jar — Repackaged Hadoop dependencies that avoid version conflicts with Apache Flink&lt;/li&gt; 
 &lt;li&gt;hadoop-hdfs-client-3.4.0.jar — Hadoop Distributed File System (HDFS) client libraries for file system operations&lt;/li&gt; 
 &lt;li&gt;flink-json-1.19.1.jar — JSON format support for Apache Flink&lt;/li&gt; 
 &lt;li&gt;hadoop-aws-3.4.0.jar — Hadoop integration with AWS services&lt;/li&gt; 
 &lt;li&gt;hadoop-client-3.4.0.jar — Hadoop client libraries&lt;/li&gt; 
 &lt;li&gt;aws-java-sdk-bundle-1.12.261.jar — AWS SDK for authentication and service access&lt;/li&gt; 
&lt;/ul&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;jars = [
    "flink-s3-fs-hadoop-1.19.1.jar",
    "flink-sql-connector-hive-3.1.3_2.12-1.19.1.jar",
    "hadoop-common-3.4.0.jar",
    "flink-shaded-hadoop-2-uber-2.8.3-10.0.jar",
    "iceberg-flink-runtime-1.19-1.6.1.jar",
    "iceberg-aws-bundle-1.6.1.jar",
    "hadoop-hdfs-client-3.4.0.jar",
    "flink-json-1.19.1.jar",
    "hadoop-aws-3.4.0.jar",
    "hadoop-client-3.4.0.jar",
    "aws-java-sdk-bundle-1.12.261.jar"
]&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Technical implementation&lt;/h2&gt; 
&lt;p&gt;The sample code in this post is available under the MIT-0 license.This section walks you through building the streaming pipeline step by step. You create a single Python file, iceberg_streaming.py, with three functions that run in sequence. Your main() function calls them in order: set up the Apache Flink environment, register the Data Catalog, then start the streaming query.&lt;/p&gt; 
&lt;h3&gt;Set up your Apache Flink environment&lt;/h3&gt; 
&lt;p&gt;To prepare your Apache Flink environment:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Download the required JAR files listed in the prerequisites section.&lt;/li&gt; 
 &lt;li&gt;Place the JAR files in a lib directory in your project folder.&lt;/li&gt; 
 &lt;li&gt;Configure your &lt;code&gt;HADOOP_CLASSPATH&lt;/code&gt; environment variable to point to the lib directory.&lt;/li&gt; 
 &lt;li&gt;Create your streaming execution environment by adding the following function to &lt;code&gt;iceberg_streaming.py&lt;/code&gt;:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def setup_environment():
    """Configure the Flink streaming runtime."""
    try:
        os.environ['HADOOP_CLASSPATH'] = os.path.join(os.getcwd(), 'lib', '*')
        env = StreamExecutionEnvironment.get_execution_environment()
        env.set_parallelism(1)
        settings = EnvironmentSettings.new_instance().in_streaming_mode().build()
        t_env = StreamTableEnvironment.create(env, settings)
        return t_env
    except Exception as e:
        print(f"Failed to initialize Flink environment: {e}")
        raise&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Verify your environment by running flink –version. If the command isn’t found, confirm that Apache Flink 1.19.1 is installed and that your PATH includes the Flink bin directory.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Configure AWS Glue Data Catalog&lt;/h3&gt; 
&lt;p&gt;To connect your Flink application to Data Catalog:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open your &lt;code&gt;iceberg_streaming.py&lt;/code&gt; file.&lt;/li&gt; 
 &lt;li&gt;Add the &lt;code&gt;create_iceberg_source()&lt;/code&gt; function shown in the following section.&lt;/li&gt; 
 &lt;li&gt;Replace the placeholder values with your actual AWS resources before running. These values are static configuration strings, not user input — do not construct them from external or untrusted sources at runtime.&lt;/li&gt; 
 &lt;li&gt;Save the file.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def create_iceberg_source(t_env):
    """Register the AWS Glue Data Catalog as an Iceberg catalog."""
    try:
        catalog_sql = """
        CREATE CATALOG glue_catalog WITH (
            'type'='iceberg',
            'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
            'warehouse'='s3://&amp;lt;example-data-lake-bucket&amp;gt;',
            'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
            'aws.region'='us-east-1',
            'hadoop-conf.fs.s3a.aws.credentials.provider'=
                'com.amazonaws.auth.DefaultAWSCredentialsProviderChain',
            'hadoop-conf.fs.s3a.endpoint'='s3.amazonaws.com',
            'property-version'='1'
        )
        """
        t_env.execute_sql(catalog_sql)
        t_env.use_catalog("glue_catalog")
        t_env.use_database("streaming_db")
    except Exception as e:
        print(f"Failed to configure Iceberg catalog: {e}")
        raise&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Set up streaming logic&lt;/h3&gt; 
&lt;p&gt;This function configures Apache Flink to monitor your Apache Iceberg table continuously and process new records as they arrive. Checkpointing runs every 10 seconds to track progress—if the job restarts, it resumes from the last checkpoint rather than reprocessing the entire table.Notice the monitor-interval parameter, it controls how frequently Apache Flink checks for new Apache Iceberg snapshots. A 3-second interval provides near real-time processing but generates approximately 1,200 Amazon S3 LIST API calls per hour (at $0.005 per 1,000 requests, roughly $0.04/month per table based on pricing as of March 2026). For less time-sensitive workloads, increase this to 30s to reduce API costs by 90%.Replace &lt;code&gt;customer_events&lt;/code&gt; with the name of your Apache Iceberg table in Data Catalog:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def process_record(row):
    """Validate and process each record from the stream."""
    try:
        if row is None:
            raise ValueError("Received null row")
        required_fields = ["event_type", "timestamp"]
        for field in required_fields:
            if field not in row:
                raise ValueError(f"Missing required field: {field}")
        # Validate field types and content
        if not isinstance(row.get("event_type"), str) or len(row["event_type"]) &amp;gt; 256:
            raise ValueError("event_type must be a string under 256 characters")
        if not isinstance(row.get("timestamp"), (str, int)):
            raise ValueError("timestamp must be a string or integer")
        # Replace with your business logic
        print(f"Processing record: {row}")
    except ValueError as e:
        print(f"Validation error for record {row}: {e}")
    except Exception as e:
        print(f"Error processing record {row}: {e}")
def stream_data(t_env):
    """Start the streaming query and process results."""
    try:
        configuration = t_env.get_config().get_configuration()
        configuration.set_string("table.dynamic-table-options.enabled", "true")
        configuration.set_string("execution.checkpointing.interval", "10000")
        query = """
        SELECT * FROM customer_events /*+ OPTIONS(
            'streaming'='true',
            'monitor-interval'='3s',
            'table.exec.iceberg.cell-based-snapshot'='true'
        ) */
        """
        table_result = t_env.execute_sql(query)
        with table_result.collect() as results:
            for row in results:
                process_record(row)
    except Exception as e:
        print(f"Streaming query failed: {e}")
        raise&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Putting it together&lt;/h3&gt; 
&lt;p&gt;Your &lt;code&gt;main()&lt;/code&gt; function calls the three steps in order:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;def main():
    try:
        t_env = setup_environment()
        create_iceberg_source(t_env)
        stream_data(t_env)
    except Exception as e:
        print(f"Pipeline failed: {e}")
        raise
if __name__ == "__main__":
    main()&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Run the pipeline locally:&lt;code&gt;python iceberg_streaming.py&lt;/code&gt;Package the application and submit it to Amazon Managed Service for Apache Flink using the console or the AWS Command Line Interface (AWS CLI).&lt;/p&gt; 
&lt;h2&gt;Running in production&lt;/h2&gt; 
&lt;p&gt;Moving from a local test to a production deployment requires tuning four areas: performance, monitoring, cost, and security. This section covers the key decisions for each.&lt;/p&gt; 
&lt;h3&gt;Performance tuning&lt;/h3&gt; 
&lt;p&gt;Determine your latency requirements before tuning. For fraud detection, you need subsecond processing. For daily reporting dashboards, you can tolerate minutes of delay.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Partition pruning&lt;/strong&gt; reduces the amount of data scanned per query. Proper partitioning can significantly reduce query times for time series data partitioned by date. To implement, create your Apache Iceberg table with partition columns (&lt;code&gt;PARTITIONED BY (date_column) in your CREATE TABLE statement&lt;/code&gt;), then include partition filters in your &lt;code&gt;WHERE clause: WHERE date_column &amp;gt;= CURRENT_DATE - INTERVAL '7' DAY&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Parallel processing&lt;/strong&gt; matches your data volume and throughput requirements. For most workloads under 10,000 records per second, a parallelism of 1–4 is sufficient. Scale up incrementally and monitor backpressure metrics (indicators that data arrives faster than your pipeline processes it, causing queuing) to find the right setting.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Checkpoint tuning&lt;/strong&gt; balances reliability and latency. Consider how much data you can afford to reprocess after a failure. If you process 1,000 records per second with 10-second checkpoints, a failure means reprocessing up to 10,000 records. When that’s acceptable, 10 seconds works well. For faster recovery or higher volumes, reduce to 5 seconds.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Resource allocation&lt;/strong&gt; — Right-size your Apache Flink cluster to avoid over-provisioning. Monitor CPU and memory utilization during your initial runs and adjust task manager resources accordingly.&lt;/p&gt; 
&lt;h3&gt;Monitoring&lt;/h3&gt; 
&lt;p&gt;Configure your production deployment with the following checkpoint settings. These work well for moderate data volumes (up to 10,000 records per second), providing exactly-once processing semantics. This means that the pipeline processes each record exactly once, even if your application restarts. Adjust the checkpoint interval based on your latency requirements. Add this to your setup_environment() function after creating the table environment.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;config_dict = {
    "execution.checkpointing.interval": "30000",
    "execution.checkpointing.mode": "EXACTLY_ONCE",
    "execution.checkpointing.timeout": "600000",
    "state.backend": "filesystem",
    "state.checkpoints.dir": "s3://&amp;lt;example-data-lake-bucket&amp;gt;/checkpoints"
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Use &lt;a href="https://docs.aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; to track checkpoint duration, records processed per second, and backpressure metrics. A 10-second checkpoint interval means writing state to Amazon S3 360 times per hour. For a 1 MB state size, that’s approximately 8.6 GB per day in checkpoint storage—at Amazon S3 Standard pricing of $0.023/GB, roughly $0.20/day or $6/month per application based on current pricing. If the checkpoint duration exceeds 50% of your interval, increase the interval or add parallelism.&lt;/p&gt; 
&lt;h3&gt;Cost management&lt;/h3&gt; 
&lt;p&gt;Use &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering.html" target="_blank" rel="noopener noreferrer"&gt;Amazon S3 Intelligent-Tiering&lt;/a&gt; for your Apache Iceberg data files, which typically have predictable access patterns after initial processing. Configure Apache Iceberg’s table expiration to automatically clean up early snapshots. This can reduce storage costs by an estimated 20–30%, though your results vary depending on write frequency and retention policies.&lt;/p&gt; 
&lt;p&gt;Right-size your Apache Flink resources based on actual throughput needs. Start with a minimal configuration and scale up based on observed backpressure and checkpoint duration metrics. Use &lt;a href="https://docs.aws.amazon.com/ec2/" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt; Spot Instances where workload interruptions are acceptable, for example, in development and testing environments.&lt;/p&gt; 
&lt;p&gt;Set data retention policies on both your Apache Iceberg tables and checkpoint storage to avoid storing data longer than necessary.&lt;/p&gt; 
&lt;h3&gt;Security&lt;/h3&gt; 
&lt;p&gt;Security is a &lt;a href="https://aws.amazon.com/compliance/shared-responsibility-model/" target="_blank" rel="noopener noreferrer"&gt;shared responsibility&lt;/a&gt; between you and AWS. AWS is responsible for the security of the cloud, including the hardware, software, networking, and facilities that run AWS services. You are responsible for security in the cloud, configuring access controls, encrypting data, and managing your application security. Apply these controls in priority order.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;AWS IAM roles&lt;/strong&gt; — Use AWS IAM roles with least-privilege access, scoped to specific resources. The following example policy restricts permissions to your data lake bucket and AWS Glue catalog:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::&amp;lt;example-data-lake-bucket&amp;gt;/*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::&amp;lt;example-data-lake-bucket&amp;gt;",
      "Condition": {
        "StringEquals": {
          "aws:SourceVpce": "&amp;lt;your-vpc-endpoint-id&amp;gt;"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": ["glue:GetDatabase", "glue:GetTable"],
      "Resource": [
        "arn:aws:glue:us-east-1:&amp;lt;account-id&amp;gt;:catalog",
        "arn:aws:glue:us-east-1:&amp;lt;account-id&amp;gt;:database/streaming_db",
        "arn:aws:glue:us-east-1:&amp;lt;account-id&amp;gt;:table/streaming_db/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["kms:Decrypt", "kms:GenerateDataKey"],
      "Resource": "arn:aws:kms:us-east-1:&amp;lt;account-id&amp;gt;:key/&amp;lt;your-kms-key-id&amp;gt;"
    }
  ]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Scoping permissions to specific Amazon S3 buckets, AWS Glue databases, and AWS Key Management Service (AWS KMS) keys restrict access to only the resources your pipeline requires. Review IAM policies quarterly using the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/what-is-access-analyzer.html" target="_blank" rel="noopener noreferrer"&gt;IAM Access Analyzer&lt;/a&gt; to identify and remove unused permissions.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Encryption&lt;/strong&gt; — Configure server-side encryption with &lt;a href="https://docs.aws.amazon.com/kms/" target="_blank" rel="noopener noreferrer"&gt;AWS Key Management Service (AWS KMS)&lt;/a&gt; customer managed keys (SSE-KMS) for your Amazon S3 buckets. Using customer managed keys requires additional review from your security team. Confirm your key management policies, rotation procedures, and access controls before implementation. Enable automatic key rotation annually. For encryption in transit, enforce TLS by adding a bucket policy that denies non-HTTPS access:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-css"&gt;{
  "Effect": "Deny",
  "Principal": "*",
  "Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
  "Resource": [
    "arn:aws:s3:::&amp;lt;example-data-lake-bucket&amp;gt;/*",
    "arn:aws:s3:::&amp;lt;example-data-lake-bucket&amp;gt;"
  ],
  "Condition": {
    "Bool": { "aws:SecureTransport": "false" }
  }
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Amazon S3 bucket hardening&lt;/strong&gt; — Enable Block Public Access on your buckets to prevent accidental public exposure:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;aws s3api put-public-access-block \
  --bucket &amp;lt;example-data-lake-bucket&amp;gt; \
  --public-access-block-configuration \
  BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Enable versioning on buckets that store critical data and checkpoints to protect against accidental deletion. For production environments with sensitive data, consider enabling MFA Delete on versioned buckets. Enable S3 server access logging to track requests for security auditing.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/vpc/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon Virtual Private Cloud (Amazon VPC)&lt;/strong&gt;&lt;/a&gt; –Use &lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html" target="_blank" rel="noopener noreferrer"&gt;Amazon VPC endpoints&lt;/a&gt; for private communication between your Apache Flink cluster and AWS services, removing public internet routing by keeping traffic within the AWS network.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Access logging&lt;/strong&gt; – Enable &lt;a href="https://docs.aws.amazon.com/cloudtrail/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudTrail&lt;/a&gt; data events to log Amazon S3 object-level API calls (GetObject, PutObject) and Data Catalog API calls. Store logs in a separate Amazon S3 bucket with restricted access and enable log file integrity validation. Run regular compliance checks using &lt;a href="https://docs.aws.amazon.com/config/" target="_blank" rel="noopener noreferrer"&gt;AWS Config&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;Operational practices&lt;/h3&gt; 
&lt;p&gt;Set up a continuous integration and continuous deployment (CI/CD) pipeline to automate deployment and testing. Use version control to track schema and code changes. With Apache Iceberg’s schema evolution support, you can add columns without rewriting existing data files. Establish rollback procedures using Apache Iceberg’s snapshot-based architecture, so you can roll back to a previous table state if a bad write corrupts your data.&lt;/p&gt; 
&lt;h2&gt;Troubleshooting&lt;/h2&gt; 
&lt;p&gt;If you run into issues during setup or execution, use the following table to diagnose common errors.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Error&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Cause&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ClassNotFoundException&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Missing JAR files&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Check the dependencies in your lib directory and confirm &lt;code&gt;HADOOP_CLASSPATH&lt;/code&gt; points to the correct path&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Table not found&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Database name mismatch&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Check that the database name in &lt;code&gt;t_env.use_database()&lt;/code&gt; matches the AWS Glue database where you registered your table&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Checkpoint failures&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon S3 permissions&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Check that your Amazon S3 bucket policy grants &lt;code&gt;s3:PutObject&lt;/code&gt; for the checkpoint location&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;AWS credential errors&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Missing AWS IAM configuration&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Check that the AWS IAM role attached to your Apache Flink application has &lt;code&gt;glue:GetTable&lt;/code&gt;, &lt;code&gt;glue:GetDatabase&lt;/code&gt;, and &lt;code&gt;s3:GetObject&lt;/code&gt; permissions on the relevant resources&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Snapshot not found&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Table modified during query&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Increase monitor-interval or implement retry logic in your &lt;code&gt;process_record()&lt;/code&gt; function&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Schema mismatch&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Table schema changed between snapshots&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Review Apache Iceberg schema evolution settings and confirm backward compatibility&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;To avoid ongoing charges, delete the resources that you created during this walkthrough.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Stop your Amazon Managed Service for Apache Flink application. Open the &lt;a href="https://console.aws.amazon.com/flink/" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink console&lt;/a&gt;, choose your application name, choose &lt;strong&gt;Stop&lt;/strong&gt;, and confirm the action. Or use the AWS CLI:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;code&gt;aws kinesisanalyticsv2 stop-application --application-name your-app-name&lt;/code&gt;&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Delete the Amazon S3 buckets that you created for data storage and checkpoints. For instructions, see &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html" target="_blank" rel="noopener noreferrer"&gt;Deleting a bucket&lt;/a&gt; in the Amazon S3 User Guide.&lt;/li&gt; 
 &lt;li&gt;Remove the Apache Iceberg tables from your &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/console-tables.html" target="_blank" rel="noopener noreferrer"&gt;Data Catalog&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Delete the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage_delete.html" target="_blank" rel="noopener noreferrer"&gt;AWS IAM roles and policies&lt;/a&gt; created specifically for this walkthrough.&lt;/li&gt; 
 &lt;li&gt;If you created an Amazon VPC or Amazon VPC endpoints for testing, &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/delete-vpc.html" target="_blank" rel="noopener noreferrer"&gt;delete those resources&lt;/a&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Maintaining separate streaming and batch pipelines doubles your infrastructure costs, creates data synchronization issues, and adds operational complexity that slows your team down. In this post, you replaced that dual-pipeline architecture with a single system built on Apache Iceberg and Amazon Managed Service for Apache Flink. You configured a Flink environment with the required JAR dependencies, connected it to Data Catalog, and implemented streaming queries that read new records incrementally with exactly-once processing semantics. The same data, the same storage layer, the same schema—accessible to both your real-time and batch consumers.&lt;/p&gt; 
&lt;p&gt;To extend this solution, try these next steps based on your use case:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;If you’re processing high volumes (&amp;gt;10,000 records/sec):&lt;/strong&gt; Start with partition pruning. Add PARTITIONED BY (date_column) to your table definition, this typically reduces query times by 60–80%.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;If you need production monitoring:&lt;/strong&gt; Implement custom &lt;a href="https://docs.aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; metrics. Track checkpoint duration, records processed per second, and backpressure to catch issues before they impact your pipeline.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;If you have variable workloads:&lt;/strong&gt; Configure auto scaling for your Apache Flink cluster. See the &lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink Developer Guide&lt;/a&gt; for detailed guidance.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Share your implementation experience in the comments, your use case, data volumes, latency improvements, and cost reductions help other readers calibrate their expectations. To get started, try the &lt;a href="https://docs.aws.amazon.com/managed-flink/latest/java/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Service for Apache Flink Developer Guide&lt;/a&gt; and the &lt;a href="https://iceberg.apache.org/docs/latest/" target="_blank" rel="noopener noreferrer"&gt;Apache Iceberg documentation&lt;/a&gt; on the Apache Iceberg website.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/BDB-5291-image-3.jpeg" alt="Headshot of Nikhil" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Nikhil Jha&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Nikhil Jha&lt;/strong&gt;&amp;nbsp;is a Principal Delivery Consultant at AWS Professional Services, helping enterprises navigate complex modernization journeys. He builds data and AI solutions for AWS customers. Outside of work he likes swimming and hiking.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/BDB-5291-image-4-269x300.png" alt="Headshot of Vyas" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Vyas Garigipati&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Vyas Garigipati&lt;/strong&gt;&amp;nbsp;is a Delivery Consultant at AWS Professional Services, with experience building scalable, distributed systems. He specializes in designing and building AI-powered, high-availability, multi-region architectures and helps customers deploy resilient, production ready solutions on AWS.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/BDB-5291-image-5.jpeg" alt="Headshot of Vafa" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Vafa Ahmadiyeh&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Vafa Ahmadiyeh&lt;/strong&gt;&amp;nbsp;is a Principal Lead Technologist at AWS, specializing in cloud architecture for the global financial services sector. He partners with major financial institutions to modernize their infrastructure and accelerate their migration to AWS, with a focus on building secure, scalable distributed systems and platforms designed for highly regulated environments.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/BDB-5291-image-6.png" alt="Headshot of Kaushal" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Kaushal (KK) Agrawal&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;Kaushal (KK) Agrawal &lt;/strong&gt;is a Principal Technology Delivery Leader for the Digital Native Segment of AWS Professional Services, working with top-tier customers to deliver innovation at the intersection of AI and Cloud.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Securely connecting on-premises data systems to Amazon Redshift with IAM Roles Anywhere</title>
		<link>https://aws.amazon.com/blogs/big-data/securely-connecting-on-premises-data-systems-to-amazon-redshift-with-iam-roles-anywhere/</link>
					
		
		<dc:creator><![CDATA[Zainab Syeda]]></dc:creator>
		<pubDate>Mon, 20 Apr 2026 14:59:23 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Redshift]]></category>
		<category><![CDATA[AWS Identity and Access Management (IAM)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">4bd8063cb4b628fa724d5726fbf0f5655a8407a5</guid>

					<description>In this post, you will learn how to use AWS IAM Roles Anywhere with Amazon Redshift for secure, private connections. This removes the need to expose traffic to the public internet or manage long-lived access keys.</description>
										<content:encoded>&lt;p&gt;Securely connecting on-premises data systems to &lt;a href="https://aws.amazon.com/redshift/" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift&lt;/a&gt; requires removing static credentials while preserving seamless access for your data teams. This solution extends connectivity from your on-premises data centers to Amazon Redshift by using short-lived, auditable credentials. All traffic remains within trusted, private channels.&lt;/p&gt; 
&lt;p&gt;Developers and data engineers need a process to run ingestion pipelines, Extract, Transform, Load (ETL) jobs, and analytics queries without managing static credentials or complex authentication flows. You can use &lt;a href="https://aws.amazon.com/iam/roles-anywhere/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM) Roles Anywhere&lt;/a&gt; to obtain temporary security credentials in IAM. This service extends the short-term credential model of AWS beyond the cloud and allows on-premises workloads to authenticate with IAM using X.509 certificates from an existing certificate authority. This approach removes static IAM access keys and applies least-privilege access through IAM policies. Every request is recorded in &lt;a href="https://aws.amazon.com/cloudtrail/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudTrail&lt;/a&gt;. Paired with private Domain Name System (DNS) and Amazon Virtual Private Cloud (Amazon VPC) endpoints for Amazon Redshift, it keeps authentication and data flows inside private networks without traversing the public internet.&lt;/p&gt; 
&lt;p&gt;In this post, you will learn how to use AWS IAM Roles Anywhere with Amazon Redshift for secure, private connections. This removes the need to expose traffic to the public internet or manage long-lived access keys.&lt;/p&gt; 
&lt;h2&gt;The challenge&lt;/h2&gt; 
&lt;p&gt;Organizations connecting on-premises data systems to Amazon Redshift typically choose from several established security patterns, each with tradeoffs in risk, complexity, and operational overhead. Static IAM access keys are straightforward to adopt but require ongoing rotation, secure distribution, and storage across systems. Their long-lived nature increases the impact of accidental exposure in code, configuration files, or logs. Shared database or service credentials can streamline setup but often reduce auditability, weaken least-privilege controls, and create accountability challenges across teams. VPN or private network connections improve network isolation, yet they still require strong application-layer authentication and add infrastructure management burdens. Custom secret-management or credential-brokering solutions can reduce reliance on long-lived credentials, but they introduce additional components that must be built, integrated, and maintained. As organizations scale, these patterns often force tradeoffs between strong security controls and the developer productivity needed to build and operate data pipelines efficiently.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;The solution integrates on-premises workloads with Amazon Redshift using IAM Roles Anywhere and the built-in IAM authentication of Amazon Redshift. The core idea is that on-premises workloads use X.509 certificates to obtain short-term IAM credentials, then exchange them for temporary Amazon Redshift database credentials. Both provisioned clusters and serverless workgroups are supported. The architecture consists of these main components:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon Redshift Service Endpoint&lt;/strong&gt; – Handles secure API calls such as &lt;a href="https://docs.aws.amazon.com/redshift/latest/APIReference/API_GetClusterCredentials.html" target="_blank" rel="noopener noreferrer"&gt;GetClusterCredentials&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/redshift-serverless/latest/APIReference/API_GetCredentials.html" target="_blank" rel="noopener noreferrer"&gt;GetCredentials&lt;/a&gt;, and &lt;a href="https://docs.aws.amazon.com/redshift/latest/APIReference/API_GetClusterCredentialsWithIAM.html" target="_blank" rel="noopener noreferrer"&gt;GetClusterCredentialsWithIAM&lt;/a&gt;. The on-premises workload uses these API endpoints to request temporary database credentials.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon Redshift Cluster Endpoint&lt;/strong&gt; – Provides the connection point for database operations on provisioned Amazon Redshift clusters. After obtaining temporary credentials, applications and tools like JDBC/ODBC drivers or psql connect to the cluster endpoint. They use this connection to execute SQL queries, load data, and perform analytics tasks.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Amazon Redshift Serverless Workgroup Endpoint &lt;/strong&gt;– Serves the same function as the cluster endpoint but for serverless deployments. After temporary credentials are retrieved through the&amp;nbsp;GetCredentials&amp;nbsp;API, applications connect to this endpoint using standard database drivers (JDBC/ODBC) or command line tools like psql to run queries and load data.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Certificate authority&lt;/strong&gt;&amp;nbsp;– For this post, we use&amp;nbsp;&lt;a href="https://aws.amazon.com/private-ca/" target="_blank" rel="noopener noreferrer"&gt;AWS Private Certificate Authority (AWS Private CA)&lt;/a&gt;&amp;nbsp;as the certificate authority (CA) source. Alternatively, you can integrate with an external CA. For more details, see&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/security/iam-roles-anywhere-with-an-external-certificate-authority/" target="_blank" rel="noopener noreferrer"&gt;IAM Roles Anywhere with an external certificate authority&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;X.509 Certificate&amp;nbsp;&lt;/strong&gt;– We use a sample&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/acm/latest/userguide/private-certificates.title.html" target="_blank" rel="noopener noreferrer"&gt;private certificate&lt;/a&gt;&amp;nbsp;stored in&amp;nbsp;&lt;a href="https://aws.amazon.com/certificate-manager/" target="_blank" rel="noopener noreferrer"&gt;AWS Certificate Manager (ACM)&lt;/a&gt;&amp;nbsp;and issued by AWS Private CA.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;IAM Roles Anywhere&lt;/strong&gt; – Issues short-term AWS credentials to on-premises processes based on X.509 certificates from an organization’s certificate authority. These temporary credentials allow the workload to assume an IAM role that grants access to Amazon Redshift APIs.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;To retrieve temporary credentials using IAM Role Anywhere, we use the&amp;nbsp;&lt;code&gt;credential_process&lt;/code&gt;&amp;nbsp;parameter in &lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface&lt;/a&gt; (AWS CLI) profile configurations to trigger an &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sourcing-external.html" target="_blank" rel="noopener noreferrer"&gt;external process&lt;/a&gt; that generates or retrieves credentials. This post uses &lt;a href="https://docs.aws.amazon.com/rolesanywhere/latest/userguide/authentication.html" target="_blank" rel="noopener noreferrer"&gt;X.509 certificates to authenticate&lt;/a&gt; and return temporary IAM credentials through IAM Roles Anywhere. The &lt;a href="https://docs.aws.amazon.com/rolesanywhere/latest/userguide/credential-helper.html" target="_blank" rel="noopener noreferrer"&gt;AWS IAM Roles Anywhere Credential Helper&lt;/a&gt; is executed to handle the &lt;a href="https://docs.aws.amazon.com/rolesanywhere/latest/userguide/authentication-sign-process.html" target="_blank" rel="noopener noreferrer"&gt;signing process&lt;/a&gt; for the &lt;a href="https://docs.aws.amazon.com/rolesanywhere/latest/userguide/authentication-create-session.html" target="_blank" rel="noopener noreferrer"&gt;CreateSession&lt;/a&gt; API, returning credentials in a JSON format that applications and tools can consume.&lt;/p&gt; 
&lt;p&gt;Amazon Redshift provides several APIs that work together to support temporary, IAM-based authentication for different deployment scenarios. When connecting to a&amp;nbsp;provisioned Amazon Redshift cluster, applications typically use the&amp;nbsp;&lt;code&gt;GetClusterCredentials&lt;/code&gt;&amp;nbsp;API, which returns short-term database credentials tied to an IAM role’s permissions. For organizations with fully IAM-managed identities,&amp;nbsp;&lt;code&gt;GetClusterCredentialsWithIAM&lt;/code&gt;&amp;nbsp;streamlines this process by automatically mapping the IAM identity to a database user, removing the need to specify usernames manually. In&amp;nbsp;serverless deployments, the&amp;nbsp;&lt;code&gt;GetCredentials&lt;/code&gt;&amp;nbsp;API performs the same function, issuing temporary credentials for Amazon Redshift Serverless workgroups based on IAM permissions. Collectively, these APIs keep static credentials from being stored or distributed while offering flexible integration paths for both provisioned and serverless Amazon Redshift architectures.&lt;/p&gt; 
&lt;h3&gt;Flow overview&lt;/h3&gt; 
&lt;p&gt;An on-premises ETL job begins by initiating a request and authenticates with AWS using IAM Roles Anywhere to assume an IAM role securely. After obtaining temporary security credentials, the workload calls the Amazon Redshift service endpoint to execute the &lt;code&gt;GetClusterCredentials&lt;/code&gt; API, which returns short-term database credentials. These credentials allow the workload to connect to the Amazon Redshift cluster endpoint through a VPC endpoint. This enables running SQL queries or loading data into the cluster as part of the ETL process.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90157" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5469-1-1.png" alt="" width="1010" height="580"&gt;&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;&lt;strong&gt;You must have the following prerequisites to follow along with this post.&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;AWS account requirements&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An AWS account with permissions to deploy&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt;&amp;nbsp;templates.&lt;/li&gt; 
 &lt;li&gt;Access to&amp;nbsp;&lt;a href="https://aws.amazon.com/cloudshell/" target="_blank" rel="noopener noreferrer"&gt;AWS CloudShell&lt;/a&gt;&amp;nbsp;for exporting a sample private certificate that we create using AWS CloudFormation in a later step.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Remote environment&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" target="_blank" rel="noopener noreferrer"&gt;AWS CLI version 2&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/rolesanywhere/latest/userguide/credential-helper.html" target="_blank" rel="noopener noreferrer"&gt;IAM Roles Anywhere Credential Helper tool&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/hybrid-connectivity.html" target="_blank" rel="noopener noreferrer"&gt;Network Connectivity&lt;/a&gt; requirements&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Establish secure connectivity between your on-premises environment and AWS using &lt;a href="https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html" target="_blank" rel="noopener noreferrer"&gt;AWS Site-to-Site VPN&lt;/a&gt;, &lt;a href="https://aws.amazon.com/directconnect/" target="_blank" rel="noopener noreferrer"&gt;AWS Direct Connect&lt;/a&gt;, or &lt;a href="https://aws.amazon.com/vpn/client-vpn/" target="_blank" rel="noopener noreferrer"&gt;AWS Client VPN&lt;/a&gt;.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Deploy AWS resources with AWS CloudFormation&lt;/h2&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the&amp;nbsp;&lt;a href="https://console.aws.amazon.com/cloudformation" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose&amp;nbsp;&lt;strong&gt;Create Stack.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Download the&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-redshift-iamra-template/blob/main/RedshiftIAMRolesAnywhere.yaml" target="_blank" rel="noopener noreferrer"&gt;redshift-iamra-template&lt;/a&gt; template.&lt;/li&gt; 
 &lt;li&gt;For&amp;nbsp;&lt;strong&gt;Specify template,&amp;nbsp;&lt;/strong&gt;choose&amp;nbsp;&lt;strong&gt;Upload a template file&amp;nbsp;&lt;/strong&gt;and upload &lt;a href="https://github.com/aws-samples/sample-redshift-iamra-template/blob/main/RedshiftIAMRolesAnywhere.yaml" target="_blank" rel="noopener noreferrer"&gt;redshift-iamra-template&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose&amp;nbsp;&lt;strong&gt;Next&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a unique name for&amp;nbsp;&lt;strong&gt;Stack name&lt;/strong&gt;. The default value is&amp;nbsp;&lt;code&gt;redshift-test&lt;/code&gt;.&lt;/li&gt; 
 &lt;li&gt;Configure the stack parameters. The following table provides default values.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;table class="styled-table" style="height: 1072px" border="1px" width="757" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Parameter name&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Default value&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;VPCCIDR&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10.0.0.0/16&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CIDR block for the VPC&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;PrivateSubnet1CIDR&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10.0.1.0/24&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CIDR block for the first private subnet&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;PrivateSubnet2CIDR&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10.0.2.0/24&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CIDR block for the second private subnet&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;CACommonName&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;redshift-ca.example.com&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Common Name for the Certificate&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;CAOrganization&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Example Corp&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Organization for the Certificate Authority&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;CACountry&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;US&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Country for the Certificate Authority&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;CAValidityInDays&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1826&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Validity period in days for the CA Certificate (5 years)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;RedshiftClusterIdentifier&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;my-redshift-cluster&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Identifier for the Amazon Redshift cluster&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;RedshiftDatabaseName&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;dev&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Name of the initial database in the Amazon Redshift cluster&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;RedshiftMasterUsername&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;admin&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Main username for the Amazon Redshift cluster&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;RedshiftNodeType&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;ra3.xlplus&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Node type for the Amazon Redshift cluster&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;ServerlessNamespace&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;my-serverless-namespace&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Namespace identifier for Amazon Redshift Serverless&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;ServerlessWorkgroup&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;my-serverless-workgroup&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Workgroup identifier for Amazon Redshift Serverless&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;ol start="8"&gt; 
 &lt;li&gt;Select the &lt;strong&gt;acknowledgement&lt;/strong&gt; checkbox and choose&amp;nbsp;&lt;strong&gt;Create Stack&lt;/strong&gt;. Stack deployment takes about 10 minutes to complete.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;ol start="9"&gt; 
 &lt;li&gt;When stack creation is complete, navigate to the&amp;nbsp;&lt;strong&gt;Outputs&lt;/strong&gt;&amp;nbsp;tab on the AWS CloudFormation console and note down the values for the resources that the stack created.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;The following table shows a summarized view of the output values.&lt;/p&gt; 
&lt;table class="styled-table" style="height: 988px" border="1px" width="984" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Example value&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;CertificateAuthorityArn&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Resource Name (ARN)&lt;/a&gt; of the Private Certificate Authority&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;arn:aws:acm-pca:aa-example-1:111122223333:certificate-authority/a1b2c3d4-5678-90ab-cdef-EXAMPLE22222&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;ClientCertificateArn&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ARN of the sample client certificate&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;arn:aws:acm:aa-example-1:111122223333:certificate/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;ProfileArn&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ARN of the IAM Roles Anywhere profile&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;arn:aws:rolesanywhere:aa-example-1:111122223333:profile/a1b2c3d4-5678-90ab-cdef-EXAMPLE44444&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;RedshiftAccessRoleArn&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ARN of the Amazon Redshift Access role&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;arn:aws:iam::1222345677:role/Redshift-test-RedshiftAccessRole&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;TrustAnchorArn&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;ARN of the IAM Roles Anywhere profile. You will use this value for configuring&amp;nbsp;&lt;code&gt;credential_process&lt;/code&gt;&amp;nbsp;for IAM Roles Anywhere in a later step.&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;arn:aws:rolesanywhere:aa-example-1:111122223333:trust-anchor/a1b2c3d4-5678-90ab-cdef-EXAMPLE33333&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;RedshiftClusterEndpoint&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Private endpoint of the Amazon Redshift Cluster&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;my-redshift-cluster-123456789012.aa-example-1.redshift.amazonaws.com&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;RedshiftClusterPort&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Port of the Amazon Redshift Cluster&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;5439&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;ServerlessWorkgroupEndpoint&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Private endpoint of Amazon Redshift Serverless Workgroup&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;my-serverless-workgroup-123456789012.aa-example-1.redshift.serverless.amazonaws.com&lt;/code&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;Export a sample private certificate using CloudShell&lt;/h2&gt; 
&lt;p&gt;To export a sample private certificate using CloudShell, complete the following steps.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open CloudShell. For more details, see&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/cloudshell/latest/userguide/working-with-aws-cloudshell.html#navigating-the-interface" target="_blank" rel="noopener noreferrer"&gt;Navigating the AWS CloudShell interface&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Export the certificate ARN from the CloudFormation outputs. If you changed the stack name in the previous step, use that value for&amp;nbsp;&lt;code&gt;&amp;lt;stack-name&amp;gt;&lt;/code&gt;. Otherwise, use the default value&amp;nbsp;&lt;code&gt;redshift-public-iam-roles-anywhere&lt;/code&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-javascript"&gt;export CERT_ARN=$(aws cloudformation describe-stacks \
    --stack-name &amp;lt;stack-name&amp;gt; \
    --query 'Stacks[0].Outputs[?OutputKey==`ClientCertificateArn`].OutputValue' \
    --output text)
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Extract the certificate and private key files:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-shell"&gt;# Generate and save the passphrase
export PASSPHRASE=$(openssl rand -base64 32)
# Export certificate using environment variables
aws acm export-certificate \
    --certificate-arn $CERT_ARN \
    --passphrase $(echo -n "$PASSPHRASE" | base64) \
    &amp;gt; cert_export.json
# Extract components to separate files
jq -r '.Certificate' cert_export.json &amp;gt; certificate.pem
jq -r '.PrivateKey' cert_export.json &amp;gt; encrypted_private_key.pem
# Decrypt the private key
openssl rsa -in encrypted_private_key.pem -out private_key.pem -passin pass:"$PASSPHRASE"
# Clear environment variables
unset PASSPHRASE CERT_ARN&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cloudshell/latest/userguide/getting-started.html#download-file" target="_blank" rel="noopener noreferrer"&gt;Download&lt;/a&gt;&amp;nbsp;the extracted certificate and private key files from CloudShell:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;/home/cloudshell-user/certificate.pem
/home/cloudshell-user/private_key.pem&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Secure the private key on your local workstation.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;After downloading the files, restrict file permissions to prevent unauthorized access:&lt;/p&gt; 
&lt;pre&gt;&lt;code&gt;chmod 400 private_key.pem&lt;/code&gt; &lt;code&gt;chmod 400 certificate.pem&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;For production workloads, consider storing private keys in your operating system’s keychain (macOS Keychain, Windows Certificate Store), a hardware security module (HSM), or a secrets management tool rather than as files on disk.&lt;/p&gt; 
&lt;h2&gt;Configure an AWS CLI profile&lt;/h2&gt; 
&lt;p&gt;These are the steps to configure an AWS CLI profile on your system:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Store the downloaded certificate and private key to your environment. For an automated approach to generate and rotate certificates, see&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/security/set-up-aws-private-certificate-authority-to-issue-certificates-for-use-with-iam-roles-anywhere/" target="_blank" rel="noopener noreferrer"&gt;Set up AWS Private Certificate Authority to issue certificates for use with IAM Roles Anywhere&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Create a new profile named&amp;nbsp;&lt;code&gt;onprem-redshift&lt;/code&gt;. This invokes the credential process. Replace the placeholders with your specific values. Find the values for&amp;nbsp;&lt;code&gt;trusted-anchor-arn&lt;/code&gt;,&amp;nbsp;&lt;code&gt;profile-arn&lt;/code&gt;, and&amp;nbsp;&lt;code&gt;role-arn&lt;/code&gt;&amp;nbsp;in your CloudFormation stack outputs.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-html"&gt;aws configure set profile.onprem-redshift.credential_process "&amp;lt;/path/to/aws_signing_helper&amp;gt; credential-process \
      --certificate &amp;lt;/path/to/certificate.pem&amp;gt; \
      --private-key &amp;lt;/path/to/private_key.pem&amp;gt; \
      --trust-anchor-arn &amp;lt;trusted-anchor-arn&amp;gt; \
      --profile-arn &amp;lt;profile-arn&amp;gt; \
      --role-arn &amp;lt; role-arn&amp;gt;"&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Verify your configuration. Open the&amp;nbsp;&lt;code&gt;~/.aws/config&lt;/code&gt;&amp;nbsp;file and confirm that it contains a profile.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-html"&gt;[profile onprem-redshift]
credential_process = &amp;lt;/path/to/aws_signing_helper&amp;gt; credential-process       
--certificate &amp;lt;/path/to/certificate.pem&amp;gt;       
--private-key &amp;lt;/path/to/private_key.pem&amp;gt;       
--trust-anchor-arn &amp;lt;trusted-anchor-arn&amp;gt;       
--profile-arn &amp;lt;profile-arn&amp;gt;       
--role-arn &amp;lt;role-arn&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Test the solution&lt;/h2&gt; 
&lt;p&gt;Follow these steps to validate your setup for provisioned clusters to confirm end-to-end connectivity:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Verify network connectivity&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Before testing authentication, confirm that your on-premises environment can reach the Amazon Redshift cluster endpoint:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;telnet my-redshift-cluster.abc123.us-east-1.redshift.amazonaws.com 5439&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;If the connection succeeds, you should see a response indicating the port is open. If it fails, verify your VPN/Direct Connect configuration and security group rules.&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Create database user&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;If you haven’t already created a user, connect to your Amazon Redshift as the main user and create a dedicated user for testing:&lt;/p&gt; 
&lt;p&gt;&lt;code&gt;CREATE USER analytics_user PASSWORD '[PASSWORD]';&lt;/code&gt;&lt;/p&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Retrieve Amazon Redshift database credentials&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;With the configuration in place, request temporary database credentials from Amazon Redshift:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-powershell"&gt;aws redshift get-cluster-credentials \
  --db-user analytics_user \
  --cluster-identifier my-redshift-cluster \
  --region us-east-1 \
  --profile onprem-redshift&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This call returns a short-lived username and password that’s valid for connecting to the cluster. By default, the temporary credentials expire in 900 seconds. You can optionally specify a duration between 900–3600 seconds (15–60 minutes).&lt;/p&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Connect using JDBC/ODBC or psql&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Use the issued credentials in your connection string. For JDBC:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;jdbc:redshift://my-redshift-cluster.abc123.redshift.amazonaws.com:5439/dev?ssl=true&amp;amp;UID=analytics_user&amp;amp;PWD=&amp;lt;temporary_password&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;For psql:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;PGPASSWORD=&amp;lt;temporary_password&amp;gt; psql \
  -h my-redshift-cluster.abc123.redshift.amazonaws.com \
  -p 5439 \
  -U analytics_user \
  -d dev \
  --set=sslmode=verify-full&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Validate and monitor&lt;/h3&gt; 
&lt;ul&gt; 
 &lt;li&gt;Test authentication flows end-to-end using your ETL jobs.&lt;/li&gt; 
 &lt;li&gt;Review AWS CloudTrail logs to validate. It records role assumptions and Amazon Redshift API calls.&lt;/li&gt; 
 &lt;li&gt;Monitor session expiration to help workloads handle credential refresh seamlessly.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Testing end-to-end connectivity for Amazon Redshift Serverless&lt;/h3&gt; 
&lt;p&gt;The testing process for Amazon Redshift Serverless follows a similar pattern to provisioned clusters, with minor differences in the API calls and connection parameters. These steps validate connectivity to your serverless workgroup.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Verify network connectivity&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;code&gt;telnet my-serverless-workgroup.abc123.us-east-1.redshift.amazonaws.com 5439&lt;/code&gt;&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Retrieve Amazon Redshift Serverless database credentials&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-powershell"&gt;aws redshift-serverless get-credentials \
  --workgroup-name my-serverless-workgroup \
  --db-name dev \
  --region us-east-1 \
  --profile onprem-redshift&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Connect using JDBC/ODBC or psql&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;PGPASSWORD="&amp;lt;password_from_get_credentials&amp;gt;" psql \
  -h my-serverless-workgroup.abc12.us-east-1.redshift-serverless.amazonaws.com \
  -p 5439 \
  -U "IAMR:Redshift-IAMRA-RedshiftAccessRole" \
  -d dev \
  --set=sslmode=verify-full&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;To avoid future charges, remove the deployed resources:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html" target="_blank" rel="noopener noreferrer"&gt;Delete&lt;/a&gt; the CloudFormation stack.&lt;/li&gt; 
 &lt;li&gt;Remove the generated files from CloudShell:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;code&gt;rm cert_export.json encrypted_private_key.pem certificate.pem private_key.pem&lt;/code&gt;&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we showed how to implement IAM Roles Anywhere with Amazon Redshift so that enterprises can securely connect on-premises data systems to their cloud data warehouse without relying on static credentials or public internet access. This architecture provides short-lived, auditable credentials, integrates with existing certificate authorities, and helps ensure authentication and data flows remain private and trusted.&lt;/p&gt; 
&lt;p&gt;With this approach, data engineers and developers can run ingestion pipelines, ETL jobs, and analytics queries, while security teams maintain full control through IAM governance and CloudTrail auditing. You can remove manual credential rotation tasks, allow your data engineers to connect to Amazon Redshift without managing static keys, and achieve complete audit trails through CloudTrail integration for your hybrid analytics environments.&lt;/p&gt; 
&lt;p&gt;To get started, deploy the solution using the&amp;nbsp;&lt;a href="https://github.com/aws-samples/sample-redshift-iamra-template/blob/main/RedshiftIAMRolesAnywhere.yaml" target="_blank" rel="noopener noreferrer"&gt;CloudFormation template&lt;/a&gt;&amp;nbsp;and follow the steps in this post. To learn more about the services used, see the following resources:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/rolesanywhere/latest/userguide/introduction.html" target="_blank" rel="noopener noreferrer"&gt;IAM Roles Anywhere documentation&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/generating-user-credentials.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift IAM authentication&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/privateca/latest/userguide/PcaWelcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS Private Certificate Authority&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html" target="_blank" rel="noopener noreferrer"&gt;AWS CloudTrail for auditing&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-90158 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5469-2-1-100x101.png" alt="" width="100" height="101"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/k-bajwa/" target="_blank" rel="noopener noreferrer"&gt;Kanwar&lt;/a&gt; Bajwa is a Principal Enterprise Account Engineer at AWS who works with customers to optimize their use of AWS services and achieve their business objectives.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-90159 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5469-3-1-100x133.jpeg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/xiaoxuexu/" target="_blank" rel="noopener noreferrer"&gt;Xiaoxue&lt;/a&gt; Xu is a Solutions Architect for AWS based in Toronto. She primarily works with Financial Services customers to help secure their workload and design scalable solutions on the AWS Cloud.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-90156 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5469-4-100x133.png" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/syeda-zainab/" target="_blank" rel="noopener noreferrer"&gt;Zainab&lt;/a&gt; Syeda&amp;nbsp;is a Technical Account Manager at Amazon Web Services in Toronto. She works with customers in the Financial Services segment, helping them leverage cloud-native solutions at scale.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Enhancing Identity Intelligence with Babel Street Match and Amazon OpenSearch</title>
		<link>https://aws.amazon.com/blogs/big-data/enhancing-identity-intelligence-with-babel-street-match-and-amazon-opensearch/</link>
					
		
		<dc:creator><![CDATA[Kunal Sharma]]></dc:creator>
		<pubDate>Thu, 16 Apr 2026 18:21:52 +0000</pubDate>
				<category><![CDATA[Amazon OpenSearch Service]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Partner solutions]]></category>
		<guid isPermaLink="false">ba9290f300354ea5801e8820df3b5679843a7714</guid>

					<description>This post explores how combining Babel Street Match with OpenSearch Service provides a solution that helps your organization to handle large-scale, multilingual data.</description>
										<content:encoded>&lt;p&gt;&lt;em&gt;This post is co-authored with Gil Irizarry, Mae Wells-Kress and Craig Harmon from Babel Street.&amp;nbsp;&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Can your system tell “John Smith” apart from “John Smith”?&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Organizations requiring identity intelligence increasingly face challenges due to complexity of matching names and entities across vast, multilingual, and constantly evolving datasets. Whether helping border security, combating financial crimes, or maintaining regulatory compliance, the accuracy of identity and entity resolution directly determines whether threats are detected, investigations succeed, and regulatory requirements are met. Yet, linguistic diversity, transliterations, inconsistent data formats, and legacy system limitations continue to create friction, leading to false positives, missed matches, and costly manual reviews. As customers ingest and analyze petabytes of unstructured and structured data in &lt;a href="https://aws.amazon.com/opensearch-service/" target="_blank" rel="noopener noreferrer"&gt;Amazon OpenSearch Service&lt;/a&gt;, the need for intelligent, scalable, and multilingual matching becomes increasingly important. This is where the integration of &lt;a href="https://www.babelstreet.com/" target="_blank" rel="noopener noreferrer"&gt;Babel Street&lt;/a&gt;&lt;strong&gt; (an AWS Partner)&lt;/strong&gt; with OpenSearch Service provides a solution that helps organizations enhance precision, reduce noise, and accelerate insights from their high-volume data environments.&lt;/p&gt; 
&lt;p&gt;This post explores how combining Babel Street Match with OpenSearch Service provides a solution that helps your organization to handle large-scale, multilingual data.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;The growing complexity of identity and entity resolution&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;As organizations ingest and analyze massive volumes of multilingual and inconsistently formatted data, accurately matching names and entities becomes increasingly difficult. Variations in spelling, transliterations, semantic differences, cultural naming conventions, and incomplete or noisy records can contribute to mismatches. These challenges are compounded by legacy systems, fragmented data pipelines, operational inefficiencies, and evolving regulatory requirements—especially in sectors where precision is a requirement.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Evaluating and enhancing identity in high-volume enterprise environments&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;&lt;strong&gt;Amazon OpenSearch Service&lt;/strong&gt; is a fully managed, scalable search and analytics service that enables organizations to ingest, search, visualize, and analyze massive volumes of data in near real time. Built to handle structured and unstructured information from diverse sources, it powers use cases ranging from security analytics and log monitoring to enterprise search and advanced analytical applications.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Babel Street&lt;/strong&gt; delivers risk intelligence trusted by organizations across government, defense, and the private sector. The offering combines access to vast volumes of multilingual data with advanced analytics to uncover hidden identities, secure vendor networks, and identify emerging risks with precision, speed, and scale. From national security to regulatory compliance and enterprise resilience, Babel Street provides the strategic advantage needed to stay ahead of risk, safeguard operations, and protect missions.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://www.babelstreet.com/modules/match" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Babel Street Match&lt;/strong&gt;&lt;/a&gt;, an offering from Babel Street incorporates advanced identity risk intelligence capabilities, which enhance the precision and reliability of screening processes. This advanced solution uses sophisticated matching techniques to verify identities and identify variations in personal data—including aliases, alternate spellings, and differences in biographical details, helping organizations separate legitimate individuals from potential threats. The ability to screen names, addresses, dates, and other identifiers across different scripts and languages helps reduce false positives and negatives, helps accurately detect critical risks with transparent scoring to meet compliance and audit requirements. Further, Babel Street Match streamlines screening workflows, reduces the burden of manual reviews, and elevates the accuracy of threat detection.&lt;/p&gt; 
&lt;p&gt;The following diagram shows the details of OpenSearch Service and Babel Street Match Plugin integration.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89925" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/BDB-5571-1.png" alt="Architecture diagram showing Babel Street Match Plugin integration with AWS services, including AWS Marketplace, Amazon S3, and Amazon OpenSearch Service across two AWS accounts for secure entity matching." width="1648" height="968"&gt;&lt;/p&gt; 
&lt;p&gt;Babel Street Match integrates directly with the OpenSearch Service domain through a lightweight plugin that runs inside your own AWS account where you have full control of your data. The Match plugin sends encrypted match requests to Babel Street’s fully managed Match engine, where the core matching engine performs the entity-resolution logic. The results return to you in real time, enhancing your existing OpenSearch Service workflows with advanced name- and entity-matching capabilities. Meanwhile, Babel Street’s control plane handles licensing, monitoring, and &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-nzux3slo6um5m?sr=0-2&amp;amp;ref_=beagle&amp;amp;applicationId=AWSMPContessa" target="_blank" rel="noopener noreferrer"&gt;AWS Marketplace&lt;/a&gt; integration behind the scenes, provides continuous validation, automated updates, and a seamless operational experience.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Example use cases&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;The solution combines enterprise-scale search and analytics with AI-powered, multilingual identity intelligence. This section showcases example use cases where integration has enhanced organizations’ capabilities.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Border Screening: Help agencies identify high-risk travelers, cargo, and networks to strengthen point-of-entry security with faster, automated risk assessment.&lt;/li&gt; 
 &lt;li&gt;Financial Services Compliance: Help Financial institutions and the FinTechs that serve them by offering AI-driven solutions for name screening, adverse media monitoring, and know your customer (KYC)/know your vendor (KYV) due diligence.&lt;/li&gt; 
 &lt;li&gt;Identity and Organization Screening: Help businesses needing identity and organization screening by providing AI, analytics, and advanced matching technologies to assist in addressing complex screening challenges.&lt;/li&gt; 
 &lt;li&gt;Customer and Vendor Onboarding: Help governments and financial institutions by providing research, analytics, and advanced matching technologies needed to quickly and confidently onboard customers and vendors at scale.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Customer Success Stories&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Here’s how leading organizations are leveraging Babel Street Match and Amazon OpenSearch Service to solve real-world identity challenges:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;A European online brokerage faced AML (anti-money laundering) compliance challenges with its outdated name-matching system, which produced excessive false positives and couldn’t process longer multilingual names. After implementing Babel Street Match on OpenSearch Service, the firm achieved up to 70% better accuracy across 25 languages—significantly reducing manual work and speeding customer payments.&lt;br&gt; &lt;a href="https://www.babelstreet.com/resources/case-studies/babel-street-match-improves-fis-name-matching-accuracy-by-up-to-70-on-opensearch" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Babel Street Match Improves FI’s Name-Matching Accuracy by Up to 70% on OpenSearch&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;A major border agency struggled with an outdated screening system that flagged 15% of travelers as potential watchlist matches—overwhelming agents and creating long queues. After implementing Babel Street Match, false positives dropped dramatically (from 80,000 to just 100 in one test), hardware needs fell by 70%, and travelers with common names can now pass through faster. As one stakeholder put it: “Name matching is not our biggest problem anymore.”&lt;br&gt; &lt;a href="https://www.babelstreet.com/resources/case-studies/enabling-stronger-safer-borders-with-ai-powered-screening-by-babel-street-match" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Enabling Stronger, Safer Borders with AI-powered Screening by Babel Street Match&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;&lt;strong&gt;Getting Started with Babel Street Match for Amazon OpenSearch Service&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Amazon OpenSearch Service supports third-party plugins like &lt;a href="https://babelstreet.fluidtopics.net/r/Match-for-OpenSearch-Plugin-Guide/Installing-Match-for-OpenSearch-with-a-managed-AWS-instance" target="_blank" rel="noopener noreferrer"&gt;Babel Street Match for OpenSearch&lt;/a&gt;. This plugin is supported on OpenSearch version 2.15 or higher and licenses can be obtained through &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-nzux3slo6um5m?sr=0-2&amp;amp;ref_=beagle&amp;amp;applicationId=AWSMPContessa" target="_blank" rel="noopener noreferrer"&gt;AWS Marketplace&lt;/a&gt;.&lt;/p&gt; 
&lt;h3&gt;&lt;strong&gt;Installing Babel Street Match for Amazon OpenSearch Service&lt;/strong&gt;&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; Obtain the license file from Babel Street and upload it to an S3 bucket in the same AWS Region as your OpenSearch domain.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Installation Steps:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Create packages&lt;/strong&gt; – In the OpenSearch Service console, create a package for your license file and select the Babel Street Match plugin from the available options&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Associate packages&lt;/strong&gt; – Link both the license and plugin packages to your OpenSearch domain&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Verify&lt;/strong&gt; – Monitor the domain update and confirm the plugin is active&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;For details, refer to AWS documentation “&lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/plugins-third-party.html" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Installing third-party plugins in Amazon OpenSearch Service&lt;/strong&gt;&lt;/a&gt;” and &lt;a href="https://babelstreet.fluidtopics.net/r/Match-for-OpenSearch-Plugin-Guide/Installing-Match-for-OpenSearch-with-a-managed-AWS-instance" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Babel Street installation guide&lt;/strong&gt;&lt;/a&gt; which provides detailed guidance on pre-requisites, installation and using the plugin.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;Together, Babel Street Match and OpenSearch Service help organizations cut through false positives and catch true matches faster. The result? Greater precision, efficiency, and speed—whether protecting entities, maintaining compliance, or securing supply chains. That’s business-critical identity intelligence in action.&lt;/p&gt; 
&lt;p&gt;Explore how Babel Street Match on Amazon OpenSearch Service can elevate your organization’s identity intelligence capabilities and transform the screening operations through an interactive or customized demo on Babel Street’s &lt;a href="https://www.babelstreet.com/modules/match?utm_source=AWS&amp;amp;utm_medium=blog&amp;amp;utm_campaign=MatchForOpenSearch" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;website&lt;/strong&gt;&lt;/a&gt;.&lt;br&gt; &lt;em&gt;&lt;br&gt; Portions of this content describing Babel Street products and services are provided by Babel Street. AWS is not responsible for the accuracy of third-party product information.&lt;/em&gt;&lt;/p&gt; 
&lt;hr&gt; 
&lt;h2&gt;About the Authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-89932" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/Kunal-Sharma-Headshot.png" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;strong&gt;Kunal Sharma&lt;/strong&gt;&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/kunalksharma27/" target="_blank" rel="noopener"&gt;Kunal Sharma&lt;/a&gt; is a Sr. Solutions Architect at AWS. He works with AWS Worldwide Public Sector (WWPS) partners to build and scale cloud-native solutions. As an SA, he thrives on turning complex customer challenges into elegant, well-architected solutions — one whiteboard session at a time.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-89929" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/Gil-Irizarry-headshot.png" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;strong&gt;Gil Irizarry&lt;/strong&gt;&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/gilirizarry/" target="_blank" rel="noopener"&gt;Gil&lt;/a&gt; is the Chief Innovation Officer at Babel Street. He specializes in applying natural language processing and AI to identity resolution use cases. Gil’s work combines computational linguistics, machine learning and AI to produce state-of-the-art entity extraction and resolution applications. Gil’s focus on innovation led to his winning of Babel Street’s internal hackathon two years in a row.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-89930" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/Wells-Kress-Mae.png" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;strong&gt;Mae Wells-Kress&lt;/strong&gt;&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/maewellskress/" target="_blank" rel="noopener"&gt;Mae Wells-Kress&lt;/a&gt; is the Vice President of Strategic Marketing at Babel Street. She has extensive experience across strategic and creative marketing roles, she implements process-driven lead generation efforts and develops strategic campaigns, events, and messaging that connect with audiences and helps organizations advance their missions in high stakes environments.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-89931" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/07/Headshot-CH.png" alt="" width="100" height="100"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;&lt;strong&gt;Craig Harmon&lt;/strong&gt;&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/craigharmon/" target="_blank" rel="noopener"&gt;Craig&lt;/a&gt; is the Director of Partner Management at Babel Street. He leads the company’s strategic alliance with Amazon Web Services (AWS). A former Senior Partner Account Manager at AWS, Craig brings a hyperscaler‑native perspective to building and scaling partnerships that drive revenue growth and deepen technical collaboration. He is passionate about operational excellence and the design of high‑performance partner models that translate cloud innovation into measurable outcomes for customers and partners.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Getting started with Apache Iceberg write support in Amazon Redshift – Part 2</title>
		<link>https://aws.amazon.com/blogs/big-data/getting-started-with-apache-iceberg-write-support-in-amazon-redshift-part-2/</link>
					
		
		<dc:creator><![CDATA[Sanket Hase]]></dc:creator>
		<pubDate>Wed, 15 Apr 2026 21:29:36 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Redshift]]></category>
		<category><![CDATA[Amazon S3 Tables]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">995e7e565440f93a78b2688aaf775da80598be56</guid>

					<description>Amazon Redshift now supports DELETE, UPDATE, and MERGE operations for Apache Iceberg tables stored in Amazon S3 and Amazon S3 table buckets. With these operations, you can modify data at the row level, implement upsert patterns, and manage the data lifecycle while maintaining transactional consistency using familiar SQL syntax. You can run complex transformations in Amazon Redshift and write results to Apache Iceberg tables that other analytics engines like Amazon EMR or Amazon Athena can immediately query. In this post, you work with datasets to demonstrate these capabilities in a data synchronization scenario.</description>
										<content:encoded>&lt;p&gt;In&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/big-data/getting-started-with-apache-iceberg-write-support-in-amazon-redshift/" target="_blank" rel="noopener noreferrer"&gt;Getting started with Apache Iceberg write support in Amazon Redshift – part 1&lt;/a&gt;, you learned how to create &lt;a href="https://iceberg.apache.org/" target="_blank" rel="noopener noreferrer"&gt;Apache Iceberg&lt;/a&gt; tables and write data directly from &lt;a href="https://aws.amazon.com/redshift/" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift&lt;/a&gt; to your data lake. You set up external schemas, created tables in both &lt;a href="https://aws.amazon.com/s3/" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&lt;/a&gt; and &lt;a href="https://aws.amazon.com/s3/features/tables/" target="_blank" rel="noopener noreferrer"&gt;S3 Tables&lt;/a&gt;, and performed INSERT operations while maintaining ACID (Atomicity, Consistency, Isolation, Durability) compliance.&lt;/p&gt; 
&lt;p&gt;Amazon Redshift now supports DELETE, UPDATE, and MERGE operations for Apache Iceberg tables stored in Amazon S3 and Amazon S3 table buckets. With these operations, you can modify data at the row level, implement upsert patterns, and manage the data lifecycle while maintaining transactional consistency using familiar SQL syntax. You can run complex transformations in Amazon Redshift and write results to Apache Iceberg tables that other analytics engines like &lt;a href="https://aws.amazon.com/emr/" target="_blank" rel="noopener noreferrer"&gt;Amazon EMR&lt;/a&gt; or &lt;a href="https://aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena&lt;/a&gt; can immediately query.&lt;/p&gt; 
&lt;p&gt;In this post, you work with &lt;code&gt;customer&lt;/code&gt; and &lt;code&gt;orders&lt;/code&gt; datasets that were created and used in the previously mentioned &lt;a href="https://aws.amazon.com/blogs/big-data/getting-started-with-apache-iceberg-write-support-in-amazon-redshift/" target="_blank" rel="noopener noreferrer"&gt;post&lt;/a&gt; to demonstrate these capabilities in a data synchronization scenario.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;This solution demonstrates DELETE, UPDATE, and MERGE operations for Apache Iceberg tables in Amazon Redshift using a common data synchronization pattern: maintaining customer records and orders data across staging and production tables. The workflow includes three key operations:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;DELETE&lt;/strong&gt;&amp;nbsp;– Remove customer records based on opt-out requests&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;UPDATE&lt;/strong&gt;&amp;nbsp;– Modify existing customer information&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;MERGE&lt;/strong&gt;&amp;nbsp;– Synchronize order data between staging and production tables using upsert patterns&lt;/li&gt; 
&lt;/ul&gt; 
&lt;div id="attachment_90263" style="width: 1002px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90263" loading="lazy" class="size-full wp-image-90263" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-1.png" alt="Figure : solution overview" width="992" height="509"&gt;
 &lt;p id="caption-attachment-90263" class="wp-caption-text"&gt;Figure 1: solution overview&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The solution uses a staging table (&lt;code&gt;orders_stg&lt;/code&gt;) stored in an S3 table bucket for incoming data and reference tables (&lt;code&gt;customer_opt_out&lt;/code&gt;) in Amazon Redshift for managing data lifecycle operations. With this architecture, you can process changes efficiently while maintaining ACID compliance across both storage types.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;For this walkthrough, you should have completed the setup steps from&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/big-data/getting-started-with-apache-iceberg-write-support-in-amazon-redshift" target="_blank" rel="noopener noreferrer"&gt;Getting started with Apache Iceberg write support in Amazon Redshift – part 1&lt;/a&gt;, including:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Create an Amazon Redshift data warehouse (&lt;a href="https://docs.aws.amazon.com/redshift/latest/gsg/new-user.html" target="_blank" rel="noopener noreferrer"&gt;provisioned&lt;/a&gt; or &lt;a href="https://docs.aws.amazon.com/redshift/latest/gsg/new-user-serverless.html" target="_blank" rel="noopener noreferrer"&gt;Serverless&lt;/a&gt;)&lt;/li&gt; 
 &lt;li&gt;Set up the required IAM role (&lt;code&gt;RedshifticebergRole&lt;/code&gt;) with appropriate permissions&lt;/li&gt; 
 &lt;li&gt;Create an Amazon S3 bucket and S3 Table bucket&lt;/li&gt; 
 &lt;li&gt;Configure AWS Glue Data Catalog database and setting up access&lt;/li&gt; 
 &lt;li&gt;Set up AWS Lake Formation permissions&lt;/li&gt; 
 &lt;li&gt;Create the&amp;nbsp;&lt;code&gt;customer&lt;/code&gt;&amp;nbsp;Apache Iceberg table in Amazon S3 standard buckets with sample customer data&lt;/li&gt; 
 &lt;li&gt;Create the&amp;nbsp;orders&amp;nbsp;Apache Iceberg table in Amazon S3 Table buckets with sample order data&lt;/li&gt; 
 &lt;li&gt;Amazon Redshift data warehouse on &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/cluster-versions.html#cluster-version-200" target="_blank" rel="noopener noreferrer"&gt;p200&lt;/a&gt; version or higher&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Data preparation&lt;/h2&gt; 
&lt;p&gt;In this section, you set up the sample data needed to demonstrate MERGE, UPDATE, and DELETE operations. To prepare your data, complete the following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Log in to Amazon Redshift using &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-connecting.html" target="_blank" rel="noopener noreferrer"&gt;Query Editor V2 with the Federated user&lt;/a&gt; option.&lt;/li&gt; 
 &lt;li&gt;Create the&amp;nbsp;&lt;code&gt;orders_stg&lt;/code&gt; and&amp;nbsp;&lt;code&gt;customer_opt_out&lt;/code&gt;&amp;nbsp;tables with sample data:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE TABLE "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
(
customer_id BIGINT,
order_id BIGINT,
Total_order_amt DECIMAL(10,2),
Total_order_tax_amt REAL,
tax_pct DOUBLE PRECISION,
order_date DATE,
order_created_at_tz TIMESTAMPTZ,
is_active_ind BOOLEAN
)
USING ICEBERG;
INSERT INTO "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
(order_date, order_id, customer_id, total_order_amt, total_order_tax_amt, tax_pct, order_created_at_tz, is_active_ind)
VALUES
('2024-11-11', 1016, 10, 167.45, 13.40, 0.08, '2024-11-11 06:55:00-06:00', true),
('2024-11-12', 1017, 15, 34.99, 2.80, 0.08, '2024-11-12 23:30:30-06:00', true),
('2024-11-09', 1014, 9, 500.60, 56.80, 0.09, '2024-11-09 16:20:55-06:00', true),
('2024-11-10', 1015, 5, 329.85, 33.51, 0.08, '2024-11-10 11:45:30-06:00', true);
select * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90264" style="width: 1137px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90264" loading="lazy" class="size-full wp-image-90264" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-2.png" alt="Figure 2: orders_stg result set" width="1127" height="127"&gt;
 &lt;p id="caption-attachment-90264" class="wp-caption-text"&gt;Figure 2: orders_stg result set&lt;/p&gt;
&lt;/div&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE TABLE dev.public.customer_opt_out
(
customer_id bigint,
customer_name varchar,
opt_out_ind char(1),
cust_rec_upd_ind char(1)
);
INSERT INTO dev.public.customer_opt_out VALUES
(9, 'Customer9 Martinez', 'Y', 'N'),
(12, 'Customer12 Thomas', 'Y', 'N'),
(13, 'Customer13 Albon', 'N', 'Y'),
(14, 'Customer14 Oscar', 'N', 'Y');
select * from dev.public.customer_opt_out;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90265" style="width: 579px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90265" loading="lazy" class="size-full wp-image-90265" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-3.png" alt="Figure 3: customer_opt_out result set" width="569" height="128"&gt;
 &lt;p id="caption-attachment-90265" class="wp-caption-text"&gt;Figure 3: customer_opt_out result set&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;You can now use the&amp;nbsp;&lt;code&gt;orders_stg&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;customer_opt_out&lt;/code&gt;&amp;nbsp;tables to demonstrate data manipulation operations on the&amp;nbsp;&lt;code&gt;orders&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;customer&lt;/code&gt;&amp;nbsp;tables created in the prerequisite section.&lt;/p&gt; 
&lt;h2&gt;MERGE&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/iceberg-writes-sql-syntax.html#iceberg-writes-merge" target="_blank" rel="noopener noreferrer"&gt;MERGE&lt;/a&gt; conditionally inserts, updates, or deletes rows in a target table based on the results of a join with a source table. You can use &lt;strong&gt;MERGE&lt;/strong&gt; to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To perform a MERGE operation:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Verify that the current data in the &lt;strong&gt;orders&lt;/strong&gt; table for order IDs 1014, 1015, 1016, and 1017.You loaded this sample data in &lt;a href="https://aws.amazon.com/blogs/big-data/getting-started-with-apache-iceberg-write-support-in-amazon-redshift/" target="_blank" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders
where order_id in (1014,1015,1016,1017);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90266" style="width: 1114px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90266" loading="lazy" class="size-full wp-image-90266" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-4.png" alt="Figure 4: orders data for existing orders for orders in orders_stg" width="1104" height="75"&gt;
 &lt;p id="caption-attachment-90266" class="wp-caption-text"&gt;Figure 4: orders data for existing orders for orders in orders_stg&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The &lt;strong&gt;orders&lt;/strong&gt; table contains existing rows for order IDs 1014 and 1015.&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Run the following &lt;strong&gt;MERGE&lt;/strong&gt; operation using &lt;strong&gt;order_id&lt;/strong&gt; as the key column to match rows between the &lt;strong&gt;orders&lt;/strong&gt; and &lt;strong&gt;orders_stg&lt;/strong&gt; tables:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;MERGE INTO "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders
USING "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
ON orders.order_id = orders_stg.order_id
WHEN MATCHED THEN UPDATE 
SET
customer_id         = orders_stg.customer_id,
total_order_amt     = orders_stg.total_order_amt,
total_order_tax_amt = orders_stg.total_order_tax_amt,
tax_pct             = orders_stg.tax_pct,
order_date          = orders_stg.order_date,
order_created_at_tz = orders_stg.order_created_at_tz,
is_active_ind       = orders_stg.is_active_ind
WHEN NOT MATCHED THEN INSERT
VALUES 
(orders_stg.customer_id,orders_stg.order_id,orders_stg.total_order_amt,orders_stg.total_order_tax_amt,orders_stg.tax_pct,orders_stg.order_date,orders_stg.order_created_at_tz,orders_stg.is_active_ind);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The operation updates existing rows (1014 and 1015) and inserts new rows for order IDs that don’t exist in the &lt;strong&gt;orders&lt;/strong&gt; table (1016 and 1017).&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Verify the updated data in the &lt;strong&gt;orders&lt;/strong&gt; table:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orderswhere order_id in (1014,1015,1016,1017);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90267" style="width: 1137px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90267" loading="lazy" class="size-full wp-image-90267" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-5.jpg" alt="Figure 5: merged data on orders from orders_stg" width="1127" height="127"&gt;
 &lt;p id="caption-attachment-90267" class="wp-caption-text"&gt;Figure 5: merged data on orders from orders_stg&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The &lt;strong&gt;MERGE&lt;/strong&gt; operation performs the following changes:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Updates existing rows&lt;/strong&gt; – Order IDs 1014 and 1015 have updated &lt;strong&gt;total_order_amt&lt;/strong&gt; and &lt;strong&gt;total_order_tax_amt&lt;/strong&gt; values from the &lt;strong&gt;orders_stg&lt;/strong&gt; table&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Inserts new rows&lt;/strong&gt; – Order IDs 1016 and 1017 are inserted because they don’t exist in the &lt;strong&gt;orders&lt;/strong&gt; table&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;This demonstrates the upsert pattern, where &lt;strong&gt;MERGE&lt;/strong&gt; conditionally updates or inserts rows based on the matching key column.&lt;/p&gt; 
&lt;h2&gt;UPDATE&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/iceberg-writes-sql-syntax.html#iceberg-writes-update" target="_blank" rel="noopener noreferrer"&gt;UPDATE&lt;/a&gt; modifies existing rows in a table based on specified conditions or values from another table.&lt;/p&gt; 
&lt;p&gt;Update the &lt;code&gt;customer&lt;/code&gt; Apache Iceberg table using data from the &lt;code&gt;customer_opt_out&lt;/code&gt; Amazon Redshift native table. The &lt;strong&gt;UPDATE&lt;/strong&gt; operation uses the &lt;code&gt;cust_rec_upd_ind&lt;/code&gt; column as a filter, updating only rows where the value is ‘Y’.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To perform an UPDATE operation:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Verify the current &lt;code&gt;customer_name&lt;/code&gt; values for customer IDs 13 and 14 in &lt;code&gt;customer_opt_out&lt;/code&gt; and &lt;code&gt;customer&lt;/code&gt; (loaded this sample data in &lt;a href="https://aws.amazon.com/blogs/big-data/getting-started-with-apache-iceberg-write-support-in-amazon-redshift/" target="_blank" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;) tables:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from dev.public.customer_opt_out
where cust_rec_upd_ind = 'Y';&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90268" style="width: 628px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90268" loading="lazy" class="size-full wp-image-90268" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-6.jpg" alt="Figure 6: verify existing customer data for customers from customer_opt_out" width="618" height="78"&gt;
 &lt;p id="caption-attachment-90268" class="wp-caption-text"&gt;Figure 6: verify existing customer data for customers from customer_opt_out&lt;/p&gt;
&lt;/div&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select customer_id,customer_name from dev.demo_iceberg.customer
where customer_id in(13,14);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90269" style="width: 465px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90269" loading="lazy" class="size-full wp-image-90269" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-7.jpg" alt="Figure 7: verify existing customer name for customers from customer_opt_out" width="455" height="100"&gt;
 &lt;p id="caption-attachment-90269" class="wp-caption-text"&gt;Figure 7: verify existing customer name for customers from customer_opt_out&lt;/p&gt;
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Run the following &lt;strong&gt;UPDATE&lt;/strong&gt; operation to modify customer names based on the &lt;code&gt;cust_rec_upd_ind&lt;/code&gt; from &lt;code&gt;customer_opt_out&lt;/code&gt;:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;UPDATE dev.demo_iceberg.customerSET customer_name = customer_opt_out.customer_name
FROM dev.public.customer_opt_out
WHERE customer_opt_out.cust_rec_upd_ind = 'Y'and customer.customer_id = customer_opt_out.customer_id;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Verify the changes for customer IDs 13 and 14:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select customer_id,customer_name from dev.demo_iceberg.customer where customer_id in(13,14) order by 1;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90270" style="width: 545px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90270" loading="lazy" class="size-full wp-image-90270" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-8.jpg" alt="Figure 8: updated customer names in customer table" width="535" height="101"&gt;
 &lt;p id="caption-attachment-90270" class="wp-caption-text"&gt;Figure 8: updated customer names in customer table&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The &lt;strong&gt;UPDATE&lt;/strong&gt; operation modifies the &lt;code&gt;customer_name&lt;/code&gt; values based on the join condition with the &lt;code&gt;customer_opt_out&lt;/code&gt; table. Customer IDs 13 and 14 now have updated names (&lt;code&gt;Customer13 Albon&lt;/code&gt; and &lt;code&gt;Customer14 Oscar&lt;/code&gt;).&lt;/p&gt; 
&lt;h2&gt;DELETE&lt;/h2&gt; 
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/iceberg-writes-sql-syntax.html#iceberg-writes-delete" target="_blank" rel="noopener noreferrer"&gt;DELETE&lt;/a&gt; removes rows from a table based on specified conditions. Without a &lt;strong&gt;WHERE&lt;/strong&gt; clause, &lt;strong&gt;DELETE&lt;/strong&gt; removes all the rows from table.&lt;/p&gt; 
&lt;p&gt;Delete rows from the &lt;code&gt;customer&lt;/code&gt; Apache Iceberg table using data from the &lt;code&gt;customer_opt_out&lt;/code&gt; Amazon Redshift native table. The &lt;strong&gt;DELETE&lt;/strong&gt; operation uses the &lt;code&gt;opt_out_ind&lt;/code&gt; column as a filter, removing only rows where the value is ‘Y’.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To perform a DELETE operation:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Verify the opt-out indicator data in the &lt;code&gt;customer_opt_out&lt;/code&gt; table:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from dev.public.customer_opt_out
where opt_out_ind = 'Y';&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90271" style="width: 818px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90271" loading="lazy" class="size-full wp-image-90271" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-9.jpg" alt="Figure 9: verify customer records for opt out" width="808" height="102"&gt;
 &lt;p id="caption-attachment-90271" class="wp-caption-text"&gt;Figure 9: verify customer records for opt out&lt;/p&gt;
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Verify the current customer data for customer IDs 9 and 12:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from dev.demo_iceberg.customerwhere customer_id in(9,12);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90272" style="width: 955px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90272" loading="lazy" class="size-full wp-image-90272" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-10.jpg" alt="Figure 0: verify existing customers data in customer table for opt out" width="945" height="103"&gt;
 &lt;p id="caption-attachment-90272" class="wp-caption-text"&gt;Figure 10: verify existing customers data in customer table for opt out&lt;/p&gt;
&lt;/div&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Review the query execution plan:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;EXPLAINDELETE FROM demo_iceberg.customerUSING public.customer_opt_out
WHERE customer.customer_id = customer_opt_out.customer_id
AND customer_opt_out.opt_out_ind = 'Y';&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90273" style="width: 1400px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90273" loading="lazy" class="size-full wp-image-90273" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-11.jpg" alt="Figure 1: query plan for the DELETE queryThe execution plan shows Amazon S3 scans for Apache Iceberg format tables, indicating that Amazon Redshift removes rows directly from the Amazon S3 bucket." width="1390" height="344"&gt;
 &lt;p id="caption-attachment-90273" class="wp-caption-text"&gt;Figure 11: query plan for the DELETE query. The execution plan shows Amazon S3 scans for Apache Iceberg format tables, indicating that Amazon Redshift removes rows directly from the Amazon S3 bucket.&lt;/p&gt;
&lt;/div&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Run the following &lt;strong&gt;DELETE&lt;/strong&gt; operation:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;DELETE FROM demo_iceberg.customer
USING public.customer_opt_out
WHERE customer.customer_id = customer_opt_out.customer_id
AND customer_opt_out.opt_out_ind = 'Y';&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Verify that the rows were removed:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from dev.demo_iceberg.customer where customer_id in(9,12);&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;div id="attachment_90274" style="width: 1081px" class="wp-caption alignnone"&gt;
 &lt;img aria-describedby="caption-attachment-90274" loading="lazy" class="size-full wp-image-90274" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/bdb-5850-image-12.jpg" alt="Figure 2: result set from customer table for opt out customer after delete" width="1071" height="157"&gt;
 &lt;p id="caption-attachment-90274" class="wp-caption-text"&gt;Figure 12: result set from customer table for opt out customer after delete&lt;/p&gt;
&lt;/div&gt; 
&lt;p&gt;The query returns no rows, confirming that customer IDs 9 and 12 were successfully deleted from the &lt;code&gt;customer&lt;/code&gt; table.&lt;/p&gt; 
&lt;h2&gt;Best practices&lt;/h2&gt; 
&lt;p&gt;After performing multiple &lt;strong&gt;UPDATE&lt;/strong&gt; or &lt;strong&gt;DELETE&lt;/strong&gt; operations, consider running table maintenance to optimize read performance:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;For AWS Glue tables&lt;/strong&gt; – Use AWS Glue table optimizers. For more information, see &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html" target="_blank" rel="noopener noreferrer"&gt;Table optimizers&lt;/a&gt; in the AWS Glue Developer Guide.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;For S3 Tables&lt;/strong&gt; – Use S3 Tables maintenance operations. For more information, see &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-maintenance.html" target="_blank" rel="noopener noreferrer"&gt;S3 Tables maintenance&lt;/a&gt; in the Amazon S3 User Guide.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Table maintenance merges and compacts deletion files generated by Merge-on-Read operations, improving query performance for subsequent reads.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;You can use Amazon Redshift support for DELETE, UPDATE, and MERGE operations on Apache Iceberg tables to build data architectures that combine warehouse performance with data lake scalability. You can modify data at the row level while maintaining ACID compliance, giving you the same flexibility with Apache Iceberg tables as you have with native Amazon Redshift tables.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Get started:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Review the &lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/querying-iceberg.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift Iceberg integration documentation&lt;/a&gt; for complete syntax reference&lt;/li&gt; 
 &lt;li&gt;Explore &lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/iceberg-writes.html" target="_blank" rel="noopener noreferrer"&gt;Writing to Apache Iceberg tables&lt;/a&gt; for detailed examples&lt;/li&gt; 
 &lt;li&gt;Learn about &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html" target="_blank" rel="noopener noreferrer"&gt;table maintenance best practices&lt;/a&gt; for AWS Glue tables&lt;/li&gt; 
 &lt;li&gt;Discover &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-maintenance.html" target="_blank" rel="noopener noreferrer"&gt;S3 Tables maintenance operations&lt;/a&gt; for improving query performance&lt;/li&gt; 
&lt;/ul&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-29797" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2025/11/25/5576a1.jpg" alt="Sanket Hase" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sanket Hase&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/sankethase/" target="_blank" rel="noopener"&gt;Sanket&lt;/a&gt; is an Engineering Manager with the Amazon Redshift team, leading query execution teams in the areas of data lake analytics, hardware-software co-design, and vectorized query execution.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-29797" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2023/06/27/kuppa.jpg" alt="Raghu Kuppala" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Raghu Kuppala&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/raghu-kuppala-a056a78b/" target="_blank" rel="noopener"&gt;Raghu&lt;/a&gt; is an Analytics Specialist Solutions Architect experienced working in the databases, data warehousing, and analytics space. Outside of work, he enjoys trying different cuisines and spending time with his family and friends.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2025/07/09/ritesh-100-1.png"&gt;&lt;img loading="lazy" class="size-full wp-image-80580 alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2025/07/09/ritesh-100-1.png" alt="" width="100" height="133"&gt;&lt;/a&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ritesh Sinha&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/ritesh-kumar-sinha-8016aa22/" target="_blank" rel="noopener"&gt;Ritesh&lt;/a&gt; is an Analytics Specialist Solutions Architect based out of San Francisco. He has helped customers build scalable data warehousing and big data solutions for over 16 years. He loves to design and build efficient end-to-end solutions on AWS. In his spare time, he loves reading, walking, and doing yoga.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-29797" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/01/05/BDB-5675-image-15-1.png" alt="Sundeep Kumar" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sundeep Kumar&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/sundeep-kumar-8b8160a4/" target="_blank" rel="noopener"&gt;Sundeep&lt;/a&gt; is a Sr. Specialist Solutions Architect at Amazon Web Services (AWS), helping customers build data lake and analytics platforms and solutions. When not building and designing data lakes, Sundeep enjoys listening to music and playing guitar.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="aligncenter size-full wp-image-29797" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2025/11/25/5576a2.jpg" alt="Xiening Dai" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Xiening Dai&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/xndai/" target="_blank" rel="noopener"&gt;Xiening&lt;/a&gt; is a Principal Software Engineer working on Redshift Query Processing and Data Lake.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90356" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/16/ebsh.jpeg" alt="" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Ebrahim Salim Hirani&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/ebrahim-hirani?utm_source=share_via&amp;amp;utm_content=profile&amp;amp;utm_medium=member_android" target="_blank" rel="noopener"&gt;Ebrahim&lt;/a&gt; is a Software Engineer working on Redshift Query Processing and Data Lake.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90357" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/16/lixuex.jpeg" alt="" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Sherry Xiao&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/lixue-xiao-b9b798125/" target="_blank" rel="noopener"&gt;Sherry Xiao&lt;/a&gt; is a Software Engineer working on Redshift Query Engine Execution and Data Lake team.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Get to insights faster using Notebooks in Amazon SageMaker Unified Studio</title>
		<link>https://aws.amazon.com/blogs/big-data/get-to-insights-faster-using-notebooks-in-amazon-sagemaker-unified-studio/</link>
					
		
		<dc:creator><![CDATA[Praveen Kumar]]></dc:creator>
		<pubDate>Wed, 15 Apr 2026 21:24:36 +0000</pubDate>
				<category><![CDATA[Amazon SageMaker Unified Studio]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<guid isPermaLink="false">2acb159cf9d49e7b83fd71e1797457acd06398e1</guid>

					<description>In this post, we demonstrate how Notebooks in Amazon SageMaker Unified Studio help you get to insights faster by simplifying infrastructure configuration. You'll see how to analyze housing price data, create scalable data tables, run distributed profiling, and train machine learning (ML) models within a single notebook environment.</description>
										<content:encoded>&lt;p&gt;In this post, we demonstrate how Notebooks in Amazon SageMaker Unified Studio help you get to insights faster by simplifying infrastructure configuration. You’ll see how to analyze housing price data, create scalable data tables, run distributed profiling, and train machine learning (ML) models within a single notebook environment.&lt;/p&gt; 
&lt;p&gt;Data scientists and analysts often spend days configuring infrastructure and managing authentication across multiple data sources before they can begin analysis. When working with data across Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Snowflake, and local files, teams face repeated authentication setup, manual compute scaling decisions, and tool-switching overhead that delays insights.&lt;/p&gt; 
&lt;p&gt;Notebooks in Amazon SageMaker Unified Studio provide instant access to 12+ data sources, compute scaling from local to distributed processing, and AI-powered code generation within a single browser-based environment. You’ll learn to use polyglot programming, multi-engine compute, and AI-assisted development to accelerate your path from question to insight.&lt;/p&gt; 
&lt;h2&gt;What are Notebooks in Amazon SageMaker Unified Studio?&lt;/h2&gt; 
&lt;p&gt;Notebooks in Amazon SageMaker Unified Studio provide an interactive environment for data analysis, exploration, engineering, and machine learning workflows. It delivers five integrated capabilities:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Polyglot programming&lt;/strong&gt;: Write code in Python and SQL interchangeably within the same notebook environment&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Unified data access&lt;/strong&gt;: Connect instantly to data stored in Amazon S3, AWS Glue Data Catalog, Apache Iceberg tables, and third-party sources like Snowflake and BigQuery&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Native visualization&lt;/strong&gt;: Create charts directly from Python and SQL results for immersive data analytics&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AI-powered development&lt;/strong&gt;: Generate code through natural language prompts using SageMaker Data Agent, with an intelligent chat interface for data analytics, data science, and ML tasks&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Flexible compute&lt;/strong&gt;: Scale from basic instances to GPU-powered environments as your needs grow&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Architecture&lt;/h3&gt; 
&lt;p&gt;This section covers the architecture of Notebooks, which delivers enterprise-scale analytics with browser-based simplicity through a cloud-native architecture that integrates multiple compute engines, diverse data sources, and AI-powered assistance.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90240" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-1.png" alt="" width="940" height="960"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Presentation layer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;You access the notebook interface through Amazon SageMaker Unified Studio, interacting with a familiar interface featuring code cells for execution, markdown cells for documentation, and visualization cells for charts and tables.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Compute layer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;A dedicated notebook server manages your kernel lifecycle and session state. Key components include a Language Server for code completion, a Python 3.11 runtime with pre-loaded data science libraries, and a Polyglot Kernel that handles your Python, PySpark, and SQL execution within the same notebook. Persistent Amazon Elastic Block Store (Amazon EBS) storage backs each notebook you create.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Execution layer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Notebooks support multiple execution engines, automatically routing your code to the optimal processing engine. In-memory execution handles your smaller datasets and rapid prototyping. Apache Spark via Amazon Athena provides distributed processing for your large-scale analytics via Spark Connect. Native connectivity to Amazon Athena (Trino), Amazon Redshift, Snowflake, and BigQuery processes your SQL queries.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Data Integration&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;You get unified access to 12+ data sources including AWS-native (Amazon S3, AWS Glue, Amazon Athena, Amazon Redshift) and third-party (Snowflake, BigQuery, PostgreSQL, MySQL) data sources. For the latest supported data sources, see &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/connect-data-sources.html" target="_blank" rel="noopener noreferrer"&gt;Connect to data sources &lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;AI layer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The SageMaker Data Agent operates in two modes to assist you: an &lt;em&gt;Agent Panel&lt;/em&gt; for multi-step analytical workflows and &lt;em&gt;Inline Assistance&lt;/em&gt; for focused, cell-level code generation. For a detailed overview, see &lt;a href="https://aws.amazon.com/blogs/big-data/accelerate-context-aware-data-analysis-and-ml-workflows-with-amazon-sagemaker-data-agent/" target="_blank" rel="noopener noreferrer"&gt;Accelerate context-aware data analysis and ML workflows with Amazon SageMaker Data Agent&lt;/a&gt; .&lt;/p&gt; 
&lt;p&gt;Security is embedded throughout the architecture to protect your work. Data access respects your AWS Identity and Access Management (AWS IAM) permissions. The notebook and the agent can only access data sources you’re authorized to use. Communication between components uses encrypted channels, and your notebook storage is encrypted at rest. The AI agent includes built-in guardrails to help prevent destructive operations and logs interactions for your compliance and auditing purposes.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before you begin, you need:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An AWS account with appropriate permissions to create Amazon SageMaker Unified Studio resources. See &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/setup-iam-based-domains.html" target="_blank" rel="noopener noreferrer"&gt;Set up IAM-based domains&lt;/a&gt; for complete permission requirements.&lt;/li&gt; 
 &lt;li&gt;Basic familiarity with Python programming and SQL queries&lt;/li&gt; 
 &lt;li&gt;Understanding of data analysis concepts and ML workflows&lt;/li&gt; 
 &lt;li&gt;Access to the sample housing dataset (provided in the walkthrough)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Getting started with Notebooks&lt;/h3&gt; 
&lt;p&gt;To get started, open the &lt;a href="https://console.aws.amazon.com/datazone/home?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker console&lt;/a&gt; and choose &lt;strong&gt;Get started&lt;/strong&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90241" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-2.jpg" alt="" width="2560" height="733"&gt;&lt;/p&gt; 
&lt;p&gt;You will be prompted either to select an existing &lt;a href="https://aws.amazon.com/iam/?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&amp;amp;sc_channel=el" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (AWS IAM)&lt;/a&gt; role that has access to your data and compute, or to create a new role. For this walkthrough, choose &lt;strong&gt;Create a new role&lt;/strong&gt; and leave the other options at their defaults.&lt;/p&gt; 
&lt;p&gt;Choose Set up. It takes a few minutes to complete your environment.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90242" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-3.jpg" alt="" width="2560" height="838"&gt;&lt;/p&gt; 
&lt;h2&gt;Use case&lt;/h2&gt; 
&lt;p&gt;In this post, you’ll use a Notebook and the SageMaker Data Agent to perform the following:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Working with dataset: Upload sample dataset housing.csv and&lt;/strong&gt; explore with data explorer&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Polyglot programming:&lt;/strong&gt; Query dataframes with SQL via DuckDB&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Multi-engine access via AWS Glue: Create an AWS&lt;/strong&gt; Glue table to unlock Athena SQL/Spark engines for distributed processing&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Advanced analytics:&lt;/strong&gt; Use Athena Spark for data profiling&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AI-assisted development:&lt;/strong&gt; Generate profiling and ML code with Data Agent&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;ML workflow: Train Random Forest model and&lt;/strong&gt; evaluate results&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;First, let’s walk through the interface and explore its core capabilities.&lt;/p&gt; 
&lt;h3&gt;Understanding the interface&lt;/h3&gt; 
&lt;p&gt;The Notebooks interface follows familiar notebook conventions with cells for code execution and markdown for documentation. Within the notebook, you’ll see your current programming environment (such as Python 3.11) and compute profile specifications. The interface allows you to:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Access your data by browsing files, exploring data catalogs, and managing third-party connections&lt;/li&gt; 
 &lt;li&gt;Monitor variables created within your notebook context&lt;/li&gt; 
 &lt;li&gt;Scale compute resources on demand by adjusting virtual CPUs and RAM based on your workload requirements, even scaling up to GPU instances&lt;/li&gt; 
 &lt;li&gt;Manage packages by installing and configuring Python packages as needed&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Working with the dataset&lt;/h3&gt; 
&lt;p&gt;For this walkthrough, you’ll use the housing.csv sample dataset which you can download from &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-sample-datasets.html" target="_blank" rel="noopener noreferrer"&gt;this page&lt;/a&gt;. (the file is named canvas-sample-housing.csv on the linked page). Choose the Files icon in the left panel and choose the &lt;strong&gt;Local&lt;/strong&gt; tab. Upload the CSV file to the notebook on the&lt;strong&gt; Local&lt;/strong&gt; tab.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90243" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-4.jpg" alt="" width="2560" height="772"&gt;&lt;/p&gt; 
&lt;p&gt;Notebooks provide you with instant access to your data assets. Using the data explorer, you can browse your AWS Glue Data Catalog, Amazon S3 table catalogs, Amazon S3 buckets, and configured third-party connections.&lt;/p&gt; 
&lt;p&gt;Choose the three-dot options menu.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90244" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-5.jpg" alt="" width="1090" height="978"&gt;&lt;/p&gt; 
&lt;p&gt;Choose &lt;strong&gt;Read as dataframe&lt;/strong&gt;, then run the inserted cell in the notebook to view the results.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-python"&gt;import pandas as pd
&amp;lt;&amp;lt;df_csv_xxxx&amp;gt;&amp;gt; = pd.read_csv('housing.csv')
&amp;lt;&amp;lt;df_csv_xxxx&amp;gt;&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90245" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-6.jpg" alt="" width="2560" height="1110"&gt;&lt;/p&gt; 
&lt;p&gt;When you return a dataframe, Notebooks render it in a rich table format with automatic data profiling.&lt;/p&gt; 
&lt;h3&gt;Polyglot programming: Python and SQL together&lt;/h3&gt; 
&lt;p&gt;One of the most powerful features in Notebooks is the interoperability between Python and SQL. After you load data into a Python dataframe, you can immediately query it using SQL. For example, to calculate total population and household by ocean proximity, you can run:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select sum(population) ,sum(households),ocean_proximity 
from&amp;lt;&amp;lt;df_csv_xxxx&amp;gt;&amp;gt; 
group byocean_proximity&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90246" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-7.jpg" alt="" width="1108" height="892"&gt;&lt;/p&gt; 
&lt;p&gt;The notebook’s autocomplete functionality recognizes dataframes in your context, making SQL queries intuitive.&lt;/p&gt; 
&lt;p&gt;This SQL query runs on &lt;a href="https://duckdb.org/" target="_blank" rel="noopener noreferrer"&gt;DuckDB&lt;/a&gt; (an in-memory SQL database engine), which requires no separate installation or server maintenance on your part. DuckDB’s lightweight design integrates into Python, Java, and other environments, making it ideal for your rapid interactive data analysis. For distributed processing needs, you can use engines such as Apache Spark or Trino after creating an AWS Glue table for this dataset.&lt;/p&gt; 
&lt;h3&gt;Create an AWS Glue table for the dataset&lt;/h3&gt; 
&lt;p&gt;After you create an AWS Glue table, you can query the dataset using various AWS Glue catalog-compatible engines, including Amazon Athena SQL (Trino) and Amazon Athena Spark. These engines deliver optimal price-performance for your specific workload requirements.&lt;/p&gt; 
&lt;p&gt;Start by creating an AWS Glue database. To do that, create a new cell in the notebook by choosing &lt;strong&gt;SQL&lt;/strong&gt; and selecting &lt;strong&gt;Amazon Athena (SQL)&lt;/strong&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90247" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-8.jpg" alt="" width="2560" height="977"&gt;&lt;/p&gt; 
&lt;p&gt;Run this SQL to create a database: &lt;code&gt;create database demo;&lt;/code&gt;&lt;/p&gt; 
&lt;p&gt;Next, go to data explorer and choose &lt;strong&gt;+Add&lt;/strong&gt; on the top left, then choose &lt;strong&gt;Create table&lt;/strong&gt;. Choose the database you created earlier and enter a name for the table. Upload the housing.csv dataset file used earlier. Continue by choosing &lt;strong&gt;Next&lt;/strong&gt; in the side panel to create the table.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90248" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-9.jpg" alt="" width="2560" height="1224"&gt;&lt;/p&gt; 
&lt;p&gt;Next, let’s run a sample SQL query in a new cell using Amazon Athena SQL:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select sum(population) , sum(households), ocean_proximity 
fromdemo.housing
group by ocean_proximity&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90249" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-10.jpg" alt="" width="2560" height="1104"&gt;&lt;/p&gt; 
&lt;h3&gt;Advanced capabilities with Athena Spark&lt;/h3&gt; 
&lt;p&gt;Before you can build an ML model to predict house prices, let’s analyze the dataset further and run data profiling for additional insights. For advanced exploration, you can use Amazon Athena Spark within your notebook.To do that, you’ll create a new Python cell which has a built-in Spark session. Run the following code to check the Spark version:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;# Verify Spark version
spark.version&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90250" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-11.jpg" alt="" width="2096" height="440"&gt;&lt;/p&gt; 
&lt;h3&gt;Using the SageMaker Data Agent for data profiling&lt;/h3&gt; 
&lt;p&gt;Instead of writing boilerplate code manually, you can use the built-in generative AI capability.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;: “Perform data profiling and create visualization for housing table”&lt;/p&gt; 
&lt;p&gt;The AI assistant generates comprehensive profiling code for you, including basic statistics calculation, column-level profiling, data type analysis, and missing value detection.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90251" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-12.jpg" alt="" width="2106" height="662"&gt;&lt;/p&gt; 
&lt;p&gt;The agent accessed your AWS Glue Data Catalog, understood your housing table structure, and generated profiling code tailored to your specific columns and data types. This context awareness reduces the trial-and-error cycle you’d normally face when adapting generic code snippets to your environment. Review the generated code and run it. The fast response times help you iterate on your analysis efficiently.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90252" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-13.jpg" alt="" width="2560" height="1179"&gt;&lt;/p&gt; 
&lt;p&gt;If you encounter an error, you can resolve it using &lt;strong&gt;Fix with AI&lt;/strong&gt; as shown in the following figure. When errors occur during execution, the “&lt;strong&gt;Fix with AI&lt;/strong&gt;” feature analyzes the traceback, diagnoses the root cause, and generates corrected code, so you can keep your analysis moving forward.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90253" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-14.jpg" alt="" width="2174" height="1186"&gt;&lt;/p&gt; 
&lt;h3&gt;Training ML models&lt;/h3&gt; 
&lt;p&gt;Next, you’ll use the data agent to generate code for training a model that predicts housing prices.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Prompt: &lt;/strong&gt;“Generate code to train a model that predicts housing prices. Use table housing.”&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90255" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-15.jpg" alt="" width="2128" height="978"&gt;&lt;/p&gt; 
&lt;p&gt;The AI assistant generates end-to-end code for you that:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Reads housing data from AWS Glue catalog using Amazon Athena Spark and converts to pandas&lt;/li&gt; 
 &lt;li&gt;Converts string columns to numeric, encodes using one-hot encoding and removes missing values&lt;/li&gt; 
 &lt;li&gt;Trains a Random Forest model to predict median house values&lt;/li&gt; 
 &lt;li&gt;Evaluates model performance (RMSE, MAE, R-square)&lt;/li&gt; 
 &lt;li&gt;Displays top 10 most important features for predictions&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This multi-step orchestration saves you hours of development time by handling the entire workflow from data access to model evaluation.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-90256" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-16.jpg" alt="" width="1904" height="1108"&gt;&lt;/p&gt; 
&lt;p&gt;If you encounter an error, you can resolve it using &lt;strong&gt;Fix with AI&lt;/strong&gt; available in the results traceback section.&lt;/p&gt; 
&lt;p&gt;This workflow showcased Notebooks’ unified capabilities: you uploaded files locally, created AWS Glue tables for multi-engine access, used Amazon Athena Spark for distributed profiling, and used AI-assisted ML development to predict housing prices. All of this happened within a single notebook environment without switching tools.&lt;/p&gt; 
&lt;h2&gt;Key benefits and best practices&lt;/h2&gt; 
&lt;p&gt;Notebooks in Amazon SageMaker Unified Studio deliver several advantages:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Faster time to insights&lt;/strong&gt;: With traditional environments, you might spend hours on configuration before analysis begins. Notebooks bypass this overhead, so you can start work immediately.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Improved collaboration&lt;/strong&gt;: You can share notebooks with consistent environments, supporting reproducibility and reducing “works on my machine” issues.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Reduced complexity: You can access multiple data sources and compute engines from one interface rather than navigating se&lt;/strong&gt;parate tools for each data source or processing engine.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;AI-accelerated development&lt;/strong&gt;: Generate task-specific code and receive intelligent suggestions, reducing time spent on repetitive coding tasks.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Scalable performance&lt;/strong&gt;: Handle datasets from megabytes to petabytes with appropriate compute resources. The system scales automatically as data volumes grow.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Best practices&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Start with appropriate &lt;/strong&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/manage-compute-environments.html" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;compute profiles&lt;/strong&gt;&lt;/a&gt; by beginning with smaller instances and scaling up as your needs grow.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Use AI assistance&lt;/strong&gt; with natural language prompts for your repetitive tasks and complex operations.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Combine engines strategically&lt;/strong&gt; by using Amazon Athena Spark for your large-scale processing, Amazon Redshift for data warehousing and other specialized engines for your specific workloads.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Document your work&lt;/strong&gt; using markdown cells to create living documentation alongside your code.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Organize using multiple cells&lt;/strong&gt; by breaking the complex workflows into logical steps for better readability and debugging.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Cleaning up&lt;/h2&gt; 
&lt;p&gt;To avoid incurring future charges, delete the resources you created in this walkthrough:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the Amazon SageMaker Unified Studio console, navigate to the Notebook page&lt;/li&gt; 
 &lt;li&gt;Delete the notebook&lt;/li&gt; 
 &lt;li&gt;Delete the demo database and housing table from the AWS Glue Data Catalog&lt;/li&gt; 
 &lt;li&gt;Delete Amazon SageMaker Unified Studio domain created during this walkthrough&lt;/li&gt; 
 &lt;li&gt;If you created a new IAM role specifically for this walkthrough, delete it from the IAM console&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we demonstrated how Notebooks in Amazon SageMaker Unified Studio help you work more efficiently and deliver insights more quickly. By combining familiar notebook interfaces with enterprise-scale compute, multi-engine support, and generative AI assistance, teams can streamline data and AI workflows.&lt;/p&gt; 
&lt;p&gt;The integration of Python and SQL, instant access to diverse data sources, and intelligent code generation capabilities make Notebooks a valuable tool for modern data teams. Teams can perform exploratory data analysis, build complex data pipelines, or train ML models with the flexibility and power needed within a single, intuitive environment.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Ready to get started?&lt;/strong&gt; &lt;a href="https://us-east-2.console.aws.amazon.com/datazone/home?region=us-east-2#/" target="_blank" rel="noopener noreferrer"&gt;Create your first notebook in Amazon SageMaker Unified Studio&lt;/a&gt; and begin analyzing data within minutes.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Explore additional capabilities:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Time series analysis workflows with seasonal decomposition and forecasting&lt;/li&gt; 
 &lt;li&gt;Natural language processing pipelines for text classification and sentiment analysis&lt;/li&gt; 
 &lt;li&gt;Integration with Amazon SageMaker Model Registry for ML model versioning&lt;/li&gt; 
 &lt;li&gt;Advanced Spark optimization techniques for petabyte-scale processing&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Learn more:&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/setting-up.html" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Unified Studio Administrator Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/navigating-sagemaker-unified-studio.html" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Unified Studio User Guide&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/notebooks.html" target="_blank" rel="noopener noreferrer"&gt;Notebooks in SageMaker Unified Studio&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/sagemaker-data-agent.html" target="_blank" rel="noopener noreferrer"&gt;Use the SageMaker Data Agent&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/sagemaker/pricing/" target="_blank" rel="noopener noreferrer"&gt;Pricing information&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90257" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-17.jpg" alt="" width="576" height="768"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Praveen Kumar&lt;/h3&gt; 
  &lt;p&gt;Praveen Kumar is a Principal Analytics Solutions Architect at AWS with expertise in designing, building, and implementing modern data and analytics applications using cloud-based services. His areas of interest are serverless technology, data governance, and data-driven AI applications.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90258" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-18.jpg" alt="" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Majisha Namath Parambath&lt;/h3&gt; 
  &lt;p&gt;Majisha Namath Parambath is a Principal Engineer at Amazon SageMaker, bringing over a decade of experience at AWS to her role. She spearheads critical initiatives for Amazon SageMaker Unified Studio, the next-generation service that provides comprehensive data analytics and interactive machine learning capabilities with an emphasis on agentic systems. Her expertise encompasses system design, architecture, and cross-functional execution, with particular attention to security, performance, and reliability at enterprise scale. When she’s not engineering solutions, Majisha enjoys reading, cooking, and hitting the slopes for skiing.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-90259" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/14/BDB-5645-image-19.jpg" alt="" width="1176" height="1567"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Siddharth Gupta&lt;/h3&gt; 
  &lt;p&gt;Siddharth Gupta is heading Generative AI within SageMaker’s Unified Experiences. His focus is on driving agentic experiences, where AI systems act autonomously on behalf of users to accomplish complex tasks. Previously, he led edge machine learning solutions at AWS. His work focuses on improving how developers and data scientists interact with AI, creating more intuitive data integrations and better tools for building and deploying machine learning models. An alumnus of the University of Illinois at Urbana-Champaign, he brings extensive experience from his roles at Yahoo, Glassdoor, and Twitch. You can reach out to him on &lt;a href="https://www.linkedin.com/in/sid88in/" target="_blank" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>How to use Parquet Column Indexes with Amazon Athena</title>
		<link>https://aws.amazon.com/blogs/big-data/how-to-use-parquet-column-indexes-with-amazon-athena/</link>
					
		
		<dc:creator><![CDATA[Matt Wong]]></dc:creator>
		<pubDate>Mon, 13 Apr 2026 15:57:18 +0000</pubDate>
				<category><![CDATA[Amazon Athena]]></category>
		<category><![CDATA[Amazon SageMaker Unified Studio]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">e576855b9de8c1bb7bbc082bdd7343c4101455a7</guid>

					<description>In this blog post, we use Athena and Amazon SageMaker Unified Studio to explore Parquet Column Indexes and demonstrate how they can improve Iceberg query performance. We explain what Parquet Column Indexes are, demonstrate their performance benefits, and show you how to use them in your applications.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena&lt;/a&gt; recently added support for reading Parquet Column Indexes in Apache Iceberg tables on &lt;a href="https://docs.aws.amazon.com/athena/latest/ug/release-notes.html" target="_blank" rel="noopener noreferrer"&gt;November 21, 2025&lt;/a&gt;. With this optimization, Athena can perform page-level data pruning to skip unnecessary data within Parquet row groups, potentially reducing the amount of data scanned and improving query runtime for queries with selective filters. For data teams, this may help enable faster insights and help reduce costs when analyzing large-scale data lakes.&lt;/p&gt; 
&lt;p&gt;Data teams building data lakes often choose Apache Iceberg for its ACID transactions, schema evolution, and metadata management capabilities. Athena is a serverless query engine that allows you to query Amazon S3-based data lakes using SQL, and you don’t need to manage infrastructure. Based on the type of data and query logic, Athena can apply multiple query optimizations to improve performance and reduce costs.&lt;/p&gt; 
&lt;p&gt;In this blog post, we use Athena and &lt;a href="https://aws.amazon.com/sagemaker/unified-studio/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Unified Studio&lt;/a&gt; to explore Parquet Column Indexes and demonstrate how they can improve Iceberg query performance. We explain what Parquet Column Indexes are, demonstrate their performance benefits, and show you how to use them in your applications.&lt;/p&gt; 
&lt;h2&gt;Overview of Parquet Column Indexes&lt;/h2&gt; 
&lt;p&gt;Parquet Column Indexes store metadata that query engines can use to skip irrelevant data with greater precision than row group statistics alone. To understand how they work, consider how data is structured within Parquet files and how engines like Athena process them.&lt;/p&gt; 
&lt;p&gt;Parquet files organize data hierarchically by dividing data into row groups (typically 128-512 MB each) and further subdividing them into pages (typically 1 MB each). Traditionally, Parquet maintains metadata on the contents of each row group level in the form of min/max statistics, allowing engines like Athena to skip row groups that don’t satisfy query predicates. Although this approach reduces the bytes scanned and query runtime, it has limitations. If even a single page within a row group overlaps with the values you are searching for, Athena scans all pages within the row group.&lt;/p&gt; 
&lt;p&gt;Parquet Column Indexes help address this problem by storing page-level min/max statistics in the Parquet file footer. Row group statistics provide coarse-grained filtering, but Parquet Column Indexes enable finer-grained filtering by allowing query engines like Athena to skip individual pages within a row group. Consider a Parquet file with a single row group containing 5 pages for a column. The row group has min/max statistics of (1, 20), and each page for that column has the following min/max statistics.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;row-group-0: min=1, max=20
    page-0: min=1, max=10
    page-1: min=1, max=10
    page-2: min=5, max=15
    page-3: min=6, max=16
    page-4: min=10, max=20&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;When Athena runs a query filtering for values equal to 2, it first checks the row group statistics and confirms that 2 falls within the range (1, 20). Athena will then plan to scan the pages within that row group. Without Parquet Column Indexes, Athena scans each of the 5 pages in the row group. With Parquet Column Indexes, Athena examines the page-level statistics and determines that only page-0 and page-1 need to be read, skipping the remaining 3 pages.&lt;/p&gt; 
&lt;h2&gt;How to use Parquet Column Indexes with Athena&lt;/h2&gt; 
&lt;p&gt;Athena uses Parquet Column Indexes based on table type:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Amazon S3 Tables: Athena automatically uses Parquet Column Indexes by default when they are present.&lt;/li&gt; 
 &lt;li&gt;Iceberg tables in S3 general purpose buckets: Athena does not use Parquet Column Indexes by default. To allow Athena to use Parquet Column Indexes, add an AWS Glue table property named &lt;code&gt;use_iceberg_parquet_column_index&lt;/code&gt; and set it to &lt;code&gt;true&lt;/code&gt;. Use the AWS Glue console or &lt;a href="https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html" target="_blank" rel="noopener noreferrer"&gt;AWS Glue UpdateTable API&lt;/a&gt; to perform these actions.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Read more about how to use this feature in &lt;a href="https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-data-optimization.html#querying-iceberg-data-optimization-parquet-column-indexing" target="_blank" rel="noopener noreferrer"&gt;Use Parquet column indexing&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Measuring Athena performance gains when using Parquet Column Indexes&lt;/h2&gt; 
&lt;p&gt;Now that we understand what Parquet Column Indexes are, we’ll demonstrate the performance benefits of using Parquet Column Indexes by analyzing the &lt;code&gt;catalog_sales&lt;/code&gt; table from a 3TB TPC-DS dataset. This table contains ecommerce transaction data including order dates, sales amounts, customer IDs, and product information. This dataset is a good proxy for the types of business analysis that you might perform on your own data, such as identifying sales trends, analyzing customer purchasing patterns, and calculating revenue metrics. We compare query execution statistics with and without Parquet Column Indexes to quantify the performance improvement.&lt;/p&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;Before you begin, you must have the following resources:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;A SageMaker Unified Studio IAM-based domain.&lt;/li&gt; 
 &lt;li&gt;An &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/setup-iam-based-domains.html" target="_blank" rel="noopener noreferrer"&gt;Execution IAM Role&lt;/a&gt; configured within the SageMaker Unified Studio IAM-based domain with access to S3, AWS Glue Data Catalog, and Athena.&lt;/li&gt; 
 &lt;li&gt;An S3 bucket in your account to store Iceberg table data and Athena query results.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Create catalog_sales Iceberg table&lt;/h3&gt; 
&lt;p&gt;Complete the following steps using SageMaker Unified Studio notebooks. There, you can use SageMaker Unified Studio’s multi-dialect notebook functionality to work with your data using the Athena SQL and Spark engines. To create a &lt;code&gt;catalog_sales&lt;/code&gt; Iceberg table in your account, follow these steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to Amazon SageMaker in the AWS Management Console and choose &lt;strong&gt;Open&lt;/strong&gt; under &lt;strong&gt;Get started with Amazon SageMaker Unified Studio&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;From the side navigation, select &lt;strong&gt;Notebooks&lt;/strong&gt; and choose &lt;strong&gt;Create Notebook&lt;/strong&gt;. The subsequent steps in this post will execute scripts in this notebook.&lt;/li&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;SQL &lt;/strong&gt;cell in the notebook and set the connection type to &lt;strong&gt;Athena (Spark)&lt;/strong&gt;. Execute the following query to create a database for the tables in this post. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE DATABASE parquet_column_index_blog;&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;SQL &lt;/strong&gt;cell in the notebook and verify the connection type is &lt;strong&gt;Athena (Spark)&lt;/strong&gt;. Execute the following query to create a Hive table pointing to the location of the TPC-DS &lt;code&gt;catalog_sales&lt;/code&gt; table data at the public S3 bucket. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE TABLE IF NOT EXISTS parquet_column_index_blog.catalog_sales_hive (
	  cs_sold_time_sk int,
	  cs_ship_date_sk int,
	  cs_bill_customer_sk int,
	  cs_bill_cdemo_sk int,
	  cs_bill_hdemo_sk int,
	  cs_bill_addr_sk int,
	  cs_ship_customer_sk int,
	  cs_ship_cdemo_sk int,
	  cs_ship_hdemo_sk int,
	  cs_ship_addr_sk int,
	  cs_call_center_sk int,
	  cs_catalog_page_sk int,
	  cs_ship_mode_sk int,
	  cs_warehouse_sk int,
	  cs_item_sk int,
	  cs_promo_sk int,
	  cs_order_number bigint,
	  cs_quantity int,
	  cs_wholesale_cost decimal(7, 2),
	  cs_list_price decimal(7, 2),
	  cs_sales_price decimal(7, 2),
	  cs_ext_discount_amt decimal(7, 2),
	  cs_ext_sales_price decimal(7, 2),
	  cs_ext_wholesale_cost decimal(7, 2),
	  cs_ext_list_price decimal(7, 2),
	  cs_ext_tax decimal(7, 2),
	  cs_coupon_amt decimal(7, 2),
	  cs_ext_ship_cost decimal(7, 2),
	  cs_net_paid decimal(7, 2),
	  cs_net_paid_inc_tax decimal(7, 2),
	  cs_net_paid_inc_ship decimal(7, 2),
	  cs_net_paid_inc_ship_tax decimal(7, 2),
	  cs_net_profit decimal(7, 2))
	USING parquet
	PARTITIONED BY (cs_sold_date_sk int)
	LOCATION 's3://blogpost-sparkoneks-us-east-1/blog/BLOG_TPCDS-TEST-3T-partitioned/catalog_sales/'
	TBLPROPERTIES (
	  'parquet.compression'='SNAPPY'
	);&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;SQL &lt;/strong&gt;cell in the notebook and verify the connection type is &lt;strong&gt;Athena (Spark)&lt;/strong&gt;. Execute the following query to add the Hive partitions to the AWS Glue metadata. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;MSCK REPAIR TABLE parquet_column_index_blog.catalog_sales_hive;&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;SQL&lt;/strong&gt; cell in the notebook and verify the connection type is &lt;strong&gt;Athena (Spark)&lt;/strong&gt;. Replace &lt;code&gt;s3://amzn-s3-demo-bucket/athena_parquet_column_index_blog/catalog_sales/&lt;/code&gt;&amp;nbsp;with the S3 URI where you want to store your Iceberg table data, then execute the following query to create the &lt;code&gt;catalog_sales&lt;/code&gt; Iceberg table from the Hive table. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE TABLE parquet_column_index_blog.catalog_sales
	USING iceberg
	PARTITIONED BY (cs_sold_date_sk)
	LOCATION 's3://amzn-s3-demo-bucket/athena_parquet_column_index_blog/catalog_sales/'
	AS
	SELECT * FROM parquet_column_index_blog.catalog_sales_hive;&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;SQL&lt;/strong&gt; cell in the notebook and verify the connection type is &lt;strong&gt;Athena (Spark)&lt;/strong&gt;. Execute the following query to delete the &lt;code&gt;catalog_sales_hive&lt;/code&gt; table, which was only needed to create the &lt;code&gt;catalog_sales&lt;/code&gt; Iceberg table. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;DROP TABLE parquet_column_index_blog.catalog_sales_hive;&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Run an Athena query without Parquet Column Indexes&lt;/h3&gt; 
&lt;p&gt;After creating the &lt;code&gt;catalog_sales&lt;/code&gt; Iceberg table in the preceding steps, we run a simple query that analyzes shipping delays of the top 10 most ordered items. This type of analysis could be critical for ecommerce and retail operations. By identifying which popular items experience the greatest delays, fulfillment teams can focus resources where they matter most. For example, you can adjust inventory placement, change warehouse assignments, or address carrier issues. Additionally, popular items with significant shipping delays are more likely to result in order cancellations or returns, so proactively identifying these issues helps protect revenue.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;SELECT cs_item_sk,
    SUM(cs_quantity) as total_orders,
    AVG(cs_ship_date_sk - cs_sold_date_sk) as avg_ship_delay_days,
    MIN(cs_ship_date_sk - cs_sold_date_sk) as min_ship_delay,
    MAX(cs_ship_date_sk - cs_sold_date_sk) as max_ship_delay,
    SUM(
        CASE
            WHEN cs_ship_date_sk - cs_sold_date_sk &amp;gt; 7 THEN 1 ELSE 0
        END
    ) as late_shipments,
    SUM(
        CASE
            WHEN cs_ship_date_sk - cs_sold_date_sk &amp;gt; 7 THEN 1 ELSE 0
        END
    ) * 100.0 / COUNT(*) as late_shipment_pct,
    AVG(cs_ext_ship_cost) as avg_shipping_cost
FROM parquet_column_index_blog.catalog_sales
WHERE cs_item_sk IN (
        SELECT cs_item_sk
        FROM parquet_column_index_blog.catalog_sales
        WHERE cs_item_sk IS NOT NULL
        GROUP BY cs_item_sk
        ORDER BY SUM(cs_quantity) DESC
        LIMIT 10
    )
    AND cs_ship_date_sk IS NOT NULL
    AND cs_sold_date_sk IS NOT NULL
GROUP BY cs_item_sk
ORDER BY avg_ship_delay_days DESC;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Additionally, this query is a good candidate for demonstrating the effectiveness of using Parquet Column Indexes because it has a selective filter predicate on a single column &lt;code&gt;cs_item_sk&lt;/code&gt;. When Athena executes this query, it first identifies row groups whose min/max ranges overlap with the top 10 most ordered items. Without using Parquet Column Indexes, Athena has to scan every page of data within those matched row groups. However, when using Parquet Column Indexes, Athena can prune data further by skipping individual pages within those row groups whose min/max ranges do not overlap with the ids. Complete the following steps to establish baseline query performance when Athena does not use Parquet Column Indexes during the query.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;Python&lt;/strong&gt; cell in the notebook. Replace &lt;code&gt;s3://amzn-s3-demo-bucket/athena_parquet_column_index_blog/query_results/&lt;/code&gt; with the S3 URI where you want to store your Athena query results, then execute the following script. Note the runtime and bytes scanned that will be printed. The script will run the query five times with query result reuse disabled and chooses the minimum runtime and the corresponding bytes scanned among those iterations. See our numbers in the &lt;strong&gt;Run Athena query with Parquet Column Indexes &lt;/strong&gt;section. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;import boto3
import time

# Configuration
DATABASE = "parquet_column_index_blog"
OUTPUT_LOCATION = "s3://amzn-s3-demo-bucket/athena_parquet_column_index_blog/query_results/"

def run_athena_query(query: str, database: str, output_location: str):
    athena_client = boto3.client('athena')
    
    response = athena_client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={'Database': database},
        ResultConfiguration={'OutputLocation': output_location}
    )
    
    query_execution_id = response['QueryExecutionId']
    
    while True:
        result = athena_client.get_query_execution(QueryExecutionId=query_execution_id)
        state = result['QueryExecution']['Status']['State']
        
        if state in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
            break
        
        time.sleep(5)
    
    if state != 'SUCCEEDED':
        raise Exception(f"Query failed with state: {state}")
    
    stats = result['QueryExecution']['Statistics']
    
    return {
        'execution_time_sec': stats['EngineExecutionTimeInMillis'] / 1000,
        'data_scanned_gb': stats['DataScannedInBytes'] / (1024 ** 3)
    }


def benchmark_query(query: str, database: str, output_location: str, num_runs: int = 5):
    results = []
    
    for i in range(num_runs):
        stats = run_athena_query(query, database, output_location)
        results.append(stats)
    
    best_run = min(results, key=lambda r: r['execution_time_sec'])
    
    execution_time = round(best_run['execution_time_sec'], 1)
    data_scanned = round(best_run['data_scanned_gb'], 1)
    
    print(f"Execution time: {execution_time} sec")
    print(f"Data scanned: {data_scanned} GB")


QUERY = """
SELECT cs_item_sk,
    SUM(cs_quantity) as total_orders,
    AVG(cs_ship_date_sk - cs_sold_date_sk) as avg_ship_delay_days,
    MIN(cs_ship_date_sk - cs_sold_date_sk) as min_ship_delay,
    MAX(cs_ship_date_sk - cs_sold_date_sk) as max_ship_delay,
    SUM(
        CASE
            WHEN cs_ship_date_sk - cs_sold_date_sk &amp;gt; 7 THEN 1 ELSE 0
        END
    ) as late_shipments,
    SUM(
        CASE
            WHEN cs_ship_date_sk - cs_sold_date_sk &amp;gt; 7 THEN 1 ELSE 0
        END
    ) * 100.0 / COUNT(*) as late_shipment_pct,
    AVG(cs_ext_ship_cost) as avg_shipping_cost
FROM parquet_column_index_blog.catalog_sales
WHERE cs_item_sk IN (
        SELECT cs_item_sk
        FROM parquet_column_index_blog.catalog_sales
        WHERE cs_item_sk IS NOT NULL
        GROUP BY cs_item_sk
        ORDER BY SUM(cs_quantity) DESC
        LIMIT 10
    )
    AND cs_ship_date_sk IS NOT NULL
    AND cs_sold_date_sk IS NOT NULL
GROUP BY cs_item_sk
ORDER BY avg_ship_delay_days DESC;
"""

# Run benchmark
benchmark_query(QUERY, DATABASE, OUTPUT_LOCATION, num_runs=5)&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Sort the catalog_sales table&lt;/h3&gt; 
&lt;p&gt;Before rerunning the query with Athena using Parquet Column Indexes, you need to sort the &lt;code&gt;catalog_sales&lt;/code&gt; table by the &lt;code&gt;cs_item_sk&lt;/code&gt; column. In the preceding query, there is a dynamic filter as a subquery on the &lt;code&gt;cs_item_sk&lt;/code&gt; column:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;cs_item_sk IN (
        SELECT cs_item_sk
        FROM parquet_column_index_blog.catalog_sales
        WHERE cs_item_sk IS NOT NULL
        GROUP BY cs_item_sk
        ORDER BY SUM(cs_quantity) DESC
        LIMIT 10
    )&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;When executing this query, Athena pushes down the filter predicate to the data source level, fetching only rows that match the top 10 most ordered items. To maximize page pruning with Parquet Column Indexes, rows with the same &lt;code&gt;cs_item_sk&lt;/code&gt; values should be stored near each other in the Parquet file. Without sorting, matching values could be scattered across many pages, forcing Athena to read more data. Sorting the table by &lt;code&gt;cs_item_sk&lt;/code&gt; clusters similar values together, enabling Athena to read fewer pages.&lt;/p&gt; 
&lt;p&gt;Let’s examine the Parquet Column Indexes in one of the Parquet files to understand how the data in the &lt;code&gt;catalog_sales&lt;/code&gt; table is currently organized. First, download the Parquet file from the &lt;code&gt;cs_sold_date_sk = 2450815&lt;/code&gt; partition and install the &lt;a href="https://github.com/apache/parquet-java/tree/master/parquet-cli" target="_blank" rel="noopener noreferrer"&gt;open-source parquet-cli tool&lt;/a&gt; on your local machine. Replace &lt;code&gt;&amp;lt;local-path-to-parquet-file&amp;gt;&lt;/code&gt; with the path to the downloaded Parquet file, then run the following command on your local machine:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;parquet column-index &amp;lt;local-path-to-parquet-file&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This displays Parquet Column Indexes for all columns. For brevity, only the first 11 pages of the &lt;code&gt;cs_item_sk&lt;/code&gt; column from the first row group are shown in the following example:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;row-group 0:
column index for column cs_item_sk:
Boundary order: UNORDERED
         null_count  min  max
page-0            0    4  359989
page-1            0    2  359996
page-2            0   10  359995
page-3            0   13  359996
page-4            0   22  359989
page-5            0   25  359984
page-6            0   13  359989
page-7            0   56  359990
page-8            0   14  359984
page-9            0    7  359978
page-10           0    1  359998&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Notice that nearly every page contains a wide range of values. This overlap means Athena cannot eliminate pages when filtering with Parquet Column Indexes on &lt;code&gt;cs_item_sk&lt;/code&gt;. For example, searching for &lt;code&gt;cs_item_sk = 100&lt;/code&gt; requires scanning each of the 11 pages because the value 100 falls within every page’s min/max range. With this overlap, enabling Athena to use Parquet Column Indexes would provide no performance benefit. Sorting the data by &lt;code&gt;cs_item_sk&lt;/code&gt; eliminates this overlap, creating distinct, non-overlapping ranges for each page. To make Parquet Column Indexes more effective, sort the table by completing the following step:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;SQL &lt;/strong&gt;cell in the notebook and verify the connection type is &lt;strong&gt;Athena (Spark)&lt;/strong&gt;. Execute the query to sort the &lt;code&gt;cs_item_sk&lt;/code&gt; column values of the &lt;code&gt;catalog_sales&lt;/code&gt; table in ascending order and to put all the null values in the last few Parquet pages. New Iceberg data files will be generated from this query. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;CALL spark_catalog.system.rewrite_data_files(
table =&amp;gt; 'parquet_column_index_blog.catalog_sales', 
strategy =&amp;gt; 'sort', 
sort_order =&amp;gt; 'cs_item_sk ASC NULLS LAST', 
options =&amp;gt; map('target-file-size-bytes', '1073741824', 
'rewrite-all', 'true', 'max-concurrent-file-group-rewrites', '200'));&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Running the parquet column-index command on the sorted data file from the &lt;code&gt;cs_sold_date_sk = 2450815&lt;/code&gt; partition shows that the Parquet Column Indexes are now sorted and have non-overlapping ranges. The first 11 pages of the &lt;code&gt;cs_item_sk&lt;/code&gt; column from the first row group are shown in the following example:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;row-group 0:
column index for column cs_item_sk:
Boundary order: ASCENDING
         null_count  min    max
page-0           0      1   5282
page-1           0   5282  10556
page-2           0  10556  15842
page-3           0  15842  21154
page-4           0  21154  26434
page-5           0  26434  31669
page-6           0  31669  36916
page-7           0  36916  42205
page-8           0  42205  47528
page-9           0  47528  52808
page-10          0  52808  58189&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Now when searching for &lt;code&gt;cs_item_sk = 100&lt;/code&gt;, Athena only needs to read page-0, skipping the remaining 10 pages entirely.&lt;/p&gt; 
&lt;h3&gt;Run Athena query with Parquet Column Indexes&lt;/h3&gt; 
&lt;p&gt;Now that the data is sorted to eliminate overlapping pages within the row groups for the &lt;code&gt;cs_item_sk&lt;/code&gt; column, we run two experiments on the sorted data. The first measures the impact of sorting alone, and the second measures the combined effect of sorting with Parquet Column Indexes.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;Python&lt;/strong&gt; cell in the notebook. Execute the same script in the section &lt;strong&gt;Run Athena query without Parquet Column Indexes&lt;/strong&gt; and take note of the query runtime and bytes scanned results. This measures the performance of querying sorted data without using Parquet Column Indexes.&lt;/li&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;Python&lt;/strong&gt; cell in the notebook. Execute the following Python script to set the &lt;code&gt;use_iceberg_parquet_column_index&lt;/code&gt; table property to &lt;code&gt;true&lt;/code&gt; for the &lt;code&gt;catalog_sales&lt;/code&gt; table in the AWS Glue Data Catalog. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-python"&gt;import boto3

def add_iceberg_parquet_column_index(database_name: str, table_name: str):
    glue_client = boto3.client('glue')
    
    # Get current table definition
    response = glue_client.get_table(DatabaseName=database_name, Name=table_name)
    table = response['Table']
    
    # Build TableInput with only allowed fields
    table_input = {'Name': table['Name']}
    
    allowed_fields = [
        'Description', 'Owner', 'LastAccessTime', 'LastAnalyzedTime',
        'Retention', 'StorageDescriptor', 'PartitionKeys', 'ViewOriginalText',
        'ViewExpandedText', 'TableType', 'Parameters', 'TargetTable'
    ]
    
    for field in allowed_fields:
        if field in table:
            table_input[field] = table[field]
    
    # Add the property
    if 'Parameters' not in table_input:
        table_input['Parameters'] = {}
    table_input['Parameters']['use_iceberg_parquet_column_index'] = 'true'
    
    # Update the table
    glue_client.update_table(DatabaseName=database_name, TableInput=table_input)

# Usage
add_iceberg_parquet_column_index("parquet_column_index_blog", "catalog_sales")&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;Python&lt;/strong&gt; cell in the notebook. Execute the same script in the section &lt;strong&gt;Run Athena query without Parquet Column Indexes&lt;/strong&gt; and take note of the query runtime and bytes scanned results. This measures the performance of querying sorted data using Parquet Column Indexes.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h4&gt;Athena query time and bytes scanned improvement&lt;/h4&gt; 
&lt;p&gt;The following table summarizes the results from each experiment. The percentage improvements for the sorted experiments are measured against the unsorted baseline.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;&lt;strong&gt;Experiment&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;strong&gt;Runtime (sec)&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;&lt;strong&gt;Bytes Scanned (GB)&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Unsorted without Parquet Column Indexes&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;20.6&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;45.2&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Sorted without Parquet Column Indexes&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;15.4 (25.2% faster)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;27.8 (38.5% fewer bytes)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px"&gt;Sorted with Parquet Column Indexes&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;10.3 (50.0% faster)&lt;/td&gt; 
   &lt;td style="padding: 10px"&gt;13.0 (71.2% fewer bytes)&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;Recommendations&lt;/h2&gt; 
&lt;p&gt;To maximize Athena’s ability to use Parquet Column Indexes and achieve optimal query performance, we recommend the following.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Sort data by frequently filtered columns.&lt;/strong&gt; This allows Athena to efficiently read Parquet Column Indexes and skip irrelevant pages, potentially reducing scan time. When data is sorted by a filter column, similar values are clustered together within pages. Because Parquet Column Indexes store min/max values for each page, Athena can quickly determine which pages contain matching values and skip the rest.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Sort data by high-cardinality columns.&lt;/strong&gt; This creates distinct value ranges between pages, maximizing the opportunity for Athena to skip pages during query execution. High-cardinality (many distinct values) columns produce non-overlapping min/max ranges across pages, allowing Athena to more effectively filter out irrelevant pages. In contrast, low-cardinality columns such as boolean or status fields result in overlapping ranges across many pages, reducing the number of skipped pages.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;When you have finished the steps in this post, complete the following cleanup actions to avoid incurring ongoing charges:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create a new &lt;strong&gt;SQL&lt;/strong&gt; cell in the notebook and set the connection type to &lt;strong&gt;Athena (Spark)&lt;/strong&gt;. Execute the following command to drop the &lt;code&gt;parquet_column_index_blog&lt;/code&gt; database and the &lt;code&gt;catalog_sales&lt;/code&gt; table. 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-python"&gt;DROP DATABASE parquet_column_index_blog CASCADE;&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Delete the Iceberg table data and the Athena query results from your S3 bucket.&lt;/li&gt; 
 &lt;li&gt;Delete the SageMaker Unified Studio IAM-based domain if it is no longer needed.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we showed you how Athena uses Parquet Column Indexes to speed up queries and reduce the number of bytes scanned. By using Parquet Column Indexes, Athena can skip irrelevant data pages to improve query performance, especially for queries with selective filters on sorted data. Refer to &lt;a href="https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-data-optimization.html" target="_blank" rel="noopener noreferrer"&gt;Optimize Iceberg tables&lt;/a&gt; to learn more about this feature and try it out on your own queries.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the Author&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft size-full wp-image-90024" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/08/BDB-5789-image-1.png" alt="Portrait photograph of a young Asian male in his twenties wearing a black t-shirt against a neutral gray background" width="100" height="150"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Matt Wong&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/matthew-wong-0b14b1132/" target="_blank" rel="noopener"&gt;Matt&lt;/a&gt; is a Software Development Engineer on Amazon Athena. He has worked on several projects within the Amazon Athena Datalake and Storage team and is continuing to build out more Athena features. Outside of work, Matthew likes to spend time juggling, biking, and running with family and friends.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Implementing Kerberos authentication for Apache Spark jobs on Amazon EMR on EKS to access a Kerberos-enabled Hive Metastore</title>
		<link>https://aws.amazon.com/blogs/big-data/implementing-kerberos-authentication-for-apache-spark-jobs-on-amazon-emr-on-eks-to-access-a-kerberos-enabled-hive-metastore/</link>
					
		
		<dc:creator><![CDATA[Krishna Kumar Venkateswaran]]></dc:creator>
		<pubDate>Mon, 13 Apr 2026 15:51:31 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Amazon EMR on EKS]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">512796fc4108a2b735daa7a30e704b52edced2ce</guid>

					<description>In this post, we show how to configure Kerberos authentication for Spark jobs on Amazon EMR on EKS, authenticating against a Kerberos-enabled HMS so you can run both Amazon EMR on EC2 and Amazon EMR on EKS workloads against a single, secure HMS deployment.</description>
										<content:encoded>&lt;p&gt;Many organizations run their &lt;a href="https://spark.apache.org/docs/latest/" target="_blank" rel="noopener noreferrer"&gt;Apache Spark&lt;/a&gt; analytics platforms on &lt;a href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EMR on Amazon Elastic Compute Cloud (Amazon EC2)&lt;/a&gt;, using &lt;a href="https://web.mit.edu/kerberos/" target="_blank" rel="noopener noreferrer"&gt;Kerberos&lt;/a&gt; authentication to secure connectivity between Spark jobs and a centralized shared &lt;a href="https://hive.apache.org/" target="_blank" rel="noopener noreferrer"&gt;Apache Hive Metastore (HMS)&lt;/a&gt;. With &lt;a href="https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EMR on Amazon EKS&lt;/a&gt;, they gained a new option for running Spark jobs with the benefits of &lt;a href="https://kubernetes.io/docs/concepts/overview/" target="_blank" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;-based container orchestration, improved resource utilization, and faster job startup times. However, an HMS deployment supports only one authentication mechanism at a time. This means that they must configure Kerberos authentication for their Spark jobs on Amazon EMR on EKS to connect to the existing Kerberos-enabled HMS.&lt;/p&gt; 
&lt;p&gt;In this post, we show how to configure Kerberos authentication for Spark jobs on Amazon EMR on EKS, authenticating against a Kerberos-enabled HMS so you can run both Amazon EMR on EC2 and Amazon EMR on EKS workloads against a single, secure HMS deployment.&lt;/p&gt; 
&lt;h2&gt;Overview of solution&lt;/h2&gt; 
&lt;p&gt;Consider an enterprise data platform team that’s been running Spark jobs on Amazon EMR on EC2 for several years. Their architecture includes a Kerberos-enabled standalone HMS that serves as the centralized data catalog, with Microsoft Active Directory functioning as the Key Distribution Center (KDC). As the team evaluates Amazon EMR on EKS for new workloads, their existing HMS must continue serving Amazon EMR on EC2, with both authenticating through the same Kerberos infrastructure. To address this, the platform team must configure their Spark jobs running on Amazon EMR on EKS to authenticate with the same KDC. This is so they can obtain valid Kerberos tickets and establish authenticated connections to the HMS while maintaining a unified security posture across their data platform.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-1-architecture-1.png" target="_blank" rel="noopener"&gt;&lt;img loading="lazy" class="aligncenter size-full wp-image-89900" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-1-architecture-1.png" alt="Architecture diagram showing two VPCs connected via VPC peering: the Active Directory VPC contains Microsoft Active Directory serving as the Kerberos Key Distribution Center (KDC) with ports 88 (Kerberos) and 749 (Admin). The Amazon EKS VPC contains two namespaces — the emr namespace runs Apache Spark jobs (each with a driver pod and executor pods) configured with krb5.conf, jaas.conf, and keytab files using a spark/analytics-team@CORP.KERBEROS principal; the hive-metastore namespace runs Hive Metastore pods (with deployment, replica set, and HPA) configured with Kerberos artifacts and the hive/hive-metastore@CORP.KERBEROS principal. Spark driver pods connect to the Hive Metastore service, which is backed by Amazon Aurora PostgreSQL for metadata storage and Amazon S3 for data storage. AWS Secrets Manager stores Kerberos keytabs and database credentials retrieved during deployment. Users submit Spark jobs via AWS Systems Manager Session Manager." width="1024" height="889"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;Scope of Kerberos in this solution&lt;/h2&gt; 
&lt;p&gt;Kerberos authentication in this solution secures the connection between Spark jobs and the HMS. Other components in the architecture use AWS and Kubernetes security mechanisms instead.&lt;/p&gt; 
&lt;h2&gt;Solution architecture&lt;/h2&gt; 
&lt;p&gt;Our solution implements Kerberos authentication to secure the connection between Spark jobs and the HMS. The architecture spans two &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html" target="_blank" rel="noopener"&gt;Amazon Virtual Private Clouds (Amazon VPCs)&lt;/a&gt; connected using &lt;a href="https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html" target="_blank" rel="noopener"&gt;VPC peering&lt;/a&gt;, with distinct components handling identity management, compute, and metadata services.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Identity and Authentication layer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;A self-managed Microsoft Active Directory Domain Controller is deployed in a dedicated VPC and serves as the KDC for Kerberos authentication. The Active Directory server hosts service principals for both the HMS service and Spark jobs. This separate VPC deployment mirrors real-world enterprise architectures where Active Directory is typically managed by identity teams in their own network boundary, whether on-premises or in AWS.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Data Platform layer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The data platform components reside in a separate VPC and includes an EKS cluster that hosts both the HMS service and Amazon EMR on EKS based Spark jobs persisting data in an &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Simple Storage Service (Amazon S3)&lt;/a&gt; bucket.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Hive Metastore service&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The HMS is deployed in the EKS &lt;code&gt;hive-metastore&lt;/code&gt; &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/" target="_blank" rel="noopener"&gt;namespace&lt;/a&gt; and simulates a pre-existing, standalone Kerberos-enabled HMS, a common enterprise pattern where HMS is managed independently of any data processing platform. You can learn more about other enterprise design patterns in the post &lt;a href="https://aws.amazon.com/blogs/big-data/design-patterns-for-implementing-hive-metastore-for-amazon-emr-on-eks/" target="_blank" rel="noopener noreferrer"&gt;Design patterns for implementing Hive Metastore for Amazon EMR or EKS&lt;/a&gt;. The HMS service authenticates with the KDC using its service principal and &lt;code&gt;keytab&lt;/code&gt; mounted from a &lt;a href="https://kubernetes.io/docs/concepts/configuration/secret/" target="_blank" rel="noopener noreferrer"&gt;Kubernetes secret.&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Apache Spark Execution layer&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Apache Spark jobs are deployed using the &lt;a href="https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/spark-operator.html" target="_blank" rel="noopener noreferrer"&gt;Spark Operator&lt;/a&gt; on EKS. The Spark driver and executor pods are configured with Kerberos credentials through mounted &lt;a href="https://kubernetes.io/docs/concepts/configuration/configmap/" target="_blank" rel="noopener noreferrer"&gt;ConfigMaps&lt;/a&gt; containing &lt;code&gt;krb5.conf&lt;/code&gt; and &lt;code&gt;jaas.conf&lt;/code&gt;, along with &lt;code&gt;keytab&lt;/code&gt; files from Kubernetes secrets. When a Spark job must access Hive tables, the driver authenticates with the KDC and establishes a secure &lt;a href="https://datatracker.ietf.org/doc/html/rfc4422" target="_blank" rel="noopener noreferrer"&gt;Simple Authentication and Security Layer (SASL)&lt;/a&gt; connection to the HMS.&lt;/p&gt; 
&lt;h2&gt;Authentication flow&lt;/h2&gt; 
&lt;p&gt;The HMS runs as a long-running Kubernetes service that must be deployed and authenticated before Spark jobs can connect.&lt;/p&gt; 
&lt;p&gt;During HMS deployment:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;HMS pod validates its Kerberos configuration. &lt;code&gt;krb5.conf&lt;/code&gt; and &lt;code&gt;jaas.conf&lt;/code&gt; are mounted from &lt;code&gt;ConfigMaps&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;Service authenticates with KDC using its principal &lt;code&gt;hive/hive-metastore-svc.hive-metastore.svc.cluster.local@CORP.KERBEROS&lt;/code&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;keytab&lt;/code&gt; is mounted from Kubernetes secret for credential access&lt;/li&gt; 
 &lt;li&gt;Secure Thrift endpoint is established on &lt;code&gt;port 9083&lt;/code&gt; with SASL authentication enabled&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;When a Spark job must interact with the HMS:&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Spark job submission: 
  &lt;ol type="a"&gt; 
   &lt;li&gt;User submits Spark job through Spark Operator&lt;/li&gt; 
   &lt;li&gt;Driver and executor pods are created with Kerberos configuration mounted as volumes&lt;/li&gt; 
   &lt;li&gt;&lt;code&gt;krb5.conf&lt;/code&gt; ConfigMap provides KDC connection details including realm and server addresses&lt;/li&gt; 
   &lt;li&gt;&lt;code&gt;jaas.conf&lt;/code&gt; ConfigMap specifies a login module configuration with &lt;code&gt;keytab&lt;/code&gt; path and principal&lt;/li&gt; 
   &lt;li&gt;&lt;code&gt;Keytab&lt;/code&gt; secret contains encrypted credentials for Spark service principal &lt;code&gt;spark/analytics-team@CORP.KERBEROS&lt;/code&gt;&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;Authentication and connection: 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Spark driver authenticates with KDC using its principal and &lt;code&gt;keytab&lt;/code&gt; to obtain a Ticket Granting Ticket (TGT)&lt;/li&gt; 
   &lt;li&gt;When connecting to HMS, Spark requests a service ticket from the KDC for the HMS principal &lt;code&gt;hive/hive-metastore-svc.hive-metastore.svc.cluster.local@CORP.KERBEROS&lt;/code&gt;&lt;/li&gt; 
   &lt;li&gt;KDC issues a service ticket encrypted with HMS’s secret key&lt;/li&gt; 
   &lt;li&gt;Spark presents this service TGT to HMS over the Thrift connection on &lt;code&gt;port 9083&lt;/code&gt;&lt;/li&gt; 
   &lt;li&gt;HMS decrypts the ticket using its &lt;code&gt;keytab&lt;/code&gt;, verifies Spark’s identity, and establishes the authenticated SASL session&lt;/li&gt; 
   &lt;li&gt;Executor pods use the same configuration for authenticated operations&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;Data access: 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Authenticated Spark job queries HMS for table metadata&lt;/li&gt; 
   &lt;li&gt;HMS validates Kerberos tickets before serving metadata requests&lt;/li&gt; 
   &lt;li&gt;Spark accesses underlying data in Amazon S3 using IRSA&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-2-sequence-diagram.png" target="_blank" rel="noopener"&gt;&lt;img loading="lazy" class="aligncenter size-full wp-image-89822" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-2-sequence-diagram.png" alt="Sequence diagram illustrating the Kerberos authentication flow between a Spark job and the Hive Metastore. The flow proceeds in five phases: (1) Job Submission — a Data Engineer submits a SparkApplication via kubectl, and the Spark Operator creates a driver pod with krb5.conf, jaas.conf, and keytab mounted. (2) Kerberos Authentication — the Spark driver loads its keytab for the spark/analytics-team@CORP.KERBEROS principal and sends an AS-REQ to the Active Directory KDC, which validates the credentials and returns a TGT (Ticket Granting Ticket). (3) Service Ticket Request — the Spark driver sends a TGS-REQ to the KDC requesting a service ticket for the Hive Metastore principal, and the KDC returns a service ticket encrypted with the HMS key. (4) Authenticated Connection — the Spark driver connects to the Hive Metastore over Thrift (port 9083) using SASL with the service ticket; HMS decrypts the ticket using its own keytab, verifies the Spark identity, and establishes an authenticated session. (5) Data Operations — the Spark driver queries table metadata from HMS (backed by Aurora PostgreSQL) and reads/writes table data directly from Amazon S3 using IRSA credentials." width="1315" height="1305"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;h2&gt;Implementation workflow&lt;/h2&gt; 
&lt;p&gt;The implementation involves three key stakeholders working together to establish the Kerberos-enabled communication:&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Microsoft Active Directory Administrator&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The Active Directory Administrator creates service accounts that are used for HMS and Spark jobs. This involves setting up the service principal names using the &lt;code&gt;setspn&lt;/code&gt; utility and generating &lt;code&gt;keytab&lt;/code&gt; files using &lt;code&gt;ktpass&lt;/code&gt; for secure credential storage. The administrator configures the appropriate Active Directory permissions and Kerberos AES256 encryption type. Finally, the &lt;code&gt;keytab&lt;/code&gt; files are uploaded to AWS Secrets Manager for secure distribution to Kubernetes workloads.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Data Platform Team&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;The platform team handles the Amazon EMR on EKS and Kubernetes configurations. They retrieve keytabs from Secrets Manager and create Kubernetes secrets for the workloads. They configure &lt;a href="https://helm.sh/" target="_blank" rel="noopener noreferrer"&gt;Helm charts&lt;/a&gt; for HMS deployment with Kerberos settings and set up &lt;code&gt;ConfigMaps&lt;/code&gt; for &lt;code&gt;krb5.conf&lt;/code&gt;, &lt;code&gt;jaas.conf&lt;/code&gt;, and &lt;code&gt;core-site.xml&lt;/code&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Data Engineering Operations&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Data engineers submit jobs using the configured service account with Kerberos authentication. They monitor job execution and verify authenticated access to HMS.&lt;/p&gt; 
&lt;h2&gt;Deploy the solution&lt;/h2&gt; 
&lt;p&gt;In the remainder of this post, you will explore the implementation details for this solution. You can find the sample code in the &lt;a href="https://github.com/aws-samples/sample-emr-eks-spark-kerberos-hms" target="_blank" rel="noopener noreferrer"&gt;AWS Samples&lt;/a&gt;&amp;nbsp;GitHub repository. For additional details, including verification steps for each deployment stage, refer to the &lt;a href="https://github.com/aws-samples/sample-emr-eks-spark-kerberos-hms/blob/main/README.md" target="_blank" rel="noopener noreferrer"&gt;README&lt;/a&gt; in the repository.&lt;/p&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;Before you deploy this solution, make sure that the following prerequisites are in place:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Access to a valid AWS account and permission to create AWS resources.&lt;/li&gt; 
 &lt;li&gt;The&amp;nbsp;&lt;a href="http://aws.amazon.com/cli" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface (AWS CLI)&lt;/a&gt;&amp;nbsp;is&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" target="_blank" rel="noopener noreferrer"&gt;installed&lt;/a&gt;&amp;nbsp;on your local machine.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://github.com/git-guides/install-git" target="_blank" rel="noopener noreferrer"&gt;Git&lt;/a&gt;,&amp;nbsp;&lt;a href="https://docs.docker.com/engine/install/" target="_blank" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;,&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html" target="_blank" rel="noopener noreferrer"&gt;eksctl&lt;/a&gt;,&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html" target="_blank" rel="noopener noreferrer"&gt;kubectl&lt;/a&gt;,&amp;nbsp;&lt;a href="https://helm.sh/docs/intro/install/" target="_blank" rel="noopener noreferrer"&gt;Helm&lt;/a&gt;, &lt;a href="https://man7.org/linux/man-pages/man1/envsubst.1.html" target="_blank" rel="noopener noreferrer"&gt;envsubst&lt;/a&gt;, &lt;a href="https://jqlang.github.io/jq/" target="_blank" rel="noopener noreferrer"&gt;jq&lt;/a&gt;,&amp;nbsp;and &lt;a href="https://mikefarah.gitbook.io/yq" target="_blank" rel="noopener noreferrer"&gt;yq&lt;/a&gt; utilities are installed on your local machine.&lt;/li&gt; 
 &lt;li&gt;Familiarity with &lt;a href="https://web.mit.edu/kerberos/" target="_blank" rel="noopener noreferrer"&gt;Kerberos&lt;/a&gt;, &lt;a href="https://hive.apache.org/" target="_blank" rel="noopener noreferrer"&gt;Apache Hive Metastore (HMS)&lt;/a&gt;, &lt;a href="https://spark.apache.org/docs/latest/" target="_blank" rel="noopener noreferrer"&gt;Apache Spark&lt;/a&gt;, &lt;a href="https://kubernetes.io/docs/concepts/overview/" target="_blank" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;, &lt;a href="https://aws.amazon.com/eks" target="_blank" rel="noopener noreferrer"&gt;Amazon EKS&lt;/a&gt;, and &lt;a href="https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EMR on Amazon EKS&lt;/a&gt;.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Clone the repository and set up environment variables&lt;/h3&gt; 
&lt;p&gt;Clone the repository to your local machine and set the two environment variables. Replace &lt;span style="color: #ff0000"&gt;&amp;lt;AWS_REGION&amp;gt;&lt;/span&gt; with the AWS Region where you want to deploy these resources.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;# Clone the Git repository
git clone https://github.com/aws-samples/sample-emr-eks-spark-kerberos-hms.git
cd sample-emr-eks-spark-kerberos-hms

# Set environment variables
export REPO_DIR=$(pwd)
export AWS_REGION=&lt;span style="color: #ff0000"&gt;&amp;lt;AWS_REGION&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Setup Microsoft Active Directory infrastructure&lt;/h3&gt; 
&lt;p&gt;In this section, we deploy a self-managed Microsoft Active Directory with KDC on a Windows Server EC2 instance into a dedicated VPC. This is an intentionally minimal implementation highlighting only the key components required for this blog post.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/microsoft-ad
./setup.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Setup EKS infrastructure&lt;/h3&gt; 
&lt;p&gt;This section provisions the Amazon EMR on EKS infrastructure stack, including VPC, EKS cluster, &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html" target="_blank" rel="noopener"&gt;Amazon Aurora&lt;/a&gt; PostgreSQL database, &lt;a href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Elastic Container Registry (Amazon ECR)&lt;/a&gt;, Amazon S3, Amazon EMR on EKS virtual clusters and the Spark Operator. Run the following script.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/data-infra
./setup.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Set up VPC peering&lt;/h3&gt; 
&lt;p&gt;This section establishes network connectivity between the Active Directory VPC and EKS VPC for Kerberos authentication. Run the following script:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/vpc-peering
./setup.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Deploy Hive Metastore with Kerberos authentication&lt;/h3&gt; 
&lt;p&gt;This section deploys a Kerberos-enabled HMS service on the EKS cluster. Complete the following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create Kerberos Service Principal for HMS service&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/microsoft-ad/
# Create HMS service principal
./manage-ad-service-principals.sh create hive "hive/hive-metastore-svc.hive-metastore.svc.cluster.local"
# Verify the service principal was created
./manage-ad-service-principals.sh list&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Deploy HMS service with Kerberos authentication&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/hive-metastore
./deploy.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Set up Amazon EMR on Amazon EKS with Kerberos authentication&lt;/h3&gt; 
&lt;p&gt;This section configures Spark jobs to authenticate with Kerberos-enabled HMS. This involves creating service principles for Spark jobs and generating the necessary configuration files. Complete the following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Create Service Principal for Spark jobs&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/microsoft-ad/
# Create Spark service principal
./manage-ad-service-principals.sh create spark "spark/analytics-team"
# Verify the service principal was created
./manage-ad-service-principals.sh list&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Generate Kerberos configurations for Spark jobs&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/spark-jobs/
./generate-spark-configs.sh --principal "spark/analytics-team@CORP.KERBEROS" --namespace emr&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Submit Spark jobs&lt;/h3&gt; 
&lt;p&gt;This section verifies Kerberos authentication by running a Spark job that connects to the Kerberized HMS. Complete the following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Submit the test Spark job&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/spark-jobs
kubectl apply -f spark-job.yaml&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;Monitor job execution&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;# Watch the SparkApplication status
kubectl get sparkapplications -n emr -w
# Check pod status
kubectl get pods -n emr | grep "spark-kerberos"&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Verify Kerberos authentication and HMS connection&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;# Check Spark driver logs for successful authentication
kubectl logs spark-kerberos-job-driver -n emr&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;The logs should confirm successful authentication, along with a listing of sample databases and tables.&lt;/p&gt; 
&lt;h3&gt;Understanding Kerberos configuration&lt;/h3&gt; 
&lt;p&gt;The HMS requires specific configuration parameters to enable Kerberos authentication, applied through the previously mentioned steps. The key configurations are outlined in the following section.&lt;/p&gt; 
&lt;h4&gt;&lt;strong&gt;HMS configuration (metastore-site.xml)&lt;/strong&gt;&lt;/h4&gt; 
&lt;p&gt;The following configurations are added to &lt;code&gt;metastore-site.xml&lt;/code&gt; file.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Setting&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Value&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;hive.metastore.sasl.enabled&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;true&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Enable SASL authentication&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;hive.metastore.kerberos.principal&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;hive/hive-metastore-svc.hive-metastore.svc.cluster.local@CORP.KERBEROS&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;HMS service principal&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;hive.metastore.kerberos.keytab.file&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;/etc/security/keytab/hive.keytab&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Keytab path&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h4&gt;&lt;strong&gt;Hadoop security (core-site.xml)&lt;/strong&gt;&lt;/h4&gt; 
&lt;p&gt;The following configurations are added to the &lt;code&gt;core-site.xml&lt;/code&gt; file.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Setting&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Value&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;hadoop.security.authentication&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;kerberos&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;hadoop.security.authorization&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;true&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h4&gt;&lt;strong&gt;Spark configuration&lt;/strong&gt;&lt;/h4&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Setting&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Value&lt;/strong&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;spark.security.credentials.kerberos.enabled&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;true&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Enable Kerberos for Spark&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;spark.hadoop.hive.metastore.sasl.enabled&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;true&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;SASL for HMS connection&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;spark.kerberos.principal&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;spark/analytics-team@CORP.KERBEROS&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Spark service principal&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;spark.kerberos.keytab&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;local:///etc/security/keytab/analytics-team.keytab&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Keytab path&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h4&gt;&lt;strong&gt;Shared Kerberos files&lt;/strong&gt;&lt;/h4&gt; 
&lt;p&gt;Both HMS and Spark pods mount two common Kerberos configuration files: &lt;code&gt;krb5.conf&lt;/code&gt; and &lt;code&gt;jaas.conf&lt;/code&gt;, using &lt;code&gt;ConfigMaps&lt;/code&gt; and Kubernetes &lt;code&gt;secrets&lt;/code&gt;. The &lt;code&gt;krb5.conf&lt;/code&gt; file is identical across both services and defines how each component connects to the KDC. The &lt;code&gt;jaas.conf&lt;/code&gt; file follows the same structure but differs in the principal and &lt;code&gt;keytab&lt;/code&gt; path for each service.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;code&gt;krb5&lt;/code&gt; Configuration&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-ini"&gt;[libdefaults]
	default_realm = CORP.KERBEROS
	dns_lookup_realm = false
	dns_lookup_kdc = false
	ticket_lifetime = 24h
	forwardable = true
	udp_preference_limit = 1
	default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
	default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
	permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96

[realms]
	CORP.KERBEROS = {
		kdc = &amp;lt;ad-server-ip&amp;gt;
		admin_server = &amp;lt;ad-server-ip&amp;gt;
	}

[domain_realm]
	.corp.kerberos = CORP.KERBEROS
	corp.kerberos = CORP.KERBEROS&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;For more information, see the online documentation for &lt;a href="https://web.mit.edu/kerberos/krb5-devel/doc/admin/conf_files/krb5_conf.html" target="_blank" rel="noopener noreferrer"&gt;krb5.conf&lt;/a&gt;.&lt;/p&gt; 
&lt;ol start="2"&gt; 
 &lt;li&gt;JAAS configuration&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-java"&gt;Client {
 com.sun.security.auth.module.Krb5LoginModule required
 useKeyTab=true
 keyTab="/etc/security/keytab/hive.keytab"
 principal="hive/hive-metastore-svc.hive-metastore.svc.cluster.local@CORP.KERBEROS"
 useTicketCache=false
 storeKey=true
 debug=false;
};
&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Additional security considerations&lt;/h2&gt; 
&lt;p&gt;This post focuses on core Kerberos authentication mechanics between Spark and HMS. We recommend two additional security hardening steps based on your organization’s security posture and compliance requirements.&lt;/p&gt; 
&lt;h3&gt;Protecting Keytabs at Rest with AWS KMS Envelope Encryption&lt;/h3&gt; 
&lt;p&gt;Keytabs stored as Kubernetes Secrets are only base64-encoded by default, not encrypted at rest. We recommend enabling EKS &lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/kms-cryptography.html#enveloping" target="_blank" rel="noopener"&gt;envelope encryption&lt;/a&gt; using an &lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/overview.html" target="_blank" rel="noopener"&gt;AWS Key Management Service (AWS KMS)&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html" target="_blank" rel="noopener"&gt;customer managed key&lt;/a&gt;. With envelope encryption, secret data is encrypted with a &lt;a href="https://docs.aws.amazon.com/kms/latest/developerguide/data-keys.html" target="_blank" rel="noopener"&gt;Data Encryption Key (DEK)&lt;/a&gt;, which is encrypted by your customer managed key. This protects &lt;code&gt;keytab&lt;/code&gt; content even if the &lt;code&gt;etcd&lt;/code&gt; datastore is compromised. To enable this on an existing EKS cluster:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;aws eks associate-encryption-config \
  --cluster-name &amp;lt;your-cluster&amp;gt; \
  --encryption-config '[{"resources":["secrets"],"provider":{"keyArn":"arn:aws:kms:&amp;lt;region&amp;gt;:&amp;lt;account-id&amp;gt;:key/&amp;lt;key-id&amp;gt;"}}]'&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Refer to the &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/envelope-encryption.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EKS documentation on envelope encryption&lt;/a&gt; for full setup guidance.&lt;/p&gt; 
&lt;h3&gt;Encrypting the Thrift Data Channel with TLS&lt;/h3&gt; 
&lt;p&gt;SASL with Kerberos provides mutual authentication but doesn’t automatically encrypt data over the Thrift connection. Many deployments default to auth QoP, leaving the data channel unencrypted. We recommend either:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Set SASL QoP to auth-conf &lt;/strong&gt;— enables SASL-layer encryption using Kerberos session keys&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Layer TLS over Thrift (preferred)&lt;/strong&gt; — enables transport-level encryption using modern cipher suites&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Enabling TLS on HiveServer2 / Hive Metastore Thrift:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-xml"&gt;&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;hive.server2.use.SSL&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;true&amp;lt;/value&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;hive.server2.keystore.path&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;/etc/tls/keystore.jks&amp;lt;/value&amp;gt;
&amp;lt;/property&amp;gt;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Refer to the &lt;a href="https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-SSLEncryption" target="_blank" rel="noopener noreferrer"&gt;Hive SSL/TLS configuration documentation&lt;/a&gt; for full details.&lt;/p&gt; 
&lt;h2&gt;Cleaning up&lt;/h2&gt; 
&lt;p&gt;To avoid incurring future charges, clean up all provisioned resources during this setup by executing the following cleanup script.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;cd ${REPO_DIR}/
./cleanup.sh&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we demonstrated how to implement Kerberos authentication for Amazon EMR on EKS to securely connect to a Kerberos-enabled HMS. This solution addresses a common challenge faced by organizations with existing Kerberos-enabled HMS deployments who want to adopt Amazon EMR on EKS while maintaining their Kerberos-enabled security posture.&lt;/p&gt; 
&lt;p&gt;This pattern applies whether you’re migrating from on-premises Hadoop, running hybrid Amazon EMR on EC2 or Amazon EMR on EKS environments, or building a new cloud-native platform. Any scenario where Spark jobs on Kerberos must authenticate with a shared, Kerberos-enabled HMS.&lt;/p&gt; 
&lt;p&gt;You can use this post as a starting point to implement this pattern and extend it further to suit your organization’s data platform needs.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-3-krishhkk.jpeg" target="_blank" rel="noopener"&gt;&lt;img loading="lazy" class="aligncenter size-full wp-image-89823" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-3-krishhkk.jpeg" alt="Headshot of Krishna Kumar Venkateswaran" width="243" height="324"&gt;&lt;/a&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/krishnakumaroncloud9/" target="_blank" rel="noopener"&gt;Krishna Kumar Venkateswaran&lt;/a&gt; is a Cloud Infrastructure Architect at Amazon Web Services (AWS), passionate about building secure applications and data platforms. He has extensive experience in Kubernetes, DevOps, and enterprise architecture, helping customers containerize applications, streamline deployments, and optimize cloud-native environments.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-4-susunilc.jpeg" target="_blank" rel="noopener"&gt;&lt;img loading="lazy" class="aligncenter size-full wp-image-89824" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-4-susunilc.jpeg" alt="Headshot of Sunil Chakrapani Sundararaman" width="209" height="279"&gt;&lt;/a&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/sunilchakrapani/" target="_blank" rel="noopener"&gt;Sunil Chakrapani Sundararaman&lt;/a&gt; is a DevOps Architect at Amazon Web Services (AWS), where he helps enterprise customers architect and implement Data and Machine Learning platforms in the AWS Cloud. He brings extensive experience in Data Platform engineering, MLOps, DevOps, and Kubernetes implementations. Sunil specializes in guiding organizations through their cloud transformation journey, focusing on building scalable and efficient solutions that drive business value.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-5-avidesir.jpeg" target="_blank" rel="noopener"&gt;&lt;img loading="lazy" class="aligncenter size-full wp-image-89825" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-5-avidesir.jpeg" alt="Headshot of Avinash Desireddy" width="256" height="341"&gt;&lt;/a&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/avinashdesireddy/" target="_blank" rel="noopener"&gt;Avinash Desireddy&lt;/a&gt; is a Specialist Solutions Architect (Containers) at Amazon Web Services (AWS), passionate about building secure applications and data platforms. He has extensive experience in Kubernetes, DevOps, and enterprise architecture, helping customers and partners containerize applications, streamline deployments, and optimize cloud-native environments.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-6-suvojid-1.png" target="_blank" rel="noopener"&gt;&lt;img loading="lazy" class="aligncenter wp-image-89838" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/06/BDB-4905-6-suvojid-1.png" alt="Headshot of Suvojit Dasgupta" width="228" height="304"&gt;&lt;/a&gt;
  &lt;/div&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/suvojitdasgupta/" target="_blank" rel="noopener"&gt;Suvojit Dasgupta&lt;/a&gt; is an Engineering Leader at Amazon Web Services (AWS). He leads engineering teams, guiding them in designing and implementing scalable, high-performance data platforms for AWS customers. With expertise spanning distributed systems, real-time and batch data architectures, and cloud-native infrastructure, he drives technical strategy and engineering excellence across teams. He is passionate about raising the bar on engineering practices, and solving large-scale problems at the intersection of data and business impact.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Introducing Amazon MSK Express Broker power for Kiro</title>
		<link>https://aws.amazon.com/blogs/big-data/introducing-amazon-msk-express-broker-power-for-kiro/</link>
					
		
		<dc:creator><![CDATA[Stephan Schiller]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 14:40:07 +0000</pubDate>
				<category><![CDATA[Amazon Managed Streaming for Apache Kafka (Amazon MSK)]]></category>
		<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Kiro]]></category>
		<category><![CDATA[Amazon MSK]]></category>
		<guid isPermaLink="false">94c8686b67af0ba647e9af3304d8bb6fab83e3d1</guid>

					<description>In this post, we'll show you how to use Kiro powers, a new capability that equips Kiro with contextual knowledge and tooling. You can simplify your MSK cluster management, from initial setup to diagnosing common issues, all through natural language conversations.</description>
										<content:encoded>&lt;p&gt;Developers working with &lt;a href="https://aws.amazon.com/msk/" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Streaming for Apache Kafka&lt;/a&gt; (Amazon MSK) regularly need to make decisions that require deep operational context—choosing the right instance type, diagnosing consumer lag, or planning for a traffic spike. Answering these questions means piecing together documentation, metrics, and operational know-how.&lt;/p&gt; 
&lt;p&gt;What if your IDE could guide you through that workflow with built-in domain expertise and tooling? &lt;a href="https://kiro.dev/" target="_blank" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; is an AI-powered agentic IDE that lets you describe what you need in natural language. Whether it’s infrastructure configuration or operational troubleshooting, Kiro guides you through the solution.&lt;/p&gt; 
&lt;p&gt;In this post, we’ll show you how to use &lt;a href="https://kiro.dev/powers/" target="_blank" rel="noopener noreferrer"&gt;Kiro powers&lt;/a&gt;, a new capability that equips Kiro with contextual knowledge and tooling. You can simplify your MSK cluster management, from initial setup to diagnosing common issues, all through natural language conversations.&lt;/p&gt; 
&lt;h2&gt;Challenges operating your MSK Express broker cluster&lt;/h2&gt; 
&lt;p&gt;Amazon MSK &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-broker-types-express.html" target="_blank" rel="noopener noreferrer"&gt;Express Brokers&lt;/a&gt; are a fully managed offering where AWS handles much of the underlying infrastructure. However, platform teams still need to correctly size clusters based on throughput requirements. They also need to understand the right Amazon &lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;CloudWatch&lt;/a&gt; metrics during performance issues and investigate when CPU usage or replication lag is higher than expected. MSK best practices documentation spans multiple AWS guides. This makes it time-consuming to find relevant information during production incidents. New team members face a learning curve with MSK operations and can repeat common sizing and configuration mistakes.&lt;/p&gt; 
&lt;p&gt;Although Express Brokers simplify infrastructure management, you still face operational challenges that require deep Kafka expertise across three areas:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Cluster creation and sizing&lt;/strong&gt;: You must still select the right instance type, configure networking, and choose authentication methods. These decisions impact cost and performance from day one.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Observability and troubleshooting&lt;/strong&gt;: Effective operations require correlating broker, partition, and client metrics. Troubleshooting lag or replication issues still requires a solid understanding of Express Brokers’ architecture.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Capacity management: &lt;/strong&gt;You must monitor CPU usage, understand per-broker throughput limits, and scale before hitting throttling thresholds.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;These challenges mean that setting up an MSK cluster, analyzing slow-running clients, or investigating high-CPU load requires pulling together documentation, configuration details, CLI tooling, and operational know-how, which is often spread across multiple sources. Kiro powers address these challenges by bringing best practices, guided workflows, and tooling directly into your IDE, reducing the expertise barrier and the time spent context-switching between documentation, consoles, and the CLI.&lt;/p&gt; 
&lt;h2&gt;Kiro powers&lt;/h2&gt; 
&lt;p&gt;Kiro powers is a feature that combines best practices, specialized context, and tool integrations into a single capability. You can install powers with one click in the Kiro IDE or add them from a public GitHub URL. Each Power combines the following components:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Model Context Protocol (MCP) servers give your Kiro agent direct access to your infrastructure. The &lt;a href="https://awslabs.github.io/mcp/servers/aws-msk-mcp-server" target="_blank" rel="noopener noreferrer"&gt;AWS MSK MCP server&lt;/a&gt;, for example, exposes tools to create clusters, monitor health, and optimize configurations.&lt;/li&gt; 
 &lt;li&gt;Steering files provide persistent knowledge and workflow guides that Kiro loads based on the user’s task, such as monitoring best practices or troubleshooting workflows.&lt;/li&gt; 
 &lt;li&gt;Optional hooks run automated actions when IDE events occur, such as validating configurations before deployment.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The key advantage of Kiro powers is that they load context dynamically based on the user’s task. Instead of configuring every MCP server upfront and re-providing context in each conversation, powers activate the right tools and knowledge on demand. This keeps your agent’s context focused and relevant. In the next section, we look at how these components work together specifically for MSK Express Broker operations.&lt;/p&gt; 
&lt;h2&gt;The MSK Express broker power&lt;/h2&gt; 
&lt;p&gt;The MSK Express broker power packages the AWS MSK MCP server with targeted streaming operations guidance, giving your Kiro agent expertise for MSK Express Broker operations and cluster management. You can use it to build Kafka-based streaming applications through Kiro while maintaining &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices-express.html" target="_blank" rel="noopener noreferrer"&gt;Express broker best practices&lt;/a&gt; throughout the development lifecycle.&lt;/p&gt; 
&lt;p&gt;For cluster operations, you can create Express broker clusters, monitor health metrics, and manage configurations through natural language. You can retrieve cluster metadata, check broker endpoints, and verify replication status. The Power also supports operational monitoring. You can track CPU utilization, throughput limits, partition distribution, and AWS &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;Identity and Access Management&lt;/a&gt; (IAM) connection metrics.&lt;/p&gt; 
&lt;p&gt;To see how this works in practice, here’s what happens when you interact with the Power: When you ask Kiro to create an MSK cluster, the Power recommends appropriate instance sizes based on your throughput requirements. When you’re troubleshooting, it knows to check LeaderCount before diving into network metrics. When you’re troubleshooting authentication failures, it recommends client settings like reconnect.backoff.ms and group.instance.id to resolve connection churn and rebalancing issues against Express broker limits. Use cases include:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Cluster sizing and creation: Describe your throughput requirements (for example, “50 MBps ingress with 3x fan-out”) and the Power calculates the right instance type and broker count, then walks through cluster creation.&lt;/li&gt; 
 &lt;li&gt;Proactive health monitoring: Ask Kiro to review your cluster. It checks CPU against the 60% threshold, compares throughput to instance limits, and flags partition imbalances and throughput bottlenecks before they become incidents.&lt;/li&gt; 
 &lt;li&gt;Incident troubleshooting: Consumer lag spiking? The Power checks the relevant metrics, identifies the root cause (like skewed partition leadership), and guides you through resolution.&lt;/li&gt; 
 &lt;li&gt;Capacity planning: Preparing for a traffic spike? The Power analyzes current utilization against instance limits and recommends whether to scale up or add brokers.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The MSK Express broker power brings together documentation, metrics, and operational context so your Kiro agent can correlate findings and help identify root causes specific to your infrastructure.&lt;/p&gt; 
&lt;h2&gt;Getting started with the MSK Express broker power&lt;/h2&gt; 
&lt;p&gt;Starting with Kiro powers takes only a few clicks in the Kiro IDE. You can install from the built-in marketplace or import from a public GitHub URL. Kiro packages all components and makes them available to the Kiro agent.&lt;/p&gt; 
&lt;p&gt;To set up the MSK Express broker power, follow these steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Choose the &lt;strong&gt;Powers&lt;/strong&gt; icon in the Kiro sidebar&lt;/li&gt; 
 &lt;li&gt;In the &lt;strong&gt;AVAILABLE&lt;/strong&gt; panel, scroll down to &lt;strong&gt;Build and Operate MSK Express Broker&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Install&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;The power now appears in the &lt;strong&gt;INSTALLED&lt;/strong&gt; panel.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-89756 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/image-BDB-57591.png" alt="Screenshot of Kiro IDE Powers panel showing installed and available extensions including the MSK Express Broker power." width="1296" height="1340"&gt;&lt;/p&gt; 
&lt;p&gt;You can also visit the &lt;a href="https://kiro.dev/powers/" target="_blank" rel="noopener noreferrer"&gt;Kiro powers marketplace&lt;/a&gt; to explore other powers.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The MSK Express broker power streamlines Kafka operations by combining Model Context Protocol (MCP) servers with operational guidance. With natural language interactions, you can create clusters, monitor health, optimize configurations, and troubleshoot issues without reviewing extensive documentation.&lt;/p&gt; 
&lt;p&gt;Learn more about &lt;a href="https://kiro.dev/" target="_blank" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; and available &lt;a href="https://kiro.dev/powers/" target="_blank" rel="noopener noreferrer"&gt;Kiro powers&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;h3 class="lb-h4"&gt;Stephan Schiller&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="http://www.linkedin.com/in/stephanschilleraws" target="_blank" rel="noopener"&gt;&lt;img loading="lazy" class="size-full wp-image-89794 alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/03/stepesch-high-res-current-photo-copy_v2-copy.jpg" alt="" width="100" height="150"&gt;Stephan&lt;/a&gt; is a Solutions Architect at AWS, where he has worked since 2023. He brings deep experience from technical roles across multiple hyperscalers and specializes in data analytics and agentic AI systems. He designs and operates scalable data platforms and builds agentic workloads for enterprise environments—helping organizations move from prototypes to production-ready AI systems that are reliable, secure, and deeply integrated with enterprise data landscapes.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Introducing workload simulation workbench for Amazon MSK Express broker</title>
		<link>https://aws.amazon.com/blogs/big-data/introducing-workload-simulation-workbench-for-amazon-msk-express-broker/</link>
					
		
		<dc:creator><![CDATA[Manu Mishra]]></dc:creator>
		<pubDate>Tue, 07 Apr 2026 16:49:49 +0000</pubDate>
				<category><![CDATA[Amazon Managed Streaming for Apache Kafka (Amazon MSK)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Amazon Managed Streaming for Apache Kafka]]></category>
		<category><![CDATA[Amazon MSK]]></category>
		<category><![CDATA[ECS]]></category>
		<category><![CDATA[Kafka]]></category>
		<category><![CDATA[Performance]]></category>
		<guid isPermaLink="false">b3d24418f68f39f35e538a5023c7f6fd258fd3b4</guid>

					<description>In this post, we introduce the workload simulation workbench for Amazon Managed Streaming for Apache Kafka (Amazon MSK) Express Broker. The simulation workbench is a tool that you can use to safely validate your streaming configurations through realistic testing scenarios.</description>
										<content:encoded>&lt;p&gt;Validating Kafka configurations before production deployment can be challenging. In this post, we introduce the workload simulation workbench for &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-broker-types-express.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Streaming for Apache Kafka (Amazon MSK) Express Broker&lt;/a&gt;. The simulation workbench is a tool that you can use to safely validate your streaming configurations through realistic testing scenarios.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;Varying message sizes, partition strategies, throughput requirements, and scaling patterns make it challenging for you to predict how your Apache Kafka configurations will perform in production. The traditional approaches to test these variables create significant barriers: ad-hoc testing lacks consistency, manual set up of temporary clusters is time-consuming and error-prone, production-like environments require dedicated infrastructure teams, and team training often happens in isolation without realistic scenarios. You need a structured way to test and validate these configurations safely before deployment. The workload simulation workbench for MSK Express Broker addresses these challenges by providing a configurable, infrastructure as code (IaC) solution using &lt;a href="https://aws.amazon.com/cdk/" target="_blank" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (AWS CDK&lt;/a&gt;) deployments for realistic Apache Kafka testing. The workbench supports configurable workload scenarios, and real-time performance insights.&lt;/p&gt; 
&lt;p&gt;Express brokers for MSK Provisioned make managing Apache Kafka more streamlined, more cost-effective to run at scale, and more elastic with the low latency that you expect. Each broker node can provide up to 3x more throughput per broker, scale up to 20x faster, and recover 90% quicker compared to standard Apache Kafka brokers.&amp;nbsp;The &lt;a href="https://github.com/aws-samples/sample-simulation-workbench-for-msk-express-brokers"&gt;workload simulation workbench for Amazon MSK Express&lt;/a&gt; broker facilitates systematic experimentation with consistent, repeatable results. You can use the workbench for multiple use cases like production capacity planning, progressive training to prepare developers for Apache Kafka operations with increasing complexity, and architecture validation to prove streaming designs and compare different approaches before making production commitments.&lt;/p&gt; 
&lt;h2&gt;Architecture overview&lt;/h2&gt; 
&lt;p&gt;The workbench creates an isolated Apache Kafka testing environment in your AWS account. It deploys a private subnet where consumer and producer applications run as containers, connects to a private MSK Express broker and monitors for performance metrics and visibility. This architecture mirrors the production deployment pattern for experimentation. The following image describes this architecture using AWS services.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone wp-image-89469" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/ArchitectureDiagram.png" alt="MSK Workload SImulator WorkBench Architecture Diagram" width="1275" height="1040"&gt;&lt;/p&gt; 
&lt;p&gt;This architecture is deployed using the following AWS services:&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/ecs/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon Elastic Container Service (Amazon ECS&lt;/strong&gt;&lt;/a&gt;)&lt;strong&gt;&amp;nbsp;&lt;/strong&gt;generate configurable workloads with Java-based producers and consumers, simulating various real-world scenarios through different message sizes and throughput patterns.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Amazon MSK Express Cluster&lt;/strong&gt;&amp;nbsp;runs&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/supported-kafka-versions.html" target="_blank" rel="noopener noreferrer"&gt;Apache Kafka 3.9.0&lt;/a&gt;&amp;nbsp;on&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/broker-instance-sizes.html" target="_blank" rel="noopener noreferrer"&gt;Graviton-based instances&lt;/a&gt;&amp;nbsp;with hands-free storage management and&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-broker-types-express.html" target="_blank" rel="noopener noreferrer"&gt;enhanced performance characteristics&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Dynamic&amp;nbsp;&lt;/strong&gt;&lt;a href="https://aws.amazon.com/cloudwatch/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon CloudWatch&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;&amp;nbsp;Dashboards&lt;/strong&gt;&amp;nbsp;automatically adapt to your configuration, displaying real-time throughput, latency, and resource utilization across different test scenarios.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Secure&amp;nbsp;&lt;/strong&gt;&lt;a href="https://aws.amazon.com/vpc/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon Virtual Private Cloud (Amazon VPC&lt;/strong&gt;&lt;/a&gt;)&lt;strong&gt;&amp;nbsp;Infrastructure&lt;/strong&gt;&amp;nbsp;provides private subnets across three Availability Zones with&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html" target="_blank" rel="noopener noreferrer"&gt;VPC endpoints&lt;/a&gt;&amp;nbsp;for secure service communication.&lt;/p&gt; 
&lt;h2&gt;Configuration-driven testing&lt;/h2&gt; 
&lt;p&gt;The workbench provides different configuration options for your Apache Kafka testing environment, so you can customize instance types, broker count, topic distribution, message characteristics, and ingress rate. You can adjust the number of topics, partitions per topic, sender and receiver service instances, and message sizes to match your testing needs. These flexible configurations support two distinct testing approaches to validate different aspects of your Kafka deployment:&lt;/p&gt; 
&lt;h3&gt;Approach 1: Workload validation (single deployment)&lt;/h3&gt; 
&lt;p&gt;Test different workload patterns against the same MSK Express cluster configuration. This is useful for comparing partition strategies, message sizes, and load patterns.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Fixed MSK Express Cluster Configuration
export const mskBrokerConfig: MskBrokerConfig = {
numberOfBrokers: 1, // 1 broker per AZ = 3 total brokers
instanceType: 'express.m7g.large', // MSK Express instance type
};

// Multiple Concurrent Workload Tests
export const deploymentConfig: DeploymentConfig = { services: [
{ topics: 2, partitionsPerTopic: 6, instances: 3, messageSizeBytes: 1024 }, // High-throughput scenario
{ topics: 1, partitionsPerTopic: 3, instances: 1, messageSizeBytes: 512 }, // Latency-optimized scenario
{ topics: 3, partitionsPerTopic: 4, instances: 2, messageSizeBytes: 4096 }, // Multi-topic scenario
]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Approach 2: Infrastructure rightsizing (redeploy and compare)&lt;/h3&gt; 
&lt;p&gt;Test different MSK Express cluster configurations by redeploying the workbench with different broker settings while keeping the same workload. This is recommended for rightsizing experiments and understanding the impact of vertical compared to horizontal scaling.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Baseline: Deploy and test
export const mskBrokerConfig: MskBrokerConfig = { numberOfBrokers: 1, instanceType: 'express.m7g.large',};

// Vertical scaling: Redeploy with larger instances
export const mskBrokerConfig: MskBrokerConfig = { numberOfBrokers: 1,
instanceType: 'express.m7g.xlarge', // Larger instances
};

// Horizontal scaling: Redeploy with more brokers
export const mskBrokerConfig: MskBrokerConfig = {
numberOfBrokers: 2, // More brokers
instanceType: 'express.m7g.large',};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Each redeployment uses the same workload configuration, so you can isolate the impact of infrastructure changes on performance.&lt;/p&gt; 
&lt;h2&gt;Workload testing scenarios (single deployment)&lt;/h2&gt; 
&lt;p&gt;These scenarios test different workload patterns against the same MSK Express cluster:&lt;/p&gt; 
&lt;h3&gt;Partition strategy impact testing&lt;/h3&gt; 
&lt;p&gt;&lt;em&gt;Scenario: You are debating the usage of fewer topics with many partitions compared to many topics with fewer partitions for your microservices architecture. You want to understand how partition count affects throughput and consumer group coordination before making this architectural decision.&lt;/em&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;const deploymentConfig = { services: [
{ topics: 1, partitionsPerTopic: 1, instances: 2, messageSizeBytes: 1024 }, // Baseline: minimal partitions
{ topics: 1, partitionsPerTopic: 10, instances: 2, messageSizeBytes: 1024 }, // Medium partitions
{ topics: 1, partitionsPerTopic: 20, instances: 2, messageSizeBytes: 1024 }, // High partitions
]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Message size performance analysis&lt;/h3&gt; 
&lt;p&gt;&lt;em&gt;Scenario: Your application handles different types of events – small IoT sensor readings (256 bytes), medium user activity events (1 KB), and large document processing events (8KB). You must understand how message size impacts your overall system performance and if you should separate these into different topics or handle them together.&lt;/em&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;const deploymentConfig = { services: [
{ topics: 2, partitionsPerTopic: 6, instances: 3, messageSizeBytes: 256 }, // IoT sensor data
{ topics: 2, partitionsPerTopic: 6, instances: 3, messageSizeBytes: 1024 }, // User events
{ topics: 2, partitionsPerTopic: 6, instances: 3, messageSizeBytes: 8192 }, // Document events
]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Load testing and scaling validation&lt;/h3&gt; 
&lt;p&gt;&lt;em&gt;Scenario: You expect traffic to vary significantly throughout the day, with peak loads requiring 10× more processing capacity than off-peak hours. You want to validate how your Apache Kafka topics and partitions handle different load levels and understand the performance characteristics before production deployment.&lt;/em&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;const deploymentConfig = { services: [
{ topics: 2, partitionsPerTopic: 6, instances: 1, messageSizeBytes: 1024 }, // Off-peak load simulation
{ topics: 2, partitionsPerTopic: 6, instances: 5, messageSizeBytes: 1024 }, // Medium load simulation
{ topics: 2, partitionsPerTopic: 6, instances: 10, messageSizeBytes: 1024 }, // Peak load simulation
]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Infrastructure rightsizing experiments (redeploy and compare)&lt;/h2&gt; 
&lt;p&gt;These scenarios help you understand the impact of different MSK Express cluster configurations by redeploying the workbench with different broker settings:&lt;/p&gt; 
&lt;h3&gt;MSK broker rightsizing analysis&lt;/h3&gt; 
&lt;p&gt;&lt;em&gt;Scenario: You deploy a cluster with basic configuration and put load on it to establish baseline performance. Then you want to experiment with different broker configurations to see the effect of vertical scaling (larger instances) and horizontal scaling (more brokers) to find the right cost-performance balance for your production deployment.&lt;/em&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Step 1: Deploy with baseline configuration&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Initial deployment: Basic configuration
export const mskBrokerConfig: MskBrokerConfig = {
numberOfBrokers: 1, // 3 total brokers (1 per AZ)
instanceType: 'express.m7g.large',};export const deploymentConfig: DeploymentConfig = { services: [ { topics: 2, partitionsPerTopic: 6, instances: 3, messageSizeBytes: 1024 }, ]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Step 2: Redeploy with vertical scaling&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Redeploy: Test vertical scaling impact
export const mskBrokerConfig: MskBrokerConfig = {
numberOfBrokers: 1, // Same broker count
instanceType: 'express.m7g.xlarge', // Larger instances
};

// Keep same workload configuration to compare results&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;strong&gt;Step 3: Redeploy with horizontal scaling&lt;/strong&gt;&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Redeploy: Test horizontal scaling impact
export const mskBrokerConfig: MskBrokerConfig = {
numberOfBrokers: 2, // 6 total brokers (2 per AZ)
instanceType: 'express.m7g.large', // Back to original size
};

// Keep same workload configuration to compare results&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This rightsizing approach helps you understand how broker configuration changes affect the same workload, so you can improve both performance and cost for your specific requirements.&lt;/p&gt; 
&lt;h2&gt;Performance insights&lt;/h2&gt; 
&lt;p&gt;The workbench provides detailed insights into your Apache Kafka configurations through monitoring and analytics, creating a CloudWatch dashboard that adapts to your configuration. The dashboard starts with a configuration summary showing your MSK Express cluster details and workbench service configurations, helping you to understand what you’re testing. The following image shows the dashboard configuration summary:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89502" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/BDB-5479-Dashboard-Summary-1.png" alt="" width="1990" height="208"&gt;&lt;/p&gt; 
&lt;p&gt;The second section of dashboard shows real-time MSK Express cluster metrics including:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Broker performance&lt;/strong&gt;: CPU utilization and memory usage across brokers in your cluster&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Network activity&lt;/strong&gt;: Monitor bytes in/out and packet counts per broker to understand network utilization patterns&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Connection monitoring&lt;/strong&gt;: Displays active connections and connection patterns to help identify potential bottlenecks&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Resource utilization&lt;/strong&gt;: Broker-level resource tracking provides insights into overall cluster health&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The following image shows the MSK cluster monitoring dashboard:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89501" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/BDB-5479-MSKClusterMetric-1.png" alt="" width="1990" height="1521"&gt;&lt;/p&gt; 
&lt;p&gt;The third section of the dashboard shows the Intelligent Rebalancing and Cluster Capacity insights showing:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Intelligent rebalancing: in progress&lt;/strong&gt;: Shows whether a rebalancing operation is currently in progress or has occurred in the past. A value of 1 indicates that rebalancing is actively running, while 0 means that the cluster is in a steady state.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Cluster under-provisioned&lt;/strong&gt;: Indicates whether the cluster has insufficient broker capacity to perform partition rebalancing. A value of 1 means that the cluster is under-provisioned and Intelligent Rebalancing can’t redistribute partitions until more brokers are added or the instance type is upgraded.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Global partition count:&lt;/strong&gt; Displays the total number of unique partitions across all topics in the cluster, excluding replicas. Use this to track partition growth over time and validate your deployment configuration.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Leader count per broker:&lt;/strong&gt; Shows the number of leader partitions assigned to each broker. An uneven distribution indicates partition leadership skew, which can lead to hotspots where certain brokers handle disproportionate read/write traffic.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Partition count per broker:&lt;/strong&gt; Shows the total number of partition replicas hosted on each broker. This metric includes both leader and follower replicas and is key to identifying replica distribution imbalances across the cluster.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The following image shows the Intelligent Rebalancing and Cluster Capacity section of the dashboard:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89503" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/BDB-5479-IntelligentRebalancing-1.png" alt="" width="1990" height="1419"&gt;&lt;/p&gt; 
&lt;p&gt;The fourth section of the dashboard shows the application-level insights showing:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;System throughput&lt;/strong&gt;: Displays the total number of messages per second across services, giving you a complete view of system performance&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Service comparisons&lt;/strong&gt;: Performs side-by-side performance analysis of different configurations to understand which approaches fit&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Individual service performance&lt;/strong&gt;: Each configured service has dedicated throughput tracking widgets for detailed analysis&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Latency analysis&lt;/strong&gt;: The end-to-end message delivery times and latency comparisons across different service configurations&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Message size impact&lt;/strong&gt;: Performance analysis across different payload sizes helps you understand how message size affects overall system behavior&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The following image shows the application performance metrics section of the dashboard:&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89504" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/BDB-5479-ApplicationPerformanceMetric-2.png" alt="" width="1990" height="1576"&gt;&lt;/p&gt; 
&lt;h2&gt;Getting started&lt;/h2&gt; 
&lt;p&gt;This section walks you through setting up and deploying the workbench in your AWS environment. You will configure the necessary prerequisites, deploy the infrastructure using AWS CDK, and customize your first test.&lt;/p&gt; 
&lt;h3&gt;Prerequisites&lt;/h3&gt; 
&lt;p&gt;You can deploy the solution from the &lt;a href="https://github.com/aws-samples/sample-simulation-workbench-for-msk-express-brokers" target="_blank" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; Repo. You can clone it and run it on your AWS environment. To deploy the artifacts, you will require:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;AWS account&lt;/strong&gt;&amp;nbsp;with administrative credentials configured for creating AWS resources.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Command Line Interface (AWS CLI)&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;must be configured with appropriate permissions for AWS resource management.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://aws.amazon.com/cdk/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Cloud Development Kit (AWS CDK)&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;should be installed globally using&amp;nbsp;&lt;em&gt;npm install -g aws-cdk&amp;nbsp;&lt;/em&gt;for infrastructure deployment.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://nodejs.org/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Node.js&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;version 20.9 or higher is required, with version 22+ recommended.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.docker.com/engine/install/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Docker engine&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;must be installed and running locally as the CDK builds container images during deployment. &lt;strong&gt;Docker daemon&lt;/strong&gt;&amp;nbsp;should be running and accessible to CDK for building the workbench application containers.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Deployment&lt;/h3&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;# Clone the workbench repository
git clone https://github.com/aws-samples/sample-simulation-workbench-for-msk-express-brokers.git

# Install dependencies and build
npm install 
npm run build

# Bootstrap CDK (first time only per account/region)
cd cdk 
npx cdk bootstrap

# Synthesize CloudFormation template (optional verification step)
npx cdk synth

# Deploy to AWS (creates infrastructure and builds containers)
npx cdk deploy&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After deployment is completed, you will receive a CloudWatch dashboard URL to monitor the workbench performance in real-time.You can also deploy multiple isolated instances of the workbench in the same AWS account for different teams, environments, or testing scenarios. Each instance operates independently with its own MSK cluster, ECS services, and CloudWatch dashboards.To deploy additional instances, modify the Environment Configuration in&amp;nbsp;&lt;code&gt;cdk/lib/config.ts&lt;/code&gt;:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;// Instance 1: Development team
export const AppPrefix = 'mske';export const EnvPrefix = 'dev';

// Instance 2: Staging environment (separate deployment)
export const AppPrefix = 'mske';export const EnvPrefix = 'staging';

// Instance 3: Team-specific testing (separate deployment)
export const AppPrefix = 'team-alpha';export const EnvPrefix = 'test';&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Each combination of&amp;nbsp;AppPrefix&amp;nbsp;and&amp;nbsp;EnvPrefix&amp;nbsp;creates completely isolated AWS resources so that multiple teams or environments can use the workbench simultaneously without conflicts.&lt;/p&gt; 
&lt;h3&gt;Customizing your first test&lt;/h3&gt; 
&lt;p&gt;You can edit the configuration file located at folder&amp;nbsp;“cdk/lib/config-types.ts”&amp;nbsp;to define your testing scenarios and run the deployment. It is preconfigured with the following configuration:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;export const deploymentConfig: DeploymentConfig = { services: [
// Start with a simple baseline test
{ topics: 1, partitionsPerTopic: 3, instances: 1, messageSizeBytes: 1024 },

// Add a comparison scenario
{ topics: 1, partitionsPerTopic: 6, instances: 1, messageSizeBytes: 1024 }, ]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h2&gt;Best practices&lt;/h2&gt; 
&lt;p&gt;Following a structured approach to benchmarking ensures that your results are reliable and actionable. These best practices will help you isolate performance variables and build a clear understanding of how each configuration change affects your system’s behavior. Begin with single-service configurations to establish baseline performance:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-bash"&gt;const deploymentConfig = { services: [ { topics: 1, partitionsPerTopic: 3, instances: 1, messageSizeBytes: 1024 } ]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;After you understand the baseline, add comparison scenarios.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Change one variable at a time&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;For clear insights, modify only one parameter between services:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;const deploymentConfig = { services: [
{ topics: 1, partitionsPerTopic: 3, instances: 1, messageSizeBytes: 1024 }, // Baseline
{ topics: 1, partitionsPerTopic: 6, instances: 1, messageSizeBytes: 1024 }, // More partitions
{ topics: 1, partitionsPerTopic: 12, instances: 1, messageSizeBytes: 1024 }, // Even more partitions
]};&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;em&gt;This approach helps you understand the impact of specific configuration changes.&lt;/em&gt;&lt;/p&gt; 
&lt;h2&gt;Important considerations and limitations&lt;/h2&gt; 
&lt;p&gt;Before relying on workbench results for production decisions, it is important to understand the tool’s intended scope and boundaries. The following considerations will help you set appropriate expectations and make the most effective use of the workbench in your planning process.&lt;/p&gt; 
&lt;h3&gt;Performance testing disclaimer&lt;/h3&gt; 
&lt;p&gt;&lt;em&gt;The workbench is designed as an educational and sizing estimation tool to help teams prepare for MSK Express production deployments&lt;/em&gt;. While it provides valuable insights into performance characteristics:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Results can vary based on your specific use cases, network conditions, and configurations&lt;/li&gt; 
 &lt;li&gt;Use workbench results as guidance for initial sizing and planning&lt;/li&gt; 
 &lt;li&gt;Conduct comprehensive performance validation with your actual workloads in production-like environments before final deployment&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h3&gt;Recommended usage approach&lt;/h3&gt; 
&lt;p&gt;&lt;strong&gt;Production readiness training&lt;/strong&gt;&amp;nbsp;– Use the workbench to prepare teams for MSK Express capabilities and operations.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Architecture validation&lt;/strong&gt;&amp;nbsp;– Test streaming architectures and performance expectations using MSK Express enhanced performance characteristics.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Capacity planning&lt;/strong&gt;&amp;nbsp;– Use MSK Express streamlined sizing approach (throughput-based rather than storage-based) for initial estimates.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Team preparation&lt;/strong&gt;&amp;nbsp;– Build confidence and expertise with production Apache Kafka implementations using MSK Express.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we showed how the workload simulation workbench for Amazon MSK Express Broker supports learning and preparation for production deployments through configurable, hands-on testing and experiments. You can use the workbench to validate configurations, build expertise, and improve performance before production deployment. If you’re preparing for your first Apache Kafka deployment, training a team, or improving existing architectures, the workbench provides practical experience and insights needed for success. Refer to &lt;a href="https://docs.aws.amazon.com/msk/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon MSK documentation&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;– Complete MSK Express documentation, best practices, and sizing guidance for more information.&lt;/p&gt; 
&lt;hr&gt; 
&lt;h3&gt;About the authors&lt;/h3&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;p&gt;&lt;img loading="lazy" class="size-thumbnail wp-image-89617 alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/31/ProfilePicture-1-100x100.jpeg" alt="Manu Mishra" width="100" height="100"&gt;&lt;a href="https://manumishra.com/"&gt;&lt;strong&gt;Manu&amp;nbsp;Mishra&lt;/strong&gt;&lt;/a&gt; is a Senior Solutions Architect at AWS with over 18 years of experience in the software industry, specializing in artificial intelligence, data and analytics, and security. His expertise spans strategic oversight and hands-on technical leadership, where he reviews and guides the work of both internal and external customers. Manu collaborates with AWS customers to shape technical strategies that drive impactful business outcomes, providing alignment between technology and organizational goals.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;p&gt;&lt;img loading="lazy" class="size-thumbnail wp-image-89617 alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/Ramesh-100x94.png" alt="Manu Mishra" width="100" height="100"&gt;&lt;a href="https://www.linkedin.com/in/rameshchidirala/"&gt;&lt;strong&gt; Ramesh Chidirala&lt;/strong&gt;&lt;/a&gt; is a Senior Solutions Architect at Amazon Web Services with over two decades of technology leadership experience in architecture and digital transformation, helping customers align business strategy and technical execution. He specializes in designing innovative, AI-powered, cost-efficient serverless event-driven architectures and has extensive experience architecting secure, scalable, and resilient cloud solutions for enterprise customers.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Proactive monitoring for Amazon Redshift Serverless using AWS Lambda and Slack alerts</title>
		<link>https://aws.amazon.com/blogs/big-data/proactive-monitoring-for-amazon-redshift-serverless-using-aws-lambda-and-slack-alerts/</link>
					
		
		<dc:creator><![CDATA[Cristian Restrepo Lopez]]></dc:creator>
		<pubDate>Tue, 07 Apr 2026 16:27:14 +0000</pubDate>
				<category><![CDATA[Amazon Q]]></category>
		<category><![CDATA[Amazon Redshift]]></category>
		<category><![CDATA[AWS Big Data]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Monitoring and observability]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">65b18cb3fdbb9d590c8c6d0a2c7dd9922e901c5a</guid>

					<description>In this post, we show you how to build a&amp;nbsp;serverless, low-cost monitoring solution&amp;nbsp;for Amazon Redshift Serverless that proactively detects performance anomalies and sends actionable alerts directly to your selected Slack channels.</description>
										<content:encoded>&lt;p&gt;Performance issues in analytics environments often remain invisible until they disrupt dashboards, delay ETL jobs, or impact business decisions. For teams running &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-serverless.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift Serverless&lt;/a&gt;, unmonitored query queues, long-running queries, or unexpected spikes in compute capacity can degrade performance and increase costs if left undetected.&lt;/p&gt; 
&lt;p&gt;Amazon Redshift Serverless streamlines running analytics at scale by removing the need to provision or manage infrastructure. However, even in a serverless environment, maintaining visibility into performance and usage is essential for efficient operation and predictable costs. While Amazon Redshift Serverless provides advanced built-in dashboards for monitoring performance metrics, delivering notifications directly to platforms like Slack, brings another level of agility. Real-time alerts in the team’s workflow enable faster response times and more informed decision-making without requiring constant dashboard monitoring.&lt;/p&gt; 
&lt;p&gt;In this post, we show you how to build a&amp;nbsp;serverless, low-cost monitoring solution&amp;nbsp;for Amazon Redshift Serverless that proactively detects performance anomalies and sends actionable alerts directly to your selected Slack channels. This approach helps your analytics team identify and address issues early, often before your users notice a problem.&lt;/p&gt; 
&lt;h1&gt;Solution overview&lt;/h1&gt; 
&lt;p&gt;The solution presented in this post uses AWS services to collect key performance metrics from Amazon Redshift Serverless, evaluate them against thresholds that you can flexibly configure, and notify you when anomalies are detected.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Picture1-5.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89081" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Picture1-5.png" alt="scope of solution" width="864" height="412"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;The workflow operates as follows:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Scheduled execution&lt;/strong&gt;&amp;nbsp;– An &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt; rule triggers an AWS Lambda function on a configurable schedule (by default, every 15 minutes during business hours).&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Metric collection&lt;/strong&gt;&amp;nbsp;– The &lt;a href="https://docs.aws.amazon.com/lambda/" target="_blank" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; function gathers metrics including queued queries, running queries, compute capacity (RPUs), data storage usage, table count, database connections, and slow-running queries using &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html" target="_blank" rel="noopener noreferrer"&gt;Amazon CloudWatch&lt;/a&gt; and the Amazon Redshift Data API.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Threshold evaluation&lt;/strong&gt;&amp;nbsp;– Collected metrics are compared against your predefined thresholds that reflect acceptable performance and usage limits.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Alerting&lt;/strong&gt;&amp;nbsp;– When a threshold is exceeded, the Lambda function publishes a notification to an Amazon SNS topic.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Slack notification&lt;/strong&gt;&amp;nbsp;– &lt;a href="https://docs.aws.amazon.com/chatbot/latest/adminguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Q Developer in Chat applications&lt;/a&gt; (formerly AWS Chatbot) delivers the alert to your designated Slack channel.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Observability&lt;/strong&gt;&amp;nbsp;– Lambda execution logs are stored in Amazon CloudWatch Logs for troubleshooting and auditing.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This architecture is fully serverless and requires no changes to your existing Amazon Redshift Serverless workloads. To simplify deployment, we provide an &lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt; template that provisions all required resources.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;Before deploying this solution, you must collect information about your existing Amazon Redshift Serverless workgroup and namespace that you want to monitor. To identify your Amazon Redshift Serverless resources:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://us-east-1.console.aws.amazon.com/redshiftv2/home?region=us-east-1&amp;quot; \l &amp;quot;/landing:" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift console.&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;In the navigation pane, choose&amp;nbsp;&lt;strong&gt;Serverless dashboard&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Note down your&amp;nbsp;&lt;strong&gt;workgroup&lt;/strong&gt;&amp;nbsp;and&amp;nbsp;&lt;strong&gt;namespace&lt;/strong&gt;&amp;nbsp;names. You will use these values when launching this blog’s AWS CloudFormation template.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Deploy the solution&lt;/h2&gt; 
&lt;p&gt;You can launch the CloudFormation stack and deploy the solution via the provided link.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/aws-samples/sample-proactive-monitoring-for-amazon-redshift-serverless-using-aws-lambda-and-slack-alerts" target="_blank" rel="noopener noreferrer"&gt;GitHub Repo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;When launching the CloudFormation stack, complete the following steps in the AWS CloudFormation Console:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;For&amp;nbsp;&lt;strong&gt;Stack name&lt;/strong&gt;, enter a descriptive name such as&amp;nbsp;redshift-serverless-monitoring.&lt;/li&gt; 
 &lt;li&gt;Review and modify the parameters as needed for your environment.&lt;/li&gt; 
 &lt;li&gt;Acknowledge that AWS CloudFormation may create IAM resources with custom names.&lt;/li&gt; 
 &lt;li&gt;Choose&amp;nbsp;&lt;strong&gt;Submit&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;CloudFormation parameters&lt;/h2&gt; 
&lt;h3&gt;Amazon Redshift Serverless Workgroup configuration&lt;/h3&gt; 
&lt;p&gt;Provide details for your existing Amazon Redshift Serverless environment. These values connect the monitoring solution to your Redshift environment. Some parameters come with the default values that you can replace with your actual configuration.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Default value&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon Redshift Workgroup Name&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Your Amazon Redshift Serverless workgroup name.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon Redshift Namespace Name&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Your Amazon Redshift Serverless namespace name.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Amazon Redshift Workgroup ID&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Workgroup ID (UUID) of the Amazon Redshift Serverless workgroup to monitor. Must follow the UUID format: &lt;code&gt;xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&lt;/code&gt; (lowercase hexadecimal with hyphens).&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Namespace ID (UUID) of the Amazon Redshift Serverless namespace. Must follow the UUID format: &lt;code&gt;xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&lt;/code&gt; (lowercase hexadecimal with hyphens).&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Database Name&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;code&gt;dev&lt;/code&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Target Amazon Redshift database for SQL-based diagnostic and monitoring queries.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Monitoring schedule&lt;/h3&gt; 
&lt;p&gt;The default schedule runs diagnostic SQL queries every 15 minutes during business hours, balancing responsiveness and cost efficiency. Running more frequently might increase costs, while less frequent monitoring could delay detection of performance issues. You can adjust this schedule to your actual need.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Default value&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Schedule Expression&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;cron(0/15 8-17 ? * MON-FRI *)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;EventBridge schedule expression for Lambda function execution. Default runs every 15 minutes, Monday through Friday, 8 AM to 5 PM UTC.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Threshold configuration&lt;/h3&gt; 
&lt;p&gt;Thresholds should be tuned based on your workload characteristics.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Default value&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Queries Queued Threshold&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;20&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Alerts threshold for queued queries.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Queries Running Threshold&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;20&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Alerts threshold for running queries.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Compute Capacity Threshold (RPUs)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;64&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Alert threshold for compute capacity (RPUs).&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Data Storage Threshold (MB)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;5242880&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Threshold for data storage in MB (default 5 TB).&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Table Count Threshold (MB)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;1000&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Alerts threshold for total table count.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Database Connections Threshold&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;50&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Alert threshold for database connections.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Slow Query Threshold (seconds)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;10&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Thresholds in seconds for slow query detection.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Query Timeout (Seconds)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;30&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Timeout for SQL diagnostics queries.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt;&amp;nbsp;Start with conservative thresholds and refine them after observing baseline behavior for one to two weeks.&lt;/p&gt; 
&lt;h3&gt;Lambda configuration&lt;/h3&gt; 
&lt;p&gt;Configure the AWS Lambda function settings. The selected default values are appropriate for most monitoring scenarios. You may want to change them only in case of troubleshooting.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Default value&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lambda Memory Size (MB)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;256&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lambda function memory size in MB.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lambda Time Out (Seconds)&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;240&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lambda function timeout in seconds.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h3&gt;Security Configuration – Amazon Virtual Private Cloud (VPC)&lt;/h3&gt; 
&lt;p&gt;If your organization has network isolation requirements, you can optionally enable VPC deployment for the Lambda function. When enabled, the Lambda function runs within your specified VPC subnets, providing network isolation and allowing access to VPC-only resources.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Default value&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;VPC ID&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;VPC ID for Lambda deployment (required if &lt;code&gt;EnableVPC&lt;/code&gt; is true). The Lambda function will be deployed in this VPC. Ensure that the VPC has appropriate routing (NAT Gateway or VPC Endpoints) to allow Lambda to access AWS services like CloudWatch, Amazon Redshift, and Amazon SNS.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;VPC Subnet IDs&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Comma-separated list of subnet IDs for Lambda deployment (required if &lt;code&gt;EnableVPC&lt;/code&gt; is true).&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Security Group IDs&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Comma-separated list of security group IDs for Lambda (optional). If not provided and &lt;code&gt;EnableVPC&lt;/code&gt; is true, a default security group will be created with outbound HTTPS access. Custom security groups must allow outbound HTTPS (port 443) to AWS service endpoints.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;Note that VPC deployment might increase cold start times and requires an NAT Gateway or VPC endpoints for AWS service access. We recommend provisioning interface VPC endpoints (through AWS PrivateLink) for the five services the Lambda function calls which keeps all traffic private without the recurring cost of a NAT Gateway.&lt;/p&gt; 
&lt;h3&gt;Security configuration – Encryption&lt;/h3&gt; 
&lt;p&gt;If your organization requires encryption of data at rest, you can optionally enable AWS Key Management Service (AWS KMS) encryption for the Lambda function’s environment variables, CloudWatch Logs, and SNS topic. When enabled, the template encrypts each resource using the AWS KMS keys that you provide, either a single shared key for all three services, or individual keys for granular key management and audit separation.&lt;/p&gt; 
&lt;table class="styled-table" border="1px" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Parameter&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Default value&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Description&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Shared KMS Key ARN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;AWS KMS key ARN to use for all encryption (Lambda, Logs, and SNS) unless service-specific keys are provided. This streamlines key management by using a single key for all services. The key policy must grant encrypt/decrypt permissions to Lambda, CloudWatch Logs, and SNS.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Lambda KMS Key ARN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;AWS KMS key ARN for Lambda environment variable encryption (optional, overrides &lt;code&gt;SharedKMSKeyArn&lt;/code&gt;). Use this for separate key management per service. The key policy must grant decrypt permissions to the Lambda execution role. If not provided, &lt;code&gt;SharedKMSKeyArn&lt;/code&gt; will be used when &lt;code&gt;EnableKMSEncryption&lt;/code&gt; is true.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;CloudWatch Logs KMS Key ARN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;AWS KMS key ARN for CloudWatch Logs encryption (optional, overrides &lt;code&gt;SharedKMSKeyArn&lt;/code&gt;). Use this for separate key management per service. The key policy must grant encrypt/decrypt permissions to the CloudWatch Logs service. If not provided, &lt;code&gt;SharedKMSKeyArn&lt;/code&gt; will be used when &lt;code&gt;EnableKMSEncryption&lt;/code&gt; is true.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;SNS Topic KMS Key ARN&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;AWS KMS key ARN for SNS topic encryption (optional, overrides &lt;code&gt;SharedKMSKeyArn&lt;/code&gt;). Use this for separate key management per service. The key policy must grant encrypt/decrypt permissions to SNS service and the Lambda execution role. If not provided, &lt;code&gt;SharedKMSKeyArn&lt;/code&gt; will be used when &lt;code&gt;EnableKMSEncryption&lt;/code&gt; is true.&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Enable Dead Letter Queue&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;False&lt;/td&gt; 
   &lt;td style="padding: 10px;border: 1px solid #dddddd"&gt;Optionally enable Dead Letter Queue (DLQ) for failed Lambda invocations to improve reliability and security monitoring. When enabled, events that fail after all retry attempts will be sent to an SQS queue for investigation and potential replay. This helps prevent data loss, provides visibility into failures, and enables security audit trails for monitoring anomalies. The DLQ retains messages for 14 days.&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;p&gt;Note that AWS KMS encryption requires the key policy to grant appropriate permissions to each consuming service (Lambda, CloudWatch Logs, and SNS).&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;On the review page, select &lt;strong&gt;I acknowledge that AWS CloudFormation might create IAM resources with custom names&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Submit&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Resources created&lt;/h2&gt; 
&lt;p&gt;The CloudFormation stack creates the following resources:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;EventBridge rule for scheduled execution&lt;/li&gt; 
 &lt;li&gt;AWS Lambda function (Python 3.12 runtime)&lt;/li&gt; 
 &lt;li&gt;Amazon SNS topic for alerts&lt;/li&gt; 
 &lt;li&gt;IAM role with permissions for CloudWatch, Amazon Redshift Data API, and SNS&lt;/li&gt; 
 &lt;li&gt;CloudWatch Log Group for Lambda logs&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; CloudFormation deployment typically takes 10–15 minutes to complete. You can monitor progress in real time under the &lt;strong&gt;Events&lt;/strong&gt; tab of your CloudFormation stack.&lt;/p&gt; 
&lt;h2&gt;Post-deployment configuration&lt;/h2&gt; 
&lt;p&gt;After the CloudFormation stack has been successfully created, complete the following steps.&lt;/p&gt; 
&lt;h3&gt;Step 1: Record CloudFormation outputs&lt;/h3&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1&amp;quot; \l &amp;quot;/getting-started" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Select your stack and choose the &lt;strong&gt;Outputs&lt;/strong&gt; tab.&lt;/li&gt; 
 &lt;li&gt;Note the values for &lt;strong&gt;LambdaRoleArn&lt;/strong&gt; and &lt;strong&gt;SNSTopicArn&lt;/strong&gt;. You will need these in subsequent steps.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Step 2: Grant Amazon Redshift permissions&lt;/h3&gt; 
&lt;p&gt;Grant permissions to the Lambda function to query Amazon Redshift system tables for monitoring data. Complete the following steps to grant the necessary access:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="https://us-east-1.console.aws.amazon.com/redshiftv2/home?region=us-east-1&amp;quot; \l &amp;quot;/landing:" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Query Editor V2&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Connect to your Amazon Redshift Serverless workgroup.&lt;/li&gt; 
 &lt;li&gt;Execute the following SQL commands, replacing &amp;lt;IAM Role ARN&amp;gt; with the &lt;strong&gt;LambdaRoleArn&lt;/strong&gt; value from your CloudFormation outputs:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE USER "IAMR:&amp;lt;IAM Lambda Role&amp;gt;" WITH PASSWORD DISABLE;

GRANT ROLE "sys:monitor" TO "IAMR:&amp;lt;IAM Role&amp;gt;";&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Picture2-7.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89088" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Picture2-7.png" alt="RedshiftSQL-DBD-5612" width="1276" height="468"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;These commands create an &lt;code&gt;AmazonRedshift&lt;/code&gt; user associated with the Lambda IAM role and grant it the &lt;code&gt;sys:monitor&lt;/code&gt; Amazon Redshift role. This role provides read-only access to catalog and system tables without granting permissions to user data tables.&lt;/p&gt; 
&lt;h3&gt;Step 3: Configure Slack notifications&lt;/h3&gt; 
&lt;p&gt;Amazon Q Developer in chat applications provides native AWS integration and managed authentication, removing custom webhook code and reducing setup complexity. To receive alerts in Slack, configure Amazon Q Developer in Chat Applications to connect your SNS topic to your preferred Slack channel:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to &lt;a href="https://us-east-2.console.aws.amazon.com/chatbot/home?region=us-east-1&amp;quot; \l &amp;quot;/home" target="_blank" rel="noopener noreferrer"&gt;Amazon Q Developer in chat applications&lt;/a&gt; (formerly AWS Chatbot) in the AWS console.&lt;/li&gt; 
 &lt;li&gt;Follow the instructions in the &lt;a href="https://docs.aws.amazon.com/chatbot/latest/adminguide/slack-setup.html" target="_blank" rel="noopener noreferrer"&gt;Slack integration documentation&lt;/a&gt; to authorize AWS access to your Slack workspace.&lt;/li&gt; 
 &lt;li&gt;When configuring the Slack channel, ensure that you select the correct AWS Region where you deployed the CloudFormation stack.&lt;/li&gt; 
 &lt;li&gt;In the &lt;strong&gt;Notifications&lt;/strong&gt; section, select the SNS topic created by your CloudFormation stack (refer to the &lt;strong&gt;SNSTopicArn&lt;/strong&gt; output value).&lt;/li&gt; 
 &lt;li&gt;Keep the default IAM read-only permissions for the channel configuration.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Picture3-4.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89083" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Picture3-4.png" alt="SNS topic " width="864" height="254"&gt;&lt;/a&gt;&lt;/p&gt; 
&lt;p&gt;After configured, alerts automatically appear in Slack whenever thresholds are exceeded.&lt;/p&gt; 
&lt;h2&gt;&lt;a href="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Untitled-design.png"&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89086" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/Untitled-design.png" alt="result-upon-success" width="864" height="682"&gt;&lt;/a&gt;&lt;/h2&gt; 
&lt;h2&gt;&lt;strong&gt;Cost considerations&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;With the default configuration, this solution incurs minimal ongoing costs. The Lambda function executes approximately 693 times per month (every 15 minutes during an 8-hour business day, Monday through Friday), resulting in a monthly cost of approximately $0.33 USD. This includes Lambda compute costs ($0.26) and CloudWatch &lt;code&gt;GetMetricData&lt;/code&gt; API calls ($0.07). All other services (EventBridge, SNS, CloudWatch Logs, and Amazon Redshift Data API). The Amazon Redshift Data API has no additional charges beyond the minimal Amazon Redshift Serverless RPU consumption for the Amazon Redshift Serverless system table query execution. You can reduce costs by decreasing the monitoring frequency (such as, every 30 minutes) or increase responsiveness by running more frequently (such as, every 5 minutes) with a proportional cost increase.&lt;/p&gt; 
&lt;p&gt;All costs are estimates and may vary based on your environment. Variations often occur because queries scanning system tables may take longer or require additional resources depending on the system complexity&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Security best practices&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;This solution implements the following security controls:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;IAM policies scoped to specific resource ARNs for the Amazon Redshift workgroup, namespace, SNS topic, and log group.&lt;/li&gt; 
 &lt;li&gt;Data API statement access restricted to the Lambda function’s own IAM user ID.&lt;/li&gt; 
 &lt;li&gt;Read-only &lt;code&gt;sys:monitor&lt;/code&gt; database role for operational metadata access. Limit to the role created by the CloudFormation template.&lt;/li&gt; 
 &lt;li&gt;Reserved concurrent executions capped at five.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;To further strengthen your security posture, consider the following enhancements:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Enable &lt;code&gt;EnableKMSEncryption&lt;/code&gt; to encrypt environment variables, logs, and SNS messages at rest.&lt;/li&gt; 
 &lt;li&gt;Enable &lt;code&gt;EnableVPC&lt;/code&gt; to deploy the function within a VPC for network isolation.&lt;/li&gt; 
 &lt;li&gt;Audit access through AWS CloudTrail.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Important: This is sample code for non-production usage. Work with your security and legal teams to meet your organizational security, regulatory, and compliance requirements before deployment. This solution demonstrates monitoring capabilities but requires additional security hardening for production environments, including encryption configuration, IAM policy scoping, VPC deployment, and comprehensive testing.&lt;/p&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;To remove all resources and avoid ongoing charges if you don’t want to use the solution anymore:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html" target="_blank" rel="noopener noreferrer"&gt;Delete the CloudFormation stack&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Remove the Slack integration from Amazon Q Developer in chat applications.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Troubleshooting&lt;/h2&gt; 
&lt;ul&gt; 
 &lt;li&gt;If no metrics or incomplete SQL diagnostics are returned, verify that the Amazon Redshift Serverless workgroup is active with recent query activity, and ensure the database user has the &lt;code&gt;sys:monitor&lt;/code&gt; role (&lt;code&gt;GRANT ROLE sys:monitor TO &amp;lt;user&amp;gt;&lt;/code&gt;) in the query editor. Without this role, queries execute successfully but only return data visible to that user’s permissions rather than the full cluster activity.&lt;/li&gt; 
 &lt;li&gt;For VPC-deployed functions that fail to reach AWS services, confirm that VPC endpoints or a NAT Gateway are configured for CloudWatch, Amazon Redshift Data API, Amazon Redshift Serverless, SNS, and CloudWatch Logs.&lt;/li&gt; 
 &lt;li&gt;If the Lambda function times out, increase the &lt;code&gt;LambdaTimeout&lt;/code&gt; and &lt;code&gt;QueryTimeoutSeconds&lt;/code&gt; parameters. The default timeout of 240 seconds accommodates most workloads, but clusters with many active queries may require additional time for SQL diagnostics to complete.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post, we showed how you can build a proactive monitoring solution for Amazon Redshift Serverless using AWS Lambda, Amazon CloudWatch, and Amazon SNS with Slack integration. By automatically collecting metrics, evaluating thresholds, and delivering alerts in near real time to Slack or your preferred collaborative platform, this solution helps detect performance and cost issues early. Because the solution itself is serverless, it aligns with the operational simplicity goals of Amazon Redshift Serverless—scaling automatically, requiring minimal maintenance, and delivering high value at low cost. You can extend this foundation with additional metrics, diagnostic logic, or alternative notification channels to meet your organization’s needs.&lt;/p&gt; 
&lt;p&gt;To learn more, see the Amazon Redshift documentation on monitoring and performance optimization.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-89079 size-thumbnail" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/headshot1-100x133.png" alt="Headhost author 1" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Cristian Restrepo Lopez&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/cristianrestrepolopez/"&gt;Cristian&lt;/a&gt; is a Solutions Architect at AWS, helping customers build modern data applications with a focus on analytics. Outside of work, he enjoys exploring emerging technologies and connecting with the data community.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-89089" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/18/headshot2-1.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Satesh Sonti&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/satish-kumar-sonti/"&gt;Satesh&lt;/a&gt; is a Principal Analytics Specialist Solutions Architect based out of Atlanta, specializing in building enterprise data platforms, data warehousing, and analytics solutions. He has over 19 years of experience in building data assets and leading complex data platform programs for banking and insurance clients across the globe.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Modernize business intelligence workloads using Amazon Quick</title>
		<link>https://aws.amazon.com/blogs/big-data/modernize-business-intelligence-workloads-using-amazon-quick/</link>
					
		
		<dc:creator><![CDATA[Satesh Sonti]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 17:56:35 +0000</pubDate>
				<category><![CDATA[Amazon Athena]]></category>
		<category><![CDATA[Amazon Quick Suite]]></category>
		<category><![CDATA[Amazon Redshift]]></category>
		<category><![CDATA[Generative BI]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">70927b97fee08a1a07e5c01da8e18b507bcefe19</guid>

					<description>In this post, we provide implementation guidance for building integrated analytics solutions that combine the generative BI features of Amazon Quick with Amazon Redshift and Amazon Athena SQL analytics capabilities.</description>
										<content:encoded>&lt;p&gt;Traditional business intelligence (BI) integration with enterprise data warehouses has been the established pattern for years. With generative AI, you can now modernize BI workloads with capabilities like interactive chat agents, automated business processes, and using natural language to generate dashboards.&lt;/p&gt; 
&lt;p&gt;In this post, we provide implementation guidance for building integrated analytics solutions that combine the generative BI features of &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick&lt;/a&gt; with &lt;a href="https://aws.amazon.com/redshift/" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift&lt;/a&gt; and &lt;a href="https://aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena&lt;/a&gt; SQL analytics capabilities. Use this post as a reference for proof-of-concept implementations, production deployment planning, or as a learning resource for understanding Quick integration patterns with Amazon Redshift and Athena.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Common use cases&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;You can use this integrated approach across several scenarios. The following are some of the most common use cases.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Traditional BI reporting benefits from bundled data warehouse and BI tool pricing, making generative BI the primary use case with significant cost advantages. 
  &lt;ul&gt; 
   &lt;li&gt;&lt;em&gt;Insurance:&lt;/em&gt; Automates Solvency II and IFRS 17 regulatory reporting, replacing manual spreadsheet consolidation.&lt;/li&gt; 
   &lt;li&gt;&lt;em&gt;Banking:&lt;/em&gt; Accelerates FDIC call report generation and capital adequacy dashboards, cutting month-end close from days to hours.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;Interactive dashboards with contextual chat agents give BI teams conversational interfaces alongside their visual metrics. 
  &lt;ul&gt; 
   &lt;li&gt;&lt;em&gt;Gaming:&lt;/em&gt; Live ops teams query player retention and monetization KPIs in plain English—no SQL needed.&lt;/li&gt; 
   &lt;li&gt;&lt;em&gt;Financial Services:&lt;/em&gt; Trading analysts chat with real-time P&amp;amp;L dashboards to surface anomalies and drill into positions on demand.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;Domain-specific analytics workspaces democratize enterprise data exploration through Quick Spaces and natural language queries. 
  &lt;ul&gt; 
   &lt;li&gt;&lt;em&gt;Insurance:&lt;/em&gt; Actuarial and underwriting teams query claims and risk data without waiting on data engineering.&lt;/li&gt; 
   &lt;li&gt;&lt;em&gt;Banking:&lt;/em&gt; Risk and compliance teams explore credit, market, and operational data through a single natural language interface.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;Workflow automation removes repetitive tasks and accelerates self-service analytics. 
  &lt;ul&gt; 
   &lt;li&gt;&lt;em&gt;Financial Services:&lt;/em&gt; Automated AR reconciliation flows replace manual ledger matching, shrinking close cycle effort significantly.&lt;/li&gt; 
   &lt;li&gt;&lt;em&gt;Gaming:&lt;/em&gt; Telemetry ingestion pipelines trigger reporting refreshes automatically, freeing data engineers from routine work.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Let us examine an end-to-end solution combining these technologies.&lt;/p&gt; 
&lt;h2&gt;&lt;strong&gt;Solution flow&lt;/strong&gt;&lt;/h2&gt; 
&lt;p&gt;AWS offers two native SQL analytics engines for building analytics workloads. &lt;a href="https://aws.amazon.com/redshift/" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift&lt;/a&gt; provides a fully managed data warehouse with columnar storage and massively parallel processing. &lt;a href="https://aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena&lt;/a&gt; delivers serverless interactive query capabilities directly against data in Amazon S3.&lt;/p&gt; 
&lt;p&gt;You can use either Amazon Redshift or Amazon Athena as a SQL engine while implementing the steps in this post. The following are the steps involved in building an end-to-end solution.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignleft wp-image-89725 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-1.png" alt="Solution steps to integrate SQL Analytics engines with Amazon Quick" width="1354" height="329"&gt;&lt;/p&gt; 
&lt;p&gt;Figure1: Solution steps to integrate SQL Analytics engines with Amazon Quick&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Set up your SQL analytics engines: Amazon Redshift or Amazon Athena.&lt;/li&gt; 
 &lt;li&gt;Load data and create business views designed for analytics workloads.&lt;/li&gt; 
 &lt;li&gt;Configure integration between SQL analytics engines and Amazon Quick.&lt;/li&gt; 
 &lt;li&gt;Create data sources in Amazon Quick.&lt;/li&gt; 
 &lt;li&gt;Create datasets and dashboards for visual analytics.&lt;/li&gt; 
 &lt;li&gt;Use Topics and Spaces to provide natural language interfaces to your data.&lt;/li&gt; 
 &lt;li&gt;Deploy chat agents to deliver conversational AI experiences for business users.&lt;/li&gt; 
 &lt;li&gt;Implement business flows to automate repetitive workflows and processes.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Let’s start by walking through steps 1–4 for Amazon Redshift. We then describe the same four steps for Amazon Athena before explaining the Amazon Quick steps 5–8.&lt;/p&gt; 
&lt;h2&gt;Configure and create datasets in Amazon Redshift&lt;/h2&gt; 
&lt;p&gt;Amazon Redshift offers two deployment options to meet your data warehousing needs. Provisioned clusters provide traditional deployment where you manage compute resources by selecting node types and cluster size. Serverless automatically scales compute capacity based on workload demands with pay-per-use pricing. Both options are supported by Amazon Quick. For this walkthrough, we use Redshift Serverless.&lt;/p&gt; 
&lt;h3&gt;Set up SQL analytics engine&lt;/h3&gt; 
&lt;p&gt;To create a Redshift Serverless namespace and workgroup:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://console.aws.amazon.com/redshift/" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;On the left navigation pane, select &lt;strong&gt;Redshift Serverless.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Follow the steps described in the &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-console-workgroups-create-workgroup-wizard.html" target="_blank" rel="noopener noreferrer"&gt;Creating a workgroup with a namespace&lt;/a&gt; documentation page to create a workgroup and a namespace. Note the username and password provided. You will use these details for configuring connections in Amazon Redshift and Quick.&lt;/li&gt; 
 &lt;li&gt;You should see the status as &lt;strong&gt;Available &lt;/strong&gt;for both the workgroup and namespace in the Serverless dashboard.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignleft wp-image-89726 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-2.png" alt="Amazon Redshift Serverless Workgroup and Namespaces" width="1719" height="446"&gt;&lt;/p&gt; 
&lt;p&gt;Figure 2: Amazon Redshift Serverless Workgroup and NamespacesThe deployment will be completed in approximately 3–5 minutes.&lt;/p&gt; 
&lt;h3&gt;Load data and create business views&lt;/h3&gt; 
&lt;p&gt;Now you can load data using the industry-standard TPC-H benchmark dataset, which provides realistic customer, order, and product data for analytics workloads.To load data into Amazon Redshift:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-getting-started.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift Query Editor V2&lt;/a&gt; from the console.&lt;/li&gt; 
 &lt;li&gt;Run the &lt;a href="https://github.com/awslabs/amazon-redshift-utils/blob/master/src/CloudDataWarehouseBenchmark/Cloud-DWB-Derived-from-TPCH/100GB/ddl.sql" target="_blank" rel="noopener noreferrer"&gt;TPC H DDL statements&lt;/a&gt; to create TPC-H tables.&lt;/li&gt; 
 &lt;li&gt;Run the following COPY commands to load data from the public S3 bucket: &lt;code&gt;s3://redshift-downloads/TPC-H/&lt;/code&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Ensure that the IAM role attached to the namespace is set as the default IAM role. If you didn’t set up the default IAM role at the time of namespace creation, you can refer to the &lt;a href="\Users\crysdu\Downloads\Creating%20an%20IAM%20role%20as%20default%20for%20Amazon%20Redshift" target="_blank" rel="noopener noreferrer"&gt;Creating an IAM role as default for Amazon Redshift&lt;/a&gt; documentation page to set it now.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;copy customer from 's3://redshift-downloads/TPC-H/2.18/100GB/customer/' iam_role default delimiter '|' region 'us-east-1'; 

copy orders from 's3://redshift-downloads/TPC-H/2.18/100GB/orders/' iam_role default delimiter '|' region 'us-east-1'; 

copy lineitem from 's3://redshift-downloads/TPC-H/2.18/100GB/lineitem/' iam_role default delimiter '|' region 'us-east-1'; &lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Run the following query to validate load status. The &lt;strong&gt;status&lt;/strong&gt; column should show as completed. You can also review the information in other columns to see details about the loads such as record counts, duration, and data source.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from &amp;nbsp;SYS_LOAD_HISTORY &amp;nbsp;Where table_name in ('customer','orders','lineitem');&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89727" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-3.png" alt="Output of SYS_LOAD_HISTORY showing successful completion of COPY Jobs" width="1511" height="125"&gt;&lt;br&gt; Figure 3: Output of SYS_LOAD_HISTORY showing successful completion of COPY Jobs&lt;/p&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Create a materialized view to improve query performance:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Run the following SQL to create a materialized view that pre-compute results set for customer revenues and order volumes by market segment.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE MATERIALIZED VIEW mv_customer_revenue AS 
SELECT 
c.c_custkey, 
c.c_name, 
c.c_mktsegment, 
SUM(l.l_extendedprice * (1 - l.l_discount)) as total_revenue, 
COUNT(DISTINCT o.o_orderkey) as order_count 
FROM customer c 
JOIN orders o ON c.c_custkey = o.o_custkey
JOIN lineitem l ON o.o_orderkey = l.l_orderkey
GROUP BY c.c_custkey, c.c_name, c.c_mktsegment;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;Run the following SQL to review the data in the materialized view.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;select * from mv_customer_revenue limit 10;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;h3&gt;Configure integration with Amazon Quick&lt;/h3&gt; 
&lt;p&gt;Amazon Quick auto discovers the Amazon Redshift provisioned clusters that are associated with your AWS account. These resources must be in the same AWS Region as your Amazon Quick account. For Amazon Redshift clusters in other accounts or Amazon Redshift Serverless, we recommend that you add a VPC connection following the steps in &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/enabling-access-redshift.html#redshift-vpc-access" target="_blank" rel="noopener noreferrer"&gt;Enabling access to an Amazon Redshift cluster in a VPC&lt;/a&gt; documentation. Usually, these steps are performed by your organization’s cloud security administration team.&lt;/p&gt; 
&lt;p&gt;For serverless, you will apply the same steps in the workgroup instead of the cluster. You can find the VPC and Security Group settings in the &lt;strong&gt;Data Access&lt;/strong&gt; tab of a workgroup.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="size-full wp-image-89728 alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-4.png" alt="Amazon Redshift Serverless workgroup VPC and Security groups " width="1603" height="752"&gt;&lt;br&gt; Figure 4: Amazon Redshift Serverless workgroup VPC and Security groups&lt;/p&gt; 
&lt;p&gt;You can also refer to &lt;a href="https://www.youtube.com/watch?v=_3ncNGVttTU" target="_blank" rel="noopener noreferrer"&gt;How do I privately connect Quick to an Amazon Redshift or RDS data source in a private subnet?&lt;/a&gt; for a demonstration.&lt;/p&gt; 
&lt;h3&gt;Create data source&lt;/h3&gt; 
&lt;p&gt;To create a dataset connecting to Amazon Redshift, complete the following steps.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the Quick left navigation pane, go to &lt;strong&gt;Datasets.&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose the &lt;strong&gt;Data sources &lt;/strong&gt;tab and select &lt;strong&gt;Create data source&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select Amazon Redshift and enter the following: 
  &lt;ul&gt; 
   &lt;li&gt;&lt;strong&gt;Data Source Name&lt;/strong&gt;: Provide &lt;code&gt;customer-rev-datasource&lt;/code&gt; as data source name.&lt;/li&gt; 
   &lt;li&gt;&lt;strong&gt;Connection type&lt;/strong&gt;: Select the VPC connection created in the previous step.&lt;/li&gt; 
   &lt;li&gt;&lt;strong&gt;Database server&lt;/strong&gt;: Enter the Amazon Redshift workgroup endpoint (for example, &lt;code&gt;quick-demo-wg.123456789.us-west-2.redshift-serverless.amazonaws.com&lt;/code&gt;).&lt;/li&gt; 
   &lt;li&gt;&lt;strong&gt;Port&lt;/strong&gt;: 5439 (default).&lt;/li&gt; 
   &lt;li&gt;&lt;strong&gt;Database&lt;/strong&gt;: &lt;code&gt;dev&lt;/code&gt;.&lt;/li&gt; 
   &lt;li&gt;&lt;strong&gt;Username/Password&lt;/strong&gt;: Amazon Redshift credentials with access to the database.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;Choose&lt;strong&gt; Validate&lt;/strong&gt; &lt;strong&gt;connection&lt;/strong&gt;. The validation should be successful.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89729" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-5.png" alt="Amazon Redshift data source configuration" width="574" height="607"&gt;&lt;br&gt; Figure 5: Amazon Redshift data source configuration&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create Data Source &lt;/strong&gt;to create a data source.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Now let’s explore how to perform all these four steps to configure Athena in Amazon Quick.&lt;/p&gt; 
&lt;h2&gt;Configure and create datasets in Amazon Athena&lt;/h2&gt; 
&lt;p&gt;Amazon Athena provides immediate query capabilities against petabytes of data with automatic scaling to handle concurrent users. Let’s go through the steps to configure connections between Amazon Quick and Amazon Athena.&lt;/p&gt; 
&lt;h3&gt;Set up SQL analytics engine&lt;/h3&gt; 
&lt;p&gt;To create an Athena workgroup:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://console.aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;In the navigation pane, choose &lt;strong&gt;Workgroups&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create workgroup&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Workgroup name&lt;/strong&gt;, enter &lt;code&gt;quick-demo&lt;/code&gt;.&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Query result configuration&lt;/strong&gt;, select &lt;strong&gt;Athena managed&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create workgroup&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Your workgroup is ready immediately for querying data.&lt;/p&gt; 
&lt;h3&gt;Load data and create business views&lt;/h3&gt; 
&lt;p&gt;For Athena, you create tables using the TPC-H benchmark dataset that AWS provides in a public S3 bucket. This approach gives you 1.5 million customer records already optimized in Parquet format without requiring data loading.&lt;/p&gt; 
&lt;p&gt;To create tables and views in Athena:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the Athena Query Editor from the console.&lt;/li&gt; 
 &lt;li&gt;Create a database for your analytics (create S3 bucket if it exists already): 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE DATABASE IF NOT EXISTS athena_demo_db 
COMMENT 'Analytics database for customer insights' 
LOCATION 's3://my-analytics-data-lake-[account-id]/';&lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create an external table pointing to the TPC-H public dataset: 
  &lt;div class="hide-language"&gt; 
   &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE EXTERNAL TABLE IF NOT EXISTS athena_demo_db.customer_csv ( 
  C_CUSTKEY INT, 
  C_NAME STRING, 
  C_ADDRESS STRING, 
  C_NATIONKEY INT, 
  C_PHONE STRING, 
  C_ACCTBAL DOUBLE, 
  C_MKTSEGMENT STRING, 
  C_COMMENT STRING 
) 

ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '|' 
STORED AS TEXTFILE 
LOCATION 's3://redshift-downloads/TPC-H/2.18/100GB/customer/' &lt;/code&gt;&lt;/pre&gt; 
  &lt;/div&gt; &lt;/li&gt; 
 &lt;li&gt;Create a business-friendly view for analytics:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Run the following SQL to create a view that aggregates customer account balances grouped by market segments.&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;CREATE VIEW athena_demo_db.customer_deep_analysis AS 
SELECT 
    c_custkey AS customer_id, 
    c_name AS customer_name, 
    c_mktsegment AS market_segment, 
    c_nationkey, 
    ROUND(c_acctbal, 2) AS account_balance, 
    CASE 
        WHEN c_acctbal &amp;lt; 0    THEN 'At-Risk' 
        WHEN c_acctbal &amp;lt; 2500 THEN 'Low' 
        WHEN c_acctbal &amp;lt; 5000 THEN 'Mid' 
        WHEN c_acctbal &amp;lt; 8000 THEN 'High' 
        ELSE 'Premium' 
    END                                                              
AS balance_tier, 

    ROUND(AVG(c_acctbal) OVER (PARTITION BY c_mktsegment), 2)        AS segment_avg, 
    ROUND(c_acctbal - AVG(c_acctbal) OVER (PARTITION BY c_mktsegment), 2) AS vs_segment_avg, 
    ROUND((c_acctbal - AVG(c_acctbal) OVER (PARTITION BY c_mktsegment)) 
          / NULLIF(STDDEV(c_acctbal) OVER (PARTITION BY c_mktsegment), 0), 2) AS segment_z_score, 
    RANK() OVER (PARTITION BY c_mktsegment ORDER BY c_acctbal DESC)  AS rank_in_segment, 
    NTILE(5) OVER (ORDER BY c_acctbal DESC)                          AS global_quintile 

FROM athena_demo_db.customer_csv 
ORDER BY c_acctbal DESC; &lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;Verify your view from Athena with:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-sql"&gt;SELECT * FROM athena_demo_db.customer_deep_analysis limit 5;&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89730" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-6.png" alt="Output from the SELECT query" width="1583" height="412"&gt;&lt;br&gt; Figure 6: Output from the SELECT query&lt;/p&gt; 
&lt;h3&gt;Configure integration with Amazon Quick&lt;/h3&gt; 
&lt;p&gt;To connect to Amazon Athena in Amazon Quick, follow these steps, consolidated from &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/create-a-data-set-athena.html" target="_blank" rel="noopener noreferrer"&gt;official AWS documentation&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/athena.html" target="_blank" rel="noopener noreferrer"&gt;authorizing connections to Amazon Athena&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;Authorize Quick to Access Athena, S3 Bucket for data, and S3 bucket for Athena Results.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Open the Amazon Quick Security Settings&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Sign in to the &lt;a href="https://quicksight.aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon Quick console&lt;/strong&gt;&lt;/a&gt; as an administrator.&lt;/li&gt; 
 &lt;li&gt;In the top-right corner, choose your profile icon, then select &lt;strong&gt;Manage account&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Under &lt;strong&gt;Permissions&lt;/strong&gt;, choose &lt;strong&gt;AWS resources&lt;/strong&gt;.&lt;img loading="lazy" class="alignnone size-full wp-image-89731" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-7.png" alt="AWS resource permissions" width="1778" height="685"&gt;&lt;br&gt; Figure 7: AWS resource permissions&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Enable Athena Access&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Under &lt;strong&gt;Quick access to AWS services&lt;/strong&gt;, choose &lt;strong&gt;Manage&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Locate &lt;strong&gt;Amazon Athena&lt;/strong&gt; in the list of AWS services.&lt;/li&gt; 
 &lt;li&gt;If Athena is already selected but access issues persist, clear the checkbox and re-select it to re-enable Athena.&lt;/li&gt; 
 &lt;li&gt;Under &lt;strong&gt;Amazon S3&lt;/strong&gt;, select &lt;strong&gt;S3 buckets&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Check the boxes next to each S3 bucket that Amazon Quick needs to access—including buckets used for Athena query results and any Redshift COPY source buckets.&lt;/li&gt; 
 &lt;li&gt;Enable &lt;strong&gt;Write permission for Athena Workgroup&lt;/strong&gt; to allow Amazon Quick to write Athena query results to S3 and choose &lt;strong&gt;Finish&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Save&lt;/strong&gt; to update the configuration.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The final step is to grant your Amazon Quick author permissions to query your database, Athena tables, and views. Configuration depends on whether AWS Lake Formation is enabled.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;If AWS Lake Formation is not enabled&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Permissions are managed at the Quick service role level through standard IAM-based S3 access control. Ensure that the Quick service role (for example, aws-quick-service-role-v0) has the appropriate IAM permissions for the relevant S3 buckets and Athena resources. No additional Lake Formation configuration is required.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;If AWS Lake Formation is enabled&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Lake Formation acts as the central authorization layer, overriding standard IAM-based S3 permissions. Grant permissions directly to the Amazon Quick author or IAM role.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To grant data permissions:&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Open the &lt;a href="https://console.aws.amazon.com/lakeformation/" target="_blank" rel="noopener noreferrer"&gt;AWS Lake Formation&lt;/a&gt; console.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Permissions&lt;/strong&gt;, then &lt;strong&gt;Data permissions&lt;/strong&gt;, then &lt;strong&gt;Grant&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select the IAM user or role.&lt;/li&gt; 
 &lt;li&gt;Choose the required databases, tables, and columns.&lt;/li&gt; 
 &lt;li&gt;Grant &lt;strong&gt;SELECT&lt;/strong&gt; at minimum; add &lt;strong&gt;DESCRIBE&lt;/strong&gt; for dataset creation.&lt;/li&gt; 
 &lt;li&gt;Repeat for each user or role that requires access.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Create data source&lt;/h3&gt; 
&lt;p&gt;Follow these steps to create an Athena data source on Amazon Quick.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the &lt;strong&gt;Amazon Quick&lt;/strong&gt; console, navigate to &lt;strong&gt;Datasets&lt;/strong&gt; and choose &lt;strong&gt;Data sources &lt;/strong&gt;tab.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create data source&lt;/strong&gt;, then select the &lt;strong&gt;Amazon Athena&lt;/strong&gt; card.&lt;/li&gt; 
 &lt;li&gt;Enter a &lt;strong&gt;Data source name &lt;/strong&gt;(you can give any name of your choice), select your &lt;strong&gt;Athena workgroup &lt;/strong&gt;(like quick-demo), and choose &lt;strong&gt;Validate connection&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89732" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-8.png" alt="Athena data source creation" width="583" height="291"&gt;&lt;br&gt; Figure 8: Athena data source creation&lt;/p&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create data source&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Your Athena data source is now available for building datasets, dashboards, and Topics.&lt;/p&gt; 
&lt;h2&gt;Use Amazon Quick generative AI features&lt;/h2&gt; 
&lt;p&gt;The next steps, from 5–8, demonstrate Amazon Quick generative AI capabilities using Amazon Redshift as a data source. While we use Amazon Redshift in this example, you can substitute with Amazon Athena based on your specific requirements.&lt;/p&gt; 
&lt;h3&gt;Create dashboards&lt;/h3&gt; 
&lt;p&gt;Let’s start by creating datasets from the Amazon Redshift data source.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the left navigation pane, choose &lt;strong&gt;Datasets&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;On the Datasets page, choose &lt;strong&gt;Create Dataset&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;For the data source, select Amazon Redshift data source &lt;code&gt;customer-rev-datasource&lt;/code&gt;&lt;strong&gt;. &lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;From the menu, choose &lt;code&gt;mv_customer_revenue&lt;/code&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89733" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-9.png" alt="Select table to visualize" width="581" height="468"&gt;&lt;br&gt; Figure 9: Select table to visualize&lt;/p&gt; 
&lt;ol start="5"&gt; 
 &lt;li&gt;You can choose one of the following query modes. For this post, select &lt;strong&gt;Directly query your data&lt;/strong&gt; option and choose &lt;strong&gt;Visualize&lt;/strong&gt;. 
  &lt;ul&gt; 
   &lt;li&gt;&lt;strong&gt;Import to SPICE for quicker analytics&lt;/strong&gt; – Quick loads a snapshot into its in-memory engine for faster dashboard performance.&lt;/li&gt; 
   &lt;li&gt;&lt;strong&gt;Directly query&lt;/strong&gt; &lt;strong&gt;your data&lt;/strong&gt;– Quick runs queries on demand against your query engine.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
 &lt;li&gt;Select &lt;strong&gt;Build&lt;/strong&gt; icon to open a chat window. Enter “Show me orders by market segments” as the prompt. Note that you need &lt;a href="https://docs.aws.amazon.com/quick/latest/userguide/generative-bi-get-started.html" target="_blank" rel="noopener noreferrer"&gt;Author Pro&lt;/a&gt; access to use this feature.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89734" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-10.png" alt="Build visualization using generative BI feature" width="1907" height="615"&gt;&lt;br&gt; Figure 10: Build visualization using generative BI feature&lt;/p&gt; 
&lt;ol start="7"&gt; 
 &lt;li&gt;You can change the visual type to a pie chart and add it to the analysis.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89735" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/BDB-4727-11.gif" alt="Change visual type" width="420" height="598"&gt;&lt;br&gt; Figure 10: Change visual type&lt;/p&gt; 
&lt;p&gt;To publish your analysis as a dashboard&lt;/p&gt; 
&lt;ol start="8"&gt; 
 &lt;li&gt;After you add the visuals, choose &lt;strong&gt;Publish&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a name for the dashboard. For this post, use the &lt;strong&gt;&lt;a href="https://us-west-2.quicksight.aws.amazon.com/sn/dashboards/b7bb989a-c5ef-4ded-bb61-6c6cfcb7c186&amp;quot; \o &amp;quot;Market Segement Analysis" target="_blank" rel="noopener noreferrer"&gt;Market Segment Dashboard.&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Publish dashboard&lt;/strong&gt;. Your dashboard is now available for viewing and sharing.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;Create topics and spaces&lt;/h3&gt; 
&lt;p&gt;To fully maximize enterprise data with AI, we must provide the right structure and context. That’s where&amp;nbsp;&lt;strong&gt;Topics&lt;/strong&gt;&amp;nbsp;and&amp;nbsp;&lt;strong&gt;Spaces&lt;/strong&gt;&amp;nbsp;come in. Topics act as natural language interfaces to your structured datasets, automatically analyzing your data, mapping fields, and adding synonyms. Business users can ask&amp;nbsp;&lt;em&gt;“What are total revenues by market segment?”&lt;/em&gt;&amp;nbsp;and receive instant, visualized answers without writing a single line of SQL. Spaces bring together all of your related assets into a single collaborative workspace that democratizes data access, reduces context-switching, accelerates team onboarding, so everyone is working from the same trusted, AI-ready data sources.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create a Quick topic&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;From the Amazon Quick homepage, choose &lt;strong&gt;Topics&lt;/strong&gt;, then choose &lt;strong&gt;Create topic&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a name for your topic. For this post, use &lt;strong&gt;Customer Revenue Analytics&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a description. For example:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;em&gt;The Customer Revenue Analytics topic is designed for business users (including analysts, sales operations teams, finance, and market segment owners who need to explore customer and revenue data without SQL expertise. It serves as a natural language interface over the mv_customer_revenue Amazon Redshift dataset, allowing users to ask plain-English questions like “What are total revenues by market segment?” and receive instant, visualized answers. By automatically mapping business language to the underlying schema, it democratizes access to revenue insights across the organization.&lt;/em&gt;&lt;/p&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Under &lt;strong&gt;Dataset&lt;/strong&gt;, select &lt;code&gt;mv_customer_revenue&lt;/code&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create&lt;/strong&gt;. The topic can take 15–30 minutes to enable depending on the data. During this time, Amazon Quick automatically analyzes your data, selects relevant fields, and adds synonyms.&lt;/li&gt; 
 &lt;li&gt;After the topic is enabled, take a few minutes to review and enrich it. The following are some example enrichments. 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Add column descriptions to clarify field meaning for business users.&lt;/li&gt; 
   &lt;li&gt;Define preferred aggregations (for example, sum compared to average for revenue fields).&lt;/li&gt; 
   &lt;li&gt;Confirm which fields are &lt;strong&gt;Dimensions&lt;/strong&gt; and which are &lt;strong&gt;Measures&lt;/strong&gt;.&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;(Optional) To further refine how your topic interprets and responds to queries, add multiple datasets (for example, a customer CSV combined with a database view), custom instructions, filters, and calculated fields.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;After your topic is created, its columns are available to add to a Space or to an Agent by selecting it as a data source.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full" src="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/BDB-5727/BDB-4727-12.gif" width="1920" height="888"&gt;&lt;br&gt; Figure 11: Create a Quick Topic&lt;/p&gt; 
&lt;h3&gt;Create a Space for your team&lt;/h3&gt; 
&lt;p&gt;Spaces bring together dashboards, topics, datasets, documents, and other resources into organized, collaborative workspaces. By centralizing related assets in a single workspace, Spaces reduce context-switching, accelerate onboarding, so everyone is working from the same trusted data sources.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;What to include in your Quick Space&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Dashboard&lt;/strong&gt; – Add the dashboard &lt;strong&gt;Market Segment Dashboard&lt;/strong&gt; published from your &lt;code&gt;mv_customer_revenue&lt;/code&gt; analysis. This gives team members instant access to visualizations such as revenue by market segment, top customers by order volume, and revenue distribution.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Topic&lt;/strong&gt; – Connect the &lt;a href="https://us-west-2.quicksight.aws.amazon.com/sn/topics/EDyPgicLVwCh1ZAUjrb6RJM8dNtA4yyQ/summary" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Customer Revenue Analytics&lt;/strong&gt;&lt;/a&gt; (built on the &lt;code&gt;mv_customer_revenue&lt;/code&gt; materialized view) to enable natural language queries directly against your Amazon Redshift data.&lt;/li&gt; 
 &lt;li&gt;Optionally, you can upload supporting context to ground your team’s analysis: 
  &lt;ul&gt; 
   &lt;li&gt;Data dictionary or field definitions for &lt;code&gt;mv_customer_revenue&lt;/code&gt;&lt;/li&gt; 
   &lt;li&gt;Market segment definitions (AUTOMOBILE, BUILDING, FURNITURE, MACHINERY, HOUSEHOLD)&lt;/li&gt; 
   &lt;li&gt;Business rules for revenue calculation (for example, how discounts are applied in the TPC-H model)&lt;/li&gt; 
   &lt;li&gt;This implementation guide, so new team members can onboard quickly&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;To create the Quick Space&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;From the left navigation menu, choose &lt;strong&gt;Spaces&lt;/strong&gt;, then choose &lt;strong&gt;Create space&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a name, for example, &lt;strong&gt;Customer Revenue &amp;amp; Segmentation&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a description. For example:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Centralized workspace for customer revenue analysis powered by Amazon Redshift includes interactive dashboards, natural language query access to customer and segment data, and supports documentation for the TPC-H revenue model.&lt;/p&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Add knowledge by connecting the &lt;strong&gt;Market Segment Dashboard&lt;/strong&gt; and topic&lt;strong&gt; Customer Revenue Analytics&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;You can invite team members, such as finance, sales operations, and segment owners, and set appropriate permissions.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;Your Space is now ready for collaborative data exploration.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full" src="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/BDB-5727/BDB-4727-13.gif" width="1920" height="888"&gt;&lt;br&gt; Figure 12: Create a Quick Space&lt;/p&gt; 
&lt;h3&gt;Build chat agents&lt;/h3&gt; 
&lt;p&gt;A custom chat agent delivers conversational AI experiences that understand business context and provide intelligent, grounded responses to user queries. These agents go beyond question-and-answer interactions. They synthesize knowledge from your dashboards, topics, datasets, and documents to explain trends, surface anomalies, guide users through complex analytics workflows, and recommend next steps.&lt;/p&gt; 
&lt;p&gt;Rather than requiring users to navigate multiple tools or write SQL queries, agents serve as a single conversational interface to your entire analytics environment. Agents can also connect to&amp;nbsp;&lt;strong&gt;Actions&lt;/strong&gt;, pre-built integrations with enterprise tools such as Slack, Microsoft Teams, Outlook, and SharePoint, enabling them to answer questions and trigger real-world workflows, send notifications, create tasks, and interact with external systems directly from the conversation. Custom agents can be tailored to specific business domains, teams, or use cases so that responses align with organizational terminology, data definitions, and business processes. After created, agents can be shared across teams, enabling consistent, actionable, AI-powered data access at scale. For teams working with the&amp;nbsp;&lt;code&gt;mv_customer_revenue&lt;/code&gt;&amp;nbsp;dataset, we recommend creating a dedicated&amp;nbsp;Customer Revenue Analysis Agent. This is a purpose-built conversational assistant grounded in your Amazon Redshift data, dashboards, and the Customer Revenue &amp;amp; Segmentation Space.&lt;/p&gt; 
&lt;h3&gt;Create a Quick chat agent&lt;/h3&gt; 
&lt;p&gt;There are two ways that you can use Amazon Quick to create a Quick agent. You can use the navigation menu or directly from Space. The following steps walk you through creating one from the navigation menu.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create a Quick chat agent&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;From the left navigation menu, choose &lt;strong&gt;Agents&lt;/strong&gt;, then choose &lt;strong&gt;Create agent&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a name for your agent, for example, &lt;strong&gt;Customer Revenue Analyst&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a description. For example:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;em&gt;An AI assistant for analyzing customer revenue, market segment performance, and order trends using our Amazon Redshift or data warehouse&lt;/em&gt;.&lt;/p&gt; 
&lt;ol start="4"&gt; 
 &lt;li&gt;Under &lt;strong&gt;Knowledge Sources&lt;/strong&gt;, add the &lt;strong&gt;Customer Revenue &amp;amp; Segmentation&lt;/strong&gt; Space as a data source. This gives your agent access to the dashboards, topics, and reference documents you’ve already built.&lt;/li&gt; 
 &lt;li&gt;(Optional) Define custom persona instructions to align the agent’s responses with your business context. For example, specifying preferred terminology, response style, or the types of questions it should prioritize.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Launch chat agent&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Start having a conversation with your data. You are welcome to ask any questions. The following are some examples. 
  &lt;ul&gt; 
   &lt;li&gt;&lt;em&gt;Which market segment generated most revenue? &lt;/em&gt;&lt;/li&gt; 
   &lt;li&gt;&lt;em&gt;Show me order trends&lt;/em&gt;&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full" src="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/BDB-5727/BDB-4727-14.gif" width="1920" height="888"&gt;&lt;br&gt; Figure 13: Create a Quick Chat agent&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To share your Quick chat agent&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;After your agent is published, choose &lt;strong&gt;Share&lt;/strong&gt; and invite team members or share it across your organization. Custom agents can be tailored to specific business contexts so that different teams can get AI assistance that speaks their language, without needing to configure anything themselves.&lt;/p&gt; 
&lt;h3&gt;Create Quick Flows&lt;/h3&gt; 
&lt;p&gt;Quick Flows automate repetitive tasks and orchestrate multi-step workflows across your entire analytics environment. This removes manual effort, reducing human error, and ensuring consistent execution of critical business processes. Flows can be triggered on a schedule or launched on demand, giving you flexible control over when and how automation runs.&lt;/p&gt; 
&lt;p&gt;You can build flows that span the full analytics lifecycle: monitoring data quality and flagging anomalies, generating and distributing scheduled reports to stakeholders, and triggering downstream actions in integrated systems such as Slack, Outlook. Amazon Quick gives you three ways to create a flow, so whether you prefer a no-code conversation or a visual step-by-step builder, there’s an option that fits how you work.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;To create a flow from chat&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;While conversing with My Assistant or a custom agent, describe the workflow that you want to automate in plain English.&lt;/li&gt; 
 &lt;li&gt;Amazon Quick generates the flow and offers to create it directly from your conversation — no configuration screens required.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;strong&gt;To create a flow from a natural language description&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;From the left navigation menu, choose &lt;strong&gt;Flows&lt;/strong&gt;, then choose &lt;strong&gt;Create flow&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a plain-English description of your workflow. For example:&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;” Query revenue data by market segments. Filter by order count and all dates. Search web for comparable relevant market trends. Generate formatted summary reports providing market summary and look ahead per segment. ”&lt;/p&gt; 
&lt;ol start="3"&gt; 
 &lt;li&gt;Amazon Quick automatically generates the complete workflow with all the necessary steps.&lt;/li&gt; 
 &lt;li&gt;Optionally, you can add additional steps.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Run Mode&lt;/strong&gt; to test the Flow.&lt;/li&gt; 
 &lt;li&gt;After your flow is created, share it with team members or publish it to your organization’s flow library, so everyone benefits from the same automation without having to rebuild it independently.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full" src="https://d2908q01vomqb2.cloudfront.net/artifacts/DBSBlogs/BDB-5727/BDB-4727-15.gif" width="1920" height="888"&gt;&lt;br&gt; Figure 14: Create a Quick Flow to generate summaries and publish dashboards&lt;/p&gt; 
&lt;p&gt;For more complex flow, review weekly customer revenue summary flow as an example.&lt;/p&gt; 
&lt;ol start="7"&gt; 
 &lt;li&gt;Queries the &lt;code&gt;mv_customer_revenue&lt;/code&gt; materialized view in Amazon Redshift for the latest weekly revenue figures by market segment.&lt;/li&gt; 
 &lt;li&gt;Compares results against the prior week to calculate segment-level variance.&lt;/li&gt; 
 &lt;li&gt;Generates a formatted summary report and publishes it to the Customer Revenue &amp;amp; Segmentation Space.&lt;/li&gt; 
 &lt;li&gt;Sends a notification through email or Slack to finance, sales operations, and segment owners with a direct link to the updated dashboard.&lt;/li&gt; 
 &lt;li&gt;Flags any segment where revenue has declined more than a defined threshold, routing an alert to the appropriate owner for follow-up.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This flow transforms what might otherwise be a manual, multi-step reporting process into a fully automated pipeline, so stakeholders receive consistent, timely revenue insights without analyst intervention and saving analysts an estimated &lt;strong&gt;3–5 hours per week&lt;/strong&gt;. For detailed guidance on creating and managing flows, see &lt;a href="https://docs.aws.amazon.com/quicksuite/latest/userguide/using-amazon-quick-flows.html" target="_blank" rel="noopener noreferrer"&gt;Using Amazon Quick Flows&lt;/a&gt;. Also review &lt;a href="https://youtu.be/nUUS3iYUQi0" target="_blank" rel="noopener noreferrer"&gt;&lt;strong&gt;Create workflows for routine tasks&lt;/strong&gt;&lt;/a&gt; demo&lt;strong&gt;.&lt;/strong&gt;&lt;/p&gt; 
&lt;h2&gt;Cleanup&lt;/h2&gt; 
&lt;p&gt;Consider deleting the following resources created while following this post to avoid incurring costs. We encourage you to use the trials at no cost as much as possible to familiarize yourself with the features described.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Delete the Amazon Redshift Serverless &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless_delete-workgroup.html" target="_blank" rel="noopener noreferrer"&gt;workgroup&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-console-namespace-delete.html" target="_blank" rel="noopener noreferrer"&gt;namespace&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Delete &lt;a href="https://docs.aws.amazon.com/athena/latest/ug/deleting-workgroups.html" target="_blank" rel="noopener noreferrer"&gt;Athena workgroup&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html" target="_blank" rel="noopener noreferrer"&gt;S3 Buckets&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Delete the &lt;a href="https://docs.aws.amazon.com/quick/latest/userguide/troubleshoot-delete-quicksight-account.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick account&lt;/a&gt; used while following this post. If you used an existing account, delete the data sets, dashboards, topics, spaces, agents and flows created.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;This integrated approach to business intelligence combines the power of AWS SQL analytics engines with Amazon Quick generative AI capabilities to deliver comprehensive analytics solutions. By following these implementation steps, you establish a foundation for traditional BI reporting, interactive dashboards, natural language data exploration, and intelligent workflow automation. The architecture scales from proof-of-concept implementations to production deployments, transforming how organizations access and act on data insights. For more information about Amazon Quick features and capabilities, see the &lt;a href="https://docs.aws.amazon.com/quick/" target="_blank" rel="noopener noreferrer"&gt;Amazon Quick documentation&lt;/a&gt;. To learn more about Amazon Redshift, visit the &lt;a href="https://aws.amazon.com/redshift/" target="_blank" rel="noopener noreferrer"&gt;Amazon Redshift product page&lt;/a&gt;. For Amazon Athena details, see the &lt;a href="https://aws.amazon.com/athena/" target="_blank" rel="noopener noreferrer"&gt;Amazon Athena product page&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-85290" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2025/11/18/internal-cdn.amazon.jpg" alt="" width="120" height="160"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;“Satesh Sonti”&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/satish-kumar-sonti/" target="_blank" rel="noopener"&gt;Satesh&lt;/a&gt; is a Principal Analytics Specialist Solutions Architect based in Atlanta, specializing in building enterprise data platforms, data warehousing, and analytics solutions. He has over 20 years of experience in building data assets and leading complex data platform programs for banking and insurance clients across the globe.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-89736" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/02/ramonlopez.jpeg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;“Ramon Lopez”&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/ramonjose/" target="_blank" rel="noopener"&gt;Ramon Lopez&lt;/a&gt; is a Principal Solutions Architect for Amazon Quick. With many years of experience building BI solutions and a background in accounting, he loves working with customers, creating solutions, and making world-class services. When not working, he prefers to be outdoors in the ocean or up on a mountain.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Agentic AI for observability and troubleshooting with Amazon OpenSearch Service</title>
		<link>https://aws.amazon.com/blogs/big-data/agentic-ai-for-observability-and-troubleshooting-with-amazon-opensearch-service/</link>
					
		
		<dc:creator><![CDATA[Muthu Pitchaimani]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 21:44:17 +0000</pubDate>
				<category><![CDATA[Amazon Bedrock AgentCore]]></category>
		<category><![CDATA[Amazon OpenSearch Service]]></category>
		<category><![CDATA[Analytics]]></category>
		<guid isPermaLink="false">ed1d3f381488d96c6acb8c9b6f81bbfda0729f33</guid>

					<description>Now, Amazon OpenSearch Service brings three new agentic AI features to OpenSearch UI. In this post, we show how these capabilities work together to help engineers go from alert to root cause in minutes. We also walk through a sample scenario where the Investigation Agent automatically correlates data across multiple indices to surface a root cause hypothesis.</description>
										<content:encoded>&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon OpenSearch Service&lt;/a&gt; powers observability workflows for organizations, giving their Site Reliability Engineering (SRE) and DevOps teams a single pane of glass to aggregate and analyze telemetry data.&amp;nbsp;During incidents, correlating signals and identifying root causes demand deep expertise in log analytics and hours of manual work. Identifying the root cause remains largely manual. For many teams, this is the bottleneck that delays service recovery and burns engineering resources.&lt;/p&gt; 
&lt;p&gt;We recently showed how to &lt;a href="https://aws.amazon.com/blogs/big-data/reduce-mean-time-to-resolution-with-an-observability-agent/" target="_blank" rel="noopener noreferrer"&gt;build an Observability Agent &lt;/a&gt;using &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html" target="_blank" rel="noopener noreferrer"&gt;Amazon OpenSearch Service&lt;/a&gt; and &lt;a href="https://aws.amazon.com/bedrock/" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; to reduce Mean time to Resolution (MTTR).&amp;nbsp; Now, Amazon OpenSearch Service brings many of these functions to the &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/application.html" target="_blank" rel="noopener noreferrer"&gt;OpenSearch UI&lt;/a&gt;—no additional infrastructure required.&amp;nbsp;Three new agentic AI features are offered to streamline and accelerate MTTR:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An &lt;strong&gt;Agentic Chatbot&lt;/strong&gt; that can access the context and the underlying data that you’re looking at, apply agentic reasoning, and use tools to query data and generate insights on your behalf.&lt;/li&gt; 
 &lt;li&gt;An&lt;strong&gt; Investigation Agent&lt;/strong&gt; that deep-dives across signal data with hypothesis-driven analysis, explaining its reasoning at every step.&lt;/li&gt; 
 &lt;li&gt;An &lt;strong&gt;Agentic Memory&lt;/strong&gt; that supports both agents, so their accuracy and speed improve the more you use them.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;In this post, we show how these capabilities work together to help engineers go from alert to root cause in minutes. We also walk through a sample scenario where the Investigation Agent automatically correlates data across multiple indices to surface a root cause hypothesis.&lt;/p&gt; 
&lt;h2&gt;How the agentic AI capabilities work together&lt;/h2&gt; 
&lt;p&gt;These AI capabilities are accessible from &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/application.html" target="_blank" rel="noopener noreferrer"&gt;OpenSearch UI&lt;/a&gt;&amp;nbsp;through an &lt;strong&gt;Ask AI&lt;/strong&gt; button, as shown in the following diagram, which gives an entry point for the&amp;nbsp;&lt;strong&gt;Agentic Chatbot&lt;/strong&gt;.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89637" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-1.jpg" alt="" width="2560" height="1029"&gt;&lt;/p&gt; 
&lt;h3&gt;Agentic Chatbot&lt;/h3&gt; 
&lt;p&gt;To open the chatbot interface, choose Ask AI.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89638" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-2.jpg" alt="" width="1274" height="1358"&gt;&lt;/p&gt; 
&lt;p&gt;The chatbot understands the context of the current page, so it understands what you’re looking at before you ask a question. You can ask questions about your data, initiate an investigation, or ask the chatbot to explain a concept. After it understands your request, the chatbot plans and uses tools to access data, including generating and running queries in the Discover page, and applies reasoning to produce a data-driven answer. You can also use the chatbot in the Dashboard page, initiating conversations from a particular visualization to get a summary as shown in the following image.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89639" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-3.jpg" alt="" width="2560" height="1401"&gt;&lt;/p&gt; 
&lt;h3&gt;Investigation agent&lt;/h3&gt; 
&lt;p&gt;Many incidents are too complex to resolve with one or two queries. Now you can get the help of the investigation agent to handle these complex situations. The investigation agent uses&amp;nbsp;the &lt;a href="https://docs.opensearch.org/latest/ml-commons-plugin/agents-tools/agents/plan-execute-reflect/" target="_blank" rel="noopener noreferrer"&gt;plan-execute-reflect agent&lt;/a&gt;, which is designed for solving complex tasks that require iterative reasoning and step-by-step execution.&amp;nbsp;It uses a Large Language Model (LLM) as a planner and another LLM as an executor. When an engineer identifies a suspicious observation, like an error rate spike or a latency anomaly, they can ask the investigation agent to investigate. One of the important steps the investigation agent performs is re-evaluation. The agent, after executing each step, reevaluates the plan using the planner and the intermediate results. The planner can adjust the plan if necessary or skip a step or dynamically add steps based on this new information. Using the planner, the agent generates a root cause analysis report led by the most likely hypothesis and recommendations, with full agent traces showing every reasoning step, all findings, and how they support the final hypotheses. You can provide feedback, add your own findings, iterate on the investigation goal, and review and validate each step of the agent’s reasoning. This approach mirrors how experienced incident responders work, but completes automatically in minutes. You can also use the “/investigate” slash command to initiate an investigation directly from the chatbot, building on an ongoing conversation or starting with a different investigation goal.&lt;/p&gt; 
&lt;h2&gt;Agent in action&lt;/h2&gt; 
&lt;h3&gt;Automatic query generation&lt;/h3&gt; 
&lt;p&gt;Consider a situation where you’re&amp;nbsp;an SRE or DevOps engineer&amp;nbsp;and received an alert that a key service is experiencing elevated latency. You log in to the OpenSearch UI, navigate to the Discover page, and select the Ask AI button. Without any expertise in the Piped Processing Language (PPL) query language, you enter the question “find all requests with latency greater than 10 seconds”. The chatbot understands the context and the data that you’re looking at, thinks through the request, generates the right PPL command, and updates it in the query bar to get you the results. And if the query runs into any errors, the chatbot can learn about the error, self-correct, and iterate on the query to get the results for you.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89640" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-4.jpg" alt="" width="2560" height="1172"&gt;&lt;/p&gt; 
&lt;h3&gt;Investigation and investigation management&lt;/h3&gt; 
&lt;p&gt;For complex incidents that normally require manually analyzing and correlating multiple logs for the possible root cause, you can choose &lt;strong&gt;Start Investigation&lt;/strong&gt; to initiate the investigation agent. You can provide a goal for the investigation, along with any context or hypothesis that you want to instruct the investigation. For example, “identify the root cause of widespread high latency across services. Use TraceIDs from slow spans to correlate with detailed log entries in the related log indices. Analyze affected services, operations, error patterns, and any infrastructure or application-level bottlenecks without sampling&lt;strong&gt;”.&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89641" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-5.jpg" alt="" width="636" height="358"&gt;&lt;/p&gt; 
&lt;p&gt;The agent, as part of the conversation, will offer to investigate any issue that you’re trying to debug.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89642" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-6.jpg" alt="" width="1426" height="1356"&gt;&lt;/p&gt; 
&lt;p&gt;The agent sets goals for itself along with any other relevant information like indices, associated time range, and other, and asks for your confirmation before creating a &lt;em&gt;Notebook&lt;/em&gt; for this investigation. A Notebook&amp;nbsp;is a way within the OpenSearch UI to develop a rich report that’s live and collaborative. This helps with the management of the investigation and allows for reinvestigation at a later date if necessary.&lt;/p&gt; 
&lt;p&gt;After the investigation starts, the agent will&amp;nbsp;perform a quick analysis by log sequence and data distribution to surface outliers. Then, it will plan for the investigation into a series of actions, and then performs each action, such as query for a specific log type and time range. It will reflect on the results at every step, and iterate on the plan until it reaches the most likely hypotheses. Intermediate results will appear on the same page as the agent works so that you can follow the reasoning in real time. For example, you find that the Investigation Agent accurately mapped out the service topology and used it as a key intermediary steps for the investigation.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89643" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-7.jpg" alt="" width="1942" height="1748"&gt;&lt;/p&gt; 
&lt;p&gt;As the investigation completes, the investigation agent concludes that the most likely hypothesis is&amp;nbsp;a fraud detection timeout. The associated finding shows a log entry from the payment service: “currency amount is too big, waiting for fraud detection”. This matches a known system design where large transactions trigger a fraud detection call that blocks the request until the&amp;nbsp;transaction is scored and assessed. The agent arrived at this finding by correlating data across two separate indices, a metrics index where the original duration data lived, and a correlated log index where the payment service entries were stored. The agent linked these indices using trace IDs, connecting the latency measurement to the specific log entry that explained it.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89644" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-8.jpg" alt="" width="2560" height="1644"&gt;&lt;/p&gt; 
&lt;p&gt;After reviewing the hypothesis and the supporting evidence, you find the result reasonable and aligns with your domain knowledge and past experiences with similar issues. You can now accept the hypothesis and review the request flow topology for the affected traces that were provided as part of the hypothesis investigation.&lt;/p&gt; 
&lt;p&gt;Alternatively, if you find that the initial hypothesis wasn’t helpful, you can review the alternative hypothesis at the bottom of the report and select any of the alternative hypotheses if there’s one that’s more accurate. You can also trigger a re-investigation with additional inputs, or corrections from previous input so that the Investigation Agent can rework it.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="alignnone size-full wp-image-89645" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-9.jpg" alt="" width="660" height="612"&gt;&lt;/p&gt; 
&lt;h2&gt;Getting started&lt;/h2&gt; 
&lt;p&gt;You can use any of the new agentic AI features (limits apply) in the OpenSearch UI at no cost.&amp;nbsp;You will find the new agentic AI features ready to use in your OpenSearch UI applications, unless you have previously disabled AI features in any OpenSearch Service domains in your account.&amp;nbsp;To enable or disable the AI features, you can navigate to the details page of the OpenSearch UI application in AWS Management Console and update the AI settings from there. Alternatively, you can also use the &lt;code&gt;registerCapability&lt;/code&gt; API to enable the AI features or use the &lt;code&gt;deregisterCapability&lt;/code&gt; API to disable them.&amp;nbsp;Learn more at&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/application-ai-assistant.html" target="_blank" rel="noopener noreferrer"&gt;Agentic AI in Amazon OpenSearch Services&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;The agentic AI feature uses the identity and permissions of the logged in users for authorizing access to the connected data sources. Make sure that your users have the necessary permissions to access the data sources. For more information, see&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/application-getting-started.html" target="_blank" rel="noopener noreferrer"&gt;Getting Started with OpenSearch UI&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;The investigation results are saved in the metadata system of OpenSearch UI and encrypted with a service managed key. Optionally, you can configure a customer managed key to encrypt all of the metadata with your own key. For more information, see &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/application-encryption-cmk.html" target="_blank" rel="noopener noreferrer"&gt;Encryption and Customer Managed Key with OpenSearch UI&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;The AI features are powered by Claude Sonnet 4.6 model in Amazon Bedrock. Learn more at&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/data-protection.html" target="_blank" rel="noopener noreferrer"&gt;Amazon Bedrock Data Protection&lt;/a&gt;.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;The new agentic AI capabilities announced for Amazon OpenSearch Service help reduce Mean Time to Resolution by providing context-aware agentic chatbot for assistance, hypothesis-driven investigations with full explainability, and agentic memory for context consistency. With the new agentic AI capabilities, your engineering team can spend less time writing queries and correlating signals, and more time acting on confirmed root causes. We invite you to explore these capabilities and experiment with your applications today.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-89646" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-10.jpg" alt="" width="119" height="163"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Muthu Pitchaimani&lt;/h3&gt; 
  &lt;p&gt;Muthu is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search applications and solutions. Muthu is interested in the topics of networking and security, and is based out of Austin, Texas.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-89647" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-11.jpg" alt="" width="125" height="164"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Hang (Arthur) Zuo&lt;/h3&gt; 
  &lt;p&gt;Arthur is a Senior Product Manager with Amazon OpenSearch Service. Arthur leads OpenSearch UI platform and agentic AI features for observability and search use cases. Arthur is interested in the topics of Agentic AI and data products.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-89648" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/04/01/BDB-5859-image-12.jpg" alt="" width="117" height="157"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Mikhail Vaynshteyn&lt;/h3&gt; 
  &lt;p&gt;Mikhail is a Solutions Architect with Amazon Web Services. Mikhail works with healthcare and life sciences customers and specializes in data analytics services. Mikhail has more than 20 years of industry experience covering a wide range of technologies and sectors.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>Streamline Apache Kafka topic management with Amazon MSK</title>
		<link>https://aws.amazon.com/blogs/big-data/streamline-apache-kafka-topic-management-with-amazon-msk/</link>
					
		
		<dc:creator><![CDATA[Swapna Bandla]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 15:32:26 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Managed Streaming for Apache Kafka (Amazon MSK)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">ad1a9cd24dfcf01687f2d298e6f11890cb9809d4</guid>

					<description>In this post, we show you how to use the new topic management capabilities of Amazon MSK to streamline your Apache Kafka operations. We demonstrate how to manage topics through the console, control access with AWS Identity and Access Management (IAM), and bring topic provisioning into your continuous integration and continuous delivery (CI/CD) pipelines.</description>
										<content:encoded>&lt;p&gt;If you manage Apache Kafka today, you know the effort required to manage topics. Whether you use infrastructure as code (IaC) solutions or perform operations with admin clients, setting up topic management takes valuable time that could be spent on building streaming applications.&lt;/p&gt; 
&lt;p&gt;&lt;a href="https://aws.amazon.com/msk/" target="_blank" rel="noopener noreferrer"&gt;Amazon Managed Streaming for Apache Kafka&lt;/a&gt; (Amazon MSK) now streamlines topic management by supporting new topic APIs and console integration. You can programmatically create, update, and delete Apache Kafka topics using familiar interfaces including &lt;a href="https://aws.amazon.com/cli/" target="_blank" rel="noopener noreferrer"&gt;AWS Command Line Interface&lt;/a&gt; (AWS CLI), &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/API/sdk-general-information-section.html" target="_blank" rel="noopener noreferrer"&gt;AWS SDKs&lt;/a&gt;, and &lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html" target="_blank" rel="noopener noreferrer"&gt;AWS CloudFormation&lt;/a&gt;. With these APIs, you can define topic properties such as replication factor and partition count and configuration settings like retention and cleanup policies. The Amazon MSK console integrates these APIs, bringing all topic operations to one place. You can now create or update topics with a few selections using guided defaults while gaining comprehensive visibility into topic configurations, partition-level information, and metrics. You can browse for topics within a cluster, review replication settings and partition counts, and go into individual topics to examine detailed configuration, partition-level information, and metrics. A unified dashboard consolidates partition topics and metrics in one view.&lt;/p&gt; 
&lt;p&gt;In this post, we show you how to use the new topic management capabilities of Amazon MSK to streamline your Apache Kafka operations. We demonstrate how to manage topics through the console, control access with &lt;a href="https://aws.amazon.com/iam/" target="_blank" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt;, and bring topic provisioning into your continuous integration and continuous delivery (CI/CD) pipelines.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;To get started with topic management, you need:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;An active AWS account with appropriate IAM permissions for Amazon MSK.&lt;/li&gt; 
 &lt;li&gt;An existing Amazon MSK Express or Standard cluster using Apache Kafka version 3.6 and above.&lt;/li&gt; 
 &lt;li&gt;Basic familiarity with Apache Kafka concepts like topics, partitions, and replication.&lt;/li&gt; 
 &lt;li&gt;AWS CLI installed and configured (for command line examples).&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Creating topics&lt;/h2&gt; 
&lt;p&gt;The MSK console provides a guided experience with sensible defaults while still offering advanced configuration options when you need them.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the &lt;a href="http://console.aws.amazon.com/msk" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK console&lt;/a&gt; and select your cluster.&lt;/li&gt; 
 &lt;li&gt;Choose the &lt;strong&gt;Topics&lt;/strong&gt; tab, then choose &lt;strong&gt;Create topic&lt;/strong&gt;.&lt;br&gt; &lt;img loading="lazy" class="aligncenter wp-image-89205 size-full" style="margin: 10px 0px 10px 0px;border: 1px solid #cccccc" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/20/BDB-5775-1.png" alt="" width="939" height="420"&gt;&lt;/li&gt; 
 &lt;li&gt;Enter a topic name (for example, &lt;code&gt;customer-orders&lt;/code&gt;).&lt;/li&gt; 
 &lt;li&gt;Specify the number of partitions (use the guided defaults or customize based on your needs).&lt;/li&gt; 
 &lt;li&gt;Set the replication factor. Note that Express brokers improve the availability and durability of your Amazon MSK clusters by setting values for critical configurations and protecting them from common misconfiguration. If you try to create a topic with a replication factor value other than 3, Amazon MSK Express will create the topic with a replication factor of 3 by default.&lt;/li&gt; 
 &lt;li&gt;(Optional) Configure advanced settings like retention period or message size limits.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create topic&lt;/strong&gt;.&lt;br&gt; &lt;img loading="lazy" class="aligncenter wp-image-89206 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/20/BDB-5775-2.png" alt="" width="939" height="699"&gt;&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;The console validates your configuration and creates the topic. You can create multiple topics simultaneously with the same configuration settings. These topic API responses reflect data that updates approximately every minute. For the most current topic state after making changes, wait approximately one minute before querying.&lt;/p&gt; 
&lt;h2&gt;Configuration considerations&lt;/h2&gt; 
&lt;p&gt;When choosing configuration options, consider your workload requirements:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;You can configure more partitions to achieve higher throughput but this requires more broker resources. Refer to &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html" target="_blank" rel="noopener noreferrer"&gt;Best practices for Standard brokers&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/limits.html#msk-express-quota" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK Express broker quota&lt;/a&gt; for more information on partition limits.&lt;/li&gt; 
 &lt;li&gt;With Standard brokers you can improve durability by configuring higher replication factors, though this will increase your storage costs. Refer to “Build highly available clusters” in &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html" target="_blank" rel="noopener noreferrer"&gt;Best practices for Standard brokers&lt;/a&gt; for more information on replication factors.&lt;/li&gt; 
 &lt;li&gt;Standard brokers support the full range of Apache Kafka topic configurations.&lt;/li&gt; 
 &lt;li&gt;Express brokers offer a curated set of the most important settings. Refer to &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-configuration-express-read-write.html#msk-configuration-express-topic-configuration" target="_blank" rel="noopener noreferrer"&gt;Topic-level configurations on Express Brokers&lt;/a&gt; for more information.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Viewing and monitoring topics&lt;/h2&gt; 
&lt;p&gt;After you create topics, the MSK console provides comprehensive visibility into their configuration. When you select a specific topic, you will see detailed information:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;strong&gt;Partitions tab&lt;/strong&gt;: Shows the distribution of partitions across brokers, including leader assignments and in-sync replica status showcasing Broker IDs for leader and replicas.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Configuration tab&lt;/strong&gt;: Displays all topic-level configuration settings.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Monitoring tab&lt;/strong&gt;: Integrates with Amazon CloudWatch to show metrics like bytes in/out, message rates, and consumer lag.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-89207 size-large" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/20/BDB-5775-3-1024x598.png" alt="" width="1024" height="598"&gt;&lt;/p&gt; 
&lt;h2&gt;Updating topic configurations&lt;/h2&gt; 
&lt;p&gt;As your workload requirements evolve, you might need to adjust topic configurations. You can modify various topic settings depending on your cluster type. For example:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Retention settings: Adjust &lt;code&gt;retention.ms&lt;/code&gt; (time-based) or &lt;code&gt;retention.bytes&lt;/code&gt; (size-based) to control how long messages are retained.&lt;/li&gt; 
 &lt;li&gt;Message size limits: Modify &lt;code&gt;max.message.bytes&lt;/code&gt; to accommodate larger or smaller messages.&lt;/li&gt; 
 &lt;li&gt;Compression: Change &lt;code&gt;compression.type&lt;/code&gt; to optimize storage and network usage.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Configuration changes take effect immediately for new messages. Existing messages remain subject to the previous configuration until they age out or are consumed.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-89208 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/20/BDB-5775-4.png" alt="" width="955" height="608"&gt;&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-89209 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/20/BDB-5775-5.png" alt="" width="955" height="567"&gt;&lt;/p&gt; 
&lt;h2&gt;Deleting topics&lt;/h2&gt; 
&lt;p&gt;Amazon MSK also provides APIs for deleting topics that are no longer in use. Before deleting a topic, verify that:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;No active producers are writing to the topic&lt;/li&gt; 
 &lt;li&gt;All consumers have finished processing messages&lt;/li&gt; 
 &lt;li&gt;You have backups if you need to retain the data&lt;/li&gt; 
 &lt;li&gt;Downstream applications won’t be impacted&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Topic deletion permanently removes all messages in the topic.&lt;/p&gt; 
&lt;p&gt;&lt;img loading="lazy" class="aligncenter wp-image-89210 size-large" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/20/BDB-5775-6-1024x637.png" alt="" width="1024" height="637"&gt;&lt;/p&gt; 
&lt;h2&gt;Control access with IAM&lt;/h2&gt; 
&lt;p&gt;Beyond streamlining topic operations, you also need appropriate access controls. Access control uses IAM, so you define permissions using the same model that you apply to other AWS resources. Amazon MSK uses a two-level permission model:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Resource-level permissions: An IAM policy that enforces which operations the cluster will allow&lt;/li&gt; 
 &lt;li&gt;Principal-level permissions: IAM policies attached to Roles or Users that enforce which operations a principal is allowed to perform on a cluster&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;With this separation, you can control access depending on your organizational needs and access patterns for your cluster. Refer to the &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-topic-operations-information.html#topic-operations-permissions" target="_blank" rel="noopener noreferrer"&gt;IAM permissions&lt;/a&gt; documentation for IAM permissions required for topic management for the Amazon MSK cluster.&lt;/p&gt; 
&lt;p&gt;You can grant your operations team broad access to manage all topics and restrict application teams to manage only their own topics. The permission granularity that you need is available through standard IAM policies. If you’ve already configured IAM permissions for Apache Kafka topics, they work immediately with the new functionality without any migration or reconfiguration.&lt;/p&gt; 
&lt;p&gt;Here is a sample IAM policy definition that allows Describe Topic API&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:Connect"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:111111111111:cluster/iam-auth-acl-test/a6b5c6d5-f74f-4dbc-ad14-63fb5e87fe4f-2"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:DescribeTopic",
                "kafka-cluster:DescribeTopicDynamicConfiguration"
            ],
            "Resource": [
                "arn:aws:kafka:us-east-1:111111111111:topic/iam-auth-acl-test/a6b5c6d5-f74f-4dbc-ad14-63fb5e87fe4f-2/*"
            ]
        }
    ]
}&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This IAM policy grants the necessary permissions to describe Kafka topics in your Amazon MSK cluster. The policy includes three key permissions:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;code&gt;kafka-cluster:Connect&lt;/code&gt; – Allows connection to the specified MSK cluster&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;kafka-cluster:DescribeTopic&lt;/code&gt; – Enables viewing topic details&lt;/li&gt; 
 &lt;li&gt;&lt;code&gt;kafka-cluster:DescribeTopicDynamicConfiguration&lt;/code&gt; – Enables viewing topic dynamic configuration&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;The policy is scoped to a specific cluster ARN and applies to all topics within that cluster using the wildcard pattern &lt;code&gt;/*&lt;/code&gt;. Replace the placeholder Amazon MSK cluster ARN with your MSK cluster ARN.&lt;/p&gt; 
&lt;h2&gt;Infrastructure as Code&lt;/h2&gt; 
&lt;p&gt;If you manage infrastructure as code (IaC), you can now define topics alongside clusters in your CloudFormation templates:&lt;/p&gt; 
&lt;div class="hide-language"&gt; 
 &lt;pre&gt;&lt;code class="lang-code"&gt;Resources:
    OrdersTopic:
      Type: AWS::MSK::Topic
      Properties:
        ClusterArn: !GetAtt MyMSKCluster.Arn
        TopicName: orders
        NumPartitions: 6
        ReplicationFactor: 3
        Config:
          retention.ms: "604800000"&lt;/code&gt;&lt;/pre&gt; 
&lt;/div&gt; 
&lt;p&gt;This approach brings topic provisioning into your CI/CD pipelines.&lt;/p&gt; 
&lt;h2&gt;Availability and pricing&lt;/h2&gt; 
&lt;p&gt;The new Amazon MSK topic management experience is available today for Standard and Express Amazon MSK clusters using Apache Kafka version 3.6 and above in all AWS Regions where Amazon MSK is offered, at no additional cost.&lt;/p&gt; 
&lt;h2&gt;Cleanup&lt;/h2&gt; 
&lt;p&gt;To avoid incurring additional charges to your AWS account, ensure you delete all resources created during this tutorial, including:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Amazon MSK cluster&lt;/li&gt; 
 &lt;li&gt;Any Kafka topics created&lt;/li&gt; 
 &lt;li&gt;Associated AWS resources (security groups, VPCs, etc., if created specifically for this blog)&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;Remember to verify that all resources have been successfully removed to prevent ongoing costs.&lt;/p&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;Topic management has been a persistent pain point for Apache Kafka operations. The new integrated experience in Amazon MSK now reduces operational friction by bringing topic operations into the AWS tools that you use every day. You now have a consistent, streamlined way to handle these operations for all Apache Kafka topics across multiple MSK clusters. This capability reflects our commitment to reducing operational complexity in Apache Kafka. You get the reliability and performance of Apache Kafka without the operational overhead that traditionally comes with it. Your team spends less time on infrastructure maintenance and more time building streaming applications that drive your business forward.&lt;/p&gt; 
&lt;p&gt;Ready to start streamlining your topic management? Start managing your topics today through the Amazon MSK console or by visiting the &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/msk-topic-operations-information.html" target="_blank" rel="noopener noreferrer"&gt;Amazon MSK documentation&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-thumbnail wp-image-85003" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2025/11/08/swapnaba-100x133.jpg" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Swapna Bandla&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/swapnabandla/" target="_blank" rel="noopener noreferrer"&gt;Swapna&lt;/a&gt; is a Senior Streaming Solutions Architect at AWS. With a deep understanding of real-time data processing and analytics, she partners with customers to architect scalable, cloud-native solutions that align with AWS Well-Architected best practices. Swapna is passionate about helping organizations unlock the full potential of their data to drive business value. Beyond her professional pursuits, she cherishes quality time with her family.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-full wp-image-89475" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/24/bdb-5775-mmehrten-headshot.png" alt="" width="100" height="107"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Mazrim Mehrtens&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/mmehrtens/" target="_blank" rel="noopener noreferrer"&gt;Mazrim&lt;/a&gt; is a Sr. Specialist Solutions Architect for messaging and streaming workloads. They work with customers to build and support systems that process and analyze terabytes of streaming data in real time, run enterprise Machine Learning pipelines, and create systems to share data across teams seamlessly with varying data toolsets and software stacks.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignnone size-thumbnail wp-image-89211" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/20/Screenshot-2026-03-20-at-4.25.49 PM-100x133.png" alt="" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Judy Huang&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/judy-jun-h-9920b99a/" target="_blank" rel="noopener noreferrer"&gt;Judy&lt;/a&gt; is a Senior Product Manager for Amazon Managed Streaming for Apache Kafka (MSK) at AWS. She is passionate about real-time data systems and helping organizations unlock the value of streaming data at scale. Her work focuses on improving how customers manage Kafka infrastructure and building capabilities that make streaming platforms more accessible, resilient, and integrated with the broader data ecosystem.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
		<item>
		<title>How to set up a network-isolated VPC for Amazon SageMaker Unified Studio</title>
		<link>https://aws.amazon.com/blogs/big-data/how-to-set-up-a-network-isolated-vpc-for-amazon-sagemaker-unified-studio/</link>
					
		
		<dc:creator><![CDATA[Rohit Vashishtha]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 15:28:23 +0000</pubDate>
				<category><![CDATA[Amazon DataZone]]></category>
		<category><![CDATA[Amazon SageMaker Unified Studio]]></category>
		<category><![CDATA[Amazon VPC]]></category>
		<category><![CDATA[Expert (400)]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">67d2a499e9cd9c11ff7c275ba382f552beffdded</guid>

					<description>In this post, we explore scenarios where customers need more control over their network infrastructure when building their unified data and analytics strategic layer. We’ll show how you can bring your own Amazon Virtual Private Cloud (Amazon VPC) and set up Amazon SageMaker Unified Studio for strict network control.</description>
										<content:encoded>&lt;p&gt;Organizations are finding significant value using&amp;nbsp;&lt;a href="https://aws.amazon.com/blogs/big-data/an-integrated-experience-for-all-your-data-and-ai-with-amazon-sagemaker-unified-studio/" target="_blank" rel="noopener noreferrer"&gt;an integrated experience for all your data and AI with Amazon SageMaker Unified&amp;nbsp;Studio&lt;/a&gt;. However, many organizations require strict network control to meet security and regulatory compliance requirements like HIPAA or FedRAMP for their data and AI initiatives, while maintaining operational efficiency.&lt;/p&gt; 
&lt;p&gt;In this post, we explore scenarios where customers&amp;nbsp;need more control over their network infrastructure when building their unified data and analytics strategic layer. We’ll show how you can bring your own &lt;a href="https://aws.amazon.com/vpc/" target="_blank" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud&lt;/a&gt; (Amazon VPC) and set up &lt;a href="https://aws.amazon.com/sagemaker/unified-studio/" target="_blank" rel="noopener noreferrer"&gt;Amazon SageMaker Unified&amp;nbsp;Studio&lt;/a&gt; for strict network control.&lt;/p&gt; 
&lt;h2&gt;Solution overview&lt;/h2&gt; 
&lt;p&gt;The solution covers complete technical know-how of a fully private network architecture using Amazon VPC with no public internet exposure. The approach leverages &lt;a href="https://aws.amazon.com/privatelink/" target="_blank" rel="noopener noreferrer"&gt;AWS PrivateLink&lt;/a&gt; through VPC endpoints to provide a secure communication between SageMaker Unified Studio and essential AWS services entirely over the &lt;a href="https://aws.amazon.com/about-aws/global-infrastructure/" target="_blank" rel="noopener noreferrer"&gt;AWS backbone network&lt;/a&gt;.&lt;/p&gt; 
&lt;p&gt;The architecture consists of three core components: a custom VPC named &lt;em&gt;airgapped&lt;/em&gt; with multiple private subnets distributed across at least three Availability Zones for high availability, a comprehensive set of VPC interface and gateway endpoints for service connectivity, and the SageMaker Unified Studio domain configured to operate exclusively within this isolated environment. This design helps ensure that sensitive data never traverses the public internet while maintaining full functionality for data cataloging, query execution, and machine learning workflows.&lt;/p&gt; 
&lt;p&gt;By implementing this network-isolated VPC configuration, organizations gain granular control over network traffic, simplified compliance auditing, and the ability to integrate SageMaker Unified Studio with existing private data sources through controlled network pathways. The solution supports both immediate operational needs and long-term scalability through careful IP address planning and modular endpoint architecture.&lt;/p&gt; 
&lt;h2&gt;Prerequisites&lt;/h2&gt; 
&lt;p&gt;The set&amp;nbsp;up requires you to have an existing VPC (for this post, we’ll refer to the name as&amp;nbsp;&lt;em&gt;airgapped&lt;/em&gt; but in reality, it refers to the VPC you would like to securely set up SageMaker Unified&amp;nbsp;Studio). If you don’t have an existing VPC, you can follow &lt;a href="https://docs.aws.amazon.com/pdfs/sagemaker-unified-studio/latest/adminguide/sagemaker-unified-studio-admin.pdf" target="_blank" rel="noopener noreferrer"&gt;SageMaker Unified Studio domain quick create administrator guide&lt;/a&gt; to get started.&lt;/p&gt; 
&lt;p&gt;The high level steps to create a VPC meeting minimum requirements for SageMaker Unified&amp;nbsp;Studio are as follows:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;In the &lt;a href="https://console.aws.amazon.com/console/home" target="_blank" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt;, navigate to the &lt;a href="https://console.aws.amazon.com/vpcconsole/home" target="_blank" rel="noopener noreferrer"&gt;VPC console&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create VPC&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select the &lt;strong&gt;VPC and more&lt;/strong&gt; radio button.&lt;/li&gt; 
 &lt;li&gt;For&amp;nbsp;&lt;strong&gt;Name tag auto-generation&lt;/strong&gt;, enter &lt;em&gt;airgapped&lt;/em&gt; or a name of your choice.&lt;/li&gt; 
 &lt;li&gt;Keep the default values for&amp;nbsp;&lt;strong&gt;IPv4 CIDR block&lt;/strong&gt;,&lt;strong&gt;&amp;nbsp;IPv6 CIDR block&lt;/strong&gt;,&lt;strong&gt;&amp;nbsp;Tenancy&lt;/strong&gt;,&lt;strong&gt; NAT gateways&lt;/strong&gt;,&lt;strong&gt;&amp;nbsp;VPC endpoints&lt;/strong&gt;, and&lt;strong&gt; DNS options&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select &lt;strong&gt;3&lt;/strong&gt; for&amp;nbsp;&lt;strong&gt;Number of Availability Zones (AZs)&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select&lt;strong&gt; 0&lt;/strong&gt; for&lt;strong&gt; Number of public subnets&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create VPC&lt;/strong&gt;.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;This produces the following VPC resource map:&lt;/p&gt; 
&lt;div id="attachment_89342" style="width: 1378px" class="wp-caption aligncenter"&gt;
 &lt;img aria-describedby="caption-attachment-89342" loading="lazy" class="wp-image-89342 " src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/23/BDB-55181.jpg" alt="Figure 1 - VPC configuration" width="1368" height="539"&gt;
 &lt;p id="caption-attachment-89342" class="wp-caption-text"&gt;&lt;em&gt;Figure 1 – VPC configuration&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt; 
&lt;h2&gt;Set&amp;nbsp;up SageMaker Unified&amp;nbsp;Studio&lt;/h2&gt; 
&lt;p&gt;Now, we will set&amp;nbsp;up SageMaker Unified&amp;nbsp;Studio in an existing VPC, named &lt;em&gt;airgapped-vpc&lt;/em&gt;.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Navigate to the&amp;nbsp;&lt;a href="http://us-east-1.console.aws.amazon.com/datazone/home?region=us-east-1#/domains" target="_blank" rel="noopener noreferrer"&gt;SageMaker console&lt;/a&gt;, choose &lt;strong&gt;Domains&lt;/strong&gt; in the navigation pane.&lt;/li&gt; 
 &lt;li&gt;Choose&amp;nbsp;&lt;strong&gt;Create Domain&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;How do you want to set up your domain?&lt;/strong&gt;, select &lt;strong&gt;Quick set&amp;nbsp;up&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Expand the Quick set&amp;nbsp;up settings&lt;/li&gt; 
 &lt;li&gt;Provide a &lt;strong&gt;name&lt;/strong&gt; for your domain, such as&amp;nbsp;&lt;em&gt;airgapped-domain.&lt;/em&gt;&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;Virtual private cloud (VPC)&lt;/strong&gt;, select &lt;em&gt;airgapped-vpc&lt;/em&gt;.&lt;/li&gt; 
 &lt;li&gt;For &lt;strong&gt;subnets&lt;/strong&gt;, select a minimum of two private subnets.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Continue&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter an email address to create a user in &lt;a href="https://aws.amazon.com/iam/identity-center/" target="_blank" rel="noopener noreferrer"&gt;AWS&amp;nbsp;IAM Identity Center&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create domain&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Once the domain is created, choose &lt;strong&gt;Open unified&amp;nbsp;studio&lt;/strong&gt; or use&lt;strong&gt; SageMaker Unified&amp;nbsp;Studio URL&lt;/strong&gt; under &lt;strong&gt;Domain details&lt;/strong&gt; to&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/getting-started-access-the-portal.html" target="_blank" rel="noopener noreferrer"&gt;access SageMaker Unified Studio&lt;/a&gt;. &lt;p&gt;&lt;/p&gt;
  &lt;div id="attachment_89344" style="width: 1347px" class="wp-caption alignright"&gt;
   &lt;img aria-describedby="caption-attachment-89344" loading="lazy" class="wp-image-89344" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/23/BDB-55183.png" alt="Figure 2 - Amazon SageMaker Unified Studio URL Welcome Page" width="1337" height="708"&gt;
   &lt;p id="caption-attachment-89344" class="wp-caption-text"&gt;&lt;em&gt;Figure 2 – Amazon SageMaker Unified Studio URL Welcome Page&lt;/em&gt;&lt;/p&gt;
  &lt;/div&gt;&lt;/li&gt; 
 &lt;li&gt;After logging in to SageMaker Unified&amp;nbsp;Studio, &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/getting-started-create-a-project.html" target="_blank" rel="noopener noreferrer"&gt;create a project&lt;/a&gt; using the guided wizard.&lt;/li&gt; 
 &lt;li&gt;Once the project is created, we need to add the necessary VPC endpoints to allow traffic from the project to communicate to AWS services.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html" target="_blank" rel="noopener noreferrer"&gt;S3 Gateway VPC endpoint&lt;/a&gt; was already selected as part of VPC creation step 5 in prerequisites and thus created by default. Now we must add two more VPC endpoints for &lt;a href="https://aws.amazon.com/datazone/" target="_blank" rel="noopener noreferrer"&gt;Amazon DataZone&lt;/a&gt; and&amp;nbsp;AWS Security Token Service as illustrated in following step.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;These are the minimum set of VPC endpoints to allow using the tooling within SageMaker Unified&amp;nbsp;Studio. For a list of other mandatory and non-mandatory VPC endpoints refer to the tables in the latter part of this post.&lt;/p&gt; 
&lt;h2&gt;Create an interface endpoint&lt;/h2&gt; 
&lt;p&gt;To create an interface endpoint, complete following steps:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Go to the SageMaker Unified&amp;nbsp;Studio &lt;strong&gt;Project details&lt;/strong&gt; page and copy the &lt;strong&gt;Project ID&lt;/strong&gt;.&lt;br&gt; &lt;img loading="lazy" class="wp-image-89346 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/23/BDB-55185.png" alt="Figure 3 - SageMaker Unifed Studio Project Details Page" width="874" height="479"&gt;&lt;em&gt;Figure 3 – SageMaker Unifed Studio Project Details Page&lt;/em&gt;&lt;/li&gt; 
 &lt;li&gt;Go to the &lt;strong&gt;VPC console&lt;/strong&gt; and choose &lt;strong&gt;Endpoints&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create Endpoint&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Enter a name for the endpoint, for example, &lt;em&gt;DataZone endpoint for SageMaker Unified&amp;nbsp;Studio&lt;/em&gt;.&lt;/li&gt; 
 &lt;li&gt;For AWS Services, enter&amp;nbsp;&lt;em&gt;DataZone&lt;/em&gt;. &lt;p&gt;&lt;/p&gt;
  &lt;div id="attachment_89348" style="width: 1356px" class="wp-caption alignright"&gt;
   &lt;img aria-describedby="caption-attachment-89348" loading="lazy" class="wp-image-89348" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/23/BDB-55187.png" alt="Figure 4 - Interface Endpoint creation wizard for AWS Service datazone" width="1346" height="426"&gt;
   &lt;p id="caption-attachment-89348" class="wp-caption-text"&gt;&lt;em&gt;Figure 4 – Interface Endpoint creation wizard for AWS Service datazone&lt;/em&gt;&lt;/p&gt;
  &lt;/div&gt;&lt;/li&gt; 
 &lt;li&gt;Select &lt;strong&gt;Service Name = com.amazonaws.us-east-1.datazone&lt;/strong&gt; from the available options. &lt;p&gt;&lt;/p&gt;
  &lt;div id="attachment_89350" style="width: 1346px" class="wp-caption alignright"&gt;
   &lt;img aria-describedby="caption-attachment-89350" loading="lazy" class="wp-image-89350" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/23/BDB-55189.png" alt="Figure 5 - Interface Endpoint creation wizard network settings" width="1336" height="528"&gt;
   &lt;p id="caption-attachment-89350" class="wp-caption-text"&gt;&lt;em&gt;Figure 5 – Interface Endpoint creation wizard network settings&lt;/em&gt;&lt;/p&gt;
  &lt;/div&gt;&lt;/li&gt; 
 &lt;li&gt;Select the subnets in the &lt;em&gt;airgapped-vpc&lt;/em&gt; that you created earlier.&lt;/li&gt; 
 &lt;li&gt;Filter the &lt;strong&gt;Security Groups&lt;/strong&gt; by pasting the copied &lt;strong&gt;Project ID&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Select the security group with &lt;strong&gt;Group Name&lt;/strong&gt; &lt;em&gt;datazone-&amp;lt;project-id&amp;gt;-dev&lt;/em&gt;.&lt;/li&gt; 
 &lt;li&gt;Choose &lt;strong&gt;Create Endpoint&lt;/strong&gt;.&lt;/li&gt; 
 &lt;li&gt;Repeat the same steps to create a VPC endpoint for AWS STS.&lt;/li&gt; 
 &lt;li&gt;Once the VPC endpoints are created, validate connectivity in the SageMaker project by running a SQL query or using a Jupyterlab notebook.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;For a successful domain and project which does not get into any service level usage, the mandatory VPC endpoints to be created are: S3 Gateway, DataZone, and STS interface endpoints. For other service usage dependent operations like authentication, data preview and working with compute, you would require other mandatory service specific endpoints explained later in this post.&lt;/p&gt; 
&lt;h2&gt;Best practices for VPC set&amp;nbsp;up for various use cases&lt;/h2&gt; 
&lt;p&gt;When setting up SageMaker Unified Studio domain and project profiles, you need to specify the VPC network, subnets, and security groups. Here are some best practices around IP allocation, usage volume and expected growth to consider for different use cases within enterprises.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Production and enterprise use cases&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;If your organization require strict network control to meet security and compliance requirements for data and AI initiatives, consider following best practices in your production environment.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Use the bring-your-own (BYO) VPC approach to comply with company-specific networking and security requirements.&lt;/li&gt; 
 &lt;li&gt;Implement private networking using VPC endpoints to keep traffic within the AWS backbone.&lt;/li&gt; 
 &lt;li&gt;Use at least two private subnets across different Availability Zones.&lt;/li&gt; 
 &lt;li&gt;Enable DNS hostnames and DNS Support.&lt;/li&gt; 
 &lt;li&gt;Disable auto-assign public IP on subnets.&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/vpc/latest/ipam/planning-ipam.html" target="_blank" rel="noopener noreferrer"&gt;Plan IP capacity&lt;/a&gt; for at least 5 years. A prescriptive guidance for SageMaker Unified&amp;nbsp;Studio is shared in &lt;strong&gt;&lt;em&gt;VPC and Networking details&lt;/em&gt;&lt;/strong&gt; section later in this post. Consider the following: 
  &lt;ul&gt; 
   &lt;li&gt;Number of users&lt;/li&gt; 
   &lt;li&gt;Number of apps per user&lt;/li&gt; 
   &lt;li&gt;Number of unique instance types per user&lt;/li&gt; 
   &lt;li&gt;Average number of training instances&lt;/li&gt; 
   &lt;li&gt;Expected growth percentage&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Testing and non-production use cases&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;For development, testing, non-prod environment where use cases don’t have stringent security and compliance requirements, use automated setup for quick experiments. Use sample &lt;a href="https://github.com/aws/Unified-Studio-for-Amazon-Sagemaker/tree/main/cloudformation" target="_blank" rel="noopener noreferrer"&gt;CloudFormation github templates&lt;/a&gt; as part of the SageMaker Unified&amp;nbsp;Studio express set&amp;nbsp;up, to automate domain and project creation. However, this includes an Internet Gateway which may not be suitable for security-sensitive environments.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Private networking use cases&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;VPCs with private subnets require essential service endpoints to allow client resources like Amazon EC2 instances to securely access AWS services. The traffic between your VPC and AWS services remains within AWS network avoiding public internet exposure.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Implement all mandatory VPC endpoints for core services (SageMaker, DataZone, Glue, and more).&lt;/li&gt; 
 &lt;li&gt;Add optional &lt;a href="https://docs.aws.amazon.com/general/latest/gr/rande.html#view-service-endpoints" target="_blank" rel="noopener noreferrer"&gt;endpoints based on specific service needs&lt;/a&gt;, like IPv4 endpoints, dual-stack endpoints, and FIPS endpoints to programmatically connect to an AWS service.&lt;/li&gt; 
 &lt;li&gt;Work with network administrators for: 
  &lt;ul&gt; 
   &lt;li&gt;Preinstalling needed resources through secure channels like private subnets and self-referencing inbound rules in security groups to enable limited access.&lt;/li&gt; 
   &lt;li&gt;Allowlisting only necessary external connections like NAT gateway IP and bastion host access in firewall rules.&lt;/li&gt; 
   &lt;li&gt;Setting up appropriate proxy configurations if required.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;External data source access use cases&lt;/strong&gt;&lt;/p&gt; 
&lt;p&gt;Consider the following when working with external systems like third-party SaaS platforms, on-premises databases, partner APIs, legacy systems, or external vendors.&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;Consult with network administrators for appropriate connection methods.&lt;/li&gt; 
 &lt;li&gt;Consider AWS PrivateLink integration where available.&lt;/li&gt; 
 &lt;li&gt;Implement appropriate security measures for non-AWS data your source documents.&lt;/li&gt; 
 &lt;li&gt;For High Availability: 
  &lt;ul&gt; 
   &lt;li&gt;Deploy across at least three different Availability Zones (at least two for AWS Regions with only two AZs).&lt;/li&gt; 
   &lt;li&gt;Verify there’s a minimum of three free IPs per subnet.&lt;/li&gt; 
   &lt;li&gt;Consider larger CIDR blocks (/16 recommended) for future scalability.&lt;/li&gt; 
  &lt;/ul&gt; &lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;VPC and networking details&lt;/h2&gt; 
&lt;p&gt;In this section, we provide details of each networking aspect starting with choice of VPCs, network connectivity details for integrated services to work, the basis of VPC and subnet requirements, and finally the VPC endpoints required for private service access.&lt;/p&gt; 
&lt;h3&gt;VPC&lt;/h3&gt; 
&lt;p&gt;At a high level, you have two options to supply VPCs and subnets:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Bring-your-own (BYO) VPC&lt;/strong&gt;. This is typically the case for most customers, as most have company specific networking and security requirements to reuse an existing VPC, or to create a VPC that are compliant with those requirements.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Create VPC with the SageMaker quick set up template&lt;/strong&gt;. When creating a SageMaker Unified&amp;nbsp;Studio domain (DataZone V2 domain in CloudFormation) through the automated quick set&amp;nbsp;up, you will be shown a &lt;em&gt;Quick create stack&lt;/em&gt; wizard in CloudFormation which creates VPCs and subnets used to configure your domain.Note:The quick create stack using template URL is not intended for production use. The template creates an Internet Gateway, which is not allowed in many enterprise settings. This is only appropriate if you are either trying out SageMaker Unified&amp;nbsp;Studio or, running SageMaker Unified&amp;nbsp;Studio for use cases that don’t have stringent security requirements.If you choose this option, you start with &lt;a href="http://us-east-1.console.aws.amazon.com/datazone/home?region=us-east-1&amp;quot; \l &amp;quot;/domains" target="_blank" rel="noopener noreferrer"&gt;SageMaker console&lt;/a&gt;, navigate to domains and click &lt;strong&gt;Create domain&lt;/strong&gt; button, followed by &lt;strong&gt;Create VPC&lt;/strong&gt; button. You will navigate to CloudFormation and click on &lt;strong&gt;Create stack&lt;/strong&gt; button to create a sample VPC named &lt;em&gt;SageMakerUnifiedStudio-VPC &lt;/em&gt;with just one-click for trying out SageMaker Unified Studio.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;div id="attachment_89352" style="width: 1351px" class="wp-caption alignright"&gt;
 &lt;img aria-describedby="caption-attachment-89352" loading="lazy" class="wp-image-89352" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/23/BDB-551811.png" alt="Figure 6 - Create VPC button in SageMaker Unified Studio Create Domain Wizard" width="1341" height="167"&gt;
 &lt;p id="caption-attachment-89352" class="wp-caption-text"&gt;&lt;em&gt;Figure 6 – Create VPC button in SageMaker Unified Studio Create Domain Wizard&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt; 
&lt;h4&gt;Cost estimation for recommended VPC set&amp;nbsp;up&lt;/h4&gt; 
&lt;p&gt;The exact cost depends on the configuration of your VPC. For more complex networking set&amp;nbsp;ups (multi-VPC), you may need to use additional networking components such as a Transit Gateway, Network Firewall, and VPC Lattice. These components may incur charges, and cost depends on usage and AWS Region. Interface VPC endpoints are charged per availability zone. They also have a fixed and a variable component in the pricing structure. Use the&amp;nbsp;&lt;a href="https://calculator.aws.amazon.com/" target="_blank" rel="noopener noreferrer"&gt;AWS Pricing Calculator&lt;/a&gt; for a detailed estimate.&lt;/p&gt; 
&lt;h3&gt;Network Connectivity&lt;/h3&gt; 
&lt;p&gt;With regards to connectivity to the underlying AWS services integrated within SageMaker Unified&amp;nbsp;Studio, there are two ways to enable connectivity (these are not Studio specific, these are standard ways to enable network connectivity within a VPC). This is an&amp;nbsp;&lt;strong&gt;important security consideration that depends on your organization’s security policies&lt;/strong&gt;.&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Through the public Internet&lt;/strong&gt;. Your traffic will traverse over the public Internet through an Internet Gateway in your VPC. 
  &lt;ol type="a"&gt; 
   &lt;li&gt;Your VPC must have an&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html" target="_blank" rel="noopener noreferrer"&gt;Internet Gateway&lt;/a&gt; attached to it.&lt;/li&gt; 
   &lt;li&gt;Your public subnet must have a &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html" target="_blank" rel="noopener noreferrer"&gt;NAT Gateway&lt;/a&gt;. In addition, your public subnet’s route table must have a default route (&lt;code&gt;0.0.0.0&lt;/code&gt;&amp;nbsp;for IPv4) to the Internet Gateway. This route is what makes the subnet public.&lt;/li&gt; 
   &lt;li&gt;Your private subnets must have a default route to the public subnet’s&amp;nbsp;NAT Gateway.&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Through the AWS backbone&lt;/strong&gt;. Your traffic will remain within the private AWS backbone through PrivateLink (by provisioning Interface and Gateway endpoints for the necessary AWS services in each Availability Zone). 
  &lt;ol type="a"&gt; 
   &lt;li&gt;A list of all the AWS services integrated into Studio and the VPC endpoints required can be found in section&amp;nbsp;&lt;strong&gt;&lt;em&gt;VPC Endpoints&lt;/em&gt;&lt;/strong&gt; covered later in this post.&lt;/li&gt; 
   &lt;li&gt;For non-AWS resources, certain external providers of these services may offer PrivateLink integration. Check with each provider’s documentation and your network administrator to understand the most suitable way to connect to these external providers.&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;In a private networking scenario, you will need to consider whether you need connectivity to non-AWS resources in a way that’s compliant with your organization’s security policies. A few examples include the following:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;If you need to download software in your remote IDE host (for example, command line programs, such as Ping and Traceroute)&lt;/li&gt; 
 &lt;li&gt;If you have code that connects to external APIs.&lt;/li&gt; 
 &lt;li&gt;If you use software (such as JupyterLab or Code Editor extensions) that rely on external APIs.&lt;/li&gt; 
 &lt;li&gt;If you depend on software dependencies hosted in the public domain (such as Maven, PyPi, npm)&lt;/li&gt; 
 &lt;li&gt;If you need cross-Region access to certain resources (such as access to S3 buckets in a different Region)&lt;/li&gt; 
 &lt;li&gt;If you need functionality whose underlying AWS services do not have VPC endpoints in all Regions or any Region. 
  &lt;ol type="a"&gt; 
   &lt;li&gt;&lt;a href="https://aws.amazon.com/q/" target="_blank" rel="noopener noreferrer"&gt;Amazon Q&lt;/a&gt; (powers Q and code suggestions)&lt;/li&gt; 
   &lt;li&gt;SQL Workbench (powers Query Editor)&lt;/li&gt; 
   &lt;li&gt;IAM (powers Glue connections)&lt;/li&gt; 
  &lt;/ol&gt; &lt;/li&gt; 
&lt;/ol&gt; 
&lt;p&gt;If you need to connect to data sources outside of AWS (such as Snowflake, Microsoft SQL Server, Google BigQuery)&lt;br&gt; Enterprise network administrators must also complete either of the following prerequisites to handle private networking scenarios:&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;Preinstall needed resources through secure channels if possible. An example would be to customize your &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/byoi.html" target="_blank" rel="noopener noreferrer"&gt;SageMaker AI image&lt;/a&gt; by installing dependencies, after they are code scanned, vetted technically and legally by your organization.&lt;/li&gt; 
 &lt;li&gt;If AWS PrivateLink integration is not available for external providers, allowlist network connections to these external sources. Allow firewall egress rules, directly or indirectly, through a proxy in your organization’s network. Check with your network administrator to understand the most appropriate option for your organization.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;VPC Requirements&lt;/h3&gt; 
&lt;p&gt;When setting up a new SageMaker Unified&amp;nbsp;Studio Domain, it’s necessary to supply a VPC. It’s important to note that these VPC requirements&amp;nbsp;are a union of all the requirements from the respective compute services integrated into Studio, some of which are reinforced by validation checks during the corresponding blueprint’s deployment. If these requirements that have validation checks are not fulfilled, the resource(s) contained in that blueprint may fail to create on project creation (on-create), or when creating the compute resource (on-demand). This section will present a summary of these requirements, as well as relevant documentation links from which they originate.&lt;/p&gt; 
&lt;h4&gt;Subnet requirements for specific compute in a VPC&lt;/h4&gt; 
&lt;p&gt;This section lists the compute services integrated in SageMaker Unified&amp;nbsp;Studio that require VPC/subnets when provisioning the respective compute resources.&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;Compute Connections&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-usage-considerations.html" target="_blank" rel="noopener noreferrer"&gt;Redshift Serverless&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-vpc-host-job-flows.html?utm_source=chatgpt.com" target="_blank" rel="noopener noreferrer"&gt;EMR on EC2&lt;/a&gt;&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/vpc-access.html" target="_blank" rel="noopener noreferrer"&gt;EMR Serverless&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Other Services&lt;/strong&gt;&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/mwaa/latest/userguide/networking-about.html" target="_blank" rel="noopener noreferrer"&gt;Managed Workflows for Apache Airflow&lt;/a&gt; (MWAA)&lt;/li&gt; 
 &lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-and-internet-access.html" target="_blank" rel="noopener noreferrer"&gt;SageMaker AI Domain&lt;/a&gt;&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/p&gt; 
&lt;ol&gt; 
 &lt;li&gt;&lt;strong&gt;Number of subnets&lt;/strong&gt;: At least two private subnets. This requirement comes from&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-usage-considerations.html" target="_blank" rel="noopener noreferrer"&gt;Redshift Serverless&lt;/a&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Availability zones (AZs)&lt;/strong&gt;: At least two different AZs&amp;nbsp;(for Regions with two AZs, two subnets are sufficient).&amp;nbsp;This requirement comes from&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-usage-considerations.html" target="_blank" rel="noopener noreferrer"&gt;Redshift Serverless&lt;/a&gt;. For workgroups with Enhanced VPC Routing (EVR), you need three AZs.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Free IPs per subnet&lt;/strong&gt;: At least three Ips per subnet. This requirement comes from&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-usage-considerations.html" target="_blank" rel="noopener noreferrer"&gt;Redshift Serverless&lt;/a&gt; without EVR. For detailed IP addresses requirement with EVR enabled workgroups, refer to &lt;a href="https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-usage-considerations.html" target="_blank" rel="noopener noreferrer"&gt;Serverless usage considerations&lt;/a&gt;. Three is a minimum and may not be enough for your needs. For example, &lt;a href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-vpc-launching-job-flows.html" target="_blank" rel="noopener noreferrer"&gt;EMR cluster creation&lt;/a&gt; will fail if no subnets with enough IPs are found in the VPC. We recommend doing a forward-looking &lt;a href="https://docs.aws.amazon.com/vpc/latest/ipam/planning-ipam.html" target="_blank" rel="noopener noreferrer"&gt;capacity planning exercise&lt;/a&gt; based on your use cases (for example, growth rate, users, compute needs) to project at least 5 years into the future. This helps to determine how many IPs are needed by the team using Studio and other services that use this VPC and come up with a ceiling for the CIDR block size.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Private or public subnets&lt;/strong&gt;: We enforce that at least three private subnets be supplied, and recommend that &lt;em&gt;only&lt;/em&gt;&amp;nbsp;private subnets are chosen, with a few nuances. This requirement comes from SageMaker AI domain. A new &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-and-internet-access.html" target="_blank" rel="noopener noreferrer"&gt;SageMaker AI domain&lt;/a&gt;, when set&amp;nbsp;up with &lt;code&gt;VpcOnly&lt;/code&gt; mode, requires that all subnets in the VPC be private. This is the default networking mode in the Tooling blueprint. If you choose to use &lt;code&gt;PublicInternetOnly&lt;/code&gt;&amp;nbsp;mode, this restriction does not apply, you may choose public subnets from your VPC. To change the mode, &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/blueprints.html&amp;quot; \l &amp;quot;manage-tooling-blueprint" target="_blank" rel="noopener noreferrer"&gt;modify the Tooling Blueprint&lt;/a&gt; parameter &lt;code&gt;sagemakerDomainNetworkType&lt;/code&gt;.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Enable DNS hostname and DNS Support&lt;/strong&gt;: Both must be enabled. This requirement comes from EMR.&amp;nbsp;Without these VPC settings, &lt;code&gt;enableDnsHostname&lt;/code&gt;&amp;nbsp;and &lt;code&gt;enableDnsSupport&lt;/code&gt;, connecting to the EMR Cluster using the private DNS name through the Livy Endpoint will fail. SSL Verification, which can only be done when connecting using the DNS name, not the IP.&lt;/li&gt; 
 &lt;li&gt;&lt;strong&gt;Auto assign public IP:&lt;/strong&gt; Disable. We recommend that this EC2 subnet setting (&lt;code&gt;mapPublicIpOnLaunch&lt;/code&gt;) be disabled when using private subnets, because public IPs come at a cost and are a scarce resource in the total addressable IPv4 space.&lt;/li&gt; 
&lt;/ol&gt; 
&lt;h3&gt;VPC endpoints&lt;/h3&gt; 
&lt;p&gt;If you choose to &lt;a href="https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/network-isolation.html" target="_blank" rel="noopener noreferrer"&gt;run SageMaker Unified Studio without public internet access&lt;/a&gt;, VPC endpoints are required for all services SageMaker Unified&amp;nbsp;Studio needs to access.&amp;nbsp;These endpoints provide secure, private connectivity between your VPC and AWS services without traversing the public internet. The following table lists the required endpoints, their types, and what each is used for.&lt;/p&gt; 
&lt;p&gt;Some endpoints may not show up directly in your browser’s network tab. The reason is that some of these services (such as CloudWatch) are transitively invoked by other services.&lt;/p&gt; 
&lt;h4&gt;Mandatory endpoints&lt;/h4&gt; 
&lt;p&gt;The following are required endpoints for SageMaker Unified&amp;nbsp;Studio and supporting services to function properly.&amp;nbsp;Gateway endpoints can be used where available, you can use interface endpoints for all other AWS services.&lt;/p&gt; 
&lt;table class="styled-table" style="height: 500px" border="1px" width="900" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;AWS service&lt;/td&gt; 
   &lt;td width="28%"&gt;Endpoint&lt;/td&gt; 
   &lt;td&gt;Type&lt;/td&gt; 
   &lt;td width="38%"&gt;Purpose&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;Glue&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.glue&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For Data Catalog and metadata management&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;STS&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.sts&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Required for assuming IAM roles&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;S3&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.s3&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Gateway&lt;/td&gt; 
   &lt;td&gt;Required for datasets, Git backups, notebooks, and Git sync&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="2"&gt;&lt;strong&gt;SageMaker&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.sagemaker.api&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Required for calling SageMaker APIs&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.sagemaker.runtime&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For invoking deployed inference endpoints&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;DataZone&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.datazone&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For data catalog and governance&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;Secrets Manager&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.secretsmanager&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;To securely access secrets&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="2"&gt;&lt;strong&gt;SSM&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.ssm&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For secure command execution&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.ssmmessages&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Enables live SSM sessions&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;KMS&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.kms&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For decrypting data (volumes, S3, secrets)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="2"&gt;&lt;strong&gt;EC2&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.ec2&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For subnet and ENI management&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.ec2messages&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Required for SSM messaging&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;Athena&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.athena&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Required to run SQL queries&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;Amazon Q&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.q&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Used by SageMaker Notebooks for enhanced productivity&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h4&gt;Optional Endpoints&lt;/h4&gt; 
&lt;p&gt;Only create these if the corresponding service is used in your environment.&lt;/p&gt; 
&lt;table class="styled-table" style="height: 700px" border="1px" width="900" cellpadding="10px"&gt; 
 &lt;tbody&gt; 
  &lt;tr&gt; 
   &lt;td&gt;AWS service&lt;/td&gt; 
   &lt;td width="28%"&gt;Endpoint&lt;/td&gt; 
   &lt;td&gt;Type&lt;/td&gt; 
   &lt;td width="38%"&gt;Purpose&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="4"&gt;&lt;strong&gt;EMR&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.emr-serverless&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Serverless Spark/Hive jobs&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.emr-serverless-services.livy&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Required for Livy job submission (EMR Serverless)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.elasticmapreduce&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Classic EMR (EC2-based)&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.emr-containers&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;EMR on EKS workloads&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="3"&gt;&lt;strong&gt;Redshift&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.redshift&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For provisioned Redshift clusters&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.redshift-serverless&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For Redshift Serverless&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.redshift-data&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Required for running SQL against Redshift&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="3"&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.bedrock-runtime&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Invoke Bedrock models at runtime&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.bedrock-agent&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For Bedrock knowledge agents&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.bedrock-agent-runtime&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;For running knowledge agent workloads&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;CloudWatch&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.logs&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Application and notebook logs&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt;&lt;strong&gt;RDS&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.rds&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Connect to Amazon RDS and Aurora&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="2"&gt;&lt;strong&gt;CodeCommit&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.codecommit&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Git integration with CodeCommit&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.git-codecommit&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Alternative endpoint for CodeCommit&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td rowspan="2"&gt;&lt;strong&gt;CodeConnections and CodeStar&lt;/strong&gt;&lt;/td&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.codeconnections.api&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;GitHub and GitLab repo integration&lt;/td&gt; 
  &lt;/tr&gt; 
  &lt;tr&gt; 
   &lt;td&gt; 
    &lt;div class="hide-language"&gt; 
     &lt;pre&gt;&lt;code class="lang-css"&gt;com.amazonaws.${region}.codestar-connections.api&lt;/code&gt;&lt;/pre&gt; 
    &lt;/div&gt; &lt;/td&gt; 
   &lt;td&gt;Interface&lt;/td&gt; 
   &lt;td&gt;Alias of CodeConnections&lt;/td&gt; 
  &lt;/tr&gt; 
 &lt;/tbody&gt; 
&lt;/table&gt; 
&lt;h2&gt;Clean up&lt;/h2&gt; 
&lt;p&gt;AWS resources provisioned in your AWS accounts may incur costs based on the resources consumed. Make sure you do not leave any unintended resources provisioned. If you created a VPC and subsequent resources as part of this post, make sure you delete them.&lt;/p&gt; 
&lt;p&gt;The following service resources provisioned during this blog post need to be deleted:&lt;/p&gt; 
&lt;ul&gt; 
 &lt;li&gt;IAM Identity Center users and groups.&lt;/li&gt; 
 &lt;li&gt;Resources provisioned within your project using tooling configuration and blueprints within your domain.&lt;/li&gt; 
 &lt;li&gt;The &lt;em&gt;airgapped&lt;/em&gt; VPC.&lt;/li&gt; 
&lt;/ul&gt; 
&lt;h2&gt;Conclusion&lt;/h2&gt; 
&lt;p&gt;In this post,&amp;nbsp;we walked through the process of using your own existing VPC when creating domains and projects in SageMaker Unified&amp;nbsp;Studio. This approach benefits customers by giving them greater control over their network infrastructure while using the comprehensive data, analytics, and AI/ML capabilities of Amazon SageMaker.&amp;nbsp;We also explored the critical role of VPC endpoints in this set&amp;nbsp;up. You now understand when these become necessary components of your architecture, particularly in scenarios requiring enhanced security, compliance with data residency requirements, or improved network performance.&lt;/p&gt; 
&lt;p&gt;While using a custom VPC requires more initial set&amp;nbsp;up than the Quick Create option, it provides the flexibility and control many organizations need for their data science and analytics workflows. This approach provides a mechanism for your SageMaker environment to integrate with your existing infrastructure and adheres to your organization’s networking policies. Custom VPC configurations are a powerful tool in your arsenal for building secure, compliant, and efficient data science environments.&lt;/p&gt; 
&lt;p&gt;To learn more, visit Amazon SageMaker Unified Studio – &lt;a href="https://docs.aws.amazon.com/pdfs/sagemaker-unified-studio/latest/adminguide/sagemaker-unified-studio-admin.pdf" target="_blank" rel="noopener noreferrer"&gt;Administrator Guide&lt;/a&gt; and&amp;nbsp;&lt;a href="https://docs.aws.amazon.com/pdfs/sagemaker-unified-studio/latest/userguide/sagemaker-unified-studio-user-guide.pdf" target="_blank" rel="noopener noreferrer"&gt;User Guide&lt;/a&gt;.&lt;/p&gt; 
&lt;hr style="width: 80%"&gt; 
&lt;h2&gt;About the authors&lt;/h2&gt; 
&lt;footer&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="alignleft wp-image-13110 size-full" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2020/11/03/Saurabh-Bhutyani.jpg" alt="Saurabh Bhutyani" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Saurabh Bhutyani&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/s4saurabh"&gt;&lt;strong&gt;Saurabh&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;is a Principal Analytics Specialist Solutions Architect at AWS. He is passionate about new technologies. He joined AWS in 2019 and works with customers to provide architectural guidance for running generative AI use cases, scalable analytics solutions and data mesh architectures using AWS services like Amazon Bedrock, Amazon SageMaker, Amazon EMR, Amazon Athena, AWS Glue, AWS Lake Formation, and Amazon DataZone.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="wp-image-38880 size-full alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2022/11/29/rohitvas.png" alt="Rohit Vashishtha" width="100" height="133"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Rohit Vashishtha&lt;/h3&gt; 
  &lt;p&gt;&lt;a href="https://www.linkedin.com/in/rohit-vashishtha-analytics/"&gt;&lt;strong&gt;Rohit&lt;/strong&gt;&lt;/a&gt; is a Senior Analytics Specialist Solutions Architect at AWS based in Dallas, Texas. He has two decades of experience architecting, building, leading, and maintaining big data platforms. Rohit helps customers modernize their analytic workloads using the breadth of AWS services and ensures that customers get the best price/performance with utmost security and data governance.&lt;/p&gt; 
 &lt;/div&gt; 
 &lt;div class="blog-author-box"&gt; 
  &lt;div class="blog-author-image"&gt;
   &lt;img loading="lazy" class="size-thumbnail wp-image-89455 alignleft" src="https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2026/03/23/image-22-1-100x122.png" alt="" width="100" height="122"&gt;
  &lt;/div&gt; 
  &lt;h3 class="lb-h4"&gt;Baggio Wong&lt;/h3&gt; 
  &lt;p&gt;&lt;strong&gt;&lt;a href="https://www.linkedin.com/in/baggiowong/" target="_blank" rel="noopener"&gt;Baggio&lt;/a&gt;&lt;/strong&gt; is a Software Engineer on the SageMaker Unified Studio team, where he designs and delivers experiences that empower data practitioners to build and deploy AI/ML workloads.&lt;/p&gt; 
 &lt;/div&gt; 
&lt;/footer&gt;</content:encoded>
					
					
			
		
		
			</item>
	</channel>
</rss>