<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DataForGeeks</title>
	<atom:link href="https://dataforgeeks.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://dataforgeeks.com/</link>
	<description>for Data Enthusiasts</description>
	<lastBuildDate>Tue, 10 Jun 2025 15:46:13 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://dataforgeeks.com/wp-content/uploads/2025/05/cropped-cropped-dataforgeeks_logo_under_150kb-32x32.png</url>
	<title>DataForGeeks</title>
	<link>https://dataforgeeks.com/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>What It Really Takes to Run Snowflake&#8217;s Snowpipe in Production at Scale &#8211; A Comprehensive Guide</title>
		<link>https://dataforgeeks.com/what-it-really-takes-to-run-snowpipe-in-production-at-scale-a-comprehensive-guide/2610/</link>
					<comments>https://dataforgeeks.com/what-it-really-takes-to-run-snowpipe-in-production-at-scale-a-comprehensive-guide/2610/#respond</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Wed, 28 May 2025 22:49:11 +0000</pubDate>
				<category><![CDATA[Snowflake]]></category>
		<category><![CDATA[cloud data engineering]]></category>
		<category><![CDATA[data ingestion]]></category>
		<category><![CDATA[data lakehouse]]></category>
		<category><![CDATA[data pipeline design]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[medallion architecture]]></category>
		<category><![CDATA[near real-time ingestion]]></category>
		<category><![CDATA[production pipelines]]></category>
		<category><![CDATA[real-time data ingestion Snowflake]]></category>
		<category><![CDATA[scalable data architecture]]></category>
		<category><![CDATA[schema evolution]]></category>
		<category><![CDATA[schema evolution Snowflake]]></category>
		<category><![CDATA[snowflake]]></category>
		<category><![CDATA[Snowflake best practices]]></category>
		<category><![CDATA[Snowflake directory table]]></category>
		<category><![CDATA[Snowflake monitoring]]></category>
		<category><![CDATA[Snowflake Snowpipe tutorial]]></category>
		<category><![CDATA[Snowflake stream task automation]]></category>
		<category><![CDATA[Snowflake streaming ingestion]]></category>
		<category><![CDATA[snowpipe]]></category>
		<category><![CDATA[Snowpipe deduplication]]></category>
		<category><![CDATA[Snowpipe production]]></category>
		<category><![CDATA[stream task]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2610</guid>

					<description><![CDATA[<p>We adopted a practical Medallion-style approach to structure our data flows &#8211; segmenting data flows into Bronze, Silver, and Gold layers. As part of this redesign, we needed to optimize how curated data was exported to Snowflake. That’s when we hit performance issues with external tables. I know the common suggestion is to use the ... <a title="What It Really Takes to Run Snowflake&#8217;s Snowpipe in Production at Scale &#8211; A Comprehensive Guide" class="read-more" href="https://dataforgeeks.com/what-it-really-takes-to-run-snowpipe-in-production-at-scale-a-comprehensive-guide/2610/" aria-label="Read more about What It Really Takes to Run Snowflake&#8217;s Snowpipe in Production at Scale &#8211; A Comprehensive Guide">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/what-it-really-takes-to-run-snowpipe-in-production-at-scale-a-comprehensive-guide/2610/">What It Really Takes to Run Snowflake&#8217;s Snowpipe in Production at Scale &#8211; A Comprehensive Guide</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/what-it-really-takes-to-run-snowpipe-in-production-at-scale-a-comprehensive-guide/2610/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Apache Iceberg: The Data Lake Breakthrough That’s Reshaping the Big Data Landscape</title>
		<link>https://dataforgeeks.com/apache-iceberg-the-data-lake-breakthrough-thats-reshaping-the-big-data-landscape/2525/</link>
					<comments>https://dataforgeeks.com/apache-iceberg-the-data-lake-breakthrough-thats-reshaping-the-big-data-landscape/2525/#respond</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Wed, 21 May 2025 09:47:02 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[apache iceberg]]></category>
		<category><![CDATA[cloud data engineering]]></category>
		<category><![CDATA[data lake best practices]]></category>
		<category><![CDATA[data lake optimization]]></category>
		<category><![CDATA[data lakehouse]]></category>
		<category><![CDATA[iceberg compute cost]]></category>
		<category><![CDATA[iceberg in snowflake]]></category>
		<category><![CDATA[iceberg maintenance cost]]></category>
		<category><![CDATA[iceberg metadata]]></category>
		<category><![CDATA[iceberg performance]]></category>
		<category><![CDATA[iceberg scalability]]></category>
		<category><![CDATA[iceberg snapshot]]></category>
		<category><![CDATA[iceberg table format]]></category>
		<category><![CDATA[iceberg vs delta lake]]></category>
		<category><![CDATA[iceberg vs hudi]]></category>
		<category><![CDATA[open source data architecture]]></category>
		<category><![CDATA[open table format]]></category>
		<category><![CDATA[partition evolution]]></category>
		<category><![CDATA[schema evolution]]></category>
		<category><![CDATA[snowflake iceberg support]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2525</guid>

					<description><![CDATA[<p>By the end of this read, you’ll understand why Apache Iceberg is not just another open table format — it’s the seismic shift that’s redefining the role of Delta Lake in the modern data ecosystem and transform how the world thinks about data lakes. From Data Lakes to Data Icebergs: A New Era Begins Over ... <a title="Apache Iceberg: The Data Lake Breakthrough That’s Reshaping the Big Data Landscape" class="read-more" href="https://dataforgeeks.com/apache-iceberg-the-data-lake-breakthrough-thats-reshaping-the-big-data-landscape/2525/" aria-label="Read more about Apache Iceberg: The Data Lake Breakthrough That’s Reshaping the Big Data Landscape">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/apache-iceberg-the-data-lake-breakthrough-thats-reshaping-the-big-data-landscape/2525/">Apache Iceberg: The Data Lake Breakthrough That’s Reshaping the Big Data Landscape</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/apache-iceberg-the-data-lake-breakthrough-thats-reshaping-the-big-data-landscape/2525/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>The Medallion Masterstroke: How Databricks Rewired the Data World One Bronze Layer at a Time</title>
		<link>https://dataforgeeks.com/the-medallion-masterstroke-how-databricks-rewired-the-data-world-one-bronze-layer-at-a-time/2497/</link>
					<comments>https://dataforgeeks.com/the-medallion-masterstroke-how-databricks-rewired-the-data-world-one-bronze-layer-at-a-time/2497/#respond</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Mon, 20 Jan 2025 20:32:41 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[#PySpark]]></category>
		<category><![CDATA[ApacheIceberg]]></category>
		<category><![CDATA[BuildWithPurpose]]></category>
		<category><![CDATA[Databricks]]></category>
		<category><![CDATA[DataChaos]]></category>
		<category><![CDATA[DataEngineering]]></category>
		<category><![CDATA[DataGovernance]]></category>
		<category><![CDATA[DataInfra]]></category>
		<category><![CDATA[DataMindset]]></category>
		<category><![CDATA[DataOps]]></category>
		<category><![CDATA[DataPipelines]]></category>
		<category><![CDATA[DataStrategy]]></category>
		<category><![CDATA[DataTrust]]></category>
		<category><![CDATA[DeltaLake]]></category>
		<category><![CDATA[EngineeringCulture]]></category>
		<category><![CDATA[LakehouseArchitecture]]></category>
		<category><![CDATA[MedallionArchitecture]]></category>
		<category><![CDATA[ModernDataStack]]></category>
		<category><![CDATA[OpenDataFormats]]></category>
		<category><![CDATA[PipelineTherapy]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[snowflake]]></category>
		<category><![CDATA[SnowflakeData]]></category>
		<category><![CDATA[spark]]></category>
		<category><![CDATA[StreamingData]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2497</guid>

					<description><![CDATA[<p>The Era of Chaos &#8211; and Snowflake&#8217;s Rise Back in 2017, most of us were drowning in messy data. Files were everywhere in S3 buckets, Hadoop jobs kept failing at the worst times, and analysts? They were always chasing clean data that never seemed to arrive when needed. It was frustrating, and honestly, it felt ... <a title="The Medallion Masterstroke: How Databricks Rewired the Data World One Bronze Layer at a Time" class="read-more" href="https://dataforgeeks.com/the-medallion-masterstroke-how-databricks-rewired-the-data-world-one-bronze-layer-at-a-time/2497/" aria-label="Read more about The Medallion Masterstroke: How Databricks Rewired the Data World One Bronze Layer at a Time">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/the-medallion-masterstroke-how-databricks-rewired-the-data-world-one-bronze-layer-at-a-time/2497/">The Medallion Masterstroke: How Databricks Rewired the Data World One Bronze Layer at a Time</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/the-medallion-masterstroke-how-databricks-rewired-the-data-world-one-bronze-layer-at-a-time/2497/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Mastering Python Setup on macOS: Bye Conda, Hello pyenv + Fancy iTerm2 Terminal</title>
		<link>https://dataforgeeks.com/mastering-python-setup-on-macos-bye-conda-hello-pyenv-fancy-iterm2-terminal/2461/</link>
					<comments>https://dataforgeeks.com/mastering-python-setup-on-macos-bye-conda-hello-pyenv-fancy-iterm2-terminal/2461/#respond</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Wed, 09 Oct 2024 12:54:00 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[beginner python setup Mac]]></category>
		<category><![CDATA[brew install pyenv]]></category>
		<category><![CDATA[install pyenv Mac]]></category>
		<category><![CDATA[iTerm2 customization]]></category>
		<category><![CDATA[mac terminal customization]]></category>
		<category><![CDATA[macOS developer terminal]]></category>
		<category><![CDATA[manage Python versions Mac]]></category>
		<category><![CDATA[Oh My Zsh setup]]></category>
		<category><![CDATA[powerline fonts iTerm2]]></category>
		<category><![CDATA[pyenv setup Mac]]></category>
		<category><![CDATA[pyenv tutorial]]></category>
		<category><![CDATA[pyenv virtualenv]]></category>
		<category><![CDATA[pyenv vs conda]]></category>
		<category><![CDATA[pyenv with Pycharm]]></category>
		<category><![CDATA[Python dev tools Mac]]></category>
		<category><![CDATA[Python development Mac]]></category>
		<category><![CDATA[Python version manager Mac]]></category>
		<category><![CDATA[Python virtual environments Mac]]></category>
		<category><![CDATA[uninstall anaconda Mac]]></category>
		<category><![CDATA[zshrc pyenv configuration]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2461</guid>

					<description><![CDATA[<p>Tired of messy Python setups? Ever screamed at your terminal? Been there, done that, deleted Anaconda. Let me show you how I set up a clean, beautiful, and powerful Python development environment on my Mac. It’s light, customizable, and perfect for devs who love a good-looking terminal and tight control over Python versions. 🤓 Why ... <a title="Mastering Python Setup on macOS: Bye Conda, Hello pyenv + Fancy iTerm2 Terminal" class="read-more" href="https://dataforgeeks.com/mastering-python-setup-on-macos-bye-conda-hello-pyenv-fancy-iterm2-terminal/2461/" aria-label="Read more about Mastering Python Setup on macOS: Bye Conda, Hello pyenv + Fancy iTerm2 Terminal">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/mastering-python-setup-on-macos-bye-conda-hello-pyenv-fancy-iterm2-terminal/2461/">Mastering Python Setup on macOS: Bye Conda, Hello pyenv + Fancy iTerm2 Terminal</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/mastering-python-setup-on-macos-bye-conda-hello-pyenv-fancy-iterm2-terminal/2461/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Python Data Structures Simplified: List, Tuple, Dict, Set, Frozenset &#038; More</title>
		<link>https://dataforgeeks.com/python-data-structures-simplified-list-tuple-dict-set-frozenset-more/2483/</link>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Thu, 02 May 2024 11:23:31 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[mutable vs immutable python]]></category>
		<category><![CDATA[python beginner tutorial]]></category>
		<category><![CDATA[python collection types]]></category>
		<category><![CDATA[python data structure tutorial]]></category>
		<category><![CDATA[python data structures]]></category>
		<category><![CDATA[python dataclasses]]></category>
		<category><![CDATA[python deque example]]></category>
		<category><![CDATA[python dict vs set]]></category>
		<category><![CDATA[python for data science]]></category>
		<category><![CDATA[python frozenset]]></category>
		<category><![CDATA[python list tuple dict set]]></category>
		<category><![CDATA[python list vs tuple]]></category>
		<category><![CDATA[python namedtuple vs dataclass]]></category>
		<category><![CDATA[python set operations]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2483</guid>

					<description><![CDATA[<p>Python offers a rich set of built-in and extended data structures to efficiently manage and process data. In this blog, we&#8217;ll deep dive into essential ones:&#160;List,&#160;Tuple,&#160;Dictionary (Dict),&#160;Set,&#160;Frozenset, and also explore some powerful structures from the&#160;collections&#160;and&#160;dataclasses&#160;modules. We&#8217;ll cover their properties, use-cases, constructors, and how to convert between them using intuitive examples. Note: Since Python 3.7, dictionaries ... <a title="Python Data Structures Simplified: List, Tuple, Dict, Set, Frozenset &#38; More" class="read-more" href="https://dataforgeeks.com/python-data-structures-simplified-list-tuple-dict-set-frozenset-more/2483/" aria-label="Read more about Python Data Structures Simplified: List, Tuple, Dict, Set, Frozenset &#38; More">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/python-data-structures-simplified-list-tuple-dict-set-frozenset-more/2483/">Python Data Structures Simplified: List, Tuple, Dict, Set, Frozenset &amp; More</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
		
		
			</item>
		<item>
		<title>Understanding SQL Execution Order and Corresponding PySpark Syntax</title>
		<link>https://dataforgeeks.com/understanding-sql-execution-order-and-corresponding-pyspark-syntax/2490/</link>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Sat, 02 Sep 2023 16:33:01 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[#AggregationFunctions]]></category>
		<category><![CDATA[#ApacheSpark]]></category>
		<category><![CDATA[#BigData]]></category>
		<category><![CDATA[#CodeWithSQL]]></category>
		<category><![CDATA[#DataAnalytics]]></category>
		<category><![CDATA[#DataEngineering]]></category>
		<category><![CDATA[#DataProcessing]]></category>
		<category><![CDATA[#DataScience]]></category>
		<category><![CDATA[#DataTransformation]]></category>
		<category><![CDATA[#ETL]]></category>
		<category><![CDATA[#GroupByClause]]></category>
		<category><![CDATA[#JoinOperations]]></category>
		<category><![CDATA[#LearnPySpark]]></category>
		<category><![CDATA[#Programming]]></category>
		<category><![CDATA[#PySpark]]></category>
		<category><![CDATA[#PySparkSyntax]]></category>
		<category><![CDATA[#PySparkTutorial]]></category>
		<category><![CDATA[#SparkDataFrames]]></category>
		<category><![CDATA[#SparkFilter]]></category>
		<category><![CDATA[#SparkJoins]]></category>
		<category><![CDATA[#SparkPerformance]]></category>
		<category><![CDATA[#SparkSQL]]></category>
		<category><![CDATA[#SparkWithPython]]></category>
		<category><![CDATA[#SQL]]></category>
		<category><![CDATA[#SQLExecutionOrder]]></category>
		<category><![CDATA[#SQLQueryOptimization]]></category>
		<category><![CDATA[#SQLToPySpark]]></category>
		<category><![CDATA[#SQLvsPySpark]]></category>
		<category><![CDATA[#WindowFunctions]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2490</guid>

					<description><![CDATA[<p>When writing SQL queries, it is essential to understand the order in which SQL clauses are executed. This helps in writing optimized queries, especially when transitioning from SQL to PySpark. In this blog, we’ll walk you through the SQL execution order, the SQL clauses, and provide their corresponding PySpark syntax. SQL Execution Order and Corresponding ... <a title="Understanding SQL Execution Order and Corresponding PySpark Syntax" class="read-more" href="https://dataforgeeks.com/understanding-sql-execution-order-and-corresponding-pyspark-syntax/2490/" aria-label="Read more about Understanding SQL Execution Order and Corresponding PySpark Syntax">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/understanding-sql-execution-order-and-corresponding-pyspark-syntax/2490/">Understanding SQL Execution Order and Corresponding PySpark Syntax</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
		
		
			</item>
		<item>
		<title>Snowflake – Performance Tuning and Best Practices</title>
		<link>https://dataforgeeks.com/snowflake-performance-tuning-and-best-practices/2338/</link>
					<comments>https://dataforgeeks.com/snowflake-performance-tuning-and-best-practices/2338/#comments</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Sat, 14 May 2022 20:39:55 +0000</pubDate>
				<category><![CDATA[Snowflake]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[azure]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[partition]]></category>
		<category><![CDATA[partitioning]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[snowflake]]></category>
		<category><![CDATA[tuning]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2338</guid>

					<description><![CDATA[<p>Snowflake’s cloud-native architecture makes it incredibly easy to get started — but running it efficiently at scale is a whole different game. If you’ve ever faced slow queries, ballooning credit consumption, or unpredictable performance, you’re not alone. Tuning Snowflake workloads requires more than just adjusting warehouse sizes — it involves understanding how Snowflake stores data, ... <a title="Snowflake – Performance Tuning and Best Practices" class="read-more" href="https://dataforgeeks.com/snowflake-performance-tuning-and-best-practices/2338/" aria-label="Read more about Snowflake – Performance Tuning and Best Practices">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/snowflake-performance-tuning-and-best-practices/2338/">Snowflake – Performance Tuning and Best Practices</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/snowflake-performance-tuning-and-best-practices/2338/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Apache Spark &#8211; Performance Tuning and Best Practices</title>
		<link>https://dataforgeeks.com/apache-spark-performance-tuning-and-best-practices/2305/</link>
					<comments>https://dataforgeeks.com/apache-spark-performance-tuning-and-best-practices/2305/#comments</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Wed, 04 May 2022 12:33:18 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[apache spark]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[bigdata]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[partition]]></category>
		<category><![CDATA[spark]]></category>
		<category><![CDATA[tuning]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2305</guid>

					<description><![CDATA[<p>Apache Spark has revolutionized the way we process large-scale data — delivering unparalleled speed, scalability, and flexibility. But as many engineers discover, achieving optimal performance in Spark is far from automatic. Your job runs — but takes longer than expected. The cluster scales — but the costs rise disproportionately. Memory errors appear out of nowhere. ... <a title="Apache Spark &#8211; Performance Tuning and Best Practices" class="read-more" href="https://dataforgeeks.com/apache-spark-performance-tuning-and-best-practices/2305/" aria-label="Read more about Apache Spark &#8211; Performance Tuning and Best Practices">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/apache-spark-performance-tuning-and-best-practices/2305/">Apache Spark &#8211; Performance Tuning and Best Practices</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/apache-spark-performance-tuning-and-best-practices/2305/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Data Serialisation &#8211; Avro vs Protocol Buffers</title>
		<link>https://dataforgeeks.com/data-serialisation-avro-vs-protocol-buffers/2015/</link>
					<comments>https://dataforgeeks.com/data-serialisation-avro-vs-protocol-buffers/2015/#comments</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Wed, 23 Mar 2022 20:43:39 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[avro]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[format]]></category>
		<category><![CDATA[protobuf]]></category>
		<category><![CDATA[protobuffer]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2015</guid>

					<description><![CDATA[<p>Background File Formats Evolution Why not use CSV/XML/JSON?&#160; Repeated or no meta information. Files are not splittable, so cannot be used in a map-reduce environment. Missing/ Limited schema definition and evolution support. Can leverage &#8220;JsonSchema&#8221; to maintain schema separately for JSON. It may still require transformation based on a schema, so why not consider Avro/Proto? ... <a title="Data Serialisation &#8211; Avro vs Protocol Buffers" class="read-more" href="https://dataforgeeks.com/data-serialisation-avro-vs-protocol-buffers/2015/" aria-label="Read more about Data Serialisation &#8211; Avro vs Protocol Buffers">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/data-serialisation-avro-vs-protocol-buffers/2015/">Data Serialisation &#8211; Avro vs Protocol Buffers</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/data-serialisation-avro-vs-protocol-buffers/2015/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>Count(*) &#8211; Explaining different behaviour in Joins</title>
		<link>https://dataforgeeks.com/count-explaining-different-behaviour-in-joins/2206/</link>
					<comments>https://dataforgeeks.com/count-explaining-different-behaviour-in-joins/2206/#respond</comments>
		
		<dc:creator><![CDATA[Nikhil Aggarwal]]></dc:creator>
		<pubDate>Fri, 04 Feb 2022 13:27:00 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[impala]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[sqlquery]]></category>
		<guid isPermaLink="false">https://dataforgeeks.com/?p=2206</guid>

					<description><![CDATA[<p>Observations :&#160; Count(1) or Count(*)&#160;&#8211; This is never expanded on each column individually so will work perfectly fine on complete data.&#160; Count(1) is more optimized then Count(*) Count(source.*)&#160;&#8211; source&#160;represents “Left table” of “Left Outer Join”: This will be evaluated as Count(source.col1, source.col2, …. source.colN ) So, if any column has NULL, then the complete row ... <a title="Count(*) &#8211; Explaining different behaviour in Joins" class="read-more" href="https://dataforgeeks.com/count-explaining-different-behaviour-in-joins/2206/" aria-label="Read more about Count(*) &#8211; Explaining different behaviour in Joins">Read more</a></p>
<p>The post <a href="https://dataforgeeks.com/count-explaining-different-behaviour-in-joins/2206/">Count(*) &#8211; Explaining different behaviour in Joins</a> appeared first on <a href="https://dataforgeeks.com">DataForGeeks</a>.</p>
]]></description>
		
					<wfw:commentRss>https://dataforgeeks.com/count-explaining-different-behaviour-in-joins/2206/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
