<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Darius Kharazi's Blog]]></title><description><![CDATA[Data Scientist living and working in Columbus.]]></description><link>http://github.com/dylang/node-rss</link><generator>GatsbyJS</generator><lastBuildDate>Mon, 24 Oct 2022 14:28:52 GMT</lastBuildDate><item><title><![CDATA[An Open-Source ML Pipeline]]></title><description><![CDATA[Whether our model is predicting churn, detecting fraud, or forecasting sales, there are a few components that are common across any ML pipeline. In particular, every ML pipeline includes the following…]]></description><link>https://dkharazi.github.io/blog/mlpipeline</link><guid isPermaLink="false">https://dkharazi.github.io/blog/mlpipeline</guid><pubDate>Sat, 03 Sep 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Whether our model is predicting churn, detecting fraud, or forecasting sales, there are a few components that are common across any ML pipeline. In particular, every ML pipeline includes the following features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;feature store&lt;/strong&gt; for model training and inference &lt;/li&gt;
&lt;li&gt;Many &lt;strong&gt;CI/CD validations&lt;/strong&gt; to detect changes in those features&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;experimentation environment&lt;/strong&gt; for logging previous model runs and metrics&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;model registry&lt;/strong&gt; for serving and version controlling production models&lt;/li&gt;
&lt;li&gt;A service for &lt;strong&gt;tuning hyper-parameters&lt;/strong&gt; efficiently&lt;/li&gt;
&lt;li&gt;A service for &lt;strong&gt;model monitoring&lt;/strong&gt; for automatic model retraining, data drift detection, and reporting&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Using Feast as a Feature Store&lt;/h2&gt;
&lt;p&gt;A feature store is an API best used for low-latency feature retrieval for real-time models being served in production. A feature store plugs into your existing storage infrastructure and orchestrates jobs using your existing processing infrastructure, so it is not a database in most cases. Usually, a feature store takes already transformed features (from Azure Data Lake, AWS Redshift, Snowflake, etc.) and generates feature definitions and metadata for them, in order to improve the performance of the feature retrievals. &lt;/p&gt;
&lt;p&gt;A feature store is used for maintaining features across different data sources in a single, centralized location. Doing this promotes a central catalog for all features, their definitions, and their metadata, which allows data scientists to search, discover, and collaborate on new features. Two common and open-source feature stores are Feast and Tecton. For more detailed use-cases surrounding feature stores, refer to &lt;a href=&quot;https://docs.feast.dev/#feast-does-not-fully-solve&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Feast&apos;s documentation&lt;/a&gt; and &lt;a href=&quot;https://www.tecton.ai/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Tecton&apos;s documentation&lt;/a&gt;, which are two of the most common and open-source feature stores.&lt;/p&gt;
&lt;p&gt;In Feast&apos;s feature store, a feature store comes with two components: a &lt;em&gt;registry&lt;/em&gt; and &lt;em&gt;feature stores&lt;/em&gt;. The default Feast registry is &lt;a href=&quot;https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/scaling-feast#scaling-feast-registry&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;a file-based registry&lt;/a&gt;, where feature definitions, metadata, and versions are tracked in your local file system under a file named &lt;em&gt;registry.db&lt;/em&gt;. In production, Feast recommends using a more scalable SQL-based registry that is backed by a database, such as PostgreSQL or MySQL. The registry more specifically holds &lt;em&gt;feature views&lt;/em&gt;. A feature view is an object that represents a logical, unmaterialized group of features consisting of the following information:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The data source&lt;/li&gt;
&lt;li&gt;The specified features&lt;/li&gt;
&lt;li&gt;A name to identify this feature view in Feast&lt;/li&gt;
&lt;li&gt;Any additional metadata, like schema, description, or tags&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second component within Feast is the feature store, which can be an &lt;em&gt;offline&lt;/em&gt; store or an &lt;em&gt;online&lt;/em&gt; store. The offline store persists batch data from feature views. By default, the offline store will not log features and will instead run queries against the source data. However, &lt;a href=&quot;https://docs.feast.dev/getting-started/architecture-and-components/overview#components&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;offline stores can be configured to support feature writes&lt;/a&gt; to an offline destination. An online store is a database that stores only the last values for real-time inference. The online store is populated through materialization jobs from an offline store.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/511318f1a0e65fe06a34fffa818790a9/feast_architecture.png&quot; alt=&quot;FeastArchitecture&quot;&gt;&lt;/p&gt;
&lt;p&gt;The offline store is preferred for fetching features when training a model or making daily or weekly predictions. On the other hand, the online store is preferred for fetching features when making real-time predictions (e.g. fraud detection). For more details about the feature registry, refer to &lt;a href=&quot;https://docs.feast.dev/getting-started/architecture-and-components/registry&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;their registry documentation&lt;/a&gt; and &lt;a href=&quot;https://docs.feast.dev/getting-started/concepts/feature-view&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;documentation about feature views&lt;/a&gt;. For more details about the various feature retrieval patterns, refer to &lt;a href=&quot;https://docs.feast.dev/getting-started/concepts/overview#feature-registration-and-retrieval&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;their documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Using Great Expectations for CI/CD Data Validations&lt;/h2&gt;
&lt;p&gt;Great Expectations is an API used for testing, profiling, and documenting expected feature properties, including expected value ranges and data types. The data profiler will automatically generate its own expectations in a report, illustrating example values, percentages of missing values for columns, histograms of numeric columns, etc. Manual tests can be included as well, which could include allowed column values, thresholds of null percentages before sendings warnings, etc. Great Expectations can be scheduled and orchestrated in a CI/CD pipeline using Airflow or Kubeflow.&lt;/p&gt;
&lt;p&gt;By default, Feast uses Great Expectations as a validation engine and data profiler. As a result, we can specify the expected data types and value ranges for input columns. For a detailed example of using Great Expectations with Feast, refer to &lt;a href=&quot;https://docs.feast.dev/tutorials/validating-historical-features&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;their documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Using MLflow for Model Registry and Experiment Tracking&lt;/h2&gt;
&lt;p&gt;A model registry is a centralized object store that stores binary model objects with any metadata. By default, the model registry typically is stored in the local file system or a SQL database, but a remote object store also can be specified (e.g. AWS S3, ADLS, GCS, etc.). The metadata that is stored with the models could include model versions, stages, registry dates, and tags.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/759c58e33f913ddbb7afd2bff99ae7e1/mlflow_model.png&quot; alt=&quot;MLflowArchitecture&quot;&gt;&lt;/p&gt;
&lt;p&gt;MLflow is one example of a model registry that documents a model&apos;s lifecycle. For example, MLflow will document the model versions and allow for the model to have different stages. Registered models can be in the production stage, staging stage, or archived stage. The production stage is meant for production-ready models serving inference, whereas the staging stage is meant for a pre-production model that is meant for testing and intended to be put into production in the future. Archived models are meant for previous models in production. For more details about the model registry component in MLflow, refer to &lt;a href=&quot;https://www.mlflow.org/docs/latest/model-registry.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Mlflow registers models that have gone through experimentation. A model experiment tracks and logs previous model runs with any specified metrics, hyper-parameter choices, training date, and other metadata. By doing this, we can compare previous model runs with each other by observing their feature importances, accuracies, hyper-parameters, etc. A tracking UI comes with MLflow, which allows you to visualize these model runs and download model artifacts or metadata. Refer to the &lt;a href=&quot;https://www.mlflow.org/docs/latest/tracking.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;MLflow documentation&lt;/a&gt; for more details about logging experiments in Mlflow.&lt;/p&gt;
&lt;h2&gt;Using Hyperopt for Distributed Hyper-Parameter Tuning&lt;/h2&gt;
&lt;p&gt;When logging model runs in an experiment, hyper-parameter searches can be logged using GridSearch or other packages like Hyperopt. Hyperopt is an API that facilitates distributed hyperparameter tuning and model selection. Hyperopt allows models to scan a set of hyperparameters across a specified or learned space.&lt;/p&gt;
&lt;p&gt;The basic steps when using Hyperopt are the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Define an objective function to minimize&lt;/li&gt;
&lt;li&gt;Define the hyperparameter search space&lt;/li&gt;
&lt;li&gt;Specify the search algorithm&lt;/li&gt;
&lt;li&gt;Run the Hyperopt function fmin()&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In most cases, the objective function is the training or validation loss function.
Hyperopt uses stochastic tuning algorithms that perform a more efficient search of hyperparameter space than a deterministic grid search. The &lt;code class=&quot;language-text&quot;&gt;fmin&lt;/code&gt; function executes a run by identifying the set of hyperparameters that minimizes the objective function. The &lt;code class=&quot;language-text&quot;&gt;fmin&lt;/code&gt; function accepts the objective function, hyper-parameter space, and an optional SparkTrials object. The SparkTrials object allows you to distribute each HyperOpt iteration from a single-machine tuning to the other Spark workers. For more high-level use-cases about Hyperopt, refer to &lt;a href=&quot;https://docs.databricks.com/machine-learning/automl-hyperparam-tuning/index.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the documentation in Databricks&lt;/a&gt;. For more details about the components of Hyperopt, refer to &lt;a href=&quot;https://docs.databricks.com/machine-learning/automl-hyperparam-tuning/hyperopt-concepts.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the documentation in Databricks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The objective function is where we implement the training portion of the code, where a dictionary is returned with the desired loss function and status. For more details about SparkTrials, refer to &lt;a href=&quot;https://docs.databricks.com/machine-learning/automl-hyperparam-tuning/hyperopt-concepts.html#the-sparktrials-class&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the Databricks documentation&lt;/a&gt;. For more examples about defining a function for optimization, refer to &lt;a href=&quot;https://github.com/hyperopt/hyperopt/wiki/FMin#1-defining-a-function-to-minimize&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the Hyperopt documentation&lt;/a&gt;. Note, if you&apos;re experiencing issues with memory, you may benefit from loading in the training/validation data as binary objects instead, since Hyperopt can be memory intensive. For a sample objective function, refer to the following code snippet:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# Initialize possible hyper-parameter space for tuning&lt;/span&gt;
space &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&apos;learning_rate&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; hp&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;choice&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;learning_rate&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.01&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.02&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.06&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.08&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&apos;max_depth&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; hp&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;choice&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;max_depth&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&apos;objective&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;reg:squarederror&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&apos;n_jobs&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;token string&quot;&gt;&apos;random_state&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Initialize objective function iterating over hyper-parameter space&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;objective&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;space&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
     
    &lt;span class=&quot;token comment&quot;&gt;# Autolog model details and signatures for each child run&lt;/span&gt;
    mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;xgboost&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;autolog&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;with&lt;/span&gt; mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;start_run&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;run_name&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;run_name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; tags&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;tags&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; nested&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
         
        &lt;span class=&quot;token comment&quot;&gt;# Load in training and test data from DBFS saved above to preserve memory&lt;/span&gt;
        train &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; xgb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DMatrix&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;/dbfs/FileStore/shared_uploads/darius_kharazi@anfcorp.com/train.buffer&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        validation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; xgb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DMatrix&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;/dbfs/FileStore/shared_uploads/darius_kharazi@anfcorp.com/validation.buffer&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
         
        &lt;span class=&quot;token comment&quot;&gt;# Initialize and fit model with different combinations of tuned hyper-parameters&lt;/span&gt;
        model &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; xgb&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;train&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;params&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;space&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtrain&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;train&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; evals&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;validation&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;validation&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
 
        &lt;span class=&quot;token comment&quot;&gt;# Fit model and predict on test data&lt;/span&gt;
        test_y_ltv_pred &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; model&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;predict&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;validation&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
 
        &lt;span class=&quot;token comment&quot;&gt;# Calculate accuracy metrics&lt;/span&gt;
        r &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;mean_squared_error&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;validation&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get_label&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; test_y_ltv_pred&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; squared&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token boolean&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        m &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;mean_absolute_error&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;validation&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;get_label&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; test_y_ltv_pred&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
 
        &lt;span class=&quot;token comment&quot;&gt;# Log RMSE and MAE metrics to mlflow experiment&lt;/span&gt;
        mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;log_metric&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;rmse&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        mlflow&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;log_metric&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;mae&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
 
    &lt;span class=&quot;token comment&quot;&gt;# Return RMSE for hyperopt optimization&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;loss&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;status&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; STATUS_OK&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When defining hyper-parameter ranges within a search space, use &lt;a href=&quot;http://hyperopt.github.io/hyperopt/getting-started/search_spaces/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;helper functions&lt;/a&gt; implemented in Hyperopt. Memory issues may appear if the ranges for particular hyper-parameters are too large. The SparkTrials objects is available to make use of distributed hyper-parameter tuning. For additional tips about defining a search space, refer to the &lt;a href=&quot;https://github.com/hyperopt/hyperopt/wiki/FMin#2-defining-a-search-space&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hyperopt documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Using Evidently for Model Monitoring&lt;/h2&gt;
&lt;p&gt;Model monitoring involves maintaining registered or production-ready models and their metrics, data values, and other properties. Model maintenance can often be costly, since it impacts every data scientist and requires manual analysis about when and why accuracies worsen over time. The quality of a model in production can decline over time for many different reasons. For example, inventories could change over time, customer preferences could change over time, and regional changes could happen. Automating the model monitoring process offers the following benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Automates the model retraining process&lt;/li&gt;
&lt;li&gt;Automates metric and accuracy analysis&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Evidently is an API that detects and alerts when there are changes in a model&apos;s quality. Evidently ensure models are retrained after a model drops below some accuracy threshold, which is manually assigned by the user. Evidently also saves and tracks any specified metrics and statistics for each model, which will be illustrated through visual reports.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/f30e20d32d70d0a2991f22ce3d44e642/evidently_example.png&quot; alt=&quot;EvidentlyExample&quot;&gt;&lt;/p&gt;
&lt;p&gt;Statistical tests to compare the input feature distributions can be triggered automatically, while alerts can be emailed alongside visual reports illustrating any drifts in the input features. By doing this, we can understand why model accuracies have changed over time based on drifts in the inputs. This highlights a common problem within model monitoring known as &lt;em&gt;data drift&lt;/em&gt;, which specifically refers to distributional changes of any input features that a model is trained on. On a similar note, &lt;em&gt;target drift&lt;/em&gt; may also occur, which refers to distributional changes of any model outputs (or predictions). Target drifts are caused by changes in the model inputs in most cases. For high-level details about model monitoring, refer to &lt;a href=&quot;https://databricks.com/wp-content/uploads/2019/09/8-1-2019-Productionizing-ML_-From-Deployment-to-Drift-Detection-Webinar.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this webinar&lt;/a&gt; illustrating the occurrence of model staleness in production. For a detailed walkthrough of setting up and using Evidently for model monitoring, refer to &lt;a href=&quot;https://www.evidentlyai.com/blog/tutorial-evidently-ml-monitoring-cs329s&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this example&lt;/a&gt; in their documentation.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Goodhart's Law]]></title><description><![CDATA[I've recently become less reliant on measures like Yelp and Google reviews when selecting a restaurant. I've noticed many restaurants with many high Yelp reviews are sometimes not as good as…]]></description><link>https://dkharazi.github.io/blog/goodhart</link><guid isPermaLink="false">https://dkharazi.github.io/blog/goodhart</guid><pubDate>Tue, 23 Aug 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve recently become less reliant on measures like Yelp and Google reviews when selecting a restaurant. I&apos;ve noticed many restaurants with many high Yelp reviews are sometimes not as good as restaurants with fewer Yelp reviews. On the flip side, many restaurants with lower ratings are sometimes better than those with higher ratings. To be clear, I stiil think these ratings are useful in general, but I&apos;ve just become less reliant on them.&lt;/p&gt;
&lt;p&gt;Goodhart&apos;s Law says &lt;em&gt;when a measure becomes a target, then it ceases to be a good measure&lt;/em&gt;. In other words, when we rely on a metric so consistently over a long period of time, then it inexorably ceases to function as that metric because people will start to game it. I feel Goodhart&apos;s Law holds up fairly well across many different situations in general, where it is truer in some cases more than in others, especially as the measure becomes more widely used. It is closely related to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Perverse_incentive&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Cobra Effect&lt;/a&gt;, which dates back to an old British policy enforced to control the population of cobras in India. At time time, there were too many cobras in India, so the British government placed a bounty on dead cobras. However, this caused the locals to breed cobras so they could make even more money off of the bounty&lt;/p&gt;
&lt;p&gt;A more recent example of Goodhart&apos;s Law is the increase in fake social media accounts, either rating restaurants well or following other users to boost their follower count. These negative externalities can even arise in A/B testing. For example, if teams are told to improve one metric (or even a few metrics) for growth, then teams can be incentivized to game that metric. A simple example of this is if a company wants to increase their views per visit, so the team paginates content into smaller pages. For more examples of how A/B testing can be gamed, refer to this &lt;a href=&quot;https://towardsdatascience.com/goodharts-law-and-the-dangers-of-metric-selection-with-a-b-testing-91b48d1c1bef&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;blog post&lt;/a&gt;. As a result, metrics need to be carefully considered and monitored to make sure they&apos;re not being gamed, especially in A/B tests.&lt;/p&gt;
&lt;p&gt;Another example of Goodhart&apos;s Law could be p-hacking. For many researchers, they need to submit scientific papers to stay in academia. Since very few esteemed journals publish negative results, then scientists can often feel pressured into producing positive results. Even without any malicious intent, they begin looking for postive results worth publishing due to this misalignment of incentives. Instead of following the scientific method by forming a hypothesis then testing it naturally, they could begin with testing many different hypotheses to see which ones might produce postive results. There becomes a chance you will find something positive (i.e. a test that satisifies the p &amp;#x3C; &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;0.05&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;0.05&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;) that is semi-similar to the original hypothesis, but isn&apos;t quite the same.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[The Dead Sea Effect]]></title><description><![CDATA[Recently, I read a post from Bruce Webster's blog that highlights a pattern happening at large organizations, where the quality of retained employees exponentially worsens overtime when talented…]]></description><link>https://dkharazi.github.io/blog/deadsea</link><guid isPermaLink="false">https://dkharazi.github.io/blog/deadsea</guid><pubDate>Mon, 22 Aug 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Recently, I read &lt;a href=&quot;http://brucefwebster.com/2008/04/11/the-wetware-crisis-the-dead-sea-effect/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;a post from Bruce Webster&apos;s blog&lt;/a&gt; that highlights a pattern happening at large organizations, where the quality of retained employees exponentially worsens overtime when talented employees leave the company. In many cases, the most talented employees will be the most likely ones to leave the company if there is a workplace problem, since they are the ones most likely to secure opportunities at other companies quickly. As a result, the least talented employees are the ones that remain if there is a workplace problem. As talented employees continue to leave the organization, backfilling their roles with other talented employees becomes even more difficult. since they either notice the lack of quality right away and look elsewhere, or they will join and leave shortly afterwards. As a result, the only employees who are retained over the long-term are those that are less talented, and the degradation of talent at the large organization becomes self-reinforcing. Webster names this pattern &lt;em&gt;The Dead Sea effect&lt;/em&gt;, since he noticed talented engineers &lt;em&gt;evaporate&lt;/em&gt; similar to water in the Dead Sea, where less talented employees are the remaining &lt;em&gt;salt residue&lt;/em&gt;. For additional context, water collected in the Dead Sea evaporates more quickly than water in the open ocean, making it one of the saltiest bodies of water in the world.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Comparing Common Implicit Recommenders]]></title><description><![CDATA[Since ALS was used by researchers at Yahoo in , more recommenders have been developed to handle implicit data interactions. Many of these recommenders are based on matrix factorization and still…]]></description><link>https://dkharazi.github.io/blog/als</link><guid isPermaLink="false">https://dkharazi.github.io/blog/als</guid><pubDate>Tue, 15 Mar 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Since &lt;a href=&quot;http://www.yifanhu.net/PUB/cf.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;ALS was used&lt;/a&gt; by researchers at Yahoo in &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2008&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2008&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, more recommenders have been developed to handle implicit data interactions. Many of these recommenders are based on matrix factorization and still suffer from the cold-start problem, which refers to the inability to make predictions about new customers or new items, without being forced to retrain the entire model. I&apos;ve seen many different varients of the cold-start problem in different blogs and papers, but this seems to be the most prevalent definition.&lt;/p&gt;
&lt;p&gt;Content-based models, like decision trees or clustering models, often don&apos;t experience the cold-start problem, since only the features of a new user are necessary for these models to make predictions (rather than the user ID itself). On the other hand, collaborative filtering models often are incapable of modeling a new or recently added user, since the whole model must be retrained for the collaborative filtering model to look up that user by his or her ID. For this reason, these collaborative filtering models aren&apos;t really considered to be &lt;em&gt;model-based&lt;/em&gt; and instead are considered to be &lt;em&gt;memory-based&lt;/em&gt;. For a more detailed evaluation of recommenders&apos; behaviors, refer to &lt;a href=&quot;https://dl.acm.org/doi/10.1145/2645710.2645742&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this paper&lt;/a&gt;, which was published and presented at the &lt;a href=&quot;https://recsys.acm.org/recsys14/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;RecSys 2014 Conference&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For the remainder of this post, I plan on illustrating the major similarities and differences between some of the most common implicit recommendation algorithms found in practice.&lt;/p&gt;
&lt;h2&gt;Comparing the ALS and BPR Losses&lt;/h2&gt;
&lt;p&gt;In practice, there are two major differences I&apos;ve found between using ALS and BPR. Obviously, these differences exist because of their distinct loss functions. The first major difference relates to the ranking of items for a user based on its ratings (or relevance scores), and the second major difference relates to transfer learning happening across similar users and items built into the ratings. Note, most of the following points come from my practical experience with these recommenders, so these points may or may not perfectly align with today&apos;s literature or any future literature.&lt;/p&gt;
&lt;p&gt;Again, the first difference between the two models relates to the degree to which a user&apos;s ranking between rated items is maintained for predicted ratings. The second difference relates to the amount of accurate &lt;em&gt;transfer learning&lt;/em&gt; that is happening between similar items and users when ratings are predicted. When referring to collaborative filtering models, &lt;a href=&quot;https://arxiv.org/pdf/1507.08439.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;transfer learning&lt;/a&gt; typically refers to the degree to which a user&apos;s predicted rating for an item is based on information from other similar users and similar items. In general, models using the BPR loss function have a greater degree of transfer learning built into the predicted ratings, compared to models using the ALS loss function.&lt;/p&gt;
&lt;p&gt;Now, let me expand on the first difference relating to ranking. ALS creates latent factors that will map back to the original rating matrix as accurately as possible, whereas BPR creates latent factors that will make sure the ranking of items for each user is maintained when mapped back to the original rating matrix. For example, let&apos;s say there is a user who gives item A a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;5.0&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;5.0&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; star rating and item B a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;4.8&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;4.8&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; star rating. Notice, this user&apos;s first and second top-rated items are close to each other in this example. ALS is more likely to create embeddings that could jumble up the ranking of these ratings, meaning the predicted rating could be &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;4.7&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;4.7&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; stars for item A and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;4.9&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;4.9&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; stars for item B after the ALS embeddings are multipled together. However, BPR is more likely to create embeddings that would maintain the correct ranking of these ratings. So, the predicted rating could be &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;4.3&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;4.3&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; stars for item A and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;3.9&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;3.9&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; stars for item B after the BPR embeddings are multipled together. Notice, the ranking is maintained using the BPR embeddings, but the user&apos;s ratings are more accurate using the ALS embeddings.&lt;/p&gt;
&lt;p&gt;As with other recommenders, the BPR model can sometimes fall into a feedback loop of just recommending the most popular item. For example, BPR is more prone to recommend the most popular item if most users highly rate the most popular item and if the total number of available items is small. This point is illustrated in the example below recommending food at a completely fictional Chinese restaurant. In other words, I&apos;ve noticed there is a lesser degree of newness or diversity with predicted ratings from a BPR model compared to an ALS model, since the actual item ranking is maintained much more often with a BPR model. This is a direct effect of the BPR loss function optimizing for an accurate ranking of ratings. &lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/167c6ba064b3e8b0ddfbdbb907dbef41/bpr_bars.svg&quot; alt=&quot;BPR Example Bar Chart&quot;&gt;&lt;/p&gt;
&lt;p&gt;Now, let&apos;s expand on the second difference relating to transfer learning, which is actually a direct effect of the first difference (i.e. opimizing for more accurate ranking).  According to &lt;a href=&quot;https://arxiv.org/pdf/1205.2618.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the BPR paper&lt;/a&gt;, the model assumes that a user prefers their observed items (i.e. rated, purchased, viewed, etc.) over any unobserved items (i.e. items without a rating, purchase, view, etc.). For a user&apos;s unobserved items, the BPR model predicts ratings that more accurately reflect the items of similar users.&lt;/p&gt;
&lt;p&gt;As an example, suppose most users who watch horror movies also watch documentaries, and suppose we notice a particular user has only ever watched horror movies. In this situation, there is a better chance of our BPR model predicting ratings, such that horror movies are ranked the highest and documentaries are ranked the next highest on average. On the other hand, our ALS model doesn&apos;t guarantee that documentaries are ranked the next highest. This point is especially true when the ranking of items are somewhat close together. For example, suppose users who watch horror movies nearly watch action movies as often as documentaries, but still not quite as often as documentaries. Then, ALS has a better chance of jumbling up these rankings by predicting ratings that rank action movies slightly higher than documentaries for users who watch horror movies. The reason for this happening is because BPR optimizes for accurate rankings of ratings, whereas ALS optimizes for accurate ratings. &lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/88bf718e7f05559d2d097732e5c3482e/bpr_median.svg&quot; alt=&quot;BPR Example Bar Chart&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Comparing the BPR and WARP Losses&lt;/h2&gt;
&lt;p&gt;The BPR and WARP loss functions originate from information retrieval theory and have been used in Learning to Rank (LTR) models. BPR focuses on running pairwise comparisons between samples of positive and negative items. More specifically, a BPR model involves selecting a user &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;u&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;u&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and an item &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;i&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.65952em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; that user &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;u&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;u&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; deems to be positive, which could mean the user viewed the item, purchased the item, rated highly, etc. Then, BPR models will randomly select an item &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;j&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.85396em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; that the same user &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;u&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;u&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; deems to be negative. Once the BPR model has randomly selected a positive item &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;i&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.65952em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and negative item &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;j&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.85396em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, it computes a predicted rating for both items by calculating the dot product of its factorized item vectors on the user vector. Meaning, the BPR model will calculate the dot product between the factorized vectors user &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;u&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;u&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and item &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;i&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.65952em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, and it will calculate the dot product between user &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;u&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;u&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and item &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;j&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.85396em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Lastly, the BPR model calculates the difference between those two predicted ratings and passes this difference through a sigmoid function. This final output is used as a weighting coefficient to update all of the model parameters using stochastic gradient descent (or SGD). Essentially, the output of the sigmoid function represents the rank of each item relative to a user, and it tells us how close or far away one item&apos;s rating is to another item&apos;s rating. In summary, we only focus on the relative ranking between items for a user, and we completely disregarded how well we predict the rating for each user-item pair.&lt;/p&gt;
&lt;p&gt;In a similar fashion, WARP focuses on the ranking of items for each user by using a triplet loss &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;v&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;v&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;(user, positive item, negative item)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, like BPR. However, the WARP loss function only updates parameters when the model predicts a negative item has a higher score than the positive item, whereas the BPR loss function updates parameters for every iteration. When iterating, the WARP loss function continues to draw negative samples until it finds a rank violation or hits some assigned threshold given as a hyperparameter. In the initial iterations, WARP makes a larger gradient update for any rank violations, indicating predictions aren&apos;t an accurate reflection of the actual ranking of items for a user. In later iterations, WARP makes smaller updates for any rank violations, since this indicates the model is producing predictions that are a more accurate reflection of the actual ranking of items. In other words, the model&apos;s predictions provide an optimal accuracy, so updates should be small.&lt;/p&gt;
&lt;p&gt;Compared to the BPR loss function, the WARP loss function generally produces more accurate predictions (in terms of ranking), but takes more time to train since it continues to sample items until a rank violation appears. Consequently, as more epochs (or iterations) are trained, the WARP loss functions becomes much slower compared to the BPR loss function, since a violation becomes more difficult to find. Assigning a cutoff value for searching is important for training using the WARP loss function.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/89551a1d36fb6ff5e3e4c7db9fd48e91/warp_time.svg&quot; alt=&quot;WarpTime&quot;&gt;&lt;/p&gt;
&lt;p&gt;Most of the information comparing the BPR and WARP loss functions came from &lt;a href=&quot;https://sites.northwestern.edu/msia/2019/04/24/personalized-restaurant-recommender-system-using-hybrid-approach/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this research paper&lt;/a&gt;. It references the &lt;a href=&quot;https://arxiv.org/pdf/1507.08439.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;LightFM paper&lt;/a&gt; occasionally, which is a hybrid recommender that can use both the BPR and WARP loss functions. I recommend reading the LightFM paper for a better understanding of contextual models, the BPR loss function, and the ALS loss function. I recommend reading the research paper for a better understanding of the BPR and WARP loss functions.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Improving Health Care with 1% Steps]]></title><description><![CDATA[According to the 1% Steps Project, the three most effective ways for lowering health care costs include reducing surprise billing, capping provider prices, and providing real-time adjudication for…]]></description><link>https://dkharazi.github.io/blog/healthcare</link><guid isPermaLink="false">https://dkharazi.github.io/blog/healthcare</guid><pubDate>Tue, 01 Mar 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;According to the &lt;em&gt;1% Steps Project&lt;/em&gt;, the three most effective ways for lowering health care costs include reducing surprise billing, capping provider prices, and providing real-time adjudication for health insurance claims. Most of the information from this post is derived from the &lt;em&gt;1% Steps Project&lt;/em&gt;. The motivation of this project is best summarized by the authors on the project&apos;s &lt;a href=&quot;https://onepercentsteps.com/about/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;about page&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The goal of the 1% Steps for Health Care Reform Project is to shift the way we think about health care spending in the US and offer a roadmap to policy makers of tangible steps we as a country can take to lower the cost of health care in the US. We want to leverage leading scholars’ work to identify discrete problems in the US health system and offer evidence-based steps for reform. We will continually update the project with new proposals that are based on the latest academic research.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;The Problem with Out-of-Network Billing&lt;/h2&gt;
&lt;p&gt;A &lt;em&gt;surprise medical bill&lt;/em&gt; is an unexpected bill from a health care provider or facility. This can happen when a person with health insurance unknowingly receives medical care from a provider or facility outside their health plan’s network. Surprise billing happens in both emergency and non-emergency care. In an emergency, an individual usually goes (or is taken) to the nearest emergency department. Even if they go to an in-network hospital for emergency care, they might receive care from out-of-network providers at that facility. For non-emergency care, an individual might choose an in-network facility or an in-network provider, but they might not know that a provider involved in their care is an out-of-network provider. For example, an in-network surgeon removing wisdom teeth might use an out-of-network anesthesiologist to sedate a patient beforehand (without informing the patient). In both emergency and non-emergency circumstances, the person might not be able to choose the provider or ensure that all of their care is from a participating provider.&lt;/p&gt;
&lt;p&gt;There are &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;4&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;4&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; medical specialties where patients have little or no choice over the physician who treats them: pathology, emergency medicine, anesthesiology, and radiology (or PEAR physicians). As a result, PEAR physicians can refuse to join insurers’ networks, but cannot be avoided by patients. When PEAR physicians can bill out of network from inside in-network hospitals, patients can be exposed to large, unexpected, and unavoidable medical bills. In addition, the ability to engage in this profitable strategy gives these PEAR physicians the bargaining power to negotiate higher in-network payments compared to other physicians. In the end, these higher in-network payments are passed on to consumers in the form of higher insurance premiums.&lt;/p&gt;
&lt;p&gt;In the &lt;a href=&quot;https://onepercentsteps.com/policy-briefs/out-of-network-billing-by-hospital-based-physicians/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;policy brief by Cooper and Scott Martin&lt;/a&gt;, policy makers are suggested to improve on these problems by banning physicians from &lt;em&gt;balance billing&lt;/em&gt; patients. To be clear, balance billing occurs when providers bill a patient for the difference between the amount they charge and the amount that the patient&apos;s insurance pays. Policy makers also are suggested to determine either the amount or the process through which out-of-network providers get paid.&lt;/p&gt;
&lt;p&gt;In other words, there are two possible approaches. First, Cooper suggests a baseball-style arbitration approach: if an agreement between a doctor and an insurer isn&apos;t made, each would submit a bid to an arbitrator who would select between the two options for payment. Second, policy makers could require that hospitals sell a package of care with hospital and physician services at a set price, which would eliminate the possibility of a patient going to an in-network facility but being treated by an out-of-network provider.&lt;/p&gt;
&lt;p&gt;Cooper argues regulating the nature of the contract between providers and insurers is an important potential approach in addressing out-of-network billing. He believes that price regulation is necessary since these markets are natural monopolies. Most of the information above was taken from the policy brief by Cooper and Scott Martin. For more details about out-of-network billing, refer to &lt;a href=&quot;https://onepercentsteps.com/policy-briefs/out-of-network-billing-by-hospital-based-physicians/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;their other policy brief&lt;/a&gt;. &lt;/p&gt;
&lt;h2&gt;The Problem with Current Provider Prices&lt;/h2&gt;
&lt;p&gt;Markets for healthcare providers have become increasingly consolidated (or monopolistic). This has been happening over the last &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;30&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;30&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; years. Since the market has become more consolidated, providers can raise prices without many cheaper alternatives. As a result, many providers will raise premiums.&lt;/p&gt;
&lt;p&gt;Although competition drives prices to efficient levels in a well-functioning market with a healthy amount of competition, the healthcare provider market deviates from a competitive market. In particular, it deviates from a competitive in the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;3&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;3&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; following ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Reliance on health insurance shields patients from the costs of consumed services&lt;/li&gt;
&lt;li&gt;Patients are less able to differentiate between high and low quality providers&lt;/li&gt;
&lt;li&gt;Most hospital and specialist physician markets are highly concentrated&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To expand on the first difficulty, the coverage of health insurance limits an individual&apos;s exposure to their costs of services required for a treatment. Additionally, exact costs aren&apos;t always available to the patient during their visit, since certain claims take time to be processed by the patient&apos;s insurance provider. In these scenarios, the patient must rely on pricing estimates provided by the hospital, which is based on information given by the insurance companies. These estimated charges can exclude any physician fees billed by any out-of-network surgeon, anesthesiologist, emergency specialist, pathologist, radiologist, or other physicians who may be involved in your visit. Your actual charges could be more or less than these estimates depending on specific tests ordered by your doctor, previous test results, medical history, and other factors. In summary, the current state of health insurance can lead to incorrect estimates or confusion in understanding the actual cost of healthcare services until after a patient&apos;s visit.&lt;/p&gt;
&lt;p&gt;Second, the quality of providers is difficult to measure. As a result, consumers may have some difficulty in differentiating between high quality and low quality providers, which could lead to patients inadvertently paying for lower quality care at a higher price. Lastly, the majority of hospital and specialist physician markets are highly concentrated, which gives them substantial market power. As a result, providers can demand for higher prices due to market power alone, as opposed to a higher quality of services comparatively.&lt;/p&gt;
&lt;p&gt;Cooper argues that pro-competition policies and regulatory intervention will improve the current state of provider prices by reducing overall consolidation. Here, pro-competition policies include vigorous antitrust enforcement and an introduction of insurance plans, which likely will incentivize patients to seek out efficient providers. Also, regulatory intervention could include enforcement of price caps, which could mean higher quality providers can charge higher prices, whereas lower quality provides can only charge lower prices. For more details about Cooper&apos;s recommendations and the estimated savings on implementing price caps, refer to &lt;a href=&quot;https://onepercentsteps.com/policy-briefs/capping-provider-prices-and-price-growth-in-the-us-commercial-health-sector/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Cooper&apos;s policy brief&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;The Problem with Current Adjudication for Health Insurance Claims&lt;/h2&gt;
&lt;p&gt;In healthcare, &lt;em&gt;adjudication&lt;/em&gt; is the process of reviewing and paying for a claim that has been submitted by a healthcare provider after a patient&apos;s visit and carried out by a patient&apos;s insurance company. When a patient visits his or her medical provider, the patient first will need to present his or her insurance card to a staff member at the front desk. Next, the staff member will record information from the insurance card. After the patient receives services from his or her medical provider, the medical provider will submit a claim to the patient&apos;s insurance company using the recorded insurance information. Then, the claim is processed by the insurance company and the adjudication begins.&lt;/p&gt;
&lt;p&gt;In general, the adjudication process consists of five steps: an initial processing review, an automatic review, a manual review, a payment determination period, and a payment period. The initial processing review includes simple checks for errors or omissions, such as any incorrect spelling in the claim. Automatic review includes more detailed checks applying to the payment policies, such as checking for eligibility on the date of service or if the service should be considered medically necessary. A manual review includes checks by a medical claims examiner, which include additional checks deeming the service as necessary or unnecessary. The payment determination period involves the examiner determining if the claim should be paid, denied, or reduced, and it also involves any succeeding steps depending on the status assigned by the examiner. Lastly, the any payment will be submitted to the provider. Most adjudication (i.e. manual review) can take days several days or weeks for processing.&lt;/p&gt;
&lt;p&gt;Administration and adjudication for claims make up about &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;3&lt;/mn&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mn&gt;6&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;3-6\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.72777em;vertical-align:-0.08333em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of revenues for providers and payers. These costs are driven mostly by complexity created by building on to the current adjudication process over the years. These costs are also created due to the continued reliance on manual input and review.&lt;/p&gt;
&lt;p&gt;Each insurance payer maintains slight differences in their adjudication process. By standardizing and automating the adjudication process, the claims can be processed in real-time, the relevant administrative costs can be saved almost completely, and time can be for the insurance payer, medical provider, and patient. Additionally, price transparency can be achieved between the medical provider and insurance payer as a side effect of standardizing the adjudication process. Despite these advantages, adoption of these real-time adjudication systems remains low, mostly due to coordination failures between providers and vendors offering to implement these services.&lt;/p&gt;
&lt;p&gt;In their policy brief, Orszag, Lazard, and Rekhi propose to solve this coordination failure through a series of interventions for insurance payers, medical providers, and relevant vendors. These interventions include the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Standardization of claims forms and adjudication processes across all providers and payers&lt;/li&gt;
&lt;li&gt;New standards for reducing coding complexity&lt;/li&gt;
&lt;li&gt;Incentives for medical providers to adopt real-time adjudication systems by providers and payers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since the adjudication process isn&apos;t standardized across each insurance payer yet, vendors attempting to implement real-time claims processing systems experience coordination failure. For incentivizing adoption of real-time adjudication, relevant mandates could draw on authority from a range of federal programs and statutes.&lt;/p&gt;
&lt;p&gt;For more details about the current state of adjudication and any potential improvements, refer to Orszag, Lazard, and Rekhi&apos;s &lt;a href=&quot;https://onepercentsteps.com/policy-briefs/real-time-adjudication-for-health-insurance-claims/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;policy brief about real-time adjudication for health insurance claims&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://onepercentsteps.com/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;1% Steps Project&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.healthsystemtracker.org/chart-collection/u-s-health-care-resources-compare-countries/#item-nurses-licensed-to-practice-density-per-1000-population-2000-2018&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Data comparing US Healthcare Resources with Other Countries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.klgates.com/Surprise-Billing-Regulations-Out-of-Network-Providers-at-In-Network-Facilities-8-10-2021&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Paper about Regulating Out-of-Network Providers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.verywellhealth.com/out-of-insurance-network-claims-and-bills-2615282&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Article Outlining Common Reasons for Going Out-of-Network&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Outlining the A/B Testing Procedure]]></title><description><![CDATA[Experimentation using A/B testing is a crucial component in measuring customers' changes in behavior when making any changes to a business, including site changes, product changes, etc. Most of the…]]></description><link>https://dkharazi.github.io/blog/abtest</link><guid isPermaLink="false">https://dkharazi.github.io/blog/abtest</guid><pubDate>Mon, 28 Feb 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Experimentation using A/B testing is a crucial component in measuring customers&apos; changes in behavior when making any changes to a business, including site changes, product changes, etc. Most of the information in this post was outlined in &lt;a href=&quot;https://www.youtube.com/watch?v=DUNk4GPZ9bw&amp;#x26;ab_channel=DataInterview&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Dan Lee&apos;s video&lt;/a&gt;, which does a terrific job of defining the key steps in A/B tests. Visit his channel for more detailed A/B testing walkthroughs.&lt;/p&gt;
&lt;p&gt;In general, Dan outlines &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;7&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;7&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; general steps when designing and running tests, which include the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Defining a Problem Statement&lt;/li&gt;
&lt;li&gt;Designing a Hypothesis Test&lt;/li&gt;
&lt;li&gt;Designing an Experiment&lt;/li&gt;
&lt;li&gt;Running the Experiment&lt;/li&gt;
&lt;li&gt;Validating the Experiment and its Results&lt;/li&gt;
&lt;li&gt;Interpreting the Results&lt;/li&gt;
&lt;li&gt;Launching a Decision&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Some software, such as Optimizely, will do some of these steps for you behind-the-scenes. For additional details about A/B testing and its benefits, refer to &lt;a href=&quot;https://www.optimizely.com/optimization-glossary/ab-testing/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt; about common A/B tests outlined by a popular A/B testing software called &lt;em&gt;Optimizely&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;Defining a Problem Statement&lt;/h2&gt;
&lt;p&gt;Before designing any experiments or hypotheses, everyone that is a part of the test must align on a few key success metrics and an overarching goal of the experiment. Creating a user journey can be helpful in some cases as well. Metrics are created in an attempt to measure the customer or user&apos;s behavior, and great success metrics will either perfectly capture the behavior or at least be an accurate proxy for capturing the behavior. Typically, effective success metrics are crafted around the following principles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Measurable:&lt;/strong&gt; Will the metric track the customer&apos;s behavior using data collected throughout the experiment?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Attributable:&lt;/strong&gt; Will the metric accurately capture any potential change in the customer&apos;s behavior when the customer is introduced to the treatment?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sensitive:&lt;/strong&gt; Will the metric be sensitive enough so it always will detect a change in behavior when there is an actual change?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timely:&lt;/strong&gt; Will the metric measure the customer&apos;s behavior in a short time window?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these principles should be considered in any test when building effective success metrics. Examples of useful success metrics could include average daily revenue, average click-to-open ratios, or average click-through ratios (or CTR).&lt;/p&gt;
&lt;p&gt;Sometimes, a metric accurately measures what a team wants to assess in an A/B test, but the metrics is determined to be insensitive. In these situations, we can use alternative metrics, try proxy metrics, and apply transformations to a metric to increase its sensitivity for a test. Some transformations include the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Applying a log transformation &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mi&gt;log&lt;/mi&gt;&lt;mo&gt;⁡&lt;/mo&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x \to \log(1 + x)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop&quot;&gt;lo&lt;span style=&quot;margin-right:0.01389em;&quot;&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;Capping the value at a fixed maximum &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;100&lt;/mn&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x \to clip(upper=100)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;Changing the metrics aggregation level &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;/mfrac&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\frac{x}{n} \to x&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.040392em;vertical-align:-0.345em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.695392em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As stated already, using proxy metrics can increase sensitivity if the goal is to reach a short-term goal versus a long-term goal. Also, converting metrics into other format can increase sensitivity, such as proportions, conditional averages, or percentiles. For a more detailed walkthrough of ensuring metrics are sensitive, refer to &lt;a href=&quot;https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/beyond-power-analysis-metric-sensitivity-in-a-b-tests/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this write-up by Microsoft&lt;/a&gt; about the importance of performing a sensitivity analysis, or refer to &lt;a href=&quot;https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/why-tenant-randomized-a-b-test-is-challenging-and-tenant-pairing-may-not-work/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article by Microsoft&lt;/a&gt; for variance reduction methods.&lt;/p&gt;
&lt;h2&gt;Designing a Hypothesis Test&lt;/h2&gt;
&lt;p&gt;After there is alignment on the problem statement and a set of business KPIs, then the testing team can design a hypothesis test using these components. Arguably, the most important piece of the hypothesis test is including the null hypothesis the alternative hypothesis. As an example, our null hypothesis might be assuming there isn&apos;t a statistically significant difference in average daily revenue between our current product ranking algorithm and our new product ranking algorithm. Additionally, the test should include the following parameters as well:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Significance level&lt;/li&gt;
&lt;li&gt;Statistical power&lt;/li&gt;
&lt;li&gt;Minimum detectable effect (or MDE)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using our example, the significance level refers to the probability of observing a statistically significant difference in average daily revenue between our two ranking algorithms when &lt;em&gt;there isn&apos;t&lt;/em&gt; actually a difference. By default, the significance level is assigned to &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;α&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0.05&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\alpha = 0.05&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.0037em;&quot;&gt;α&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. On the other hand, the statistical power refers to the probability of observing a statistically significant difference in average daily revenue between our two ranking algorithms when &lt;em&gt;there is&lt;/em&gt; actually a difference. The statistical power can be evaluated by the minimum detectable effect (or MDE) in a power analysis. By default, the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mi&gt;D&lt;/mi&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;MDE&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05764em;&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is set at &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; lift, meaning a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; lift in our metric between the control and treatment group is practically significant. For detailed definitions of each of these components typically found in a power analysis, refer to the &lt;a href=&quot;https://www.statmethods.net/stats/power.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;R documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Designing an Experiment&lt;/h2&gt;
&lt;p&gt;After outlining the hypothesis test, our experiment should define any necessary and relevant parameters. In particular, the experiment should define basic units of measure, including who the test will randomize (e.g. buyers, email users, mobile app users, etc.). The test should also include the target population, the sample size, and the duration of the experiment. The following are some examples of relevant parameters in a test:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Test Split:&lt;/strong&gt; 50/50&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Randomization Unit:&lt;/strong&gt; User&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Target Population:&lt;/strong&gt; US men&apos;s site visitors&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Duration of the Experiment:&lt;/strong&gt; 1 to 2 weeks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sample Size:&lt;/strong&gt; Number of customers based on power analysis&lt;/li&gt;
&lt;/ul&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo&gt;≈&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mn&gt;16&lt;/mn&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;msup&gt;&lt;mi&gt;δ&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;n \approx \frac{16 \sigma^{2}}{\delta^{2}}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.48312em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;≈&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:2.177108em;vertical-align:-0.686em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.491108em;&quot;&gt;&lt;span style=&quot;top:-2.314em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03785em;&quot;&gt;δ&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.740108em;&quot;&gt;&lt;span style=&quot;top:-2.9890000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;σ&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8141079999999999em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.686em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Next Steps for each Hypothesis:&lt;/strong&gt; Replace ranking system for any improvement, but do not rollout change otherwise&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Success Metric stratified by Cohorts of Customers:&lt;/strong&gt; Spend by order history, spend by gender, spend by age, spend by newness of customer, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Running the Experiment&lt;/h2&gt;
&lt;p&gt;Next, the experiment must be run. Before jumping right into this step, it&apos;s important to note that the appropriate instruments and data pipelines must be set up. Additionally, the correct data, specifically any data used for success metrics, must be collected in preparation for the experiment.&lt;/p&gt;
&lt;p&gt;Once the pipelines are implemented, the test should run for the entirety of the duration. Specifically, the test shouldn&apos;t be stopped at any point, even if a small p-value is observed early on. Obviously, the sample size will be small early on, which will cause the p-value to fluctuate in the beginning. For this reason, the test shouldn&apos;t be stopped early.&lt;/p&gt;
&lt;h2&gt;Validating the Experiment&lt;/h2&gt;
&lt;p&gt;Once the experiment is finished running, there should be initial checks made to ensure the experiment ran successfully without any bugs. Specifically, there should be guardrail metrics set up and analyzed afterwards, such as system latency times, to make sure the data was collected correctly.&lt;/p&gt;
&lt;p&gt;Other errors and biases must be validated afterwardsm including external factors. These external factors could include less obvious biases, such as running the test during a recession, during an abnormally high spending period, during the holiday season, etc. Stratifying for external variables can help alleviate these biases, along with running ad-hoc analyses on other years or time periods for comparison.&lt;/p&gt;
&lt;p&gt;Another important check is a validation for selection bias, which could include validating similar distributions for cohorts of users or other variables between the control and test groups. For example, we should account for a novelty effect amongst customers by segmenting customers into new and old cohorts, then stratifying them to avoid any potential selection bias. Lastly, users should be validated to make sure there is a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;50&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;/&lt;/mi&gt;&lt;mn&gt;50&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;50/50&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; split. Sometimes, a randomized experiment could lead to a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;49&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;/&lt;/mi&gt;&lt;mn&gt;50&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;49/50&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; split, which should be checked. For more examples of validations, refer to &lt;a href=&quot;https://www.youtube.com/watch?v=DUNk4GPZ9bw&amp;#x26;ab_channel=DataInterview&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Dan Lee&apos;s video&lt;/a&gt;. The above validations are listed in the table below:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bias&lt;/th&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instrumentation Effect&lt;/td&gt;
&lt;td&gt;Guardrail metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External Factors&lt;/td&gt;
&lt;td&gt;Holidays, distruptions, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Selection Bias&lt;/td&gt;
&lt;td&gt;A/A Test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sample Ratio Mismatch&lt;/td&gt;
&lt;td&gt;Chi-Square Goodness of Fit Test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Novelty Effect&lt;/td&gt;
&lt;td&gt;Segment by new and old customers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Interpreting the Results&lt;/h2&gt;
&lt;p&gt;Once the experiment and its results have been validated, the results then must be interpreted for the team who is interested in them. Results should be analyzed to ensure they&apos;re practical and statistically significant. This can be done by evaluating the results from the hypothesis tests and power analysis after running the experiment, while measuring the lifts in our designated success metrics between the control and treatment groups also. For example, p-values, relative and absolute lift differences, and confidence intervals should be reported to the team.&lt;/p&gt;
&lt;h2&gt;Launching a Decision&lt;/h2&gt;
&lt;p&gt;After communicating the results to the team, next steps should be in place to make a final decision about the tested feature. Any trade-offs should be discussed in detail before making a final decision. For exmaple, evaluating the trade-offs of observing lifts in secondary and tertiary success metrics should be taken into consideration if lifts in primary success metrics aren&apos;t observed. Additionally, evaluating the financial cost of launching the service should be considered if the lift is marginal. There are plenty of other important trade-offs to consider as well, such as the severity of any false positives arising from the test.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Intuition behind BTYD Models]]></title><description><![CDATA[In many of Peter Fader's presentations, he has demonstrated how LTV forecasts can be accurately estimated by modeling and comining two other estimates together. In particular, these two models refer…]]></description><link>https://dkharazi.github.io/blog/bgnbd</link><guid isPermaLink="false">https://dkharazi.github.io/blog/bgnbd</guid><pubDate>Sun, 26 Sep 2021 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In &lt;a href=&quot;https://www.youtube.com/watch?v=guj2gVEEx4s&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;many of Peter Fader&apos;s presentations&lt;/a&gt;, he has demonstrated how LTV forecasts can be accurately estimated by modeling and comining two other estimates together. In particular, these two models refer to a purchase frequency model, which estimates the frequency of orders a customer will make, and a spend model, which estimates the dollar amount a customer will make. This post specificially deals with defining the former model, which is outlined in &lt;a href=&quot;http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Peter Fader&apos;s paper&lt;/a&gt; from 2005.&lt;/p&gt;
&lt;p&gt;This model is referred to as the &lt;em&gt;Buy till you die&lt;/em&gt; (or &lt;a href=&quot;https://en.wikipedia.org/wiki/Buy_Till_you_Die&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;BTYD&lt;/a&gt;) class of statistical models, which are designed to capture the purchasing behavior of non-contractual customers. Interestingly, the purchasing behavior of non-contractual customers can be summarized using two sub-models only.&lt;/p&gt;
&lt;p&gt;The Beta-Geometric/Negative-Binomial-Distribution (or BG/NBD) model is the most popular example of a BTYD model, which estimates the purchase frequency of customers. Specifically, this BTYD model jointly models two different things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How frequently customers make purchases while they&apos;re still &lt;em&gt;alive&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;How likely customers are to churning in a given time period&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In particular, the former process models the &lt;em&gt;repeat purchasing&lt;/em&gt; of customers, while the latter process models the &lt;em&gt;dropout rate&lt;/em&gt; of customers. The BG/NBD model will model the repeat purchasing process as an NBD (or poisson-gamma mixture) model, and it will model the dropout rate process as an BG (or beta-gamma mixture) model.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;Buy&lt;/em&gt; part (in the BTYD) refers to the frequency of orders made by each customer. Mathematically, the frequency of orders made by each customer is poisson-distributed across the entire population of customers. This distribution is parameterized by &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;λ&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\lambda&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;λ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, which represents the expected &lt;em&gt;transaction rate&lt;/em&gt; made across the entire population of customers for a given time period.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/b7f2a25f5757eb52ee93a0c1ea7c1bc7/btyd_buy.svg&quot; alt=&quot;BTYDBuy&quot;&gt;&lt;/p&gt;
&lt;p&gt;The expected transaction rate comes from an underlying data-generating process representing the entire population of customers. The expected transaction rate for customers is gamma-distributed across the entire population of customers. Said another way, heterogeneity in &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;λ&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\lambda&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;λ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; follows a gamma distribution. Implying, most customers have small transaction rates, which typically represent the lower valued customers. Some customers have larger transaction rates, which usually represent the higher valued customers. The shape of the gamma distribution (representing customer transaction rates) is defined using &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;r&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;α&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\alpha&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.0037em;&quot;&gt;α&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; parameters. Here, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;r&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is known as the shape parameter, whereas &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;α&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\alpha&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.0037em;&quot;&gt;α&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is known as the scale parameter.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/6372d44903009329304b16340f839c83/btyd_transaction.svg&quot; alt=&quot;BTYDTransactionRate&quot;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;Till you die&lt;/em&gt; part (in the BTYD) refers to the dropout rate for each customer. Mathematically, the dropout rate for each customer is geometric-distributed across the entire population of customers. This distribution is parameterized by &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;p&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.625em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;p&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, which represents the expected dropout rate made across the entire population of customers in a given time period.&lt;/p&gt;
&lt;p&gt;The expected dropout rate comes from an underlying data-generating process representing the entire population of customers. The expected dropout rate for customers is geometric-distributed across the entire population of customers. Said another way, heterogeneity in &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;p&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.625em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;p&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; follows a geometric distribution. Implying, customers having only made one order have a higher chance of churning, which typically represents the lower valued customers. On the other hand, customers having made multiple orders have a lower chance of churning, which usually represent the higher valued customers. The shape of the distribution representing customer dropout rates is represented using &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;s&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;β&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\beta&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8888799999999999em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05278em;&quot;&gt;β&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; parameters. Here, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;s&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is known as the shape parameter, whereas &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;β&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\beta&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8888799999999999em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05278em;&quot;&gt;β&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; us known as the scale parameter.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/41f9baccda499bd3d11017d2dcf510c7/btyd_dropout.svg&quot; alt=&quot;BTYDDropout&quot;&gt;&lt;/p&gt;
&lt;p&gt;For more information about the BG/NBD model or other BTYD models, refer to &lt;a href=&quot;http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Peter Fader&apos;s original paper&lt;/a&gt;. For higher-level illustrations about the BTYD models, refer to &lt;a href=&quot;https://medium.com/geekculture/predicting-customer-life-time-value-cltv-via-beta-geometric-negative-binominal-distribution-59be07ac30bd&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this helpful article&lt;/a&gt;. For more practical examples about BTYD models and simulations illustrating BG/NBD models in Python, refer to &lt;a href=&quot;https://towardsdatascience.com/predicting-customer-lifetime-value-with-buy-til-you-die-probabilistic-models-in-python-f5cac78758d9&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Potential Reasons for Recent Shortages]]></title><description><![CDATA[Recently, there have been shortages for many different products, especially in the last few years since COVID-19. The latest shortage of semiconductors has impacted multiple industries, including the…]]></description><link>https://dkharazi.github.io/blog/shortage</link><guid isPermaLink="false">https://dkharazi.github.io/blog/shortage</guid><pubDate>Thu, 09 Sep 2021 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Recently, there have been shortages for many different products, especially in the last few years since COVID-19. The latest shortage of semiconductors has impacted multiple industries, including the automotive industry, technology industry, etc. These product shortages have been caused by recent labor shortages, port shut downs, and excess demand. Dr. Robert Handfield wrote a &lt;a href=&quot;https://scm.ncsu.edu/scm-articles/article/q-why-so-many-product-shortages-a-the-perfect-storm&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;great article summarizing these relevant issues and their causes&lt;/a&gt;. Roughly, the main reasons for these particular product shortages include the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Preparing for a drop in demand from COVID, but instead realizing an unexpected spike in demand throughout the pandemic&lt;/li&gt;
&lt;li&gt;A difficulty in quickly reallocating resources and capacties&lt;/li&gt;
&lt;li&gt;Temporary closures of plants and ports due to COVID&lt;/li&gt;
&lt;li&gt;Labor shortages in the supply chain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&apos;s introduce our first problem about recent shortages of semiconductors by starting off with a bit of history. Historically, semiconductor manufacturers seemed to &lt;a href=&quot;https://dspace.mit.edu/bitstream/handle/1721.1/78166/829683741-MIT.pdf?sequence=2&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;overbuild fabricator plants and overestimate their capacities&lt;/a&gt; for capacity planning during successful years, which resulted in greater losses during less successful years. This happened more frequently in the 90s and hasn&apos;t happened as much recently. Since then, semiconductor companies have become more careful about capacity planning. As a result, most of them have avoided those greater losses in the last decade that they were used to seeing in the 90s.&lt;/p&gt;
&lt;p&gt;In the last few years, &lt;a href=&quot;https://www2.deloitte.com/content/dam/Deloitte/cn/Documents/technology-media-telecommunications/deloitte-cn-tmt-semiconductors-the-next-wave-en-190422.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;semiconductor companies have been increasing capacity&lt;/a&gt; due to more demand related to cloud computing, machine learning, and gaming. During the pandemic, there was an increased usage of cloud computing from companies and universities. These pandemic-related spikes in demand for semiconductors weren&apos;t included in companies&apos; planned capacity forecasts. Most likely, these companies expected drops in demand during the pandemic, rather than the realized increase in demand for semiconductors. Forecasting capacities during a time that is as turbulent as a pandemic is already difficult enough, but it becomes even more difficult since these same companies need to be extra careful about overbuilding plants, due to costly mistakes made in the past.&lt;/p&gt;
&lt;p&gt;In other words, semiconductor shortages seem to be happening because their customers (like automotive companies) expected a reduction in demand due to COVID-19. As a result, the semiconductor fabrication plants reduced capacity and shut down their plants. With an unexpected increase in demand recently, semiconductor manufacturers have started adding another plant. However, this addition requires significant time and billions of dollars in investment, so any addition likely won&apos;t happen for another year or two.&lt;/p&gt;
&lt;p&gt;Additionally, there have been many temporary port closures due to health and safety protocols for the pandemic. Specifically, &lt;a href=&quot;https://www.bloomberg.com/news/articles/2021-08-12/massive-china-port-shutdown-raises-fears-of-closures-worldwide&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;major outbound ports, like Yantian, have been shutting down temporarily&lt;/a&gt;. These backups can cause many downstream complications and delays, usually requiring weeks to catch up. These shutdowns lead to a major shortage in shipping containers. These shutdowns can also cause delays in cargo ship arrivals, which also contribute to &lt;a href=&quot;https://fbx.freightos.com/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;recent increases in container prices&lt;/a&gt; and costs of imports from China, as a result.&lt;/p&gt;
&lt;p&gt;While all of this is going on, there are also shortages in labor relevant to these industries. For example, &lt;a href=&quot;https://www.forbes.com/sites/billconerly/2021/07/07/the-labor-shortage-is-why-supply-chains-are-disrupted/?sh=473795cd301d&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;demand exceeds job applicants&lt;/a&gt; for local delivery drivers, warehouse workers, factory workers, and many other supply chain positions. Labor shortages, especially in the industries related to supply chain, are &lt;a href=&quot;https://www.businessinsider.com/shipping-delays-china-supply-chain-record-ships-stuck-california-ports-2021-8&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;leading the slowdown of loading cargo at US ports&lt;/a&gt;. Again, these slowdowns lead to downstream delays and spikes in costs.&lt;/p&gt;
&lt;p&gt;In addition, &lt;a href=&quot;https://scm.ncsu.edu/scm-articles/article/the-big-texas-chemical-freeze-raises-issues-on-resiliency-of-the-petrochemical-supply-chain&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;recent storms like the ones in Texas and hurrican Ida&lt;/a&gt; have caused chemical plants to shut down temporarily. For example, chemical manufacturing lead times were delayed for months after these storms. In particular, these chemicals are used to create compounds in many different industries, like auto parts, computers, plastics, etc. With these shutdowns, delays, shortages, and price spikes followed due to the downstream effects of necessary evacuations.&lt;/p&gt;
&lt;p&gt;For additional information or more details about recent product shortages related to the pandemic, I recommend reading &lt;a href=&quot;https://scm.ncsu.edu/scm-articles/article/q-why-so-many-product-shortages-a-the-perfect-storm&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article about reasons for recent product shortages&lt;/a&gt;. For more details or a brief summary about other supply chain issues related to COVID-19, I recommend reading &lt;a href=&quot;https://scm.ncsu.edu/scm-articles/article/will-supply-chain-price-increases-result-in-inflation-the-fed-isnt-worried&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article summarizing recent supply chain issues&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Six Dimensions of National Culture]]></title><description><![CDATA[In the field of cross-cultural psychology, Dimensionalizing Cultures by Hofstede is still one of the most widely accepted and cited papers, even though his first paper about dimensionalizing cultures…]]></description><link>https://dkharazi.github.io/blog/culture</link><guid isPermaLink="false">https://dkharazi.github.io/blog/culture</guid><pubDate>Mon, 23 Aug 2021 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In the field of cross-cultural psychology, &lt;a href=&quot;https://scholarworks.gvsu.edu/cgi/viewcontent.cgi?article=1014&amp;#x26;context=orpc&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Dimensionalizing Cultures by Hofstede&lt;/a&gt; is still one of the most widely accepted and cited papers, even though his first paper about dimensionalizing cultures was published back in 1988. His research began when he accepted a position at IBM as a manager of personnel research, where he later founded and managed the Personnel Research Department.&lt;/p&gt;
&lt;p&gt;In this role, he collected employee opinion surveys in over 70 national subsidiaries of IBM around the world, which eventually included over 100,000 questionnaires. This repository of questionnaires represented the largest cross-national database in existence, where he discovered significant differences between cultures in other countries. Hofstede was able to use this data to both support and contribute to pre-existing research in this field, which eventually led to the creation of six dimensions separating national culture:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dimensions:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Collectivism (versus Individualism)&lt;/li&gt;
&lt;li&gt;High Power Distance (versus Low Power Distance)&lt;/li&gt;
&lt;li&gt;Masculinity (versus Femininity)&lt;/li&gt;
&lt;li&gt;High Uncertainty Avoidance (versus Low Uncertainty Avoidance)&lt;/li&gt;
&lt;li&gt;Long-Term Orientation (versus Short-Term Orientation)&lt;/li&gt;
&lt;li&gt;High Indulgance (versus Low Indulgance)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At a high-level, collectivism is related to the integration of individuals into primary groups within a nation. Collectivism is arguably one of the most important axis within the six dimensions separating national culture. Additionally, power distance is related to how much less powerful members of a culture accept that power is distributed unequally. Also, masculinity is related to the division of emotional roles between women and men. Next, uncertainty avoidance is related to the level of stress that is created in a society when faced with an unknown future. Long-term orientation is related to whether a culture&apos;s efforts are more focused on the future or the present/past. Lastly, indulgence relates to the gratification of basic human desires related to enjoying life.&lt;/p&gt;
&lt;p&gt;In general, wealthier nations are correlated with having a smaller power distance and being more individualistic. Additionally, recent economic growth has been correlated with long-term oriented countries. In his research, Triandis found that uncertainty accepting countries are slightly correlated with individualistic countries. More femanine countries tend to have a smaller power distance as countries become more and more femanine. As expected, more individualistic countries have a smaller power distance. Interestingly, there is almost no correlation between countries being masculine and countries being individualistic. Indulgant countries tend to be short-term oriented countries as well.&lt;/p&gt;
&lt;p&gt;Most of the research for this post was motivated by a &lt;a href=&quot;https://freakonomics.com/podcast/american-culture-2/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Freakonomics&apos; Podcast about Hofstede&apos;s model&lt;/a&gt;, which is definitely worth a listen. For a more detailed analysis about these &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;6&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;6&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; features, refer to &lt;a href=&quot;https://scholarworks.gvsu.edu/cgi/viewcontent.cgi?article=1014&amp;#x26;context=orpc&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hofstede&apos;s paper about Dimensionalizing Cultures&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Illustrating the Axis for Collectivism&lt;/h2&gt;
&lt;p&gt;In 1995, Harry Triandis wrote a book titled &lt;a href=&quot;https://psycnet.apa.org/record/1995-97791-000&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Individualism and Collectivism&lt;/a&gt;, examining the influences of culture and social behaviors. At the time, Triandis&apos; research focused on the aspects of different cultural values, and now he is considered a pioneer of cross-cultural psychology.&lt;/p&gt;
&lt;p&gt;In this book, he frames individualism and collectivism as a single scale, which is illustrated &lt;a href=&quot;https://fetzer.org/sites/default/files/images/stories/pdf/selfmeasures/CollectiveOrientation.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;in this paper&lt;/a&gt;. Essentially, collectivism is a spectrum representing the extent to which someone&apos;s personal identity is defined in terms of personal characteristics. Said another way, it&apos;s a spectrum representing the extent to which someone&apos;s personal identity is defined in terms of the characteristics of their group. An individualist just refers to someone who is relatively low on this spectrum; whereas a collectivist just refers to someone who is relatively high on this spectrum. In most cases, a person lies somewhere in the middle of this spectrum and isn&apos;t just considered one or the other. In other words, most people aren&apos;t strictly considered an &lt;em&gt;individualist&lt;/em&gt; or a &lt;em&gt;collectivist&lt;/em&gt;. Instead, they either lean more towards to being an individualist or a collectivist.&lt;/p&gt;
&lt;p&gt;Keep in mind, we&apos;re all part of many different groups, including family, company, sports team, political party, country, etc. So, we&apos;ll most likely be situated at different points on this spectrum depending on the particular group we&apos;re referencing. For example, we may prefer to have more collectivistic values within our family, but we may prefer to have more individualistic values within our company.&lt;/p&gt;
&lt;p&gt;Regardless of where someone is situated along this spectrum, both collectivists and individualists are motivated by the preferences, needs, and rights of the entity that they personally identify with. Since collectivists identify more with their collective group, they they are more motivated by the preferences, needs, and rights of their collective group. Since individualists don&apos;t identify with their collective group, then they are more motivated by their own preferences, needs, and rights.&lt;/p&gt;
&lt;p&gt;On a similar note, both collectivists and individualists prioritize the goals of the entity that they personally identify with. Since collectivists identify more with their collective group, then they prioritize the goals of their collective group. Since individualists don&apos;t really identify with their collective group, then they prioritize their own goals.&lt;/p&gt;
&lt;p&gt;At a high-level, collectivists refer to closely linked individuals who view themselves primarily as part of a group (or collective entity). Implying, an individual&apos;s decision is usually based on what is best for their group. Whereas, individualists refer to more self-interested individuals who view themselves as an independent entity, rathar than part of a whole. Meaning, an individual&apos;s decision is usually based on what is best for themselves. &lt;/p&gt;
&lt;p&gt;So far, we&apos;ve looked at a few effects of having an individualistic versus a collectivistic culture, but let&apos;s look at a few more. In an individualistic culture, people are expected to look after themselves and even their immediate family. On the other hand, a collectivistic culture has people who are expected to look after themselves, their immediate family, and their extended family with unquestioning loyalty. &lt;/p&gt;
&lt;p&gt;Additionally, an individualistic culture tends to promote a right of privacy. An individualistic culture also expects individuals to have their own personal opinion, and they believe that speaking one&apos;s mind is healthy. Alternatively, a collectivistic culture tends to experience stress about belonging to a group or their culture. A collectivistic culture also believes harmony should always be maintained, rather than speaking one&apos;s mind.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/e0efb7a3c4e4123de755c18b11ba00c7/individualism.svg&quot; alt=&quot;CollectivismScale&quot;&gt;&lt;/p&gt;
&lt;p&gt;Typically, collectivism prevails in less developed and Eastern countries, whereas individualism tends to prevail in more developed and Western countries. Specifically, the United States of America is ranked as the highest collectivism index. For more detailed examples about collectivistic and individualistic cultures, refer to &lt;a href=&quot;https://scholarworks.gvsu.edu/cgi/viewcontent.cgi?article=1014&amp;#x26;context=orpc&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hofstede&apos;s Paper about Dimensionalizing Culture&lt;/a&gt;. The graphic below refers to a basic map illustrating Hosfstede&apos;s collectivism scores for each country, which can also be found &lt;a href=&quot;https://geerthofstede.com/culture-geert-hofstede-gert-jan-hofstede/6d-model-of-national-culture/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;in this article&lt;/a&gt;. An interactive animation can also be found &lt;a href=&quot;https://exhibition.geerthofstede.com/hofstedes-globe/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;, which illustrates each score using an animation of a globe.&lt;/p&gt;
&lt;h2&gt;Illustrating the Axis for Power Distance&lt;/h2&gt;
&lt;p&gt;Power distance is defined as how much the less powerful members of a culture accept and expect that power is distributed unequally. For example, a family typically has a higher level of power distance, since children and parents expect the the parents to have a higher level of power and children to have a lower level of power. It suggests that a culture&apos;s level of inequality is endorsed by the followers as much as by the leaders. Power and inequality ultimately depend on the expectation of inequality by the followers. In almost every culture or society, there is at least some level of inequality of power, but some are more unequal than others.&lt;/p&gt;
&lt;p&gt;In a culture with a smaller power distance, power only should be be used in a few, more legitimate scenarios, whereas power is thought to be a basic fact of life and its legitimacy is irrelevant in a culture with a higher power distance. To illustrate societies and cultures with a smaller power distance in greater detail, the following are a few examples of norms practiced in societies with a smaller power distance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parents commonly treat children as equals&lt;/li&gt;
&lt;li&gt;Older people aren&apos;t outright respected or disrespected for their age&lt;/li&gt;
&lt;li&gt;Education is centered around a teacher-enforced curriculum&lt;/li&gt;
&lt;li&gt;Income distribution is meant to be even&lt;/li&gt;
&lt;li&gt;Religions emphasize an equality amongst believers&lt;/li&gt;
&lt;li&gt;Corruption is rare and scandals typically end political careers&lt;/li&gt;
&lt;li&gt;Governments are pluralists and based on majority rule to make peaceful changes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A society or culture with a larger power distance has conflicting standards. Alternatively, the following are a few examples of norms practiced in societies with a larger power distance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parents teach children to be obedient, especially to their elders&lt;/li&gt;
&lt;li&gt;Older people are respected by younger people&lt;/li&gt;
&lt;li&gt;Education is centered around a collaboration amongst students&lt;/li&gt;
&lt;li&gt;Income distribution is more uneven&lt;/li&gt;
&lt;li&gt;Religions emphasize a hierarchy of priests&lt;/li&gt;
&lt;li&gt;Corruption is more frequent and scandals tend to be covered up&lt;/li&gt;
&lt;li&gt;Governments are autocratic and changed by revolution&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;/e0e038d2ef38c9629076c6e3b8eea9b5/powerdistance.svg&quot; alt=&quot;PowerDistanceScale&quot;&gt;&lt;/p&gt;
&lt;p&gt;In general, East European, Latin, Asian, and African countries have a higher power distance index, whereas Germanic and English-speaking Western countries have a lower power distance index. Specifically, the United States of America is ranked on the lower end of power distance indices. Refer to &lt;a href=&quot;https://e-edu.nbu.bg/pluginfile.php/900222/mod_resource/content/1/G.Hofstede_G.J.Hofstede_M.Minkov%20-%20Cultures%20and%20Organizations%20-%20Software%20of%20the%20Mind%203rd_edition%202010.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hofstede&apos;s Book about Cultures and Organizations&lt;/a&gt; for more detailed examples about high and low power distance cultures.&lt;/p&gt;
&lt;h2&gt;Illustrating the Axis for Masculinity&lt;/h2&gt;
&lt;p&gt;Masculine cultures usually uphold values like assertiveness and competition. Alternatively, feminine cultures uphold values such as modesty and kindness. Interestingly, women&apos;s values differ less, regardless of whether the society is feminine or masculine. On the other hand, men&apos;s values can sway much more depending on their cultural identity.&lt;/p&gt;
&lt;p&gt;For example, the women in feminine societies are quite modest and caring. In masculine societies, they are somewhat assertive and competitive, but not as much as the male population. Whereas, the men tend to be quite modest and caring in feminine societies, and they tend to be quite competitive and ambitious in masculine societies. Cultures that are more masculine tend to have the following traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Large differences of social and emotional roles between genders&lt;/li&gt;
&lt;li&gt;Men should be assertive and ambitious&lt;/li&gt;
&lt;li&gt;Women can be assertive and ambitious&lt;/li&gt;
&lt;li&gt;Work takes a higher priority than family&lt;/li&gt;
&lt;li&gt;Strength is an admirable quality&lt;/li&gt;
&lt;li&gt;Fathers tend to deal with logistics and facts&lt;/li&gt;
&lt;li&gt;Mothers deal with emotions and relationships&lt;/li&gt;
&lt;li&gt;Women shouldn&apos;t fight physically, but men may fight physically&lt;/li&gt;
&lt;li&gt;Women cry, whereas men don&apos;t cry&lt;/li&gt;
&lt;li&gt;Fathers decide on the family size&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notice, the qualities found in a masculine culture tend to be synonymous with assertive and competitive values. These values in a masculine culture are often taboo, implying these values are deeply rooted, an unconscious part of a culture mostly, and rarely talked about. Feminine cultures don&apos;t tend to have such taboos, and they tend to have the following traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small differences of social and emotional roles between genders&lt;/li&gt;
&lt;li&gt;Men and women should be modest and caring&lt;/li&gt;
&lt;li&gt;There is a balance between work and family&lt;/li&gt;
&lt;li&gt;There is sympathy for the weaker part of society&lt;/li&gt;
&lt;li&gt;Both fathers and mothers deal with facts and feelings&lt;/li&gt;
&lt;li&gt;Both men and women may cry, but neither should fight&lt;/li&gt;
&lt;li&gt;Mothers decide on number of children&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;/dd58d7777ff1305adcd0dcfd9f456704/masculine.svg&quot; alt=&quot;MasculinityScale&quot;&gt;&lt;/p&gt;
&lt;p&gt;In general, masculine cultures include Japan, German-speaking countries, and some Latin countries, such as Italy and Mexico. English-speaking western countries tend to have a moderately high masculinity index as well. Conversely, Nordic countries, the Netherlands, and some Latin and Asian countries, such as France, Spain, and Korea, have a low masculinity index. The United States of America is fairly masculine, since it&apos;s masculinity score is in the top 20 of all ranked countries. Refer to &lt;a href=&quot;https://e-edu.nbu.bg/pluginfile.php/900222/mod_resource/content/1/G.Hofstede_G.J.Hofstede_M.Minkov%20-%20Cultures%20and%20Organizations%20-%20Software%20of%20the%20Mind%203rd_edition%202010.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hofstede&apos;s Book about Cultures and Organizations&lt;/a&gt; for more detailed examples about high and low masculine cultures.&lt;/p&gt;
&lt;h2&gt;Illustrating the Axis for Uncertainty Avoidance&lt;/h2&gt;
&lt;p&gt;Societies that avoid uncertainty feel uncomfortable in unstructured situations, where they have less control over the situation. Specifically, uncertainty avoidance indicates the extent to which a culture programs its members to feel either uncomfortable or comfortable in unstructured situations. Usually, &lt;em&gt;unstructured situations&lt;/em&gt; just refer to any unknown situation that is different than the usual situation some person is comfortable with. Someone in a culture with high uncertainty avoidance is not as willing to step outside of his or her comfort zone, whereas someone in a culture with low uncertainty avoidance is more willing to step outside of his or her comfort zone. Specifically, a culture with high uncertainty avoidance has the following traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Uncertainty in life is thought of as a continuous threat&lt;/li&gt;
&lt;li&gt;Uncertainty in life should be avoided&lt;/li&gt;
&lt;li&gt;There is higher stress, anxiety, and emotion&lt;/li&gt;
&lt;li&gt;They are less healthy subjectively&lt;/li&gt;
&lt;li&gt;Different ideas are less tolerable and dangerous&lt;/li&gt;
&lt;li&gt;There is a need for structure&lt;/li&gt;
&lt;li&gt;There is a search for guidance&lt;/li&gt;
&lt;li&gt;They stay in jobs, even if a job is disliked&lt;/li&gt;
&lt;li&gt;There is an emotional need for rules&lt;/li&gt;
&lt;li&gt;Religions are used to find ultimate truths and grand theories&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Interestingly, uncertainty avoiding cultures are more emotional and motivated by inner nervous energy. On the other hand, uncertainty accepting cultures are more tolerant of different opinions and ideas. Uncertainty accepting cultures have the following traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They believe uncertainty is an inherent part of life&lt;/li&gt;
&lt;li&gt;Uncertainty should be accepted and welcomed&lt;/li&gt;
&lt;li&gt;They maintain higher levels of self-control and ease&lt;/li&gt;
&lt;li&gt;They maintain lower levels of anxiety and stress&lt;/li&gt;
&lt;li&gt;They are tolerant of new ideas and opinions&lt;/li&gt;
&lt;li&gt;They are comfortable with ambiguity&lt;/li&gt;
&lt;li&gt;They don&apos;t require known answers from guidance&lt;/li&gt;
&lt;li&gt;Changing a jobs isn&apos;t a problem&lt;/li&gt;
&lt;li&gt;There is a dislike of written and subliminal rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;/54032e074d368132b208a185751f5ccf/uncertaintyavoidance.svg&quot; alt=&quot;UncertaintyAvoidanceScale&quot;&gt;&lt;/p&gt;
&lt;p&gt;Generally, east and central American countries tend to have a higher uncertainty avoidance index. On the other hand, Chinese and English-speaking cultures tend to have a lower uncertainty avoidance index. Specifically, the United States is on the tail end of the uncertainty avoiding countries. Meaning, the US tends to accept uncertainty compared to other countries. Refer to &lt;a href=&quot;https://e-edu.nbu.bg/pluginfile.php/900222/mod_resource/content/1/G.Hofstede_G.J.Hofstede_M.Minkov%20-%20Cultures%20and%20Organizations%20-%20Software%20of%20the%20Mind%203rd_edition%202010.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hofstede&apos;s Book about Cultures and Organizations&lt;/a&gt; for more detailed examples about uncertainty avoiding and accepting cultures.&lt;/p&gt;
&lt;h2&gt;Illustrating the Axis for Short-Term Orientation&lt;/h2&gt;
&lt;p&gt;Long-term oriented societies emphasize long-term thinking and investments, whereas short-term oriented societies emphasize short-term thinking and enjoyment. Long-term oriented cultures maintain values, such as perseverance, thrift, ordering relationships by status, and having a sense of shame. On the other hand, short-term oriented cultures uphold, such as respect for tradition, reciprocating social obligations, and personal
steadiness and stability. The following are traits in a long-term oriented society:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They believe the most important events in life will occur in the future&lt;/li&gt;
&lt;li&gt;A good person adapts to mixed circumstances&lt;/li&gt;
&lt;li&gt;Knowing what&apos;s right and wrong depends on the scenario&lt;/li&gt;
&lt;li&gt;Traditions are adaptable to changing circumstances&lt;/li&gt;
&lt;li&gt;Perseverance is an important goal&lt;/li&gt;
&lt;li&gt;Large savings are available for investment&lt;/li&gt;
&lt;li&gt;Students believe effort leads to success&lt;/li&gt;
&lt;li&gt;Economic growth is fast&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On the other hand, short-term oriented cultures have the following traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The most important events in life happen now&lt;/li&gt;
&lt;li&gt;A good person will always be good&lt;/li&gt;
&lt;li&gt;There are universal guidelines outlining what is good and evil&lt;/li&gt;
&lt;li&gt;Traditions are sacrosanct&lt;/li&gt;
&lt;li&gt;Family life is guided by imperatives&lt;/li&gt;
&lt;li&gt;They&apos;re supposed to be proud of one&apos;s country&lt;/li&gt;
&lt;li&gt;Service to others is an important goal&lt;/li&gt;
&lt;li&gt;Students attribute success and failure to luck&lt;/li&gt;
&lt;li&gt;Economic growth is slow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;/953525b25ccc61184660b6cfa8a85554/longterm.svg&quot; alt=&quot;LongTermScale&quot;&gt;&lt;/p&gt;
&lt;p&gt;Long-term oriented countries include some Asian countries and central European countries, whereas short-term oriented countries include Middle Eastern and African countries. Specifically, the United States of America leans more towards being a short-term oriented country, with a long-term oriented index in the bottom 20 countries. Refer to &lt;a href=&quot;https://e-edu.nbu.bg/pluginfile.php/900222/mod_resource/content/1/G.Hofstede_G.J.Hofstede_M.Minkov%20-%20Cultures%20and%20Organizations%20-%20Software%20of%20the%20Mind%203rd_edition%202010.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hofstede&apos;s Book about Cultures and Organizations&lt;/a&gt; for more detailed examples about long-term and short-term oriented cultures.&lt;/p&gt;
&lt;h2&gt;Illustrating the Axis for Indulgance&lt;/h2&gt;
&lt;p&gt;Roughly, indulgant societies represent societies that allow relatively free gratification of natural human desires related to enjoying life and having fun. Alternatively, restrained societies regulate the level of gratification by means of deep-rooted social norms. The following are traits in an indulgant society:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many people declare themselves as happy&lt;/li&gt;
&lt;li&gt;There is a higher importance of leisure&lt;/li&gt;
&lt;li&gt;They&apos;re more likely to remember positive emotions&lt;/li&gt;
&lt;li&gt;Indulgant societies usually have more educated populations&lt;/li&gt;
&lt;li&gt;There are more people actively involved in sports&lt;/li&gt;
&lt;li&gt;More obese people when there is enough food&lt;/li&gt;
&lt;li&gt;There are lenient sexual norms in wealthier areas&lt;/li&gt;
&lt;li&gt;Maintaining societal order is not a priority&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On the other hand, restrained cultures have the following traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There are fewer people who are very happy&lt;/li&gt;
&lt;li&gt;There is a perception of helplessness&lt;/li&gt;
&lt;li&gt;Freedom of speech is not a primary concern&lt;/li&gt;
&lt;li&gt;There is a lower importance of leisure&lt;/li&gt;
&lt;li&gt;They are less likely to remember positive emotions&lt;/li&gt;
&lt;li&gt;Fewer people are actively involved in sports&lt;/li&gt;
&lt;li&gt;Not many obese people when there is enough food&lt;/li&gt;
&lt;li&gt;There are stricter sexual norms in wealtheir areas&lt;/li&gt;
&lt;li&gt;Higher number of police officers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;/75d00a915b64e834d1cc8ddfa36d5fe2/indulgance.svg&quot; alt=&quot;IndulganceScale&quot;&gt;&lt;/p&gt;
&lt;p&gt;In general, European and South American countries are more indulgant compared to other countries. On the other hand, central European countries are less indulgant compared to other countries. Specifically, the United State is ranked in the top 20 most indulgant countries. Refer to &lt;a href=&quot;https://e-edu.nbu.bg/pluginfile.php/900222/mod_resource/content/1/G.Hofstede_G.J.Hofstede_M.Minkov%20-%20Cultures%20and%20Organizations%20-%20Software%20of%20the%20Mind%203rd_edition%202010.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Hofstede&apos;s Book about Cultures and Organizations&lt;/a&gt; for more detailed examples about restrained and indulgant cultures.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Collider Bias and Police Use of Force]]></title><description><![CDATA[Over the last decade, I've grown up hearing and reading about police-related killings of unarmed black men, including Eric Garner, Michael Brown, Ronell Foster, George Floyd, and countless others…]]></description><link>https://dkharazi.github.io/blog/police</link><guid isPermaLink="false">https://dkharazi.github.io/blog/police</guid><pubDate>Thu, 20 May 2021 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Over the last decade, I&apos;ve grown up hearing and reading about police-related killings of unarmed black men, including Eric Garner, Michael Brown, Ronell Foster, George Floyd, and countless others. Naturally, there have been growing concerns about officers systematically discriminating against minorities, which has become one of the most important issues in modern-day policing. Unfortunately, this problem &lt;a href=&quot;https://en.wikipedia.org/wiki/Police_use_of_deadly_force_in_the_United_States&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;doesn&apos;t have such a straightforward causal answer&lt;/a&gt; for various reasons, if you&apos;re the ordinary layperson searching for clarity on the matter.&lt;/p&gt;
&lt;p&gt;First, causal inference about race and policing is difficult to study due to a lack of unbiased data. &lt;a href=&quot;https://scholar.harvard.edu/files/fryer/files/empirical_analysis_tables_figures.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Some of the most cited research papers&lt;/a&gt; about this topic use data provided by police forces, which suffers from confounding. For example, bias may be introduced if the data provided by polices forces are already engaging in racially-biased practices.&lt;/p&gt;
&lt;p&gt;Throughout the last few decades, Congress has recognized this problem involving a lack of data about race and policing. In 1994, Congress instructed the Attorney General to publish annual statistics on police use of excessive force. However, this was not carried out effectively. Since then, two national systems have been built that collect data including homicides committed by law enforcement officers: the &lt;a href=&quot;https://www.cdc.gov/nchs/nvss/index.htm&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;CDC&apos;s NVSS&lt;/a&gt; and the &lt;a href=&quot;https://www.fbi.gov/services/cjis/ucr/use-of-force&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;FBI&apos;s UCR&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The NVSS aggregates data from locally filed death certificates. However, the certificates don&apos;t document whether a killing is legally justified or not. Also, the certificates don&apos;t document whether a law enforcement officer is involved in a killing or not. On the other hand, the UCR system maintains data about police use of force. However, this relies on law enforcement agencies to submit a report voluntarily.&lt;/p&gt;
&lt;p&gt;Clearly, governmental entities haven&apos;t made much progress in collecting unbiased and comprehensive data for police use of force, which has led to more recent efforts of data collection made by crowd-sourced projects and research institutions.&lt;/p&gt;
&lt;h2&gt;The Search for Answers&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://scholar.harvard.edu/files/fryer/files/empirical_analysis_tables_figures.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Fryer&lt;/a&gt; collected data in several databases, which he hoped would shed some light on police patterns. Two were public-use data sets: the &lt;a href=&quot;https://www.nyclu.org/en/stop-and-frisk-data&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;New York City Stop-and-Frisk database&lt;/a&gt; and the &lt;a href=&quot;https://bjs.ojp.gov/data-collection/police-public-contact-survey-ppcs&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Police-Public Contact Survey&lt;/a&gt;. The first data set includes data about the NYPD&apos;s police stops and questioning of pedestrians, if officers decide to stop and frisk pedestrians for weapons or contraband. The surveys include data about civilians describing interactions with the police, including the use of force. For more information about Fryer&apos;s data collection methodologies, please refer to &lt;a href=&quot;https://mixtape.scunning.com/dag.html#collider-bias-and-police-use-of-force&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this text&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Fryer&apos;s findings have been quite controversial and politicized after his paper was published in 2016. In 2020, &lt;a href=&quot;https://scholar.princeton.edu/sites/default/files/jmummolo/files/klm.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;a Princeton study&lt;/a&gt; disputed the findings, claiming data about white pedestrians might not be the same as non-white pedestrians, if police had a higher threshold for stopping white pedestrians. In the same year, economists from the University of Chicago published &lt;a href=&quot;https://www.journals.uchicago.edu/doi/abs/10.1086/710976&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;a response to Fryer&apos;s study&lt;/a&gt;, stating Fryer&apos;s paper &lt;em&gt;&quot;doesn&apos;t establish credible evidence on the presence or absence of discrimination against Americans in police shootings&quot;&lt;/em&gt; due to selection bias. Eventually, Fryer published &lt;a href=&quot;https://www.wsj.com/articles/what-the-data-say-about-police-11592845959&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;a response in the WSJ&lt;/a&gt;, reaffirming his findings as unbiased and responding to those criticisms he had received. He pointed to other recent studies from the &lt;a href=&quot;https://injuryprevention.bmj.com/content/injuryprev/23/1/27.full.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Injury Prevention Journal&lt;/a&gt; and &lt;a href=&quot;https://policingequity.org/images/pdfs-doc/CPE_SoJ_Race-Arrests-UoF_2016-07-08-1130.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Center for Policing Equity&lt;/a&gt;, which have found similar results in comparison to Fryer&apos;s findings. We should note that they use the already criticized FBI data, however. Regardless, he acknowledges the existence of unchecked racial disparities today and in previous decades, and he believes his findings are enough to &lt;em&gt;&quot;justify sweeping reform.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;A Glimpse into Fryer&apos;s Findings&lt;/h2&gt;
&lt;p&gt;A few facts are especially important to highlight about Fryer&apos;s study. First, Fryer finds that blacks and Hispanics are more than &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;50&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;50 \%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; more likely to have an interaction with police involving any use of force. In his full model using the stop-and-frisk data, he finds that blacks are &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;21&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;21 \%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; more likely than whites to be involved in an interaction with police where a weapon is drawn. Note, this difference is statistically significant in the study.&lt;/p&gt;
&lt;p&gt;Once Fryer moves to the administrative data, he surprisingly finds that there are no racial differences in officer-involved shootings for police interactions. In fact, he finds blacks are &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;27&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;27 \%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; less likely to be shot at by police than non-black and non-hispanic suspects, after controlling for the following variables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Suspect demographics&lt;/li&gt;
&lt;li&gt;Officer demographics&lt;/li&gt;
&lt;li&gt;Encounter characteristics&lt;/li&gt;
&lt;li&gt;Suspect weapon&lt;/li&gt;
&lt;li&gt;Year fixed effects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note, this coefficient is measured with considerable error and is not statistically significant. In other words, Fryer is unable to use this data to find evidence for racial discrimination in officer-involved shootings.&lt;/p&gt;
&lt;h2&gt;Potential Issues with Collider Conditioning&lt;/h2&gt;
&lt;p&gt;As I mentioned already, Fryer&apos;s study generated skepticism amongst some researchers, including &lt;a href=&quot;https://polmeth.mit.edu/sites/default/files/documents/Jonathan_Mummolo.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Dean Knox&lt;/a&gt; and &lt;a href=&quot;https://mixtape.scunning.com/dag.html#collider-bias-and-police-use-of-force&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Scott Cunningham&lt;/a&gt;. They suggest the administrative data sources potentially are endogenous because of conditioning on a collider. If this is true, then the administrative data itself may include a racial bias.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/cbca5778f36e6052ef9e651302e5941f/policecolliderbias.svg&quot; alt=&quot;policecolliderbias&quot;&gt;&lt;/p&gt;
&lt;p&gt;Fryer&apos;s study finds that minorities are more likely to be stopped in both the stop-and-frisk data and the Police-Public Contact Survey data. He introduces many controls for &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;X&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07847em;&quot;&gt;X&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; in the above DAG. Meaning, he&apos;s captured and controlled for hundreds of variables relating to the use of force &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;Y&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.22222em;&quot;&gt;Y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, which include the nature of a police interaction, time of day, etc. By controlling for &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;X&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07847em;&quot;&gt;X&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, Fryer is able to close any backdoor path stemming from &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;X&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07847em;&quot;&gt;X&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Most importantly, &lt;a href=&quot;https://polmeth.mit.edu/sites/default/files/documents/Jonathan_Mummolo.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Knox&lt;/a&gt; and &lt;a href=&quot;https://mixtape.scunning.com/dag.html#collider-bias-and-police-use-of-force&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Cunningham&lt;/a&gt; point out the presence of collider bias by focusing on an officer&apos;s stop &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. Since the administrative data only includes data about police-initiated stops, each observation is &lt;em&gt;conditional&lt;/em&gt; on a stop. Fryer acknowledges this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Unless otherwise noted, all results are conditional on an interaction. Understanding potential selection into police data sets due to bias in who police interacts with is a difficult endeavor.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Conditioning on the stop variable &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; opens up a &lt;em&gt;mediated path&lt;/em&gt; &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;D&lt;/mi&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mo&gt;←&lt;/mo&gt;&lt;mi&gt;U&lt;/mi&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;D \to M \gets U \to Y&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;←&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.22222em;&quot;&gt;Y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. If there is discrimination happening in &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;D&lt;/mi&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;D \to M&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, spurious correlations could be created and any causal relationship between police and racial differences in shootings may be inaccurate. In other words, if there isn&apos;t racial discrimination in who the officier stops, then the correlations observed in the administrative data aren&apos;t spurious and instead induced by causal effect. Meaning, it would be fine to use this administrative data if this assumption is true. However, it would be a mistake to assume there isn&apos;t discrimination happening here, especially in a study about discrimination. As a result, we probably shouldn&apos;t make this assumption, which is in fact made in Fryer&apos;s study.&lt;/p&gt;
&lt;p&gt;To help illustrate this point, maybe officers have a higher threshold for feeling the behavior of a white civilian is suspicious, compared to the their threshold for feeling the behavior of a minority is suspicious. Consequently, the pool of stopped white civilians would become oversaturated with more truly suspicious civilians, whereas the pool of stopped minority civilians would become undersaturated with less truly suspicious civilians. Therefore, Fryer&apos;s data first must include this missing information in order to make any accurate conclusions about the causal relationship between police and racial differences in shootings.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;https://www.youtube.com/watch?v=Kdj81skxlWM&amp;#x26;t=286s&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;a presentation&lt;/a&gt; at the National Institute of Statistical Sciences, Knox illustrates his point with a convincing visual provided below. He breaks down an example where a higher percentage of minority civilians are shot by officers compared to white civilians, relative to the total number of civilians in each racial community. However, by shifting our focus to stops only, a higher percentage of white civilians appear to be shot compared to minority civilians. Intuitively, this could look like more officers stopping minorities for something as mild as jaywalking, whereas white civilians are stopped only when there is more truly suspicious behavior, such as pickpocketing or robbery.&lt;/p&gt;
&lt;p&gt;Specifically, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;8&lt;/mn&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\frac{1}{8}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.190108em;vertical-align:-0.345em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of all white civilians are shot by officers, compared to &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;4&lt;/mn&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\frac{1}{4}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.190108em;vertical-align:-0.345em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of all minority civilians are shot by officers. But, by shifting our focus to stops, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\frac{1}{2}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.190108em;vertical-align:-0.345em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of stopped white civilians are shot by officers, compared to &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;3&lt;/mn&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\frac{1}{3}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.190108em;vertical-align:-0.345em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of stopped minority civilians. According to Knox, our conclusions could be reversed entirely depending on our analysis. In particular, Fryer&apos;s study naively concludes that there&apos;s anti-white bias because:&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mtext&gt;w.s.f.&lt;/mtext&gt;&lt;mrow&gt;&lt;mtext&gt;w.s.o. &lt;/mtext&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mtext&gt; w.s.f.&lt;/mtext&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;mo&gt;&amp;gt;&lt;/mo&gt;&lt;mfrac&gt;&lt;mtext&gt;m.s.f.&lt;/mtext&gt;&lt;mrow&gt;&lt;mtext&gt;m.s.o. &lt;/mtext&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mtext&gt; m.s.f.&lt;/mtext&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\frac{\text{w.s.f.}}{\text{w.s.o. } + \text{ w.s.f.}} &amp;gt; \frac{\text{m.s.f.}}{\text{m.s.o. } + \text{ m.s.f.}}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:2.14077em;vertical-align:-0.7693300000000001em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.37144em;&quot;&gt;&lt;span style=&quot;top:-2.314em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;w.s.o. &lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt; w.s.f.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;w.s.f.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7693300000000001em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:2.14077em;vertical-align:-0.7693300000000001em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.37144em;&quot;&gt;&lt;span style=&quot;top:-2.314em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;m.s.o. &lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt; m.s.f.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;m.s.f.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7693300000000001em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;In order to make a conclusion about racial discrimination and policing for the entire race, Knox believes a more fair comparison would require expanding this analysis to also include non-stops for each race as well. Otherwise, Fryer&apos;s study (and others) can lead to dramatic underestimates of bias in force.&lt;/p&gt;
&lt;p&gt;After correcting for this bias, Knox finds evidence that there is anti-minority bias by a large margin &lt;a href=&quot;https://youtu.be/Kdj81skxlWM?t=1243&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;his own analysis&lt;/a&gt;. Specifically, Knox develops a method for correcting this bias, which places bounds on the severity of the selection problems. After applying this correction using this bounding approach, they find that even lower-bound estimates of the incidence of police violence against civilians is as much as &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;5&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;5&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; times higher than a traditional approach that ignores the sample selection problem altogether.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/39b26f3cbccd705afe159ed2af023981/policecompositionbias.svg&quot; alt=&quot;policecolliderbias&quot;&gt;&lt;/p&gt;
&lt;p&gt;For more information about Knox&apos;s study, refer to &lt;a href=&quot;ttps://polmeth.mit.edu/sites/default/files/documents/Jonathan_Mummolo.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;his paper&lt;/a&gt; for a more detailed analysis with his findings, or watch &lt;a href=&quot;https://www.youtube.com/watch?v=Kdj81skxlWM&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;his presentation&lt;/a&gt; about his findings at a NISS conference. For additional illustrations about collider bias in Fryer&apos;s study, refer to &lt;a href=&quot;https://arxiv.org/pdf/2007.08406.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this paper&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;For additional high-level details and illustrations about the presence of collider bias in studies about policing and racial discrimination, refer to &lt;a href=&quot;https://fivethirtyeight.com/features/why-statistics-dont-capture-the-full-extent-of-the-systemic-bias-in-policing/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this FiveThirtyEight article&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Gram Matrices in Neural Style Transfer]]></title><description><![CDATA[In this paper, it has been shown that matching the Gram matrices of feature maps is equivalent to minimizing the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, the paper…]]></description><link>https://dkharazi.github.io/blog/gram</link><guid isPermaLink="false">https://dkharazi.github.io/blog/gram</guid><pubDate>Mon, 10 May 2021 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In &lt;a href=&quot;https://arxiv.org/abs/1701.01036&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this paper&lt;/a&gt;, it has been shown that matching the Gram matrices of feature maps is equivalent to minimizing the &lt;a href=&quot;https://papers.nips.cc/paper/2016/file/5055cbf43fac3f7e2336b27310f0b9ef-Paper.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Maximum Mean Discrepancy (MMD)&lt;/a&gt; with the second order polynomial kernel. Thus, the paper argues that the essence of neural style transfer is to generate a new image from white noise by matching the neural activations with the content image and the Gram matrices with the style image.&lt;/p&gt;
&lt;p&gt;The original algorithm for neural style transfer used a cost function that minimized the sum of the content loss and the style loss. Here, the content loss represented the difference in content between the content image and our generated image. And, the style loss represented the difference in style between the style image and our generated image.&lt;/p&gt;
&lt;p&gt;The style loss function uses the gram matrix. Specifically, the style loss represents the normalized, squared difference between the gram matrix of the style image and the gram matrix of the generated image. The gram matrix function cares about some aspects between two images, but it doesn&apos;t care about the specific presence or location of features within an image.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the original paper for neural syle transfer&lt;/a&gt;, a new, generated image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;∗&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{*}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.688696em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.688696em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;∗&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is iteratively created by optimizing a content loss and style loss, given by the following formula. Here, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; are the individual losses, and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;α&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\alpha&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.0037em;&quot;&gt;α&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;β&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\beta&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8888799999999999em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05278em;&quot;&gt;β&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; are the weights for content and style losses:&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;α&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;β&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{gen} = \alpha L_{content} + \beta L_{style}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.969438em;vertical-align:-0.286108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15139200000000003em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.83333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.0037em;&quot;&gt;α&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.2805559999999999em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.980548em;vertical-align:-0.286108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05278em;&quot;&gt;β&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3361079999999999em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Mathematically, we can see that the losses of the generated image are just a weighted combination of the style and content losses. Here, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{content}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.83333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.2805559999999999em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is defined by the squared error between the feature maps of a specific layer &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;l&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; for the the generated image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;∗&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{*}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.688696em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.688696em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;∗&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and the content image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{c}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.664392em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.664392em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;:&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/mfrac&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;msub&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;/munderover&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;/munderover&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;msup&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{content} = \frac{1}{2} \sum_{i=1}^{N_{l}} \sum_{j=1}^{M_{l}} (F_{ij}^{l} - P_{ij}^{l})^{2}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.83333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.2805559999999999em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:3.258973em;vertical-align:-1.4137769999999998em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.32144em;&quot;&gt;&lt;span style=&quot;top:-2.314em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.686em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.8451960000000003em;&quot;&gt;&lt;span style=&quot;top:-1.872331em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.050005em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-4.316865em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3487714285714287em;margin-left:-0.10903em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15122857142857138em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.277669em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.8451960000000005em;&quot;&gt;&lt;span style=&quot;top:-1.872331em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.050005em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-4.316865em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3487714285714287em;margin-left:-0.10903em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15122857142857138em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.4137769999999998em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.13889em;&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:-0.13889em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.282216em;vertical-align:-0.383108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.13889em;&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:-0.13889em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8641079999999999em;&quot;&gt;&lt;span style=&quot;top:-3.113em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Here, the feature maps of &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;∗&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{*}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.688696em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.688696em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;∗&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{c}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.664392em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.664392em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{s}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.664392em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.664392em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; in the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;l^{th}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.849108em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;h&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; layer of a CNN are denoted by &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;F^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.849108em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.13889em;&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;P^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.849108em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.13889em;&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;S^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.849108em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05764em;&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, respectively. Thus, the loss of the content image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{content}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.83333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.2805559999999999em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; represents some combination of the feature maps of the generated image and content image. The loss of the style image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{style}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.969438em;vertical-align:-0.286108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3361079999999999em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is defined as the sum of several style losses &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{style}^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.2683239999999998em;vertical-align:-0.4192159999999999em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-2.4168920000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.4192159999999999em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; from different layers:&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;munder&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/munder&gt;&lt;msub&gt;&lt;mi&gt;w&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;msubsup&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{style} = \sum_{l} w_{l} L_{style}^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.969438em;vertical-align:-0.286108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3361079999999999em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:2.3521180000000004em;vertical-align:-1.3021129999999999em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.0500050000000005em;&quot;&gt;&lt;span style=&quot;top:-1.847887em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.050005em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.3021129999999999em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02691em;&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.33610799999999996em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Here, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;w&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;w_{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.58056em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02691em;&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.33610799999999996em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:-0.02691em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is the weight of the loss in the layer &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;l&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, and the loss of the style image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{style}^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.2683239999999998em;vertical-align:-0.4192159999999999em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-2.4168920000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.4192159999999999em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is defined by the squared error between the feature correlations expressed by Gram matrices of the generated image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;∗&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{*}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.688696em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.688696em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;∗&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and the style image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{s}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.664392em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.664392em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, where the Gram matrix &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;G&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;G^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.849108em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;G&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is just the inner product between the vectorized feature maps of the generated image &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mo lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;∗&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x^{*}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.688696em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.688696em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;∗&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; in the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;l^{th}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.849108em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.849108em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;h&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; layer.&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;y&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mrow&gt;&lt;mn&gt;4&lt;/mn&gt;&lt;msubsup&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;msubsup&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;msub&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;/munderover&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;msub&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;/munderover&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;G&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;A&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;msup&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;L_{style}^{l} = \frac{1}{4 N_{l}^{2} M_{l}^{2}} \sum_{i=1}^{N_{l}} \sum_{j=1}^{N_{l}} (G_{ij}^{l} - A_{ij}^{l})^{2}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.282216em;vertical-align:-0.383108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;L&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:3.258973em;vertical-align:-1.4137769999999998em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.32144em;&quot;&gt;&lt;span style=&quot;top:-2.314em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7959079999999998em;&quot;&gt;&lt;span style=&quot;top:-2.398692em;margin-left:-0.10903em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.0448em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.30130799999999996em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7959079999999998em;&quot;&gt;&lt;span style=&quot;top:-2.398692em;margin-left:-0.10903em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.0448em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.30130799999999996em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.9873080000000001em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.8451960000000003em;&quot;&gt;&lt;span style=&quot;top:-1.872331em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.050005em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-4.316865em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3487714285714287em;margin-left:-0.10903em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15122857142857138em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.277669em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.8451960000000005em;&quot;&gt;&lt;span style=&quot;top:-1.872331em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.050005em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-4.316865em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3487714285714287em;margin-left:-0.10903em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15122857142857138em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.4137769999999998em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;G&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.282216em;vertical-align:-0.383108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8641079999999999em;&quot;&gt;&lt;span style=&quot;top:-3.113em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;G&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;/munderover&gt;&lt;msubsup&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;msubsup&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;G_{ij}^{l} = \sum_{k=1}^{M_{l}} F_{ik}^{l} F_{jk}^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.282216em;vertical-align:-0.383108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;G&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:3.147309em;vertical-align:-1.302113em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.845196em;&quot;&gt;&lt;span style=&quot;top:-1.8478869999999998em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.0500049999999996em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-4.316865em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3487714285714287em;margin-left:-0.10903em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15122857142857138em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.302113em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.13889em;&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8991079999999998em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:-0.13889em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.113em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.247em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.13889em;&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:-0.13889em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;A&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;/munderover&gt;&lt;msubsup&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;msubsup&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;A_{ij}^{l} = \sum_{k=1}^{M_{l}} S_{ik}^{l} S_{jk}^{l}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.282216em;vertical-align:-0.383108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:3.147309em;vertical-align:-1.302113em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.845196em;&quot;&gt;&lt;span style=&quot;top:-1.8478869999999998em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.0500049999999996em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-4.316865em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3487714285714287em;margin-left:-0.10903em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15122857142857138em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.302113em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05764em;&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8991079999999998em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:-0.05764em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.113em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.247em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05764em;&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.899108em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-left:-0.05764em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1130000000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.383108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;</content:encoded></item><item><title><![CDATA[An Argument against Meat]]></title><description><![CDATA[Over this last year, I've grown to enjoy plant-based foods, while becoming convinced agricultural biotechnology will be increasingly important over the next decade as well. I think most people have…]]></description><link>https://dkharazi.github.io/blog/meat</link><guid isPermaLink="false">https://dkharazi.github.io/blog/meat</guid><pubDate>Mon, 12 Apr 2021 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Over this last year, I&apos;ve grown to enjoy plant-based foods, while becoming convinced agricultural biotechnology will be increasingly important over the next decade as well. I think most people have known this need exists for livestock farming specifically, based on ethics alone. Recently, I&apos;ve learned about the importance of advances in this area for other reasons as well. Below, I&apos;ve listed some of these major reasons, mainly for my own knowledge.&lt;/p&gt;
&lt;h2&gt;Economics of Livestock Production&lt;/h2&gt;
&lt;p&gt;Maybe, you&apos;re someone who just appreciates efficient, well-functioning processes. In particular, you might be more interested in looking at this problem through an economic lens and learning about the sustainability of livestock production. So, let&apos;s look at the current state of animal agriculture purely from the perspective of efficiency.&lt;/p&gt;
&lt;p&gt;Animals are inefficient converters of food, but how inefficient are they? A common metric used for measuring how much food is lost by converting animals to food is the &lt;a href=&quot;https://awellfedworld.org/feed-ratios/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;feed conversion ratio (FCR)&lt;/a&gt;. Typically, the FCR measures the amount of calories needed to produce one calorie of meat. Beef is one of the least efficient meats to produce. &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;25&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;25&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; calories is required to create just &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; calorie of beef. The ratio of pork is closer to &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;15&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;15&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;-to-&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. Chicken is the most efficient meat for converting into food, and it still requires &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;9&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;9&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; calories of input to produce just &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; calorie of food.&lt;/p&gt;
&lt;p&gt;Essentially, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;800&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;800\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of what we&apos;re growing is thrown away. So, eating a plate of chicken can be thought of as simultaneously throwing out eight plates of pasta. For more information about the inefficiencies behind today&apos;s livestock farming practices, refer to &lt;a href=&quot;https://cbey.yale.edu/our-stories/disrupting-meat&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this blog post by Yale&apos;s Center for Business and the Environment&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Health Effects from Livestock Farming&lt;/h2&gt;
&lt;p&gt;In 2015, the WHO called antimicrobial resistance &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4638249/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;an increasingly serious threat to global public health&lt;/a&gt;. More than 80% of all antibiotics sold in the United States are being fed to farm animals. Eventually, our increasingly frequent and routine consumption of meat will drive antibiotic resistance.&lt;/p&gt;
&lt;p&gt;Antibiotics are administered to animals in feed to marginally improve growth rates and to prevent infections. In humans, there is growing evidence that antibiotic resistance is promoted by the widespread use of non-therapeutic antibiotics in animals. Resistant bacteria are transmitted to humans through direct contact with animals, which could include exposure to animal manure, consumption of undercooked meat, or contact with surfaces touched by uncooked meat.&lt;/p&gt;
&lt;p&gt;Relevant to the recent pandemic, many previous pandemics have been caused by animal agriculture. For example, the H5N1 bird flu outbreak, which has a fatality rate of &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;60&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;60\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, &lt;a href=&quot;https://www.cdc.gov/flu/avianflu/h5n1-virus.htm&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;originated in Chinese chicken farms&lt;/a&gt; in &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1997&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1997&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. In &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2009&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2009&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, the H1N1 swine flu outbreak likely &lt;a href=&quot;https://www.cdc.gov/h1n1flu/information_h1n1_virus_qa.htm&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;originated in a pig confinement operation&lt;/a&gt; in North Carolina. In &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2015&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2015&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, the H5N2 bird flu led American poultry farmers to kill &lt;a href=&quot;https://www.cdc.gov/mmwr/preview/mmwrhtml/mm6404a9.htm&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;tens of millions of birds&lt;/a&gt; containing the outbreak. Scientists expect the next pandemic to originate from a similar farming source.&lt;/p&gt;
&lt;p&gt;For more basic nutritional value, there are healthier plant-based sources of protein, which include tofu, tempeh, lentils, and beans. &lt;a href=&quot;https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/2748453&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Some studies&lt;/a&gt;, have linked plant-based diets with a lower risk of cardiovascular disease, along with a greater source of fiber and prebiotics, which improve the health of your gut.&lt;/p&gt;
&lt;h2&gt;Climate and Deforestation Impact from Animal Agriculture&lt;/h2&gt;
&lt;p&gt;Regarding climate change, raising livestock generates &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;14.5&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;14.5\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of global greenhouse gas emissions, where &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;92&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;92\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of the fresh water is used for farming purposes. For a more detailed overview of the effects of animal agriculture on climate, refer to &lt;a href=&quot;https://www.europarl.europa.eu/climatechange/doc/FAO%20report%20executive%20summary.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this paper&lt;/a&gt; published by the Livestock, Environment, and Development (LEAD) Initiative.&lt;/p&gt;
&lt;p&gt;By far, research suggests the expansion of pasture land for beef production is the biggest reason for deforestation in the Brazilian Amazon, which accounts for nearly half of forest loss in the Brazilian Amazon in &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2013&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2013&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. Commercial crops, such as soy, contributed roughly &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;7&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;7\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of forest loss in the Brazilian Amazon in &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2013&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2013&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Additionally, research suggests the large majority of global soy production is used to feed animals that will feed us later on. In a study focused on the allocation of global soy production from &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2017&lt;/mn&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mn&gt;2019&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2017-2019&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.72777em;vertical-align:-0.08333em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, around &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;77&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;77\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of all soy was used for animal feed, where nearly all of this animal feed is used in livestock farming. Only &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;7&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;7\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of global soy production is used in direct human foods like tofu, soy milk, and others. For more details about common drivers of soy consumptions, &lt;a href=&quot;https://ourworldindata.org/soy&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this paper&lt;/a&gt; published by researchers at Oxford University.&lt;/p&gt;
&lt;h2&gt;Ethics of Livestock Farming&lt;/h2&gt;
&lt;p&gt;Maybe, you&apos;re someone who just wants to learn about the treatment of a large majority of chickens. You might be more interested in looking at this problem through an ethical lens and learning about the health and wellness of livestock. So, let&apos;s look at the current state of animal agriculture purely from a moral perspective.&lt;/p&gt;
&lt;p&gt;Since &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1952&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1952&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, broiler production has been the prominent source of chicken meat, where &lt;a href=&quot;https://www.sentienceinstitute.org/us-factory-farming-estimates#ftnt2&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;99.9% of chickens are raised as broiler chickens&lt;/a&gt; in recent years. In &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2018&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2018&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, this increase in production amounted to more than &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;9&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;9&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; billion broiler chickens produced in the United States. Today, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;90&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;90\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.80556em;vertical-align:-0.05556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of all broiler chickens are produced by independent farmers working under contract with integrated chicken production and processing companies, according to the &lt;a href=&quot;https://www.nationalchickencouncil.org/industry-issues/vertical-integration/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;National Chicken Council&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Over the past few decades, poultry breeding companies have repeatedly bred Cornish Cross strains to grow more white meat in the breast with less feed. So, a typical broiler chicken grows &lt;a href=&quot;https://www.nationalchickencouncil.org/about-the-industry/statistics/u-s-broiler-performance/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;twice as fast and twice as large on half the feed&lt;/a&gt; compared to a broiler chicken from 70 years ago.&lt;/p&gt;
&lt;p&gt;As a consequence, genetically engineering broiler chickens has caused their &lt;a href=&quot;https://www.humanesociety.org/sites/default/files/docs/hsus-report-welfare-chicken-industry.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;growth in weight to outpace their skeletal system and organ development&lt;/a&gt;. Because they put on weight so fast now, the birds’ legs and frame cannot support their bloated breasts, so they are prone to deformities. Many broiler chickens are barely able to move even at a few weeks old, &lt;a href=&quot;https://www.aspca.org/sites/default/files/chix_white_paper_nov2015_lores.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;due to their weight and deformities&lt;/a&gt;, causing them to sit and eat for most of their life. According to researchers at the University of Arkansas, these chickens grow at a rate equivalent to a two-month-old human baby weighing &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;660&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;660&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; pounds.&lt;/p&gt;
&lt;p&gt;Today, farms typically feed broiler chickens preventative antibiotics, creating a vicious cycle that allows them to perpetuate substandard conditions. For a more comprehensive article covering the consequences of breeding broiler chickens, refer to &lt;a href=&quot;https://civileats.com/2019/05/28/the-race-to-produce-a-slower-growing-chicken/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this story about the modern broiler chicken industry&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Understanding the Glicko Rating System]]></title><description><![CDATA[In 2007, Microsoft released their TrueSkill 1 paper, which essentially was a modified implentation of the Elo system. At the time, most of their games used Trueskill 1 as a player ranking system…]]></description><link>https://dkharazi.github.io/blog/elo</link><guid isPermaLink="false">https://dkharazi.github.io/blog/elo</guid><pubDate>Sat, 07 Nov 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In 2007, Microsoft released their &lt;a href=&quot;https://www.microsoft.com/en-us/research/wp-content/uploads/2007/01/NIPS2006_0688.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;TrueSkill 1 paper&lt;/a&gt;, which essentially was a modified implentation of the Elo system. At the time, most of their games used Trueskill 1 as a player ranking system. Later on, the &lt;a href=&quot;https://www.microsoft.com/en-us/research/publication/trueskill-2-improved-bayesian-skill-rating-system/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Trueskill 2 paper&lt;/a&gt; was released, which replaced their Trueskill 1 ranking system in most of their Xbox games, such as Halo.&lt;/p&gt;
&lt;p&gt;Both Trueskill 1 and Trueskill 2 treat a player&apos;s ranking as a distribution, rather than a single summary statistic. To achieve this, both ranking systems use a principled Bayesian framework. As a result, the Trueskill algorithm considers the fact that all players play differently depending on their individual circumstances.&lt;/p&gt;
&lt;h2&gt;Comparing Elo with Glicko&lt;/h2&gt;
&lt;p&gt;In 1960, the Elo system was invented and used as an improved chess-rating system. The Elo system is a method used for calculating the relative skill levels of players in zero-sum games, such as chess. At the time, the Elo system became very popular in a broad range of other sports, such as basketball and football. In 1995, the &lt;a href=&quot;https://en.wikipedia.org/wiki/Glicko_rating_system&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Glicko&lt;/a&gt; system was invented to improve the Elo system by introducing a &lt;em&gt;rate volatility&lt;/em&gt; measure. Afterward, it became a popularly adopted rating system in most well-known sports. For a more detailed explanation of the Glicko system, refer to its &lt;a href=&quot;http://www.glicko.net/glicko/glicko.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;original paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Trueskill rating systems borrow many ideas from Glicko, but also includes a measure of match quality between a set of players. There are a few other mathematical differences within the paper, and a few differences between the Trueskill 1 and Trueskill 2 rating systems. However, they share more similarities than differences. For the remainder of this post, I&apos;ll be focused on illustrating the intuition behind the Glicko algorithm.&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;min&lt;/mi&gt;&lt;mo&gt;⁡&lt;/mo&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;msqrt&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mtext&gt;old&lt;/mtext&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msup&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/msqrt&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;mn&gt;350&lt;/mn&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD} = \min(\sqrt{\text{RD}^{2}_{\text{old}} + c^{2}}, 350)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.8399999999999999em;vertical-align:-0.48595599999999994em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord sqrt&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.354044em;&quot;&gt;&lt;span class=&quot;svg-align&quot; style=&quot;top:-3.8em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.8em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot; style=&quot;padding-left:1em;&quot;&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.887338em;&quot;&gt;&lt;span style=&quot;top:-2.4530000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord text mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;old&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.1362300000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.247em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.740108em;&quot;&gt;&lt;span style=&quot;top:-2.9890000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.314044em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.8em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;hide-tail&quot; style=&quot;min-width:1.02em;height:1.8800000000000001em;&quot;&gt;&lt;svg width=&apos;400em&apos; height=&apos;1.8800000000000001em&apos; viewBox=&apos;0 0 400000 1944&apos; preserveAspectRatio=&apos;xMinYMin slice&apos;&gt;&lt;path d=&apos;M983 90
l0 -0
c4,-6.7,10,-10,18,-10 H400000v40
H1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7
s-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744
c-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30
c26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722
c56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5
c53.7,-170.3,84.5,-266.8,92.5,-289.5z
M1001 80h400000v40h-400000z&apos;/&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.48595599999999994em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;As stated previously, the Glicko system extends the Elo system by computing not only a rating, which can be thought of as a &lt;em&gt;best guess&lt;/em&gt; of one’s playing strength, but also a &lt;em&gt;ratings deviation&lt;/em&gt; &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. In statistical terminology, this &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; term represents a standard deviation, which measures the uncertainty of a rating. Thus, a high &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; corresponds to an unreliable rating, or that a player roughly has only competed in a small number of tournament games. Whereas, a low &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; indicates the player competes frequently.&lt;/p&gt;
&lt;h2&gt;Intuition behind Uncertainty of Ranking&lt;/h2&gt;
&lt;p&gt;In the Glicko system, a player&apos;s rating only changes based on their game outcomes, but players&apos; &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; changes based on their game outcomes and their time not playing. Thus, there are two features in the Glicko system that don&apos;t exist in the Elo system. First, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; increases as time passes without a player competing in many games. Second, if one player’s rating increases by &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, the opponent’s rating does not usually decrease by &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;x&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. In the Glicko system, the amount by which the opponent’s rating decreases is governed by both players’ &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; terms.&lt;/p&gt;
&lt;p&gt;To apply the rating algorithm, we treat a collection of games within a &lt;em&gt;rating period&lt;/em&gt; to
have occurred simultaneously. A rating period could be as long as several months, or could
be as short as one minute. As a result, an &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is calculated for each rating period, which is based on &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mtext&gt;old&lt;/mtext&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}_{\text{old}}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.83333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.33610799999999996em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord text mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;old&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; from the previous rating period.&lt;/p&gt;
&lt;p&gt;Here, there is a constant &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;c&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; governing the increase in uncertainty between rating periods, which can be precisely determined by optimizing predictive accuracy of future games. Once &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is computed properly, the new rating &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;r&amp;#x27;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.751892em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.751892em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; must be updated for each player.&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mfrac&gt;&lt;mi&gt;q&lt;/mi&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;msup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mfrac&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;msup&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;/munderover&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;∣&lt;/mi&gt;&lt;mtext&gt;E&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;r&amp;#x27; = r + \frac{q}{\frac{1}{\text{RD}^{2}} + \frac{1}{d^{2}}} \sum^{m}_{j=1} g(\text{RD}_{j})(s_{j} | \text{E}(r,r_{j},\text{RD}_{j}))&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.801892em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.801892em;&quot;&gt;&lt;span style=&quot;top:-3.113em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.66666em;vertical-align:-0.08333em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:3.0651740000000007em;vertical-align:-1.4137769999999998em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.1075599999999999em;&quot;&gt;&lt;span style=&quot;top:-2.264892em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.636449em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord text mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7907871428571429em;&quot;&gt;&lt;span style=&quot;top:-2.830472857142857em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.363551em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7463142857142857em;&quot;&gt;&lt;span style=&quot;top:-2.786em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;q&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.098659em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop op-limits&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.6513970000000007em;&quot;&gt;&lt;span style=&quot;top:-1.872331em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.050005em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span&gt;&lt;span class=&quot;mop op-symbol large-op&quot;&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-4.3000050000000005em;margin-left:0em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.05em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;m&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.4137769999999998em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;∣&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;msqrt&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mrow&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;msup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mfrac&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;msup&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/msqrt&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&amp;#x27; = \sqrt{\frac{1}{\frac{1}{\text{RD}^{2}} + \frac{1}{d^{2}}}}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.825122em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.825122em;&quot;&gt;&lt;span style=&quot;top:-3.1362300000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:3.04em;vertical-align:-1.2947345000000001em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord sqrt&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.7452655em;&quot;&gt;&lt;span class=&quot;svg-align&quot; style=&quot;top:-5em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot; style=&quot;padding-left:1em;&quot;&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.32144em;&quot;&gt;&lt;span style=&quot;top:-2.264892em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.636449em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord text mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7907871428571429em;&quot;&gt;&lt;span style=&quot;top:-2.830472857142857em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.363551em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.845108em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7463142857142857em;&quot;&gt;&lt;span style=&quot;top:-2.786em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.394em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.098659em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.7052655em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;hide-tail&quot; style=&quot;min-width:1.02em;height:3.08em;&quot;&gt;&lt;svg width=&apos;400em&apos; height=&apos;3.08em&apos; viewBox=&apos;0 0 400000 3240&apos; preserveAspectRatio=&apos;xMinYMin slice&apos;&gt;&lt;path d=&apos;M473,2793
c339.3,-1799.3,509.3,-2700,510,-2702 l0 -0
c3.3,-7.3,9.3,-11,18,-11 H400000v40H1017.7
s-90.5,478,-276.2,1466c-185.7,988,-279.5,1483,-281.5,1485c-2,6,-10,9,-24,9
c-8,0,-12,-0.7,-12,-2c0,-1.3,-5.3,-32,-16,-92c-50.7,-293.3,-119.7,-693.3,-207,-1200
c0,-1.3,-5.3,8.7,-16,30c-10.7,21.3,-21.3,42.7,-32,64s-16,33,-16,33s-26,-26,-26,-26
s76,-153,76,-153s77,-151,77,-151c0.7,0.7,35.7,202,105,604c67.3,400.7,102,602.7,104,
606zM1001 80h400000v40H1017.7z&apos;/&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.2947345000000001em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;The &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&amp;#x27;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.825122em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.825122em;&quot;&gt;&lt;span style=&quot;top:-3.1362300000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; term refers to the updated confidence in our own ranking after playing a bunch of players. Notice, it&apos;s basically the same as our previous &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, but adds a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; variable. Basically, the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; variable can be thought of as the change in uncertainty of &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, or how much more certain or less certain we are about our ranking. A really large &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; (i.e. &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;∞&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d \to \infty&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;∞&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;) will give us the same &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&amp;#x27;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.825122em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.825122em;&quot;&gt;&lt;span style=&quot;top:-3.1362300000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, whereas a really small &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; (i.e. &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d \to 0&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;→&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;) will give us an &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&amp;#x27;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.825122em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.825122em;&quot;&gt;&lt;span style=&quot;top:-3.1362300000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;0&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Essentially, the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; variable really starts to shrink when we play more players, which supports the fact that if we play more players, then we&apos;ll have a low &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; (and vice versa). So, that is how the uncertainty is updated.&lt;/p&gt;
&lt;h2&gt;Intuition behind Expected Ranking&lt;/h2&gt;
&lt;p&gt;The new actual ranking &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;r&amp;#x27;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.751892em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.751892em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is dependent on the difference between the outcome of an opponent &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;s&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and the expected outcome &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;E&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;∣&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{E}(s|...)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;∣&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; of playing that opponent. This expected outcome will be around &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;0&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; if a player is expected to lose, and around &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; if a player expected to win. This difference is what really determines the update to a player&apos;s ranking.&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;E&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;msup&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;R&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;D&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;mn&gt;400&lt;/mn&gt;&lt;/mfrac&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{E}(r,r_{j},\text{RD}_{j}) = \frac{1}{1+10^{\frac{-g(RD_{j})(r-r_{j})}{400}}}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.036108em;vertical-align:-0.286108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:2.4993499999999997em;vertical-align:-1.1779099999999996em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.32144em;&quot;&gt;&lt;span style=&quot;top:-2.1100000000000003em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.20458em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.2045799999999998em;&quot;&gt;&lt;span style=&quot;top:-3.3485500000000004em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mopen nulldelimiter sizing reset-size3 size6&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.2228999999999999em;&quot;&gt;&lt;span style=&quot;top:-2.656em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.2255000000000003em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line mtight&quot; style=&quot;border-bottom-width:0.049em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.6871857142857145em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen mtight&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.00773em;&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.02778em;&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3448em;margin-left:-0.02778em;margin-right:0.1em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.65952em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.5091600000000001em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose mtight&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mopen mtight&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mbin mtight&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3448em;&quot;&gt;&lt;span style=&quot;top:-2.3448em;margin-left:-0.02778em;margin-right:0.1em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.65952em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.5091600000000001em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose mtight&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.344em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter sizing reset-size3 size6&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.43458em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.20458em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.88158em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.20458em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.1779099999999996em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;q&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;munderover&gt;&lt;mo&gt;∑&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;/munderover&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;msup&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;mo&gt;×&lt;/mo&gt;&lt;mtext&gt;E&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mo&gt;×&lt;/mo&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mtext&gt;E&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;d^{2} = \frac{1}{q^{2} \sum_{j=1}^{m} g(\text{RD}_{j})^{2} \times \text{E}(r,r_{j},\text{RD}_{j}) \times (1 - \text{E}(r,r_{j},\text{RD}_{j}))}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8641079999999999em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8641079999999999em;&quot;&gt;&lt;span style=&quot;top:-3.113em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:2.45155em;vertical-align:-1.1301100000000002em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.32144em;&quot;&gt;&lt;span style=&quot;top:-2.305708em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.740108em;&quot;&gt;&lt;span style=&quot;top:-2.9890000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mop&quot;&gt;&lt;span class=&quot;mop op-symbol small-op&quot; style=&quot;position:relative;top:-0.0000050000000000050004em;&quot;&gt;∑&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.804292em;&quot;&gt;&lt;span style=&quot;top:-2.40029em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;mrel mtight&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.2029em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;m&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.43581800000000004em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.740108em;&quot;&gt;&lt;span style=&quot;top:-2.9890000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;×&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;×&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mpunct&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.1301100000000002em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;If a player beats someone when he/she is expected to lose, this difference will be positive and high. Meaning, his/her new ranking will be relatively much higher. And, if a player beats someone when he/she is expected to win, this difference will be positive and low. Meaning, his/her new ranking won&apos;t change that much. However, if a player loses to someone he/she is expected to win against, this difference will be negative and large. Meaning, his/her new ranking will be relatively much lower. Lastly, if a player loses to someone he/she is expected to lose against, this difference will be negative and small. Meaning, his/her ranking won&apos;t change that much.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;s_{j}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.716668em;vertical-align:-0.286108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/th&gt;
&lt;th&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;E&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;∣&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{E} (s \vert ...)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;∣&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/th&gt;
&lt;th&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;j&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mtext&gt;E&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;∣&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;.&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;s_{j} - \text{E}(s \vert ...)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8694379999999999em;vertical-align:-0.286108em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.311664em;&quot;&gt;&lt;span style=&quot;top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.05724em;&quot;&gt;j&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.286108em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;E&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;∣&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;win&lt;/td&gt;
&lt;td&gt;lose&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;much higher&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{much higher}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8888799999999999em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;much higher&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;win&lt;/td&gt;
&lt;td&gt;win&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;not much higher&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{not much higher}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8888799999999999em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;not much higher&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lose&lt;/td&gt;
&lt;td&gt;win&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;much lower&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{much lower}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;much lower&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lose&lt;/td&gt;
&lt;td&gt;lose&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;not much lower&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{not much lower}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;not much lower&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;After playing a bunch of opponents, the updated ranking &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;r&lt;/mi&gt;&lt;mo mathvariant=&quot;normal&quot; lspace=&quot;0em&quot; rspace=&quot;0em&quot;&gt;′&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;r&amp;#x27;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.751892em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.751892em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is also dependent on the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;g(\text{RD})&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; term, which essentially is the inverse of each opponent&apos;s &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; term. The following formula defines this function:&lt;/p&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;msqrt&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mn&gt;3&lt;/mn&gt;&lt;msup&gt;&lt;mi&gt;q&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;msup&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;msup&gt;&lt;mi&gt;π&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt;&lt;/mfrac&gt;&lt;/msqrt&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;g(\text{RD}) = \frac{1}{\sqrt{\frac{1 + 3q^{2}\text{RD}^{2}}{\pi^{2}}}}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:3.0514400000000004em;vertical-align:-1.7300000000000002em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.32144em;&quot;&gt;&lt;span style=&quot;top:-2.11em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.3123295em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord sqrt&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.3123295000000001em;&quot;&gt;&lt;span class=&quot;svg-align&quot; style=&quot;top:-3.8em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.8em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot; style=&quot;padding-left:1em;&quot;&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.9996590000000001em;&quot;&gt;&lt;span style=&quot;top:-2.6550000000000002em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;π&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7463142857142857em;&quot;&gt;&lt;span style=&quot;top:-2.786em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.446108em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mbin mtight&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot; style=&quot;margin-right:0.03588em;&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7463142857142857em;&quot;&gt;&lt;span style=&quot;top:-2.786em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord text mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.7907871428571429em;&quot;&gt;&lt;span style=&quot;top:-2.830472857142857em;margin-right:0.07142857142857144em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.5em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size3 size1 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.345em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.2723294999999997em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.8em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;hide-tail&quot; style=&quot;min-width:1.02em;height:1.8800000000000001em;&quot;&gt;&lt;svg width=&apos;400em&apos; height=&apos;1.8800000000000001em&apos; viewBox=&apos;0 0 400000 1944&apos; preserveAspectRatio=&apos;xMinYMin slice&apos;&gt;&lt;path d=&apos;M983 90
l0 -0
c4,-6.7,10,-10,18,-10 H400000v40
H1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7
s-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744
c-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30
c26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722
c56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5
c53.7,-170.3,84.5,-266.8,92.5,-289.5z
M1001 80h400000v40h-400000z&apos;/&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.5276704999999999em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.5423295em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.3123295em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.9893295em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3.3123295em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.7300000000000002em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;katex-display&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;q&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;ln&lt;/mi&gt;&lt;mo&gt;⁡&lt;/mo&gt;&lt;mn&gt;10&lt;/mn&gt;&lt;/mrow&gt;&lt;mn&gt;400&lt;/mn&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;q = \frac{\ln 10}{400}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.625em;vertical-align:-0.19444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2777777777777778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:2.05744em;vertical-align:-0.686em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:1.37144em;&quot;&gt;&lt;span style=&quot;top:-2.314em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.677em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mop&quot;&gt;ln&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.16666666666666666em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.686em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Again, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;g(\text{RD})&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is roughly the inverse of &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. Normally, a high &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; indicates a player&apos;s ranking is fairly uncertain. In this case, &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;g(\text{RD})&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; outputs a high &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; if a player&apos;s ranking is fairly certain. Meaning, if a player has played many games, then he/she generally will have a low &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and high &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;g(\text{RD})&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. On the other hand, if a player doesn&apos;t play many games, then he/she generally will have a high &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and low &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;g(\text{RD})&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The inverse of &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; represents the certainty of a player&apos;s ranking. Mathematically, we take the inverse because we want to multiply that difference mentioned earlier based on our opponent&apos;s uncertainty. So, if an opponent&apos;s ranking is highly certain, then his/her ranking will go up even more (i.e. a multiple of the difference). If an opponent&apos;s ranking is highly uncertain, then his/her ranking won&apos;t go up by very much.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Number of Games Player&lt;/th&gt;
&lt;th&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{RD}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/th&gt;
&lt;th&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mtext&gt;RD&lt;/mtext&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;g(\text{RD})&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;RD&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;many&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;few&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content:encoded></item><item><title><![CDATA[Building a Prototyping Pipeline in Python]]></title><description><![CDATA[Most data science projects undergo various stages, which require communication with the business, determining use cases and opportunities with high ROIs, collecting and exploring raw data, feature…]]></description><link>https://dkharazi.github.io/blog/prototype-pipeline</link><guid isPermaLink="false">https://dkharazi.github.io/blog/prototype-pipeline</guid><pubDate>Sat, 25 Jul 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Most data science projects undergo various stages, which require communication with the business, determining use cases and opportunities with high ROIs, collecting and exploring raw data, feature engineering, and iterating through potential models.&lt;/p&gt;
&lt;p&gt;Furthermore, these stages don&apos;t usually occur in perfectly successive steps. Rather, they generally follow an iterative cycle, where we may return to a stage after making changes to another stage. Evidently, this system consists of many moving parts, which calls for a rapid prototyping environment.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#motivating-a-data-science-pipeline&quot;&gt;Motivating a Data Science Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#outlining-a-prototyping-pipeline&quot;&gt;Outlining a Prototyping Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#illustrating-our-project-layout&quot;&gt;Illustrating our Project Layout&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-a-flask-application&quot;&gt;Defining a Flask Application&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-nginx-configurations&quot;&gt;Defining NGINX Configurations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-gunicorn-configurations&quot;&gt;Defining Gunicorn Configurations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-the-dockerfile&quot;&gt;Defining the Dockerfile&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Motivating a Data Science Pipeline&lt;/h2&gt;
&lt;p&gt;The DevOps lifecycle delineates the journey of project development. At a high level, it determines a philosophy that enforces agility and collaboration between software development and IT operations. There are 8 phases included in the DevOps lifecycle, but this post won&apos;t be going into these specifics. For details about individual phases within the DevOps lifecycle, refer to &lt;a href=&quot;https://realpython.com/tutorials/devops/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;these articles&lt;/a&gt;, which introduce DevOps solutions in Python.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/b75eb9062321a0f7411cb99f4ff6c60c/pydevops.svg&quot; alt=&quot;pydevops&quot;&gt;&lt;/p&gt;
&lt;p&gt;To motivate the use of a prototyping pipeline, we&apos;ll focus on the &lt;em&gt;plan&lt;/em&gt; and &lt;em&gt;code&lt;/em&gt; phase for most of this post. As stated previously, there are many moving parts in a data science pipeline. Generally, any prototyping pipeline will include the following steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understanding a business problem&lt;/li&gt;
&lt;li&gt;Collecting or locating any raw data&lt;/li&gt;
&lt;li&gt;Performing exploratory data analysis&lt;/li&gt;
&lt;li&gt;Performing feature engineering&lt;/li&gt;
&lt;li&gt;Building and evaluating models&lt;/li&gt;
&lt;li&gt;Deploying ultimate model&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notice, the majority of these steps happen during the code phase, and these are only a small percentage of the complete set of operations that occur throughout the entire lifecycle. Automating these steps in a standardized environment provides benefits, such as continuous deployment, continuous testing, and process efficiency. In other words, building a rapid prototyping pipeline can help automate these steps, which facilitates project development.&lt;/p&gt;
&lt;h2&gt;Outlining a Prototyping Pipeline&lt;/h2&gt;
&lt;p&gt;When assembling my own prototyping pipelines, I personally prefer to build web applications using &lt;a href=&quot;https://flask.palletsprojects.com/en/1.1.x/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Flask&lt;/a&gt;, rather than Django. In particular, I use &lt;a href=&quot;https://dash.plotly.com/integrating-dash&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Dash&lt;/a&gt; for quickly building data visualizations and interfaces, where Dash uses the Flask micro-framework under the hood. As a result, it is fairly straightforward to embed a Dash app at a specific route of an existing Flask app.&lt;/p&gt;
&lt;p&gt;Furthermore, a WSGI server can be used as an application server. It handles requests meant for our actual applications, which are passed on from the web server. Personally, I&apos;ll use Gunicorn for this setup, but there are many other &lt;a href=&quot;https://flask.palletsprojects.com/en/1.1.x/deploying/wsgi-standalone/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;WSGI servers&lt;/a&gt; that contain WSGI applications and serve HTTP.&lt;/p&gt;
&lt;p&gt;Similarly, there are many HTTP web servers available, but Nginx is &lt;a href=&quot;https://docs.gunicorn.org/en/stable/deploy.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;highly suggested&lt;/a&gt; when working with Gunicorn. Again, Nginx is our web server, which accepts the client requests and handles any HTTP connections. These HTTP requests are passed on to the Gunicorn WSGI servers. When combining these components together, our pipeline begins to take the following shape:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/866f26ccf8e8c2130a8db2d309542064/prototype.svg&quot; alt=&quot;prototypepipeline&quot;&gt;&lt;/p&gt;
&lt;p&gt;Although Nginx sits inside the same server as the Flask web application in this example, it can be served on its own server. Also, the web server can run in a docker container that is located within or outside of the container running our web application. In a productionized environment, we may think about running the Nginx web server to help balance the load.&lt;/p&gt;
&lt;p&gt;In the Gunicorn documentation, the recommended number of workers is 2-4 workers per core. For horizontal scaling, Kubernetes is used to scale the number of deployments, each running a Gunicorn WSGI server with multiple workers. For more information about the implementation of Kubernetes, Gunicorn, and other components of our pipeline, refer to &lt;a href=&quot;https://stackoverflow.com/a/51873337/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Illustrating our Project Layout&lt;/h2&gt;
&lt;p&gt;All files related to Docker and Gunicorn are located in the &lt;code class=&quot;language-text&quot;&gt;deploy&lt;/code&gt; directory in our project layout. The files related to our Flask web application are located in the &lt;code class=&quot;language-text&quot;&gt;src&lt;/code&gt; directory in our project layout. Ultimately, our project layout looks like the following:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;myapp/
├── deploy/
│   ├── docker-entrypoint.sh
│   ├── Dockerfile
│   ├── nginx.conf
│   ├── supervisord.conf
│   └── conf.ini
└── src/
    ├── __init__.py
    └── app.py&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;app.py&lt;/code&gt; contains the code for our web application. Whereas, the &lt;code class=&quot;language-text&quot;&gt;conf.py&lt;/code&gt; file contains configuration details for a Gunicorn WSGI server. The remaining files in the &lt;code class=&quot;language-text&quot;&gt;deploy&lt;/code&gt; directory mostly relate to Docker configurations.&lt;/p&gt;
&lt;h2&gt;Defining a Flask Application&lt;/h2&gt;
&lt;p&gt;In this post, we&apos;ll build a simple Flask web application running on Docker Compose. Specifically, the application outputs a request counter maintained in Redis. The following is sample code from our application:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# app.py&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; time
&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; redis
&lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; flask &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; Flask

app &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Flask&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;__name__&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
cache &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; redis&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Redis&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;host&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;redis&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; port&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;6379&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;get_hit_count&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  retries &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; cache&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;incr&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;hits&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;except&lt;/span&gt; redis&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;exceptions&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ConnectionError &lt;span class=&quot;token keyword&quot;&gt;as&lt;/span&gt; exc&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; retries &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;raise&lt;/span&gt; exc
      retries &lt;span class=&quot;token operator&quot;&gt;-=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;
      time&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sleep&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0.5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token decorator annotation punctuation&quot;&gt;@app&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;route&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;/&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;hello&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  count &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; get_hit_count&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;Ive been seen {} times.\n&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this exmaple, &lt;code class=&quot;language-text&quot;&gt;redis&lt;/code&gt; is the hostname of the redis container located on the same network as this application. We use the default port for Redis, which is &lt;code class=&quot;language-text&quot;&gt;6379&lt;/code&gt;. This &lt;a href=&quot;https://realpython.com/flask-by-example-implementing-a-redis-task-queue/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;article&lt;/a&gt; illustrates a more detailed example using a redis task queue. For a deeper explanation about our simple Python web application, refer to the &lt;a href=&quot;https://docs.docker.com/compose/gettingstarted/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;getting-started docs&lt;/a&gt;, which uses this example with Docker compose in greater detail.&lt;/p&gt;
&lt;h2&gt;Defining NGINX Configurations&lt;/h2&gt;
&lt;p&gt;In this post, our Nginx web server and Gunicorn WSGI server handles client requests and eventually runs our flask application as a result. Therefore, we need to configure our Nginx web server. These configurations are suggested in greater detail in the &lt;a href=&quot;https://docs.gunicorn.org/en/stable/deploy.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Gunicorn docs&lt;/a&gt;, but we&apos;ll simplify the file to only include barebones specifications in the &lt;code class=&quot;language-text&quot;&gt;nginx.conf&lt;/code&gt; file:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;apacheconf&quot;&gt;&lt;pre class=&quot;language-apacheconf&quot;&gt;&lt;code class=&quot;language-apacheconf&quot;&gt;worker_processes 1;

pid /var/run/nginx.pid;
error_log /var/log/nginx/error.log warn;

events {
  worker_connections 1024;
}

http {
  &lt;span class=&quot;token directive-inline property&quot;&gt;include&lt;/span&gt; mime.types;
  default_type application/octet-stream;
  sendfile on;

  upstream app_server {
    server unix:/tmp/guni.sock fail_timeout=0;
  }

  server {
    &lt;span class=&quot;token directive-inline property&quot;&gt;listen&lt;/span&gt; 8080;
    server_name localhost;
    client_max_body_size 4G;
    keepalive_timeout 5;

    root /home/dkharazi/dev/myapp/public;

    location / {
      try_files &lt;span class=&quot;token variable&quot;&gt;$uri&lt;/span&gt; @app;
    }
    location @app {
      proxy_set_header X-Forwarded-For
        &lt;span class=&quot;token variable&quot;&gt;$proxy_add_x_forwarded_for&lt;/span&gt;;
      proxy_set_header X-Forwarded-Proto
        &lt;span class=&quot;token variable&quot;&gt;$scheme&lt;/span&gt;;
      proxy_pass
        http://localhost:8050;
    }
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Nginx configuration files are located in the &lt;code class=&quot;language-text&quot;&gt;/etc/nginx&lt;/code&gt; directory, where the primary configuration file refers to &lt;code class=&quot;language-text&quot;&gt;/etc/nginx/nginx.conf&lt;/code&gt;. In Nginx, configuration options are called &lt;em&gt;directives&lt;/em&gt;, which are organized into groups known as contexts.&lt;/p&gt;
&lt;p&gt;By default, the process ID of the nginx master process is written to the &lt;code class=&quot;language-text&quot;&gt;nginx.pid&lt;/code&gt; file in the &lt;code class=&quot;language-text&quot;&gt;/var/run&lt;/code&gt; directory. Here, we are specifying the process ID to be written to that directory, but can change it to some different directory.  In a similar fashion, the error logs are written to the default directory.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;http://nginx.org/en/docs/ngx_core_module.html#events&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;events directive&lt;/a&gt; specifies the context of the main configuration file. Here, we&apos;ll identify any directives related to the processing of connections. To keep things simple, we&apos;ll only specify the maximum number of simultaneous connections that can be opened by a worker.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;http://nginx.org/en/docs/http/ngx_http_core_module.html#http&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;http directive&lt;/a&gt; specifies the context of the main configuration related to HTTP server directives. Here, we&apos;ll define some basic configurations, such as the default MIME type of a response, a flag for blocking I/O to disk, and WSGI server specifications. Additionally,  the server directive is specified, which sets configurations for our virtual server and points Nginx to the location of our web application. These configurations redefine any appending fields to the request header, which are passed to the proxied server. Specifically, they use embedded variables, such as &lt;code class=&quot;language-text&quot;&gt;proxy_add_x_forwarded_for&lt;/code&gt;, which refers to the remote address of the client.&lt;/p&gt;
&lt;p&gt;In our situation, we want to assign our Nginx to a port below 1024, which are privaleged ports. Also, we&apos;re informing Nginx to route all requests to the Gunicorn socket and Python application when specifying the location of &lt;code class=&quot;language-text&quot;&gt;@app&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For additional detils about the more basic configurations, refer to &lt;a href=&quot;http://nginx.org/en/docs/beginners_guide.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this guide&lt;/a&gt;. Also, refer to &lt;a href=&quot;http://nginx.org/en/docs/ngx_core_module.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the docs&lt;/a&gt; for more details about the behavior of any particular syntax.&lt;/p&gt;
&lt;h2&gt;Defining Gunicorn Configurations&lt;/h2&gt;
&lt;p&gt;Again, our Nginx web server and Gunicorn WSGI server handles client requests and eventually runs our flask application as a result. Therefore, we need to configure our Gunicorn WSGI server. These configurations are suggested in greater detail in the &lt;a href=&quot;https://docs.gunicorn.org/en/stable/deploy.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Gunicorn docs&lt;/a&gt;, but we&apos;ll simplify the file to only include barebones specifications in the &lt;code class=&quot;language-text&quot;&gt;conf.ini&lt;/code&gt; file:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;ini&quot;&gt;&lt;pre class=&quot;language-ini&quot;&gt;&lt;code class=&quot;language-ini&quot;&gt;&lt;span class=&quot;token selector&quot;&gt;[app:server]&lt;/span&gt;
&lt;span class=&quot;token constant&quot;&gt;bind&lt;/span&gt; &lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; localhost:8050&lt;/span&gt;
&lt;span class=&quot;token constant&quot;&gt;workers&lt;/span&gt; &lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; 4&lt;/span&gt;
&lt;span class=&quot;token constant&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; &apos;myapp&apos;&lt;/span&gt;
&lt;span class=&quot;token constant&quot;&gt;daemon&lt;/span&gt; &lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;=&lt;/span&gt; True&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;bind&lt;/code&gt; setting refers to the socket to which the Gunicorn WSGI server binds itself. The &lt;code class=&quot;language-text&quot;&gt;workers&lt;/code&gt; setting refers to the number of worker processes for handling requests. The recommended value for this setting is &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;mo&gt;×&lt;/mo&gt;&lt;mtext&gt;num cores&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;2 \times \text{num cores}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.72777em;vertical-align:-0.08333em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;×&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.43056em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;num cores&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. This setting should be adjusted in order to find the best for our work load, since it depends on the hardware of our server. The &lt;code class=&quot;language-text&quot;&gt;daemon&lt;/code&gt; setting specifies that the app will run in the background on our server. For details about more specific settings in the configuration file, refer to the &lt;a href=&quot;https://docs.gunicorn.org/en/latest/settings.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Gunicorn docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Defining the Dockerfile&lt;/h2&gt;
&lt;p&gt;Docker recommends running one process per container for &lt;a href=&quot;https://devops.stackexchange.com/a/451&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;these reasons&lt;/a&gt;. If there is a need to run our WSGI server and Nginx web server in the same container, we can use &lt;a href=&quot;https://stackoverflow.com/a/43510962/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;supervisord&lt;/a&gt; to create and manage processes based on data in its configuration file, which creates subprocesses. Docker outlines their &lt;a href=&quot;https://docs.docker.com/config/containers/multi-service_container/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;best practices&lt;/a&gt; for running multiple services within a single container.&lt;/p&gt;
&lt;p&gt;In this example, our Dockerfile builds a Python 3.7 image. We&apos;ll also define a compose file, which will reference two individual services: &lt;code class=&quot;language-text&quot;&gt;web&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;redis&lt;/code&gt;. Specifically, these will be outlined in the &lt;code class=&quot;language-text&quot;&gt;docker-compose.yml&lt;/code&gt; file. &lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;dockerfile&quot;&gt;&lt;pre class=&quot;language-dockerfile&quot;&gt;&lt;code class=&quot;language-dockerfile&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; python&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;3.7&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;alpine
&lt;span class=&quot;token keyword&quot;&gt;WORKDIR&lt;/span&gt; /myapp
&lt;span class=&quot;token keyword&quot;&gt;ENV&lt;/span&gt; FLASK_APP app.py
&lt;span class=&quot;token keyword&quot;&gt;ENV&lt;/span&gt; FLASK_RUN_HOST 0.0.0.0
&lt;span class=&quot;token keyword&quot;&gt;RUN&lt;/span&gt; apk add &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;no&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;cache gcc musl&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;dev linux&lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;headers
&lt;span class=&quot;token keyword&quot;&gt;COPY&lt;/span&gt; requirements.txt requirements.txt
&lt;span class=&quot;token keyword&quot;&gt;RUN&lt;/span&gt; pip install &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt;r requirements.txt
&lt;span class=&quot;token keyword&quot;&gt;COPY&lt;/span&gt; . .
&lt;span class=&quot;token keyword&quot;&gt;CMD&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;flask&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;run&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To translate some of these commands, the Dockerfile starts off with building an image using the Python 3.7 image. Then, it assigns the working directory to &lt;code class=&quot;language-text&quot;&gt;/myapp&lt;/code&gt;. Next, it sets the environment variables used by the &lt;a href=&quot;https://flask.palletsprojects.com/en/1.1.x/cli/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;flask command&lt;/a&gt;. Gcc is installed so Python packages, such as SQLAlchemy, can compile speedups. The &lt;code class=&quot;language-text&quot;&gt;requirements.txt&lt;/code&gt; file is copied, which installs the Python dependencies, and it is run after being copied. Then, all of the files from the current directory &lt;code class=&quot;language-text&quot;&gt;.&lt;/code&gt; are copied to the workdir &lt;code class=&quot;language-text&quot;&gt;.&lt;/code&gt; in the image. Lastly, the &lt;code class=&quot;language-text&quot;&gt;flask run&lt;/code&gt; command is run in the container.&lt;/p&gt;
&lt;p&gt;As stated previously, we want to define two different services, so they can run in their own individual containers. Creating a &lt;code class=&quot;language-text&quot;&gt;docker-compose.yml&lt;/code&gt; file will make this possible. In particular, we can create a file called &lt;code class=&quot;language-text&quot;&gt;docker-compose.yml&lt;/code&gt;, which will contain the following contents:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;yml&quot;&gt;&lt;pre class=&quot;language-yml&quot;&gt;&lt;code class=&quot;language-yml&quot;&gt;&lt;span class=&quot;token key atrule&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;3&apos;&lt;/span&gt;
&lt;span class=&quot;token key atrule&quot;&gt;services&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;web&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; .
    &lt;span class=&quot;token key atrule&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;5000:5000&quot;&lt;/span&gt;
  &lt;span class=&quot;token key atrule&quot;&gt;redis&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token key atrule&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;redis:alpine&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;web&lt;/code&gt; service uses the Python 3.7 image built from the Dockerfile. On the other hand, the &lt;code class=&quot;language-text&quot;&gt;redis&lt;/code&gt; service uses the Redis image pulled from the Docker Hub registry. Port 5000 of the container is bound to port 5000 of our machine, which is the default port for the Flask web service. We can run these containers by running the following commands:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;docker-compose up&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After the command finishes executing, our container should be running. Now, we can enter &lt;code class=&quot;language-text&quot;&gt;http://localhost:5000&lt;/code&gt; in our browser to see the application running. For additional information about specific configurations available to the Compose file, refer to the &lt;a href=&quot;https://docs.docker.com/compose/compose-file/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Docker docs&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Selection Bias and COVID-19]]></title><description><![CDATA[Survivorship bias refers to our tendency of only focusing on the observations that make it past some selection process, while overlooking those that do not. Typically, these observations aren't…]]></description><link>https://dkharazi.github.io/blog/survivorship-bias</link><guid isPermaLink="false">https://dkharazi.github.io/blog/survivorship-bias</guid><pubDate>Thu, 11 Jun 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Survivorship bias refers to our tendency of only focusing on the observations that make it past some selection process, while overlooking those that do not. Typically, these observations aren&apos;t detected in some study due to some lack of visibility. Implying, survivorship bias is a type of selection bias.&lt;/p&gt;
&lt;p&gt;This form of bias can lead to overly optimistic beliefs and conflation of correlation with causation. As an example, if the few founders of successful tech startups all dropped out of college, someone may believe that college is unnecessary or may believe tech startups have a better chance of succeeding when they&apos;re founded by college drop-outs. Obviously, we know this isn&apos;t true, since there are many students who drop out and fail, but their stories are rarely ever told.&lt;/p&gt;
&lt;p&gt;Wikipedia illustrates many more &lt;a href=&quot;https://en.wikipedia.org/wiki/Survivorship_bias&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;examples&lt;/a&gt; of survivorship bias. Refer to it for more detailed cases involving survivorship bias, which include military and other historical examples.&lt;/p&gt;
&lt;h2&gt;State of COVID-19 Testing&lt;/h2&gt;
&lt;p&gt;As hinted at earlier, survivorship bias is a special case of selection bias, which refers to the selection of &lt;em&gt;survivors&lt;/em&gt;. In the case of COVID-19, the current state of testing informs us about the selection of &lt;em&gt;non-survivors&lt;/em&gt;. The CDC states on their &lt;a href=&quot;https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/testing.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;website&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Most people have mild illness and can recover at home without medical care. Contact your healthcare provider if your symptoms are getting worse or if you have questions about your health.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words, those with mild symptoms are suggested to not visit the hospital, which is important for protecting health workers. Consequently, this significantly decreases the chance of ever including this group of people in the overall case count, since they&apos;re discouraged to visit the hospital early on. Furthermore, the CDC goes on to say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;An antibody test might tell you if you had a past infection. An antibody test might not show if you have a current infection because it can take 1–3 weeks after infection for your body to make antibodies. Having antibodies to the virus that causes COVID-19 might provide protection from getting infected with the virus again. If it does, we do not know how much protection the antibodies might provide or how long this protection might last.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Personally, I have tried getting a test from nearby testing centers in my city. I searched for local testing centers and found that most of them needed an appointment scheduled days in advance. Even after scheduling an appointment, nearly each test could only return results within 5-7 days. The limited testing capacities and lack of real-time reporting felt discouraging, since I essentially would need to quarantine for a week before learning if I have the virus or not. Or, I would need to drive almost an hour to a testing center much further away from my residence, and I would need to take time off from work. From someone who attempts to follow all of the CDC guidelines, this process was deflating and unrealistic.&lt;/p&gt;
&lt;p&gt;Regarding antibody tests, I have never received one. Based on the information from the CDC, I personally would walk away from an antibody test feeling slightly unconfident with my results, given the uncertainty associated with it. Although most of this uncertainty is arguably unavoidable, these factors disincentivize symptomless people from getting tested. For more information about the potential shortcomings of antibody tests, refer to &lt;a href=&quot;https://www.scientificamerican.com/article/coronavirus-antibody-tests-have-a-mathematical-pitfall/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Converse of Survivorship Bias&lt;/h2&gt;
&lt;p&gt;Most likely, the factors listed above contribute to fewer asymptomatic people getting tested, which can be seen in studies like &lt;a href=&quot;https://www.cidrap.umn.edu/news-perspective/2020/04/study-many-asymptomatic-covid-19-cases-undetected&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this one&lt;/a&gt;. The testing in hospitals tells us little about the spread of the virus because the results are prone to distortion.&lt;/p&gt;
&lt;p&gt;Again, the survivorship bias refers to the selection of survivors. However, in the case of COVID-19, we&apos;re almost witnessing the converse of a survivorship bias. Instead, the observations in our study mostly include non-survivors, and many of the asymptomatic survivors are excluded from the case count. As a result, the virus is likely much &lt;a href=&quot;https://news.usc.edu/170565/covid-19-antibody-study-coronavirus-infections-los-angeles-county/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;more contagious&lt;/a&gt; than expected, causing the death rate to be smaller in reality. To be clear, I am not suggesting a smaller death rate means the virus shouldn&apos;t be taken as seriously. Rather, contact tracing and large-scale testing should be taken more seriously to capture a comprehensive understanding of the virus. Then, transmission models would become accurate and robust enough to better control COVID-19.&lt;/p&gt;
&lt;p&gt;Successful countries have followed this response strategy, like South Korea. Rather than enforcing an official lockdown, they focused on aggressive testing, which specifically includes the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Early and frequent testing&lt;/li&gt;
&lt;li&gt;Tracing using high-tech surveillance&lt;/li&gt;
&lt;li&gt;Zero-tolerance isolation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a result, South Korea is considered one of the hallmarks of the countries having contained COVID-19. For more information about South Korea&apos;s response, refer to &lt;a href=&quot;https://www.theatlantic.com/ideas/archive/2020/05/whats-south-koreas-secret/611215/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Creating Custom Awaitable Objects]]></title><description><![CDATA[The goal of the asyncio module is to implement asynchronous programming in Python. It achieves concurrency by using evented I/O and cooperative multitasking, whereas a module like  achieves…]]></description><link>https://dkharazi.github.io/blog/awaitable</link><guid isPermaLink="false">https://dkharazi.github.io/blog/awaitable</guid><pubDate>Thu, 04 Jun 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The goal of the &lt;a href=&quot;https://docs.python.org/3/library/asyncio.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;asyncio&lt;/a&gt; module is to implement asynchronous programming in Python. It achieves concurrency by using evented I/O and cooperative multitasking, whereas a module like &lt;code class=&quot;language-text&quot;&gt;multithreading&lt;/code&gt; achieves concurrency by focusing on threading and pre-emptive multitasking. The asyncio module focuses on coroutines, which makes this form of concurrent programming arguably more complicated than other modules, such as &lt;code class=&quot;language-text&quot;&gt;multiprocessing&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;multithreading&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;When looking through the asyncio documentation, I never found any great examples that involved building custom awaitables and running them as tasks. Since so much of asyncio depends on building its own non-blocking functions specific to asyncio, this seemed strange to me. By running custom awaitables as task, we can achieve both increased flexibility and concurrency. This seems to be a very powerful component of asyncio, at least for some of my use cases.&lt;/p&gt;
&lt;h2&gt;Motivating the Await Expression&lt;/h2&gt;
&lt;p&gt;In Python 3.3, the &lt;code class=&quot;language-text&quot;&gt;yield from&lt;/code&gt; expression was introduced to wait for coroutines in asyncio applications. In Python 3.5, the &lt;a href=&quot;https://www.python.org/dev/peps/pep-0492/#await-expression&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;await expression&lt;/a&gt; was introduced to replace the old &lt;code class=&quot;language-text&quot;&gt;yield from&lt;/code&gt; syntax in asyncio. It was introduced for multiple reasons and included various behavioral changes.&lt;/p&gt;
&lt;p&gt;Compared to the &lt;code class=&quot;language-text&quot;&gt;yield from&lt;/code&gt; expression, the &lt;code class=&quot;language-text&quot;&gt;await&lt;/code&gt; syntax enforces a clearer role for coroutines. Specifically, &lt;code class=&quot;language-text&quot;&gt;yield from&lt;/code&gt; could accept a generator or coroutine, whereas &lt;code class=&quot;language-text&quot;&gt;await&lt;/code&gt; strictly accepts a coroutine.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Python 3.4 and older&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;               &lt;span class=&quot;token comment&quot;&gt;# subroutine?&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;None&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;bar&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; foobar&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;# generator? coroutine?&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Python 3.5&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;         &lt;span class=&quot;token comment&quot;&gt;# coroutine!&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; foobar&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;       &lt;span class=&quot;token comment&quot;&gt;# coroutine!&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;None&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Introducing Awaitable Objects&lt;/h2&gt;
&lt;p&gt;In asyncio, coroutines are considered an &lt;em&gt;awaitable&lt;/em&gt; object. There seem to be three types of awaitable objects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A coroutine&lt;/li&gt;
&lt;li&gt;An asyncio &lt;code class=&quot;language-text&quot;&gt;Task&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;An asyncio &lt;code class=&quot;language-text&quot;&gt;Future&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A &lt;code class=&quot;language-text&quot;&gt;Future&lt;/code&gt; object acts as a placeholder for data that hasn&apos;t yet been calculated or fetched. A &lt;code class=&quot;language-text&quot;&gt;Task&lt;/code&gt; is a wrapper for a coroutine and a subclass of &lt;code class=&quot;language-text&quot;&gt;Future&lt;/code&gt;. Specifically, it wraps coroutines to schedule them for execution. A &lt;code class=&quot;language-text&quot;&gt;Task&lt;/code&gt; is a high-level awaitable object, whereas a &lt;code class=&quot;language-text&quot;&gt;Future&lt;/code&gt; is a low-level awaitable object. Normally, there &lt;a href=&quot;https://docs.python.org/3/library/asyncio-task.html#awaitables&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;isn&apos;t a need&lt;/a&gt; to create a &lt;code class=&quot;language-text&quot;&gt;Future&lt;/code&gt; object at the application level code. For these reasons, let&apos;s only focus on coroutines.&lt;/p&gt;
&lt;p&gt;Generally, coroutines implement the &lt;code class=&quot;language-text&quot;&gt;__await__&lt;/code&gt; special method, which return an iterator. There are a few other ways to define an awaitable object. However, each method involves defining or invoking an object with an &lt;code class=&quot;language-text&quot;&gt;__await__&lt;/code&gt; method. Therefore, if we want to define our own custom awaitable object, we need to define a class with an &lt;code class=&quot;language-text&quot;&gt;__await__&lt;/code&gt; special method. For a more in-depth analysis of awaitables and futures, refer to &lt;a href=&quot;https://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Defining an Awaitable Object&lt;/h2&gt;
&lt;p&gt;An asyncio application begins to get interesting once we start creating tasks. In asyncio, the &lt;code class=&quot;language-text&quot;&gt;create_task()&lt;/code&gt; function runs coroutines concurrently as asyncio &lt;code class=&quot;language-text&quot;&gt;Tasks&lt;/code&gt;. In this section, we&apos;ll create a task that schedules a custom awaitable coroutine.&lt;/p&gt;
&lt;p&gt;The code below creates a custom awaitable object &lt;code class=&quot;language-text&quot;&gt;RandomSleeper&lt;/code&gt;. It sleeps for 5 to 10 seconds and returns a message after waking up. It also notifies us before falling asleep. This behavior is captured in the &lt;code class=&quot;language-text&quot;&gt;async def&lt;/code&gt; function, which creates a coroutine object. Notice, the &lt;code class=&quot;language-text&quot;&gt;RandomSleeper&lt;/code&gt; class must include the &lt;code class=&quot;language-text&quot;&gt;__await__&lt;/code&gt; special method in order to be &lt;code class=&quot;language-text&quot;&gt;awaited&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As a reminder, an &lt;code class=&quot;language-text&quot;&gt;async def&lt;/code&gt; expression only creates a coroutine object once it has been awaited. Since we&apos;re interested in creating tasks, we need to create a &lt;code class=&quot;language-text&quot;&gt;nap&lt;/code&gt; function, which strictly awaits the custom awaitable &lt;code class=&quot;language-text&quot;&gt;RandomSleeper&lt;/code&gt; object. By doing this, the &lt;code class=&quot;language-text&quot;&gt;nap&lt;/code&gt; function returns a coroutine, which can be passed into our &lt;code class=&quot;language-text&quot;&gt;main()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;main()&lt;/code&gt; function represents an &lt;a href=&quot;https://docs.python.org/3/library/asyncio-eventloop.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;event loop&lt;/a&gt;, which awaits the coroutines returned by the &lt;code class=&quot;language-text&quot;&gt;nap()&lt;/code&gt; function. The main function runs these coroutines as tasks by passing them into the &lt;code class=&quot;language-text&quot;&gt;create_task()&lt;/code&gt; function.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; asyncio
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; random

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;RandomSleeper&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;__await__&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; self&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;snooze&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;__await__&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;snooze&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;self&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         sleep &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; random&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;randint&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         msg &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;Sleeping for {} seconds!&apos;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         msg &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; msg&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sleep&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;msg&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         msg &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;What a short {} second nap!&apos;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         msg &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; msg&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sleep&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; asyncio&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sleep&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;sleep&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; msg&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;nap&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; RandomSleeper&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;     &lt;span class=&quot;token keyword&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         t1 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; asyncio&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;create_task&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;nap&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         t2 &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; asyncio&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;create_task&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;nap&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; t1&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; t2&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; asyncio&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;run&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;main&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
Sleeping &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt; seconds!
Sleeping &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt; seconds!
What a short &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt; second nap!
What a short &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt; second nap!
Sleeping &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;9&lt;/span&gt; seconds!
Sleeping &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; seconds!
What a short &lt;span class=&quot;token number&quot;&gt;9&lt;/span&gt; second nap!
What a short &lt;span class=&quot;token number&quot;&gt;5&lt;/span&gt; second nap!&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice, the two tasks &lt;code class=&quot;language-text&quot;&gt;t1&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;t2&lt;/code&gt; run concurrently in the event loop. Meaning, we&apos;re able to run our custom awaitable &lt;code class=&quot;language-text&quot;&gt;RandomSleeper&lt;/code&gt; concurrently by running its associated coroutines as tasks. Specifically, &lt;code class=&quot;language-text&quot;&gt;t1&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;t2&lt;/code&gt; are run simultaneously (roughly), and &lt;code class=&quot;language-text&quot;&gt;t1&lt;/code&gt; waits for &lt;code class=&quot;language-text&quot;&gt;t2&lt;/code&gt; to finish running before returning.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Internal Structure of Pandas DataFrames]]></title><description><![CDATA[A  object relies on underlying data structures to improve performance of row-oriented and column-oriented operations. One of these data structures includes the BlockManager. The BlockManager is a core…]]></description><link>https://dkharazi.github.io/blog/blockmanager</link><guid isPermaLink="false">https://dkharazi.github.io/blog/blockmanager</guid><pubDate>Fri, 15 May 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; object relies on underlying data structures to improve performance of row-oriented and column-oriented operations. One of these data structures includes the BlockManager. The BlockManager is a core architectural component that is an internal storage object in Pandas. Implying, it is not included in the Pandas documentation.&lt;/p&gt;
&lt;p&gt;As the internals of Pandas continues to expand, microperformance suffers. In this case, microperformance refers to the performance of many small operations taking 1 microsecond. In particular, fairly simple oeprations, such as indexing, may pass through multiple internal layers before hitting its operation. As a result, the performance of certain operations aren&apos;t always consistent and reliable. For these two reasons alone, the BlockManager is quite important to understand when dealing with the performance of many operations in Pandas.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-a-blockmanager&quot;&gt;What is a BlockManager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#illustrating-the-blockmanager&quot;&gt;Illustrating the BlockManager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#illustrating-the-role-of-the-blockmanager&quot;&gt;Illustrating the Role of the BlockManager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#benefit-of-the-blockmanager&quot;&gt;Benefit of the BlockManager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#disadvantages-of-the-blockmanager&quot;&gt;Disadvantages of the BlockManager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#roadmap-for-the-blockmanager&quot;&gt;Roadmap for the BlockManager&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What is a BlockManager&lt;/h2&gt;
&lt;p&gt;In Pandas versions 0.1 and 0.2, the data in a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; was stored in a &lt;code class=&quot;language-text&quot;&gt;dict&lt;/code&gt;. Since then, it has evolved into something much more complicated, but is still implemented in pure Python. Now, a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; in memory roughly represents:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Some metadata&lt;/li&gt;
&lt;li&gt;A collection of NumPy arrays for each column&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This structure was introduced when the BlockManager was introduced, which manages these NumPy arrays. The reason for making this change to the structure of a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; was to support column-oriented operations that were very slow without a BlockManager.&lt;/p&gt;
&lt;p&gt;A BlockManager is fairly self-explanatory. It manages blocks, where an individual block refers to data stored as a NumPy ndarray object. The BlockManager is a memory management object that manages the internal columns of data inside a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;. Each axis is capable of reshaping the blocks to a new set of labels. The BlockManager consolidates any blocks together with similar data types. It can also accept new blocks without copying data.&lt;/p&gt;
&lt;h2&gt;Illustrating the BlockManager&lt;/h2&gt;
&lt;p&gt;We may want to view the internals of a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; to gain a better understanding of how the data is actually being stored. Accessing the &lt;code class=&quot;language-text&quot;&gt;_data&lt;/code&gt; attribute yields the BlockManager of a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;. It also lists the specific blocks handled by the BlockManager.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;df&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
   c1 c2  c3
&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;   &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;  a  &lt;span class=&quot;token number&quot;&gt;10&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;   &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;  b  &lt;span class=&quot;token number&quot;&gt;20&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;   &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;  c  &lt;span class=&quot;token number&quot;&gt;30&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;df&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
BlockManager
Items&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Index&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;c1&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c2&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c3&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;object&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
Axis &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; RangeIndex&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;start&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; stop&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; step&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
IntBlock&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; x &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int64
ObjectBlock&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; x &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;object&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Illustrating the Role of the BlockManager&lt;/h2&gt;
&lt;p&gt;As briefly described earlier, the BlockManager is responsible for consolidating any blocks together with similar data types. It does this by calling the &lt;code class=&quot;language-text&quot;&gt;consolidate()&lt;/code&gt; method.&lt;/p&gt;
&lt;p&gt;The BlockManager doesn&apos;t consolidate blocks of similar data types when new blocks are added. Instead, the BlockManager does this automatically in the initial stages of many &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; operations. This notion may seem abstract at first, but can be easily observed by adding a new block to the BlockManager.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; df&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;c4&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;300&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;df&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_data&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
BlockManager
Items&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Index&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;c1&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c2&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c3&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c4&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;object&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
Axis &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; RangeIndex&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;start&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; stop&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; step&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
IntBlock&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt; x &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int64
ObjectBlock&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; x &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;object&lt;/span&gt;
IntBlock&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; x &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int64&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice, there are two separate IntBlocks after adding a new column of ints to the &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;. By calling the &lt;code class=&quot;language-text&quot;&gt;consolidate()&lt;/code&gt; method, we&apos;ll see consolidation of blocks of similar data types. Meaning, we&apos;ll see the two IntBlocks consolidated into one IntBlock.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; df&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;_data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;consolidate&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
BlockManager
Items&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; Index&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;c1&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c2&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c3&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;c4&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;object&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
Axis &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; RangeIndex&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;start&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; stop&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; step&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
IntBlock&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt; x &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; int64
ObjectBlock&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; x &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dtype&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;object&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, every block is consolidated based on its data type. At a high level, we can think of each &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; method calling the &lt;code class=&quot;language-text&quot;&gt;consolidate()&lt;/code&gt; method before running its operation. In truth, it is more complicated than this. Specifically, the &lt;code class=&quot;language-text&quot;&gt;consolidate()&lt;/code&gt; method is only called in operations that directly benefit from consolidation.&lt;/p&gt;
&lt;p&gt;For a more detailed analysis about consolidation and when it happens, refer to &lt;a href=&quot;https://uwekorn.com/2020/05/24/the-one-pandas-internal.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;. For a more detailed explanation of the BlockManager, refer to &lt;a href=&quot;https://wesmckinney.com/blog/a-roadmap-for-rich-scientific-data-structures-in-python/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt; written by Wes McKinney, who introduced the BlockManager.&lt;/p&gt;
&lt;h2&gt;Benefit of the BlockManager&lt;/h2&gt;
&lt;p&gt;The BlockManager introduced a columnar structure to the &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;. Like any other columnar store, it provides significant performance boosts for column-oriented operations. Furthermore, it provides significant performance boosts to column-oriented operations on many different columns. For example, the BlockManager improves the speed of vector-like operations, such as summing two columns together. &lt;/p&gt;
&lt;h2&gt;Disadvantages of the BlockManager&lt;/h2&gt;
&lt;p&gt;Although the BlockManager was a necessary addition to the Pandas project, it creates a negative impact on performance in certain circumstances. There are four general areas that are negatively impacted by the BlockManager:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code complexity&lt;/li&gt;
&lt;li&gt;Loss of user visibility to memory use&lt;/li&gt;
&lt;li&gt;Unavoidable consolidation&lt;/li&gt;
&lt;li&gt;Microperformance issues&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since the BlockManager introduced blocks to the Pandas architecture, writing new code becomes more complex, since there needs to be careful construction of the block structure. Although this boosts the performance of complicated algorithms, such as joins, writing code for algorithms becomes more complicated.&lt;/p&gt;
&lt;p&gt;Large datasets are usually read into a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; object &lt;em&gt;naively&lt;/em&gt;. Consequently, there is a memory-doubling effect that can lead to memory errors. When Pandas was written in 2011, the creators of Pandas weren&apos;t thinking about analyzing many gigabytes or terabytes of data. Now, &lt;a href=&quot;https://wesmckinney.com/blog/apache-arrow-pandas-internals/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the rule of thumb&lt;/a&gt; for reading in a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; object is to have 5-10 times as much available RAM as the size of the data.&lt;/p&gt;
&lt;p&gt;As stated previously, consolidation happens in methods that directly benefit from it. As a result, consolidation can lead to performance and memory overhead for fairly common operations. For example, calling &lt;code class=&quot;language-text&quot;&gt;read_csv()&lt;/code&gt; may require consolidation after completion.&lt;/p&gt;
&lt;p&gt;Again, the BlockManager was a necessary addition. It fixed a lot of performance issues. However, there is a proposition to replace the BlockManager, which would require a significant inversion of the internal architecture to involve more native code and less interpreted Python.&lt;/p&gt;
&lt;h2&gt;Roadmap for the BlockManager&lt;/h2&gt;
&lt;p&gt;Currently, Pandas architecturally is structured around the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPython implementation of internal data structures&lt;/li&gt;
&lt;li&gt;Cython implementation of algorithms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the future, there may be effort to &lt;a href=&quot;https://github.com/pydata/pandas-design/blob/a0f1d32094f5030cc06ec09c8582b5a7b7798065/source/internal-architecture.rst#building-libpandas-in-c1114-for-lowest-level-implementation-tier&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;create a native library&lt;/a&gt;, where the data structures, logical types, and memory management is assembled using a native API. By replacing the BlockManager with native code, Pandas would receive the following benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simpler code&lt;/li&gt;
&lt;li&gt;Easier extensibility with new logical types&lt;/li&gt;
&lt;li&gt;Possibly better performance than the current implementation&lt;/li&gt;
&lt;li&gt;Improved user-control over the memory layout&lt;/li&gt;
&lt;li&gt;Improved microperformance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information about the use cases and drawbacks of the BlockManager, refer to the &lt;a href=&quot;https://github.com/pydata/pandas-design/blob/a0f1d32094f5030cc06ec09c8582b5a7b7798065/source/internal-architecture.rst#what-is-blockmanager-and-why-does-it-exist&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;design docs&lt;/a&gt; and &lt;a href=&quot;https://pandas.pydata.org/docs/development/roadmap.html#block-manager-rewrite&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;roadmap docs&lt;/a&gt;, which was written by Wes McKinney after developing the BlockManager.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Performance Benchmarks: PyArrow]]></title><description><![CDATA[As of 2020, there has been development towards parquet-cpp, which is a native C++ implementation of Parquet. This development process was moved to the Apache Arrow repository. At a very high level…]]></description><link>https://dkharazi.github.io/blog/pyarrow</link><guid isPermaLink="false">https://dkharazi.github.io/blog/pyarrow</guid><pubDate>Sat, 11 Apr 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;As of 2020, there has been development towards &lt;a href=&quot;https://github.com/apache/parquet-cpp&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;parquet-cpp&lt;/a&gt;, which is a native C++ implementation of Parquet. This development process was moved to the Apache Arrow repository.&lt;/p&gt;
&lt;p&gt;At a very high level, the Arrow project was created primarily in an effort to provide zero-copy data access, which involves mapping complex tables to memory. Meaning, reading 1 terabyte of data from disk should be as &lt;strong&gt;fast&lt;/strong&gt; and &lt;strong&gt;easy&lt;/strong&gt; as reading 1 megabyte of data.&lt;/p&gt;
&lt;p&gt;The Arrow project includes Python bindings with integration of NumPy, pandas, and built-in Python objects. These Python bindings are based on a C++ implementation of Arrow, and they are accessible via the PyArrow library. To learn more about the use cases and motivations of PyArrow, watch &lt;a href=&quot;https://www.youtube.com/watch?v=Hqi_Bw_0y8Q&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Wes McKinney&apos;s presentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-origins-of-pyarrow&quot;&gt;The Origins of PyArrow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#benefits-of-the-arrow-protocol&quot;&gt;Benefits of the Arrow Protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#use-cases-of-pyarrow&quot;&gt;Use Cases of PyArrow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#comparing-pyarrow-to-parquet&quot;&gt;Comparing PyArrow to Parquet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-serialization-can-be-a-problem&quot;&gt;Why Serialization can be a Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#improvements-to-topandas&quot;&gt;Improvements to DataFrame.toPandas()&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-performance-tests&quot;&gt;Setting Up Performance Tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#performance-of-pyarrow&quot;&gt;Performance of PyArrow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Origins of PyArrow&lt;/h2&gt;
&lt;p&gt;In 2016, Wes McKinney joined the Apache Arrow project to improve Python&apos;s interperability with big data systems like Impala and Spark. Wes took the lead in development of the C++ and Python implementations of Apache Arrow. Rather than attempting to summarize the background of the PyArrow any further, I&apos;ll conclude this section with a &lt;a href=&quot;https://wesmckinney.com/blog/apache-arrow-pandas-internals/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;quote from Wes&lt;/a&gt;, which is taken from an article that was written during his time at Cloudera:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At Cloudera, I started looking at Impala, Kudu, Spark, Parquet, and other such big data storage and analysis systems. Since Python and pandas had never been involved with any of these projects, building integrations with them was difficult. The single biggest problem was data interchange, particularly moving large tabular datasets from one process&apos;s memory space to another&apos;s. It was extremely expensive, and there was no standard solution for doing it. RPC-oriented serialization protocols like Thrift and Protocol Buffers were too slow and too general purpose.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Benefits of the Arrow Protocol&lt;/h2&gt;
&lt;p&gt;Apache Arrow provides an in-memory, columnar data structure that has several key benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Random access is &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(1)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;Includes native vectorized optimization for analytical processing&lt;/li&gt;
&lt;li&gt;Data interchange is fast and efficient between systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Random access is &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(1)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; because its formatted as a column-oriented data structure. Native vectorized optimizations are possible because its execution engine takes advantage of SIMD operations, which are included in modern processors. Meaning, any algorithms that process the Arrow data structure will be very fast. Data interchange between systems is very fast and efficient, since Arrow avoids costly data serialization. Serialization is used in many other systems, including Spark, Avro, etc.&lt;/p&gt;
&lt;h2&gt;Use Cases of PyArrow&lt;/h2&gt;
&lt;p&gt;By avoiding costly serialization of I/O operations, Arrow is able improve the performance interprocess communication with zero-overhead memory sharing. Furthermore, the Arrow project involves a great deal of effort to strandardize its in-memory data structure. As a result, Arrow provides systems with the ability to reuse algorithms more efficiently.&lt;/p&gt;
&lt;p&gt;At a higher level, Arrow tackles three general use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data movement&lt;/li&gt;
&lt;li&gt;Data access&lt;/li&gt;
&lt;li&gt;Computation libraries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As stated previously, Arrow attempts to efficiently improve the process of moving data from one system to another system. As a result, the Arrow memory format support zero-copy interprocess communication. In other words, reading Arrow&apos;s data structure into a separate system avoids creating any redundant data copies between intermediate buffers. Improving interprocess communication ensures &lt;a href=&quot;https://en.wikipedia.org/wiki/Remote_procedure_call&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;RPC&lt;/a&gt; (client-server) based data-movement. For greater details about how Arrow achieved these benefits, read the &lt;a href=&quot;https://arrow.apache.org/docs/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Arrow docs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Due to improvements to interprocess communication, Apache Arrow is able to read and write to parquet files from various libraries, such as Pandas. Additionally, Arrow can convert a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; object from one library to a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; object from a different library. For example, Arrow is dedicating development effort to efficiently move an R data frame into Pandas, and vice versa. Arrow has already been able to improve the performance of reading a PySpark &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; as a Pandas &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Arrow improves the performance of data access. These improvements include efficiently reading from and writing to common storage formats or files, such as Parquet files. Specifically, zero-copy data access enables complex memory mapping of tables, which implies accessing 1 TB of data is as fast as accessing 1 mB of data. Arrow also boosts the speed of interacting with database protocols and other data sources. Arrow also provides methods for performing efficient, in-memory, dataframe-like analytics on its data structure.&lt;/p&gt;
&lt;h2&gt;Comparing PyArrow to Parquet&lt;/h2&gt;
&lt;p&gt;My &lt;a href=&quot;/blog/parquet/&quot;&gt;previous post&lt;/a&gt; described Apache Parquet as a column-oriented format for storage. It is used for data serialization and stores an actual file. On the other hand, Apache Arrow is a library that provides access to its in-memory data structure, which follows a column-oriented format. Refer to this &lt;a href=&quot;https://stackoverflow.com/a/56481636/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;StackOverflow post&lt;/a&gt; for a more in-depth explanation about the differences between these two Apache projects.&lt;/p&gt;
&lt;p&gt;Apache Arrow defines a binary serialization protocol, which is used for arranging a collection of Arrow columnar arrays. These columnar arrays allow Arrow to provide efficient messaging and interprocess communication. The Arrow protocol is used for mapping a &lt;em&gt;blob&lt;/em&gt; of Arrow data without doing any deserialization. This allows Arrow and other libraries to perform analytics on Arrow&apos;s data structure.&lt;/p&gt;
&lt;h2&gt;Why Serialization can be a Problem&lt;/h2&gt;
&lt;p&gt;Most Python users expect to deal with similar data structures when converting between a Spark &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; and a Pandas &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;. Usually, they want to make this conversion to use the flexible Pandas API on a locally-stored &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; object, after running some distribution computations on a distributed Spark &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Users don&apos;t really care about how Spark and Pandas represent data frames internally. Rather, they have a similar data structure in mind, and they want to switch back-and-forth between Spark&apos;s API (for distributed computation on a data frame object) and Pandas API (for flexible functions on locally stored data).
Since there are so many libraries that implement their own form of &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; object, moving from one context to another can become difficult. Particularly, loading and R data frame in Pandas can be very challenging and slow (and vice versa).&lt;/p&gt;
&lt;p&gt;Converting from one data frame format to a different data frame format can also involve serialization. In Python, serialization refers to the conversion of an in-memory Python object, such as a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;, to an on-disk stream of bytes. The cost of serialization varies for different contexts in different libraries. In particular, the &lt;code class=&quot;language-text&quot;&gt;pickle&lt;/code&gt; library is a standard Python library used for serializing Python objects. Read &lt;a href=&quot;http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this&lt;/a&gt; for a more in depth analysis of serialization in Python.&lt;/p&gt;
&lt;p&gt;Serialization is a relatively slow process in Python. Returning to an earlier example, a Spark &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; is converted to a Pandas &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; using serialization. Specifically, the &lt;code class=&quot;language-text&quot;&gt;DataFrame.toPandas()&lt;/code&gt; function in PySpark is inefficient, since it serializes each row into a list of tuples. This is inefficient for two primary reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPickle serialization is slow and potentially unnecessary&lt;/li&gt;
&lt;li&gt;Iterating over each tuple using &lt;code class=&quot;language-text&quot;&gt;DataFrame.from_records()&lt;/code&gt; is a slow method for creating a &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Improvements to &lt;code class=&quot;language-text&quot;&gt;toPandas&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;As a reminder, the PySpark API is a very thin layer of code wrapped around the Java API for spark, which itself is only a wrapper around the core Scala API. Therefore, running a Python driver program with a SparkContext will invoke a JavaSparkContext by launching a JVM behind-the-scenes. &lt;/p&gt;
&lt;p&gt;Arrow uses an efficient in-memory columnar data structure, which can be accessed using the PyArrow library in Python. To solve the issue with &lt;code class=&quot;language-text&quot;&gt;DataFrame.toPandas()&lt;/code&gt; in PySpark, PyArrow has proposed to use Arrow to ensure the data is in Arrow&apos;s memory format. By doing this, there isn&apos;t a need to serialize data using cPickle. This is because Arrow can send its data directly from the JVM to a Python process.&lt;/p&gt;
&lt;p&gt;Additionally, PyArrow creates a pandas &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; from entire chunks of data, rather than individual values. This is achieved by using &lt;a href=&quot;https://en.wikipedia.org/wiki/Zero-copy&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;zero-copy methods&lt;/a&gt; in PyArrow. To learn more about the solution implemented by PyArrow, refer to &lt;a href=&quot;https://arrow.apache.org/blog/2017/07/26/spark-arrow/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt; and &lt;a href=&quot;https://bryancutler.github.io/toPandas/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Setting Up Performance Tests&lt;/h2&gt;
&lt;p&gt;The New York City Taxi &amp;#x26; Limousine Commission records each taxi and limousine trip in NYC. They report these trips to the public each month, and include information about pick-up and drop-off destinations and times. This data is used in many Data Engineering projects for reasons mentioned &lt;a href=&quot;https://uwekorn.com/2019/08/22/why-the-nyc-trd-is-a-nice-training-dataset.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This exercise will only use the dataset containing trips from January of 2019 completed in yellow taxis. More details about the commission and their datasets can be found on &lt;a href=&quot;https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the site&lt;/a&gt;. This particular dataset is 687.1 MB and contains 7.6 million rows. Before performing any benchmarks, let&apos;s include any setup code:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; timeit &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; timeit
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; itr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;timeit&lt;/code&gt; library is commonly used for testing performance of code segments in Python. Specifically, it returns the total seconds taken to run a given code segment, excluding the execution of any specified setup code. The &lt;code class=&quot;language-text&quot;&gt;iter&lt;/code&gt; variable is included to test each segment 100 times. After running the &lt;code class=&quot;language-text&quot;&gt;timeit()&lt;/code&gt; function, the total seconds is divided by the number of runs, ultimately to determine average performance per test.&lt;/p&gt;
&lt;h2&gt;Performance of PyArrow&lt;/h2&gt;
&lt;p&gt;As of June 2020, pyspark defaults to converting a PySpark &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; to a Pandas &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; by serializing to a cPickle. However, the &lt;code class=&quot;language-text&quot;&gt;toPandas()&lt;/code&gt; method uses Arrow when specifying the setting &lt;code class=&quot;language-text&quot;&gt;spark.sql.execution.arrow.enabled&lt;/code&gt; to &lt;em&gt;true&lt;/em&gt; in the SparkSession.&lt;/p&gt;
&lt;p&gt;Now, let&apos;s test the performance of the default implementation:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;df.toPandas()&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token triple-quoted-string string&quot;&gt;&quot;&quot;&quot;
... from pyspark.sql import SparkSession
... spark = SparkSession.builder \
...     .master(&apos;local&apos;) \
...     .appName(&apos;Taxi&apos;) \
...     .getOrCreate()
... df = spark.read.csv(&apos;taxi.csv&apos;)
... &quot;&quot;&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;302.95&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After performing the Arrow-less &lt;code class=&quot;language-text&quot;&gt;toPandas()&lt;/code&gt; method, the final test delivered a poorer-than-expected performance. Using the pyspark engine, converting the taxi trips &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; took an average of 302.95 seconds for serializing and converting the entire dataset. Now, let&apos;s test the performance of the PyArrow implementation:.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;df.toPandas()&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token triple-quoted-string string&quot;&gt;&quot;&quot;&quot;
... from pyspark.sql import SparkSession
... spark = SparkSession.builder \
...     .master(&apos;local&apos;) \
...     .appName(&apos;Taxi&apos;) \
...     .config(&quot;spark.sql.execution.arrow.enabled&quot;, &quot;true&quot;) \
...     .config(&quot;spark.driver.maxResultSize&quot;, &quot;0&quot;) \
...     .getOrCreate()
... df = spark.read.csv(&apos;taxi.csv&apos;)
... &quot;&quot;&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;54.51&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This example was run locally on my laptop using the default configurations for Spark. Thus, the performance benchmarks should not be taken precisely. Regardless, there clearly seems to be a huge performance boost when using Arrow. For a more detailed analysis of PyArrow&apos;s I/O performance, refer to &lt;a href=&quot;https://uwekorn.com/2019/01/27/data-science-io-a-baseline-benchmark.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Performance Benchmarks: Parquet]]></title><description><![CDATA[A Parquet file is a popular column-oriented storage format for Hadoop. For more information about column-oriented stores, refer to my previous post. A Parquet file is used for fast analytics that…]]></description><link>https://dkharazi.github.io/blog/parquet</link><guid isPermaLink="false">https://dkharazi.github.io/blog/parquet</guid><pubDate>Thu, 26 Mar 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A Parquet file is a popular column-oriented storage format for Hadoop. For more information about column-oriented stores, refer to my &lt;a href=&quot;/blog/columnar/&quot;&gt;previous post&lt;/a&gt;. A Parquet file is used for fast analytics that often reads and writes columns, rather than rows. Originally, Parquet files were designed to be used in MapReduce problems. Meaning, most of its development went towards &lt;a href=&quot;https://github.com/apache/parquet-mr&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;parquet-mr&lt;/a&gt;, which is a Java implementation.&lt;/p&gt;
&lt;p&gt;As of 2020, there has been development towards &lt;a href=&quot;https://github.com/apache/parquet-cpp&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;parquet-cpp&lt;/a&gt;, which is a native C++ implementation of Parquet. Eventually, this implementation of parquet will provide native read and write support for pandas DataFrames, which will improve the performance of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reading Parquet files into DataFrames&lt;/li&gt;
&lt;li&gt;Writing DataFrames to Parquet files&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-benefits-of-parquet&quot;&gt;The Benefits of Parquet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#compression-and-io-optimization&quot;&gt;Compression and I/O Optimization in Parquet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-format-of-a-parquet-file&quot;&gt;The Format of a Parquet File&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-performance-tests&quot;&gt;Setting Up Performance Tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#performance-of-parquet-engines&quot;&gt;Performance of Parquet Engines&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Benefits of Parquet&lt;/h2&gt;
&lt;p&gt;The Apache Parquet project was originally initiated to create an open-standard columnar file format. In the beginning, Parquet files were only used in the Hadoop ecosystem. Today, they are used in Apache Spark and by cloud vendors to fill many data warehousing needs.&lt;/p&gt;
&lt;p&gt;A parquet file is used for storing a columnar to disk. Meaning, it focuses on data compression, which refers to reducing the size of a file. In Parquet, data compression is performed column-by-column. This enables encoding schemes to be used for different data types. As a result, parquet files are able to reduce the time for each query by reducing the overall I/O, such as reading data for each column in a compressed format.&lt;/p&gt;
&lt;h2&gt;Compression and I/O Optimization&lt;/h2&gt;
&lt;p&gt;Since an entire column is stored on blocks, compression can be optimized by deducing the exact number of bits for each data value. For example, a column of integers could be compressed into a smaller data type by inferring the maximum integer value. So, if a column consist of integers that range from 0 and 100, then the column doesn&apos;t need to be any larger than int8.&lt;/p&gt;
&lt;p&gt;I/O is optimized by focusing on projection pushdown and predicate pushdown. Here, a predicate refers to a filter with a &lt;code class=&quot;language-text&quot;&gt;where&lt;/code&gt; clause, and a projection refers to selected columns using a &lt;code class=&quot;language-text&quot;&gt;select&lt;/code&gt; clause. Projection pushdown involves column pruning. This happens automatically, since Parquet is formatted as a columnar file.&lt;/p&gt;
&lt;p&gt;In parquet, predicate pushdown involves moving any filtering to an earlier phase of query execution. Then, it maintains statistics for groups of rows to improve the performance of predicate evaluation. In summary, predicate pushdown in Parquet provides significant performance improvements. For more details about predicate pushdown in Parquet, refer to &lt;a href=&quot;https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_ig_predicate_pushdown_parquet.html#concept_pgs_plb_mgb&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/c60617b3604322fc8d79a880f807475d/parquetpushdown.svg&quot; alt=&quot;parquetpushdown&quot;&gt;&lt;/p&gt;
&lt;h2&gt;The Format of a Parquet File&lt;/h2&gt;
&lt;p&gt;A Parquet file is organized into three general sections:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Header&lt;/li&gt;
&lt;li&gt;Data Blocks&lt;/li&gt;
&lt;li&gt;Footer&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each Parquet file has one header, one or many data blocks, and one footer. Within these components, a Parquet file stores two different types of information: metadata and data. Specifically, the metadata is stored in the header and footer, whereas the data is stored in the data blocks.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/b7c46cd4f740fd192cc36f11f030bcab/parquetlayout.svg&quot; alt=&quot;parquetgenerallayout&quot;&gt;&lt;/p&gt;
&lt;p&gt;In particular, the header contains metadata in the form of a 4-byte magic number in the header, which represents its file is in Parquet format. Remaining metadata about the file is stored in the footer section. It contains metadata about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Row groups&lt;/li&gt;
&lt;li&gt;Columns&lt;/li&gt;
&lt;li&gt;Version of its Parquet format&lt;/li&gt;
&lt;li&gt;4-byte magic number&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a Parquet file, each data block is stored as a collection of row groups. These row groups are stored as a collection of column chunks. A row group corresponds to a set of rows, whereas a column chunk corresponds to an individual column in the dataset. The data in column chunks are organized into pages, which correspond to column values.&lt;/p&gt;
&lt;p&gt;At a high-level, the graphic below illustrates sample data formatted as a Parquet file. For more details about the layout of a Parquet file, refer to the Apache Parquet &lt;a href=&quot;https://parquet.apache.org/documentation/latest/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/22ea2f576c7f67964c31b28fab3691f2/parquetexample.svg&quot; alt=&quot;parquetformat&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Use of Parquet in Pandas&lt;/h2&gt;
&lt;p&gt;As of June 2020, the pandas library provides wrapper functions that use a Parquet engine for reading and writing Parquet files. These two functions are &lt;code class=&quot;language-text&quot;&gt;pandas.read_parquet&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;pandas.to_parquet&lt;/code&gt;. As of June 2020, there are two choices of Parquet engines used for reading in Parquet files.&lt;/p&gt;
&lt;p&gt;According to the &lt;a href=&quot;https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_parquet.html#pandas.read_parquet&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;pandas documentation&lt;/a&gt;, an engine parameter can be specified, which refers to the Parquet library to use. Its default behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.&lt;/p&gt;
&lt;p&gt;Seeing as the development towards PyArrow and parquet-cpp is still progressing, we may be interested in performance benchmarks for reading from and writing to Parquet files, while using the above functions in their current state. For a more detailed analysis of performance benchmarks, refer to &lt;a href=&quot;https://wesmckinney.com/blog/python-parquet-update/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Setting Up Performance Tests&lt;/h2&gt;
&lt;p&gt;The New York City Taxi &amp;#x26; Limousine Commission records each taxi and limousine trip in NYC. They report these trips to the public each month, and include information about pick-up and drop-off destinations and times. This data is used in many Data Engineering projects for reasons mentioned &lt;a href=&quot;https://uwekorn.com/2019/08/22/why-the-nyc-trd-is-a-nice-training-dataset.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This exercise will only use the dataset containing trips from January of 2019 completed in yellow taxis. More details about the commission and their datasets can be found on &lt;a href=&quot;https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the site&lt;/a&gt;. This particular dataset is 687.1 MB and contains 7.6 million rows. Before performing any benchmarks, let&apos;s include any setup code:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; timeit &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; timeit
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; itr &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;timeit&lt;/code&gt; library is commonly used for testing performance of code segments in Python. Specifically, it returns the total seconds taken to run a given code segment, excluding the execution of any specified setup code. The &lt;code class=&quot;language-text&quot;&gt;iter&lt;/code&gt; variable is included to test each segment 100 times. After running the &lt;code class=&quot;language-text&quot;&gt;timeit()&lt;/code&gt; function, the total seconds is divided by the number of runs, ultimately to determine average performance per test.&lt;/p&gt;
&lt;h2&gt;Performance of Parquet Engines&lt;/h2&gt;
&lt;p&gt;Now, let&apos;s test the performance of reading in the same dataset in the following formats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A csv file&lt;/li&gt;
&lt;li&gt;An hdf file&lt;/li&gt;
&lt;li&gt;A parquet file using the &lt;code class=&quot;language-text&quot;&gt;fastparquet&lt;/code&gt; engine&lt;/li&gt;
&lt;li&gt;A parquet file using the &lt;code class=&quot;language-text&quot;&gt;pyarrow&lt;/code&gt; engine&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Prior to executing the tests below, the HDF and Parquet files were converted to a csv file. Then, the &lt;code class=&quot;language-text&quot;&gt;pandas.DataFrame.to_hdf()&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;pandas.DataFrame.to_parquet()&lt;/code&gt; functions were used to store each file into their respective format. &lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Read csv&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;pd.read_csv(&apos;taxi.csv&apos;)&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;import pandas as pd&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;12.92&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Read hdf&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;pd.read_hdf(&apos;taxi.h5&apos;)&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;import pandas as pd&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;6.57&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Read parquet using fastparquet&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;pd.read_parquet(&apos;taxi.parquet&apos;, engine=&apos;fastparquet&apos;)&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;import pandas as pd&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;6.64&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Read parquet using pyarrow&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;pd.read_parquet(&apos;taxi.parquet&apos;, engine=&apos;pyarrow&apos;)&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;import pandas as pd&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;3.62&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After reading in each file in the various formats, the final test delivered the best performance. Using the PyArrow Parquet engine, the taxi trips dataset, formatted as a Parquet file, only took an average of 3.62 seconds for reading in the entire dataset.&lt;/p&gt;
&lt;p&gt;Now, let&apos;s test the performance of writing to similar files.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Write hdf&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;df.to_hdf(&apos;taxi.h5&apos;, key=&apos;df&apos;)&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token triple-quoted-string string&quot;&gt;&quot;&quot;&quot;
... import pandas as pd
... df = pd.read_csv(&apos;taxi.csv&apos;)
... &quot;&quot;&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;6.18&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Write parquet using fastparquet&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;df.to_parquet(&apos;taxi.parquet&apos;, engine=&apos;fastparquet&apos;)&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token triple-quoted-string string&quot;&gt;&quot;&quot;&quot;
... import pandas as pd
... df = pd.read_csv(&apos;taxi.csv&apos;)
... &quot;&quot;&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;9.12&lt;/span&gt;

&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;# Write parquet using pyarrow&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; stmnt &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;df.to_parquet(&apos;taxi.parquet&apos;, engine=&apos;pyarrow&apos;)&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; setup &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token triple-quoted-string string&quot;&gt;&quot;&quot;&quot;
... import pandas as pd
... df = pd.read_csv(&apos;taxi.csv&apos;)
... &quot;&quot;&quot;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; timeit&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;stmnt&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; setup&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; number&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;round&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;s&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;itr&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token number&quot;&gt;4.96&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This example was run locally on my laptop without testing other types of datasets using various styles of compression styles, such as uncompressed, snappy, and gzip. Thus, the performance benchmarks should not be taken precisely. Regardless, there clearly seems to be a huge performance boost when using the PyArrow engine. To learn more about the other use cases for PyArrow, refer to &lt;a href=&quot;/blog/pyarrow/&quot;&gt;my next post&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Basics of Database Internals]]></title><description><![CDATA[A data store is a place used for storing data. This includes a database, repository, file system, etc. There are two ways of storing data in a database, which are the following: Row-oriented data…]]></description><link>https://dkharazi.github.io/blog/columnar</link><guid isPermaLink="false">https://dkharazi.github.io/blog/columnar</guid><pubDate>Sat, 14 Mar 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A data store is a place used for storing data. This includes a database, repository, file system, etc. There are two ways of storing data in a database, which are the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Row-oriented data stores&lt;/li&gt;
&lt;li&gt;Column-oriented data storees&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two types of data stores are found in many of today&apos;s popular in-memory data structures, such as pandas and Apache arrow. To see an example of a column-oriented data store, refer to my &lt;a href=&quot;/blog/parquet/&quot;&gt;next post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Comparing Row and Column Databases&lt;/h2&gt;
&lt;p&gt;Most of us are familar with a row-oriented data store, which stores data row-by-row. Thus, reading and writing data happens one row at a time. As a result, row-oriented data stores are used for transactional systems, which include systems that manage sales, users, airline reservations, etc. Typically, these systems read and write individual records from a row-oriented data store one at a time.&lt;/p&gt;
&lt;p&gt;Column-oriented data stores serialize its data into columns. Column-oriented data stores are referred to as a columnar database. A columnar database stores data column-by-column. Meaning, reading and writing data happens one column at a time. As a result, column-oriented data stores are used for analytical systems, which include dashboards. Typically, these systems read and write individual columns from a column-oriented data store one at a time. In other words, the choice of a particular data store depends on the business use case.&lt;/p&gt;
&lt;p&gt;A form for outputting user information is an example of a system that would query a row-oriented data store. It usually involves querying a database for a specified user or record, rather than an entire column. On the other hand, a dashboard outputting a graphic illustrating sales over time is an example of a system that would query a column-oriented data store. It usually involves querying a database for a &lt;em&gt;sales&lt;/em&gt; column, rather than the sales of an individual.&lt;/p&gt;
&lt;h2&gt;Defining Data Blocks on Disk&lt;/h2&gt;
&lt;p&gt;Most databases store data on commodity hardware, which typically involves storing data on blocks. A block is the smallest unit of storage on an HDD, and it can&apos;t be partially read or written. Instead, disks read and write entire blocks of data at once. Most blocks can hold anywhere between 512 bytes to 65KB.&lt;/p&gt;
&lt;p&gt;In application, a block typically contains multiple rows from a database. However, they are not necessarily stored consecutively. In particular, they are organized using &lt;a href=&quot;https://www.freecodecamp.org/news/database-indexing-at-a-glance-bb50809d48bd/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;indexing&lt;/a&gt;, such as &lt;a href=&quot;https://en.wikipedia.org/wiki/B%2B_tree&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;B-tree&lt;/a&gt; indexing. Typically, indexing will physically organize data on a disk based on the logical order of the index key.&lt;/p&gt;
&lt;p&gt;In other words, database records are stored on a data block in any arbitrary order. New records are added to any available space. When records are updated, the operating system decides their new block position. Most blocks are structured as linked lists, which contains the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A section for the data&lt;/li&gt;
&lt;li&gt;A pointer to the location of the next block&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;/f789f0b5d983ec4af62c64bf8c950446/diskblock.svg&quot; alt=&quot;DiskBlock&quot;&gt;&lt;/p&gt;
&lt;p&gt;Notice, each block can be scattered anywhere on the disk. Also, each block contains a pointer to the block containing the next records. The disk comes equipped with a starting and stopping index for the file as well. Refer to this &lt;a href=&quot;https://stackoverflow.com/a/1130/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;post&lt;/a&gt; for a more detailed explanation about how database indexing works internally.&lt;/p&gt;
&lt;h2&gt;Comparing Disk I/O&lt;/h2&gt;
&lt;p&gt;As stated previously, a row-oriented database stores data row-by-row. Meaning, individual rows are stored on a block, rather than individual columns. Storing data in this manner is performant when users are querying individual rows, rather than columns, which can be seen in the following illustration:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/5a53904296583fcfd4942f6a7007a7c8/rowblock.svg&quot; alt=&quot;rowblock&quot;&gt;&lt;/p&gt;
&lt;p&gt;A column-oriented database stores data column-by-column. Meaning, individual columns are store
d on a block, rather than individual rows. Storing data in this manner is performant when users are querying individual columns, rather than rows, which can be seen in the following illustration:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/1b9ae931478dc832507abf92139a2c08/colblock.svg&quot; alt=&quot;colblock&quot;&gt;&lt;/p&gt;
&lt;p&gt;Querying an individual record from a row-oriented database is performant, since each row is stored in a block in its entirety. Meaning, the disk only needs to read from or write to the record in one place. On the other hand, querying an individual record from a column-oriented database will not be as performant. This is because it takes &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(n)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; to read from or write to each block for a piece of the row.&lt;/p&gt;
&lt;p&gt;Querying an individual column from a column-oriented database is performant, since each column is stored in a block in its entirety. Meaning, the disk only needs to read from or write to the column in one place.&lt;/p&gt;
&lt;p&gt;In the images above, each sector represents a single block for illustrative purposes only. Making this oversimplification hopefully provides better intuition of the purpose of blocks, since the process of reading from and writing to blocks becomes simplier. For a more in-depth explanation about some of these concepts, refer to &lt;a href=&quot;https://www.youtube.com/watch?v=uMkVi4SDLbM&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this video&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[NoSQL Basics: Graph Databases]]></title><description><![CDATA[In a previous post about NoSQL databases, graph stores were described at a fairly high-level. In this post, we'll dive into more low-level details, which includes features, behavior, and use cases…]]></description><link>https://dkharazi.github.io/blog/nosql-graph</link><guid isPermaLink="false">https://dkharazi.github.io/blog/nosql-graph</guid><pubDate>Sun, 01 Mar 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In a &lt;a href=&quot;/blog/nosql/&quot;&gt;previous post&lt;/a&gt; about NoSQL databases, graph stores were described at a fairly high-level. In this post, we&apos;ll dive into more low-level details, which includes features, behavior, and use cases.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;NoSQL Distilled&lt;/em&gt; is a terrific resource for learning about both high-level and low-level details of NoSQL databases. This post is meant to summarize my experience with these databases, along with particular segments from the book. Again, refer to the book for a deeper dive of relational and NoSQL databases.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#defining-a-graph-store&quot;&gt;Defining a Graph Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#features-of-graph-databases&quot;&gt;Features of Graph Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-graph-databases&quot;&gt;Using Graph Databases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Defining a Graph Store&lt;/h2&gt;
&lt;p&gt;Graph databases store two types of objects:nodes and relationships. These nodes represent data entities, and replationships represent a relation between two entities. These relationships visually are represented as edges between two nodes. Each edge between entities has a direction and properties as well. In the world of graph databases, edges can be thought of as a series of joins.&lt;/p&gt;
&lt;p&gt;Graph databases were designed to handle data with many relatonships. Roughly, relational data involving many different joins may be better suited in a graph database, since relationships are stored more efficiently. Graph databases intuitively manage relationships better than relational databases, since relationships are stored as objects themselves. As a result, lookups between tables for person ID and department ID don&apos;t need to be performed constantly, in order to find which person connects to which department. In other words, the relationships don&apos;t need to be inferred anymore.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/02bc51e581d8abd24d61d78e6e6a361d/graphjoin.jpeg&quot; alt=&quot;GraphJoin&quot;&gt;&lt;/p&gt;
&lt;p&gt;This ability to pre-materialize relationships into the database structure allows Neo4j to provide performance of several orders of magnitude above others, especially for join-heavy queries, allowing users to leverage a minutes to milliseconds advantage.&lt;/p&gt;
&lt;p&gt;Neo4j offers a declarative graph query language, which is built on the basic concepts and clauses of SQL. Additionally, there are many other functions to simplify the process of working with graph models. Here is a SQL query:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; name &lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; Person
&lt;span class=&quot;token keyword&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;JOIN&lt;/span&gt; Person_Department
  &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; Person&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Id &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Person_Department&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;PersonId
&lt;span class=&quot;token keyword&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;JOIN&lt;/span&gt; Department
  &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; Department&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Id &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; Person_Department&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DepartmentId
&lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; Department&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;IT Department&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And this is the corresponding Cypher query:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;MATCH&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;p:Person&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;:WORKS_AT&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;d:Dept&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; d&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;IT Department&quot;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;RETURN&lt;/span&gt; p&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice, the query simplifies the heavy-use of joins. Not only does Neo4j simplify this process, it also improves the performance of join-heavy queries compared to SQL. The relationship between nodes is not calculated at query time, but it actually is persisted to disk. In relational databases, adding another relationship involves intricate changes to the schema. However, this becomes much simpler in Neo4j. &lt;/p&gt;
&lt;p&gt;For a more detailed comparison between graph and relational databases, refer to the &lt;a href=&quot;https://neo4j.com/developer/graph-db-vs-rdbms/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Neo4j docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Features of Graph Databases&lt;/h2&gt;
&lt;p&gt;Similar to relational databases, most graph databases are &lt;strong&gt;ACID-compliant&lt;/strong&gt;. Meaning, only valid transactions are individually committed to the database, even in the event of a failure. If a transaction isn&apos;t marked as finished or successful, then it will be rolled back.&lt;/p&gt;
&lt;p&gt;Most graph database solutions aren&apos;t able to be distributed across multiple servers, favoring consistency and availability. A few graph databases, such as Infinite Graph, support node distribution across a cluster of servers, such as Infinite Graph. Whereas, Neo4j is implemented using a master-worker architecture that is fully ACID-compliant.&lt;/p&gt;
&lt;p&gt;As emphasized already, graph databases handle data with &lt;strong&gt;complex relationships&lt;/strong&gt; quite efficiently, since relationships are indexed themselves. Therefore, an RDBMS should be preferred if we&apos;re mainly interested in filtering individual entities. Graph databases should mainly be used if we&apos;re querting relationships.&lt;/p&gt;
&lt;p&gt;Graph databases are useful for reading relationships, rather than writing and rarely reading these relationships. Specifically, other databases should be preferred if we&apos;re looking to write data and don&apos;t expect to query our stored entities or relationships often.&lt;/p&gt;
&lt;p&gt;As opposed to relational databases, graph databases are fairly robust for data that is constantly changing. Specifically, graph databases can add relationships and properties, since graph databases can change traversing requirements without having to change its nodes or edges.&lt;/p&gt;
&lt;p&gt;In other words, relational databases may be a preferred choice if the columns of a table aren&apos;t expected to change. Otherwise, graph databases may be a preferred choice if the requirements are expected to morph over time.&lt;/p&gt;
&lt;p&gt;Graph databases are useful when searching throughout graph for a particular relationship. A graph database is optimized for traversing the graph for a relationship. If we&apos;re mainly interested in querying entities without a specific relationship in mind, a relational database may be better suited for our needs.&lt;/p&gt;
&lt;p&gt;As an example, a graph database is useful when we&apos;re wondering &lt;em&gt;what people Jennifer knows&lt;/em&gt;. On the other hand, a graph database may not be very useful when we&apos;re wondering &lt;em&gt;who Jennifer knows&lt;/em&gt;. A good query for a graph database is:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;MATCH&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;n:Person {name: &lt;span class=&quot;token string&quot;&gt;&apos;Jennifer&apos;&lt;/span&gt;}&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;r:KNOWS&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;p:Person&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;RETURN&lt;/span&gt; p&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since the map would need to be traversed entirely, a poor query for a graph database is:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;MATCH&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;n&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; n&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;name &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;Jennifer&apos;&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;RETURN&lt;/span&gt; n&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It should go without saying, but graph databases don&apos;t perform well when used as key-value stores. A standard lookup operation is best used for a key-value database or even a relational store. Graph databases are more useful for narrowed-down relationship lookups of a key.&lt;/p&gt;
&lt;p&gt;Graph databases are performant when reading and writing smaller nodes. Graphs are not suited for storing many properties on a single node. They are also not suited for storing large values within those properties. This is because the query can hop from entity to entity quickly. However, graph databases need extra processing to pull out details for each entity along a search path. For additional summaries about features within graph databases, refer to &lt;a href=&quot;https://medium.com/neo4j/how-do-you-know-if-a-graph-database-solves-the-problem-a7da10393f5&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Using Graph Databases&lt;/h2&gt;
&lt;p&gt;The table below outlines a few particular use cases for graph databases. In particular, the graph databases use Neo4j as a representative of the following use cases. Read more details about the Neo4j use cases in &lt;a href=&quot;https://neo4j.com/use-cases/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the docs&lt;/a&gt;. For more details about individual use cases, refer to the &lt;em&gt;NoSQL Distilled&lt;/em&gt; text.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use-Case&lt;/th&gt;
&lt;th&gt;Good or Bad?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Connected Data&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Social Network Data&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delivery Routing&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Location-Based Services&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recommendation Engine&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fraud Detection System&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Updating all or some entities&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complicated operations&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large data&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content:encoded></item><item><title><![CDATA[NoSQL Basics: Column-Family Databases]]></title><description><![CDATA[In a previous post about NoSQL databases, column-family stores were described at a fairly high-level. In this post, we'll dive into more low-level details, which includes features, behavior, and use…]]></description><link>https://dkharazi.github.io/blog/nosql-column</link><guid isPermaLink="false">https://dkharazi.github.io/blog/nosql-column</guid><pubDate>Thu, 27 Feb 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In a &lt;a href=&quot;/blog/nosql/&quot;&gt;previous post&lt;/a&gt; about NoSQL databases, column-family stores were described at a fairly high-level. In this post, we&apos;ll dive into more low-level details, which includes features, behavior, and use cases.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;NoSQL Distilled&lt;/em&gt; is a terrific resource for learning about both high-level and low-level details of NoSQL databases. This post is meant to summarize my experience with these databases, along with particular segments from the book. Again, refer to the book for a deeper dive of relational and NoSQL databases.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#defining-a-column-family-store&quot;&gt;Defining a Column-Family Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#introduction-to-the-cassandra-architecture&quot;&gt;Introduction to the Cassandra Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#features-of-column-family-databases&quot;&gt;Features of Column-Family Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-column-family-databases&quot;&gt;Using Column-Family Databases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Defining a Column-Family Store&lt;/h2&gt;
&lt;p&gt;Compared to key-value and document databases, column-family stores impose more limitations on the structure of an aggregate. Specifically, column-family databases organize their columns into column families. Each column must be assigned to a particular column-family. Then, each column can be accessed via a column-family. In particular, accessing a column-family will return each of the columns associated with that column family.&lt;/p&gt;
&lt;p&gt;At a high level point of view, a column-family database represents a map consisting of smaller maps, where the first map is a column family and the second map is a row. In the CQL API, column families are referred to as tables. To gain some additional intuition about the data model, let&apos;s look at how data is stored in Cassandra:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;sql&quot;&gt;&lt;pre class=&quot;language-sql&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;-- create keyspace (database)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; KEYSPACE hotel &lt;span class=&quot;token keyword&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;replication&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;
    {&lt;span class=&quot;token string&quot;&gt;&apos;class&apos;&lt;/span&gt;: &lt;span class=&quot;token string&quot;&gt;&apos;SimpleStrategy&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;token string&quot;&gt;&apos;replication_factor&apos;&lt;/span&gt;: &lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;}&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;-- create table&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; hotel&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;employees &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    id &lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;KEY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    name &lt;span class=&quot;token keyword&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    roles &lt;span class=&quot;token keyword&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    salary &lt;span class=&quot;token keyword&quot;&gt;smallint&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;-- insert row&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INTO&lt;/span&gt; hotel&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;employees
       JSON &lt;span class=&quot;token string&quot;&gt;&apos;{&quot;id&quot;: &quot;10F-S53&quot;,
              &quot;name&quot;: &quot;Susan&quot;,
              &quot;roles&quot;: {&quot;accountant&quot;, &quot;auditor&quot;}}&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;CQL is a &lt;strong&gt;typed language&lt;/strong&gt; used for querying column-family databases in Cassandra. Meaning, Cassandra isn&apos;t really schemaless anymore, since data types and primary keys are required. In many ways, CQL feels like SQL, but deliberately excludes certain functionalities that violates their data model. In particular, CQL doesn&apos;t support group by operations, join operations, and others.&lt;/p&gt;
&lt;p&gt;On the flip side, CQL supports certain functionalities that aren&apos;t supported in relational databases, since they violate the relational data model and comply with the column-family data model. For example, CQL supports the use of tuples and sets as data types defined within a schema.&lt;/p&gt;
&lt;p&gt;To learn more about the details behind data manipulation with CQL, refer to &lt;a href=&quot;https://cassandra.apache.org/doc/latest/cql/dml.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Introduction to the Cassandra Architecture&lt;/h2&gt;
&lt;p&gt;Whereas Apache HBase was created based on &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Google&apos;s BigTable&lt;/a&gt;, Apache Cassandra relies on a number of techniques from &lt;a href=&quot;https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Amazon&apos;s Dynamo&lt;/a&gt;. For a brief comparison between HBase and BigTable, refer to &lt;a href=&quot;https://stackoverflow.com/a/24860743/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;. Each node in the Dynamo system has three main components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Request coordination for each partitioned dataset&lt;/li&gt;
&lt;li&gt;Ring membership and failure detection&lt;/li&gt;
&lt;li&gt;Local storage engine&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cassandra uses most of these features, but uses a storage engined based on LSM instead. To go one level deeper, Cassandra takes the following from Dynamo:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dataset partitioning using consistent hashing&lt;/li&gt;
&lt;li&gt;Multi-master replication&lt;/li&gt;
&lt;li&gt;Tunable levels of replication and consistency&lt;/li&gt;
&lt;li&gt;Distributed cluster management&lt;/li&gt;
&lt;li&gt;Distributed failure detection&lt;/li&gt;
&lt;li&gt;Incremental horizontal scaling on commodity hardware&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cassandra partitions (creates shards) data across nodes using consistent hashing. In naive data hashing, keys are allocatd to buckets by hashing the key modulo the number of buckets. Cassandra takes a different approach by first hashing each node to one or more values on a continuous hash ring. These hash values representing each node are referred to as &lt;em&gt;tokens&lt;/em&gt; in Cassandra. Once tokens are created, Cassandra then is able to map data points to tokens on that same hash ring. Specifically, Cassandra will receive rows, hash the primary keys of each row, and map those hash values to the hash ring. Lastly, Cassandra will map those data points to nodes by rounding their mapped hash values to the nearest token in a clockwise motion on the ring.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/6f1a2565e67c58fa0e75d7394ab3acd1/cassandrahash.svg&quot; alt=&quot;CassandraHashRing&quot;&gt;&lt;/p&gt;
&lt;p&gt;The use of consistent hashing for partitioning makes Cassandra a &lt;strong&gt;scalable&lt;/strong&gt; and &lt;strong&gt;available&lt;/strong&gt; column-family store. There are other features included in the hashing algorithm to improve potential issues with consistency, such as virtual nodes, quorums, and &lt;a href=&quot;https://cassandra.apache.org/doc/latest/operating/compaction/index.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;compaction&lt;/a&gt;. For more details about the architecture of Cassandra and its more specific hashing features, refer to &lt;a href=&quot;https://cassandra.apache.org/doc/latest/architecture/dynamo.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Features of Column-Family Databases&lt;/h2&gt;
&lt;p&gt;Unlike some NoSQL databases and most relational databases, Cassandra does not support transactions. Similar to key-item and document stores, wrties are atomic at the row level in Cassandra. Meaning, any transformation of multiple columns of a single row is treated as a single write operation. For more information about atomicity in Cassandra, refer to &lt;a href=&quot;https://docs.datastax.com/en/cassandra-oss/2.1/cassandra/dml/dml_atomicity_c.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;the docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Returning to the CAP theorem, Cassandra focuses on high availability and scalability. The consistency can be increased with the use of quorums. Each quorum has a replication factor, which can be adjusted to tune the level of availability within our cluster.&lt;/p&gt;
&lt;p&gt;Whereas both key-item and document databases are designed using a master-worker architecture, Cassandra uses a peer-to-peer architecture. This choice reaffirms the idea of Cassandra favoring availability and scalability, rather than consistency.&lt;/p&gt;
&lt;p&gt;Cassandra effortlessly scales with the addition of nodes. Since Cassandra doesn&apos;t promote a master node, it really doesn&apos;t need to worry about failures or spend time promoting a new master node.&lt;/p&gt;
&lt;p&gt;Lastly, Cassandra supports a fairly robust querying language called CQL. Since Cassandra isn&apos;t a relational databases, CQL still has its own limitations compared to SQL. Thus, column families will need to be designed effectively, so they are optimized for reading data.&lt;/p&gt;
&lt;h2&gt;Using Column-Family Databases&lt;/h2&gt;
&lt;p&gt;The table below outlines a few particular use cases for column-family databases. In particular, the column-family databases use Cassandra as a representative of the following use cases. Read more details about the Cassandra use cases in &lt;a href=&quot;https://stackoverflow.com/a/30964048/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;. For more details about individual use cases, refer to the &lt;em&gt;NoSQL Distilled&lt;/em&gt; text. For a more straightforward comparison between MongoDB and Cassandra, read &lt;a href=&quot;https://phoenixnap.com/kb/cassandra-vs-mongodb&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use-Case&lt;/th&gt;
&lt;th&gt;Good or Bad?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Event logging&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blogging sites&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content management systems&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real time analytics&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page counters&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Systems requiring ACID&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Early prototypes&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content:encoded></item><item><title><![CDATA[NoSQL Basics: Document Databases]]></title><description><![CDATA[In a previous post about NoSQL databases, document stores were introduced at a fairly high-level. In this post, we'll dive into more low-level details, which includes features, behavior, and use cases…]]></description><link>https://dkharazi.github.io/blog/nosql-document</link><guid isPermaLink="false">https://dkharazi.github.io/blog/nosql-document</guid><pubDate>Sat, 22 Feb 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In a &lt;a href=&quot;/blog/nosql/&quot;&gt;previous post&lt;/a&gt; about NoSQL databases, document stores were introduced at a fairly high-level. In this post, we&apos;ll dive into more low-level details, which includes features, behavior, and use cases.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;NoSQL Distilled&lt;/em&gt; is a terrific resource for learning about both high-level and low-level details of NoSQL databases. This post is meant to summarize my experience with these databases, along with particular segments from the book. Again, refer to the book for a deeper dive of relational and NoSQL databases.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#defining-a-document-store&quot;&gt;Defining a Document Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#introduction-to-mongodb&quot;&gt;Introduction to MongoDB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#features-of-document-databases&quot;&gt;Features of Document Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-document-databases&quot;&gt;Using Document Databases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Defining a Document Store&lt;/h2&gt;
&lt;p&gt;A document database is a strongly aggregate-oriented database. Meaning, it consists of many aggregates. In particular, document databases generally enforce fewer restrictions and offer increased flexibility, compared to both relational databases and other NoSQL databases.&lt;/p&gt;
&lt;p&gt;Document databases generally enforce similar constraints as key-value databases. However, they can sometimes enforce a few additional limitations, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Size limits&lt;/li&gt;
&lt;li&gt;What we can place in them&lt;/li&gt;
&lt;li&gt;Data types&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, the fields in document databases can be queried, which contrasts with key-value databases. In other words, we can query specific segments of an aggregate in document databases, whereas we can only query the entire aggregate (belonging to a key) in key-value databases. Lastly, indices can be created based on the contents of an aggregate in a document database.&lt;/p&gt;
&lt;p&gt;Obviously, a document database stores documents, which are hierarchical tree data structures. These data structures can be XML, JSON, etc., and they can consist of scalar values, collections, and maps.&lt;/p&gt;
&lt;p&gt;At a high level point of view, document databases can be thought of as &lt;strong&gt;queryable&lt;/strong&gt; key-value databases. In key-value databases, aggregates are only accessible by means of a key. As a result, values can&apos;t be queried and requires a lookup for the entire aggregate. Contrastingly, document databases allow their values to be queried.&lt;/p&gt;
&lt;p&gt;Each document in a document database &lt;strong&gt;can include different attribute names&lt;/strong&gt;. In other words, the schema for each document can be different across documents. Whereas, each row in a schema must follow the same schema for each table.&lt;/p&gt;
&lt;h2&gt;Introduction to MongoDB&lt;/h2&gt;
&lt;p&gt;MongoDB is a document database that stores data in JSON-like documents. Again, document databases have many similarities with key-value databases. In particular, key-value databases store two objects: a key as a string and a value as a string, list, etc. Whereas, document databases store a single object: a JSON-like document.&lt;/p&gt;
&lt;p&gt;Not only does MongoDB allow developers to store their in-memory data structures in a straightforward way, but it also provides a comprehensive suite of tools useful for analytics. For one, MongoDB automatically creates charts of any MongoDB data stored inside the database. It also leverages external BI tools via connectors, such as Tableau, Qlik, and others.&lt;/p&gt;
&lt;p&gt;MongoDB also provides features, such as data searches and visualizations, via an intuitive GUI. By doing this, developers are able to manipulate data with visual editing tools.&lt;/p&gt;
&lt;h2&gt;Features of Document Databases&lt;/h2&gt;
&lt;p&gt;In MongoDB world, tables are referred to as collections. Thus, a document is stored in a collection within a database.  In an earlier post, the benefits of document stores were briefly mentioned, which include consistency and flexibility benefits. Specifically, each document doesn&apos;t care what data is stored as their values. The values can be JSON, XML, etc. The structure of the &lt;strong&gt;data model is flexible&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://en.wikipedia.org/wiki/CAP_theorem&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;CAP Theorem&lt;/a&gt; implies relational databases ensure strong consistency, which means any reads sent to a relational database will always return the most recently written value. Strong consistency is simpler to achieve with relational databases, since it typically consists of a single node.&lt;/p&gt;
&lt;p&gt;On the other hand, document databases ensure &lt;strong&gt;eventual consistency&lt;/strong&gt;. Recall, if two reads happen after a write in a distributed system, they may access data from two different nodes. Eventual consistency implies those reads will receive the same values shortly after the write. However, eventual consistency also implies those reads may not receive the same values if the reads happen too soon after the write. In this scenario, replica nodes need more time to receive the most-recent value written to the master node.&lt;/p&gt;
&lt;p&gt;Document databases support workarounds to increase availability. One approach for improving availability involves the use of a &lt;a href=&quot;https://docs.mongodb.com/manual/replication/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;replica sets&lt;/a&gt;. A replica set is a group of processes that maintain the same dataset. Replica sets replicate data to provide redundancy and high availability.&lt;/p&gt;
&lt;p&gt;By using replica sets, consistency can also be tuned in a way that waits for writes to be replicated to a certain number of replicas. Then, every write will ensure that number of replicas are written, before any value is returned successfully. Similarly to key-value databases, increasing the consistency will slow down writes, since more nodes need to be propagated.&lt;/p&gt;
&lt;p&gt;Document databases, such as MongoDB, is implemented using a master-worker architecture, which means all requests are sent to an individual master node. Then, the data from that master node is replicated to its worker nodes. If the master node ever experiences downtime, the worker nodes will vote for a new master node among themselves.&lt;/p&gt;
&lt;p&gt;Whereas key-value stores can only be queried via their key, document databases support limited querying capabilities based on attribute values in a document. As an example, MongoDB provides very simple querying oprations for filtering and ordering.&lt;/p&gt;
&lt;p&gt;Lastly, many document databases are &lt;strong&gt;scalable using sharding&lt;/strong&gt;. Sharding introduces its own of issues involving problems with availability and complexity. However, these tradeoffs can be tuned using a parameter representing the number of node failures.&lt;/p&gt;
&lt;h2&gt;Using Document Databases&lt;/h2&gt;
&lt;p&gt;The table below outlines a few particular use cases for document databases. In particular, these document databases use MongoDB as a representative of the following use cases. Read more details about the MongoDB use cases in the &lt;a href=&quot;https://www.mongodb.com/use-cases&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;MongoDB docs&lt;/a&gt;. For more details about individual use cases, refer to the &lt;em&gt;NoSQL Distilled&lt;/em&gt; text.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use-Case&lt;/th&gt;
&lt;th&gt;Good or Bad?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Event logging&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blogging sites&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content management systems&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analytics&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Applications needing a flexible schema&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multioperation requests&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Many differently structured documents&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content:encoded></item><item><title><![CDATA[NoSQL Basics: Key-Value Databases]]></title><description><![CDATA[In a previous post about NoSQL databases, key-value stores were described at a fairly high-level. In this post, we'll dive into more low-level details, which includes features, behavior, and use cases…]]></description><link>https://dkharazi.github.io/blog/nosql-keyvalue</link><guid isPermaLink="false">https://dkharazi.github.io/blog/nosql-keyvalue</guid><pubDate>Sun, 16 Feb 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In a &lt;a href=&quot;/blog/nosql/&quot;&gt;previous post&lt;/a&gt; about NoSQL databases, key-value stores were described at a fairly high-level. In this post, we&apos;ll dive into more low-level details, which includes features, behavior, and use cases.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;NoSQL Distilled&lt;/em&gt; is a terrific resource for learning about both high-level and low-level details of NoSQL databases. This post is meant to summarize my experience with these databases, along with particular segments from the book. Again, refer to the book for a deeper dive of relational and NoSQL databases.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#defining-a-key-value-store&quot;&gt;Defining a Key-Value Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#introduction-to-redis&quot;&gt;Introduction to Redis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#features-of-key-value-databases&quot;&gt;Features of Key-Value Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-key-value-databases&quot;&gt;Using Key-Value Databases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Defining a Key-Value Store&lt;/h2&gt;
&lt;p&gt;A key-value databas is a strongly aggregate-oriented database. Meaning, it consists of many aggregates. In particular, key-value databases generally enforce fewer restrictions and offer increased flexibility, compared to both relational databases and other NoSQL databases. In certain situations, there enforce minor constraints, such as size limits. However, they generally offer more freedom comparatively.&lt;/p&gt;
&lt;p&gt;In key-value databases, aggregates are only accessible by means of a key. As a result, values can&apos;t be queried and requires a lookup for the entire aggregate. Without the use of a strict schema, any data can be stored under a key in a key-value model.&lt;/p&gt;
&lt;p&gt;Essentially, key-value databases are just hash tables. They are useful for storing data that interacts with an API. Indicating, key-value stores are useful for clients who only need to do the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add a key-value combination&lt;/li&gt;
&lt;li&gt;Get the value of a key&lt;/li&gt;
&lt;li&gt;Store a value for a key&lt;/li&gt;
&lt;li&gt;Delete a key&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In key-value databases, values are stored as blobs, which implies these values roughly can be represented as data type. Accessing values associated with a key is both scalable and performant, since there is only one way to access the aggregate. Moreover, understanding what value is stored falls on the shoulders of the application developer.&lt;/p&gt;
&lt;h2&gt;Introduction to Redis&lt;/h2&gt;
&lt;p&gt;Redis doesn&apos;t only accept primitive data types for its values. It supports data structures such as strings, hashes, lists, sets, sorted sets, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams.&lt;/p&gt;
&lt;p&gt;Redis supports operations that include appending values to a string, pushing elements to lists, unions, etc. These features make Redis a popular choice when deciding between the many key-item databases. Redis achieves its performance benefits by working with in-memory &lt;em&gt;datasets&lt;/em&gt;. Datasets can be persisted to disk by either occasionally saving them to disk or logging any executed commands. &lt;/p&gt;
&lt;p&gt;For many of the reasons listed above, Redis is mostly used for storing in-memory data structures, which happens often in caching and messaging system. Its features make Redis a popular choice when deciding between the many key-item databases. Its flexibility and atomic operations separate Redis from the others. For a more detailed explanation of Redis, refer to &lt;a href=&quot;https://redis.io/topics/introduction&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;their site&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Features of Key-Value Databases&lt;/h2&gt;
&lt;p&gt;In an earlier post, the benefits of key-value stores were briefly mentioned, which include consistency and flexibility benefits. Specifically, each key doesn&apos;t care what data is stored as their values. The values can be JSON, XML, text, etc. The structure of the &lt;strong&gt;data model is flexible&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://en.wikipedia.org/wiki/CAP_theorem&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;CAP Theorem&lt;/a&gt; implies relational databases ensure strong consistency, which means any reads sent to a relational database will always return the most recently written value. Strong consistency is simpler to achieve with relational databases, since it typically consists of a single node.&lt;/p&gt;
&lt;p&gt;On the other hand, key-value databases ensure &lt;strong&gt;eventual consistency&lt;/strong&gt;. Recall, if two reads happen after a write in a distributed system, they may access data from two different nodes. Eventual consistency implies those reads will receive the same values shortly after the write. However, eventual consistency also implies those reads may not receive the same values if the reads happen too soon after the write. In this scenario, replica nodes need more time to receive the most-recent value written to the master node.&lt;/p&gt;
&lt;p&gt;Key-values databases support workarounds to increase consistency, but some of these solutions decrease write performance. One approach for improving on consistency and write tolerance involves the use of a &lt;a href=&quot;https://redis.io/topics/sentinel&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;quorum&lt;/a&gt;. Quorums involve setting a replication factor, which tolerates nodes being down for write operations.&lt;/p&gt;
&lt;p&gt;As stated previously, key-value stores can &lt;strong&gt;only be queried via their key&lt;/strong&gt;. This constraint is one of the reasons for their excellent performance, but may be problematic if we don&apos;t know the key. Therefore, there is an important design consideration that needs to be taken when creating the keys.&lt;/p&gt;
&lt;p&gt;Lastly, many key-value databases are &lt;strong&gt;scalable using sharding&lt;/strong&gt;. Sharding introduces its own of issues involving problems with availability and complexity. However, these tradeoffs can be tuned using a parameter representing the number of node failures.&lt;/p&gt;
&lt;h2&gt;Using Key-Value Databases&lt;/h2&gt;
&lt;p&gt;The table below outlines a few particular use cases for key-value databases. In particular, these key-value databases use Redis and Riak as representatives of the use cases. However, these two key-value stores have their own separate use cases from each other. Read more details about the Redis use cases in the &lt;a href=&quot;https://redis.io/documentation&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Redis docs&lt;/a&gt;, and Riak use cases in the &lt;a href=&quot;https://docs.riak.com/index.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Riak docs&lt;/a&gt;. For more details about individual use cases, refer to the &lt;em&gt;NoSQL Distilled&lt;/em&gt; text.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use-Case&lt;/th&gt;
&lt;th&gt;Good or Bad?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Session caching&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User profiles&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configurations&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shopping carts&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Relationships&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multioperation requests&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Querying by values&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operations applied to many keys&lt;/td&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</content:encoded></item><item><title><![CDATA[Relational and Non-Relational Databases]]></title><description><![CDATA[Ensuring a stable form of data storage is an important decision for any business. The data in an organization can last much longer than many of its applications. Unfortunately, there isn't a single…]]></description><link>https://dkharazi.github.io/blog/nosql</link><guid isPermaLink="false">https://dkharazi.github.io/blog/nosql</guid><pubDate>Sun, 09 Feb 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Ensuring a stable form of data storage is an important decision for any business. The data in an organization can last much longer than many of its applications. Unfortunately, there isn&apos;t a single database that solves the needs for every business. Each database has its own set of properties, and an organization can utilize those properties to find the database (or databases) that best solves their distinctive problem.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;NoSQL Distilled&lt;/em&gt; is a terrific resource for learning about both high-level and low-level details of NoSQL databases. This post is meant to summarize my experience with these databases, along with particular segments from the book. Again, refer to the book for a deeper dive of relational and NoSQL databases.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#evaluating-databases-with-cap-theorem&quot;&gt;Evaluating Databases with CAP Theorem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#distribution-and-replication&quot;&gt;Distribution and Replication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#power-of-relational-databases&quot;&gt;Power of Relational Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#disadvantages-of-relational-databases&quot;&gt;Disadvantages of Relational Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-nosql&quot;&gt;What is NoSQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#purpose-of-nosql-databases&quot;&gt;Purpose of NoSQL Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#properties-of-nosql-databases&quot;&gt;Properties of NoSQL Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#introducing-data-models-in-databases&quot;&gt;Introducing Data Models in Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#types-of-aggregate-data-models&quot;&gt;Types of Aggregate Data Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#use-cases-of-database-types&quot;&gt;Use Cases of Database Types&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Evaluating Databases with CAP Theorem&lt;/h2&gt;
&lt;p&gt;At a high level, there are a few categories that a database can fall under. First, there is the relational database, which is one we&apos;re all most likely familiar with. Relational databases are organized into tables with columns and rows. They ensure transactions are &lt;em&gt;ACID&lt;/em&gt; and enforce many restrictions using a schema. ACID is a somewhat contrived acronym that will be explained in greater detail later.&lt;/p&gt;
&lt;p&gt;Additionally, there are the non-relational databases, which offer increased flexibility by embracing schemaless data. Unlike relational databass, there are many different flavors of non-relational databases, such as key-value, document, column-family, and graph databases. As stated previously, each of these databases are schemaless, and the graph database is the only one ensuring ACID transactions.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/8778ffc6901851624944df6dd012b385/sqlnosql.svg&quot; alt=&quot;SqlvsNoSQLDatabases&quot;&gt;&lt;/p&gt;
&lt;p&gt;After introducing a few modern-day database options, we now should think about how to evaluate databases that are best suited for our organization&apos;s needs. To do this, we&apos;ll consider the CAP thorem, which assesses the tradeoffs between three metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;C&lt;/strong&gt;onsistency&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A&lt;/strong&gt;vailability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;P&lt;/strong&gt;artition-tolerance.  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At a high level, the CAP theorem states we can&apos;t choose two of the above metrics without sacrificing the third metric. A partition refers to an interruption in communication within a distributed system. Thus, partition indicates if any node goes down, then the cluster will still be up. Availability ensures a request to any node that is up will return a valid response. Consistency implies a request to any node that is up will return the same response. For more information about the CAP theorem, refer to &lt;a href=&quot;https://www.ibm.com/cloud/learn/cap-theorem&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this IBM article&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By observing the image below, we&apos;ll notice relational databases maintain both availablity and consistency. On the other hand, NoSQL databases can maintain either consistency and partition-tolerance, or they can maintain availability and partition-tolerance. If we&apos;re certain that our organization only ever will need one server, then we won&apos;t need to worry about partition-tolerance, and we can prioritize both availability and consistency. On the other hand, if we&apos;re certain our database will require more than a single server, then we may need to choose between consistency and availability.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/e5f161a2604ed8a7a49814f933b75fe1/captheorem.svg&quot; alt=&quot;CAPTheorem&quot;&gt;&lt;/p&gt;
&lt;p&gt;The CAP theorem has become controvertible over the years. In practice, these tradeoffs have become more loose for some of these databases. Using Cassandra as an example, the inclusion of a quorum almost allows the amount of consistency to be configurable. Over time, the lines separating the databases from each other are becoming blurred. Many of these databases can be made to work in any situation, but we should choose a database that is built for our requirements. A more detailed explanation about other considerations can be found &lt;a href=&quot;https://www.youtube.com/watch?v=v5e_PasMdXc&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Distribution and Replication&lt;/h2&gt;
&lt;p&gt;In NoSQL databases, there are two basic styles of distributing data. First, there is sharding, which has been mentioned in a &lt;a href=&quot;/blog/shard/&quot;&gt;previous post&lt;/a&gt;. Sharding distributes different data across multiple servers, so each server becomes the single source for its subset of data. In other words, sharding involves splitting up a dataset into sections, then storing each section on its own server.&lt;/p&gt;
&lt;p&gt;Replication is the second form of distributing data. Replication copies data across multiple servers, so there are multiple copies of the same dataset stored in more than one place. In other words, replication involves duplicating a dataset on another server. A system may use neither of these techniques, one of these techniques, or both of these techniques.&lt;/p&gt;
&lt;p&gt;To go one level deeper, there are two forms of replication. First, a master-worker form of replication promotes one node as the authoritative copy. In this architecture, the master typically handles writes, while the workers handle reads. Consequently, this choice causes master-worker architectures to be eventually consistent, since it takes time for the written values to be updated on the workers for reading.&lt;/p&gt;
&lt;p&gt;Secondly, a peer-to-peer form of replication doesn&apos;t use a master. Instead, it allows writes to any node, so each node coordinates with each other to synchornize their copies of data. In general, master-worker replication reduces the chance of update conflicts, but peer-to-peer replication avoids loading every write onto a single node, becoming a single point of failure.&lt;/p&gt;
&lt;p&gt;Most systems will need to choose one form of replication over the other form. Many distributed systems promote a combination of both sharding and replication. Also, most of the terminology mentioned above is defined in the NoSQL Distilled textbook, so please refer to it for more details.&lt;/p&gt;
&lt;h2&gt;Power of Relational Databases&lt;/h2&gt;
&lt;p&gt;Before introducing a few significant properties and use cases for relational databases, we should briefly review why we&apos;re intersted in writing to a database in the first place. Most architctures have two general ways of storing data: writing to memory or writing to disk. Keep in mind, data remains persistent by writing to disk.&lt;/p&gt;
&lt;p&gt;In other words, any data stored away in memory is lost if we lose power or observe any hardware issues. For these reasons, data needs to be written to disk in order to ensure persistence. Any data written to disk is commonly accessible to us via files in a file system on our operating system, or via a database.&lt;/p&gt;
&lt;p&gt;Since both approaches are ways to access data on disk, one of the first questions to be asked is &lt;em&gt;can we just keep our data in the file system, rather than writing it to a database?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is a valid question. However, an organization typically will require the use of a database, which relates to the increased flexibility of a database in storing large formats, compared to file systems.&lt;/p&gt;
&lt;p&gt;In relational databases, a request sent to the database is known as a transaction. Relational databases control users accessing its data via transactions. With the use of transactions, a relational database can undo a change if that change causes an error, without creating issues for other transactions being performed simultaneously. In particular, transaction are run in an isolated environment, allowing for other transactions to execute concurrently.&lt;/p&gt;
&lt;p&gt;Relational databases support manipulation of multiple rows in a single transaction. For relational databases, these transactions are ACID transactions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A&lt;/strong&gt;tomic&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;C&lt;/strong&gt;onsistent&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;I&lt;/strong&gt;solated&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;D&lt;/strong&gt;urable&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Atomicity implies an entire transaction will fail even if only one part of a transaction fails, and there won&apos;t be any changes committed to the the database afterwards. Consistency implies any successful transaction will take the database from one valid state to another valid state. Isolation implies any concurrent operations would have the same results if they were executed serially. Durability implies any transaction must remain committed even in the event of a crash or power loss.&lt;/p&gt;
&lt;p&gt;The major takeaway from the &lt;strong&gt;ACID&lt;/strong&gt; acronym is atomicity. Meaning, updates to multiple rows from different tables would be updated as a single operation, which implies an operation either succeeds or fails in its entirety. By enforcing this rule, concurrent operations are isolated from each other, so they&apos;ll never create a partial update.&lt;/p&gt;
&lt;p&gt;Across many different types of relational databases, they maintain a fairly &lt;strong&gt;standard model&lt;/strong&gt;. In general, transactions operate fairly uniformly, regardless of which relational database is selected. In addition, the SQL dialects for most relational databases are fairly interchangeable. For example, Microsoft SQL Server and MySQL both include fairly similar querying languages.&lt;/p&gt;
&lt;h2&gt;Disadvantages of Relational Databases&lt;/h2&gt;
&lt;p&gt;Technically, a relational database consists of tuples. A tuple is synonymous with rows, but are organized as name-value pairs. A relation is referred to as a set of tuples. All SQL operations in relational databases consume and return relations. These relations represent output of mathematically elegant relational algebra.&lt;/p&gt;
&lt;p&gt;This restriction creates &lt;em&gt;impedance mismatch&lt;/em&gt;. Meaning, the choice of using algebraic operations introduces &lt;strong&gt;limitations in flexibility&lt;/strong&gt;, such as rows unable to contain certain data structures (e.g. lists and sets). To retrieve a collection of values in a relational database, we need to perform joins and apply complicated operations to tuples. In other words, algebraic operations provide simplicity and elegance, but also creates limitations.&lt;/p&gt;
&lt;p&gt;Most importantly, relational databases can produce &lt;strong&gt;limitations in usable resources&lt;/strong&gt;. At a high level, increasing the growth of data either requires scaling upwards or outwards. Needless to say, scaling upwards has an actual cut-off point, since a machine can only grow so large.&lt;/p&gt;
&lt;p&gt;On the other hand, scaling outwards is possible, depending on the type of database. For relational databases, scaling outwards can become difficult in the long run, since relational databases aren&apos;t built for scaling outwards. Specifically, they aren&apos;t built to run on clusters. Methods that feel like workarounds, like sharding, can be performed, but these can become very difficult to manage in the long run.&lt;/p&gt;
&lt;h2&gt;What is NoSQL&lt;/h2&gt;
&lt;p&gt;The notion of NoSQL databases has been around since the 90s. Since then, databases have come a long way and almost seem unrecognizable. Thus, NoSQL databases were more easily distinguishable when they were first introduced, but the line separating SQL and NoSQL databases has blurred.&lt;/p&gt;
&lt;p&gt;Today, the term &lt;em&gt;NoSQL&lt;/em&gt; has mostly become a buzzword, since it is harder to classify since its origin. Obviously, NoSQL databases are mostly driven by the &lt;strong&gt;absence of SQL&lt;/strong&gt;. To illustrate my previous point regarding a more blurred line, a NoSQL database like Cassandra has CQL. Although CQL is far from supporting the flexibility of standard SQL, a basic querying language still is offered in Cassandra nonetheless.&lt;/p&gt;
&lt;p&gt;Even today, most NoSQL databases are &lt;strong&gt;schemaless&lt;/strong&gt;. Meaning, they&apos;re given the flexibility and freedom to specify data values without types in most cases. As stated earlier, NoSQL databases are mostly driven by the need to run on clusters. On the other hand, relational databases favor consistency and availability, rather than partition-tolerance.&lt;/p&gt;
&lt;h2&gt;Purpose of NoSQL Databases&lt;/h2&gt;
&lt;p&gt;As hinted at previously, the clearest benefits of NoSQL databases are scalability and productivity related to application development. In most cases, developers use in-memory data structures that aren&apos;t naturally designed as a relational data model. If these developers want to store their data in a relational database, they&apos;ll need to map their in-memory data structures to a relational data model.&lt;/p&gt;
&lt;p&gt;Too much effort on application development can be spent during this step. This mapping process can be eliminated entirely by introducing a NoSQL database offering a more comparable data model. Making this replacement &lt;strong&gt;simplifies any interaction between the application and database&lt;/strong&gt;, which results in less code to write, debug, and change over time. In summary, developers can more easily store their in-memory data structures in NoSQL databases, since many of them don&apos;t enforce as strict layouts.&lt;/p&gt;
&lt;p&gt;In comparison to relational database, NoSQL databases are &lt;strong&gt;built for scaling large data&lt;/strong&gt;. For example, the architecture of Cassandra was designed around concepts such as consistent hashing, partitioning, and replication. Furthermore, quickly capturing increased amounts of data can be more expensive with relational databases, since relational databases are designed to run on a single machine. Indicating, the additional hardware for being able to hold this data becomes quite expensive.&lt;/p&gt;
&lt;p&gt;On the other hand, a more economical alternative involves storing and computing large amounts of data on large clusters of smaller and cheaper machines. Since the majority of NoSQL databases explicitly are designed to run on clusters, they become the preferred option.&lt;/p&gt;
&lt;h2&gt;Properties of NoSQL Databases&lt;/h2&gt;
&lt;p&gt;Each NoSQL database has its own unique set of properties. However, they generally share a few high-level properties with each other. As mentioned earlier, NoSQL databases &lt;strong&gt;don&apos;t support ACID&lt;/strong&gt; transactions. The exception to this is a graph database. To use another example from earlier, NoSQL databases introduce flexibility from being &lt;strong&gt;schemaless&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;To go into greater detail, schemas are defined in the inital stages of a relational database. Whereas, they are not defined in NoSQL databases. By doing this, NoSQL sacrifices data integrity for increased flexibility and speed of development. Schemaless databases are generally useful when we&apos;re not entirely certain about the structure of the data being stored, which happens often in the early stages of a project.&lt;/p&gt;
&lt;p&gt;Schemaless databases also are generally useful when we&apos;re dealing with nonuniform data. Meaning, certain rows may be associated with their own set of columns. Potentially, this can happen when storing a sparse matrix. Lastly, schemaless databases are generally useful when there isn&apos;t enough time (or a high enough priority) to think about the exact structure of data. Specifically, this may occur when creating adhoc reports.&lt;/p&gt;
&lt;p&gt;Without using a strict schema, any data can be stored under a key in a key-value model. A document model essentially emulates this approach, since it doesn&apos;t enforce any additional (major) restrictions on the document. Column-families also ensures any data can be stored in its model, but forces data values to be stored under a column-family. Alternatively, graph databases store any data as nodes and edges with properties.&lt;/p&gt;
&lt;p&gt;There is also a &lt;strong&gt;greater range of relationships&lt;/strong&gt; offered by NoSQL databases, compared to relational databases. Most of these relationships are better suited for application developers.&lt;/p&gt;
&lt;p&gt;When choosing one NoSQL database over the other, a developer needs to recognize the distinctive set of requirements for his or her application. Some of these requirements will align with a key-value database, and others will align with a graph database. Generally, the type of application can guide the developer to a NoSQL database. For example, recommendation systems are better suited for graph databases, whereas session caching is better suited for key-value databases.&lt;/p&gt;
&lt;p&gt;Lastly, most NoSQL database offer some form of &lt;strong&gt;materialized views&lt;/strong&gt;, which are supported in relational databases as &lt;em&gt;virtual tables&lt;/em&gt;. They aren&apos;t handled the same way as views in relational databases, but they essentially achieve the same goal. In NoSQL databases, materialized views refer to precomputed and cached queries. These queries are typically assembled using a Map-Reduce or Spark job, since many of these NoSQL databases don&apos;t support a querying langauge.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/2cf6ba418904c3ee19704f6d40f215d7/nosqlproperties.svg&quot; alt=&quot;NoSQLProperties&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Introducing Data Models in Databases&lt;/h2&gt;
&lt;p&gt;Regardless of whether a database is relational or non-relational, each database supports a &lt;em&gt;data model&lt;/em&gt;. A data model refers to the low-level structure of data being stored in a database. The structure of a data model impacts how we interact with the data in a database.&lt;/p&gt;
&lt;p&gt;A relational databases is known to use a relational data model. As hinted at earlier, a relational data model consists of relations and tuples. Again, relations represent tables, and tuples represent rows intuitively. Indicating, a relational model stores its data as tuples.&lt;/p&gt;
&lt;p&gt;In comparison to NoSQL data models, tuples are rather inflexible data structures, since one tuple can&apos;t be nested within another tuple. To retrieve nested records, tuples expect algebraic operation to be performed.&lt;/p&gt;
&lt;p&gt;At a high level, a relational model doesn&apos;t allow lists of values to be stored in a field, since tuples can&apos;t be nested. This restriction is the reason for the relational model&apos;s inflexibility. For example, application developers have a difficult time storing their in-memory data structures as a relational model, since they need to go through extra effort to convert their data structures.&lt;/p&gt;
&lt;p&gt;Alternatively, &lt;em&gt;aggregates&lt;/em&gt; support nesting, meaning they can store lists of values. In the book NoSQL Distilled, the term aggregate is used to refer to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A collection of related objects that we wish to treat as a unit. In particular, it is a unit for data manipulation and management of consistency.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Aggregate-oriented databases have their obvious consequences. In general, aggregate-oriented databases don’t have ACID transactions across multiple aggregates. Instead, each aggregate is manipulated individually. Meaning, atomic manipulation of multiple aggregates must be handled in the actual application code.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/e7160c766ea62c6e1d29ee1e394c14aa/relationaggregate.svg&quot; alt=&quot;RelationAggregate&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Types of Aggregate Data Models&lt;/h2&gt;
&lt;p&gt;After introducing the overlapping properties of NoSQL databases, we&apos;ll now explore the specifics of certain types of NoSQL databases. Various forms of NoSQL databases were briefly mentioned in the earlier segments of this post, which include key-value, document, column-family, and graph databases.&lt;/p&gt;
&lt;p&gt;A key-value and document are strongly aggregate-oriented databases, meaning they consist of many aggregates. In particular, key-value databases generally enforce fewer restrictions and offer increased flexibility, compared to both relational databases and other NoSQL databases. In certain situations, there are very minor constraints, such as size limits. However, they generally offer more freedom comparatively.&lt;/p&gt;
&lt;p&gt;In key-value databases, aggregates are only accessible by means of a key. As a result, values can&apos;t be queried and requires a lookup for the entire aggregate.&lt;/p&gt;
&lt;p&gt;Document databases generally enforce similar constraints as key-value databases. However, they can sometimes enforce a few additional limitations, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Size limits&lt;/li&gt;
&lt;li&gt;What we can place in them&lt;/li&gt;
&lt;li&gt;Data types&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Document databases remain schemaless. Additionally, the fields in document databases can be queried, which contrasts with key-value databases. In other words, we can query specific segments of an aggregate in document databases, whereas we can only query the entire aggregate (belonging to a key) in key-value databases. Lastly, indices can be created based on the contents of an aggregate in a document database.&lt;/p&gt;
&lt;p&gt;As mentioned previously, there are workarounds that blur the line between document and key-value databases. For example, certain key-value databases, such as Redis, can hack together something that feels like indexing by separating aggregates into lists. However, we generally expect to access aggregates using a key in a key-value database. Whereas, we generally expect to access aggregates using some form of querying based on values in a document database. Keep in mind, this query may or may not include a key.&lt;/p&gt;
&lt;p&gt;Column-family databases impose even more limitations on the structure of an aggregate. Specifically, column-family databases organize their columns into column families, which was mentioned earlier as well. Each column needs to be assigned to a particular column family. Then, each column can be accessed via a column family. In particular, accessing a column-family will return each of the columns associated with that column family.&lt;/p&gt;
&lt;p&gt;The above details about NoSql databases only scratch the surface of their structure and behavior. For a more detailed explanation about key-value databases, refer to &lt;a href=&quot;/blog/nosql-keyvalue/&quot;&gt;this post&lt;/a&gt;. For a more detailed explanation about document databases, refer to &lt;a href=&quot;/blog/nosql-document/&quot;&gt;this post&lt;/a&gt;. Then, refer to &lt;a href=&quot;/blog/nosql-column/&quot;&gt;this post&lt;/a&gt; for details about column-family databases. Finally, for a deeper dive into graph databases, refer to &lt;a href=&quot;/blog/nosql-graph/&quot;&gt;this post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Use Cases of Database Types&lt;/h2&gt;
&lt;p&gt;The table below consists of a few types of databases, along with their use cases. For even more use cases, refer to &lt;a href=&quot;https://aws.amazon.com/products/databases/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this AWS article&lt;/a&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Database Type&lt;/th&gt;
&lt;th&gt;Use Cases&lt;/th&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Relational&lt;/td&gt;
&lt;td&gt;Traditional applications; basics relationships; general reporting&lt;/td&gt;
&lt;td&gt;MySQL; PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Key-Value&lt;/td&gt;
&lt;td&gt;High-traffic applications; storing configurations; session caching&lt;/td&gt;
&lt;td&gt;Redis; Riak&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document&lt;/td&gt;
&lt;td&gt;Content management; user profiles; prototypes; big data analytics&lt;/td&gt;
&lt;td&gt;MongoDB; Cassandra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph&lt;/td&gt;
&lt;td&gt;Complex relationships; fraud detection; recommendation engines&lt;/td&gt;
&lt;td&gt;Neo4j&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For a more detailed description about the differences between column-oriented databases and column-family databases, refer to &lt;a href=&quot;https://stackoverflow.com/a/38793956/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;. For a useful point about the behavior of join operations in NoSQL databases, refer to &lt;a href=&quot;https://stackoverflow.com/a/1996579/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;. For a more detailed comparison between NoSQL databases, refer to &lt;a href=&quot;https://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;. For informative presentations about NoSQL databases, refer to &lt;a href=&quot;https://www.youtube.com/watch?v=qI_g07C_Q5I&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this video&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/watch?v=Y6Ev8GIlbxc&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this video&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[File System and Database Differences]]></title><description><![CDATA[In most cases, database storage is implemented using file system files, where databases are usually stored in files, which exist in filesystems. The data within a database are usually stored in files…]]></description><link>https://dkharazi.github.io/blog/fs</link><guid isPermaLink="false">https://dkharazi.github.io/blog/fs</guid><pubDate>Sun, 02 Feb 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In most cases, database storage is implemented using file system files, where databases are usually stored in files, which exist in filesystems. The data within a database are usually stored in files, which can be found in a directory structure and generally follow naming conventions assigned by the manufacturer of the DBMS.&lt;/p&gt;
&lt;h2&gt;DBMS and File Systems&lt;/h2&gt;
&lt;p&gt;The major differences between a database and a file system are outlined in &lt;a href=&quot;https://qr.ae/pvgfHr&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this useful post&lt;/a&gt;. First, database engines organize data into files containing records as their data model. These records are obviously represented as byte sequences at the lowest level, but abstracting these sequences as records allows the DBMS to include additional querying functionality. On the other hand, file systems organize data into files and manage them as byte sequences, leading to less querying functionality since there is little awareness of the file contents beyond the byte level.&lt;/p&gt;
&lt;p&gt;Not only does managing records introduce additional querying functionality, but it also introduces search functionality in most cases. Since databases create a layer of abstraction by using records as its data model, they can retrieve and modify those records more easily using search capabilities managed by the database engine. For example, a DBMS may allow users to search for records via SQL or key-value pairs, whereas most basic file systems don&apos;t come equipped with as detailed search functionality.&lt;/p&gt;
&lt;p&gt;In the end, storing data is just storing data. A file just represents some bytes at the lowest level, but it is a storage abstraction. A database is a storage abstraction, and a file system is a storage abstraction. Databases ultimately store their data as files behind the scenes, whereas file systems store data as files too. Databases and basic file systems usually differ by how much additional functionality is offered, where databases typically offer more querying and search functionalities. More details about the differences in functionalities can be found &lt;a href=&quot;https://stackoverflow.com/a/69118380/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;in this post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;File Storage and Object Storage&lt;/h2&gt;
&lt;p&gt;Both file and object storage are different types of storage used for saving and managing data as files with metadata. For example, HDFS is a distributed file storage and can use Hive for storing metadata, and most object stores maintain metadata on their objects as well. Traditional object storage typically stores data as binary files and uses HTTP for accessing the data, whereas file stores hold data formatted in many different file types and can be accessed using SFTP in most cases. For specific examples of open-source object stores (that aren&apos;t AWS S3), refer to &lt;a href=&quot;https://github.com/okhosting/awesome-storage&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this repository on GitHub&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;Most databases today are file storage by default, where some databases can be enabled to be block storage. Again, most databases are file stores with some extra functionality, such as querying capabilities, search functionality, tracking more extensive metadata, inclusion of a GUI, etc. Relational and NoSQL databases physically store data in a semi-similar fashion, meaning they both chop up data into partitions and store these partitions as binary files. They really just differ on how they abstract data, or how they locally store data into logical data models internally.&lt;/p&gt;
&lt;p&gt;Relational databases, such as SQL server, partition data files into &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;8&lt;/mn&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;8K&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07153em;&quot;&gt;K&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; binary files with some metadata by default. An RDBMS logically stores data into tables. On the other hand, file system storage, such as HDFS, partitions data files into &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;128&lt;/mn&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mi&gt;B&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;128MB&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05017em;&quot;&gt;B&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; binary files with some metadata by default. A file system logically stores data into files in a fairly standard, hierarchical file system with directories in most cases. For HDFS, the Hive add-on can provide a SQL-like interface by creating additional metadata for the files. Technically, HDFS isn&apos;t considered a database, since it&apos;s a distributed file system.&lt;/p&gt;
&lt;p&gt;One example of a non-relational database is a key-value database. MongoDB is a common key-value database that partitions data files into BSON (or binarized JSON files) with some metadata. In this case, a key-value non-relational database logically stores data as key-value pairs in physical binary files. Another example of a non-relational database is a columnar store. Cassandra is a common columnar database that partitions data files into binary files with some metadata. In this case, a columnar non-relational database logically stores data as tables. Other examples of columnar databases include Google&apos;s Bigtable, Snowflake, and others.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Database Sharding]]></title><description><![CDATA[Query optimization, indexing, and NoSQL solutions are all popular scalability strategies when designing server-side systems. If those options aren't enough, then sharding may be the next best strategy…]]></description><link>https://dkharazi.github.io/blog/shard</link><guid isPermaLink="false">https://dkharazi.github.io/blog/shard</guid><pubDate>Mon, 06 Jan 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Query optimization, indexing, and NoSQL solutions are all popular scalability strategies when designing server-side systems. If those options aren&apos;t enough, then sharding may be the next best strategy for optimizing or scaling a monolithic RDBMS. For more information about techniques used in indexing, refer to the &lt;a href=&quot;/blog/hash/&quot;&gt;previous post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sharding refers to the process of separating a large table into smaller subsets, which are spread across multiple servers. These smaller chunks are called &lt;em&gt;shards&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In particular, sharding is commonly used for horizontally scaling a monolithic RDBMS. Implying, a shard is a horizontal data partition, where each partition contains a subset of the larger table. When the data within a table grows too large for a single server, we can use sharding to store subsets of data on multiple nodes.&lt;/p&gt;
&lt;p&gt;Other reasons for implementing sharding relates to limitations in memory capacity and compute power. When data grows very large in an unsharded database, any maintenance and query performance becomes slow. Note, vertical scaling has its own limitations in reference to adding resources to support database operations.&lt;/p&gt;
&lt;p&gt;As stated previously, shards are stored on database nodes within a cluster. A &lt;a href=&quot;https://www.mysql.com/products/cluster/scalability.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;MySQL cluster&lt;/a&gt; automatically partitions tables (or shards) across nodes. By doing this, MySQL enables the database to scale horizontally on cheap, commodity hardware.&lt;/p&gt;
&lt;p&gt;With the improvements of sharding mentioned previously, we can generally say that sharding improves the scalability and availability of an RDBMS. Specifically, separating a much larger table into smaller shards will increase the amount of compute capacity for serving incoming queries. Therefore, we&apos;ll end up with faster query response times and index builds. Similarly, creating a strategy involving horizontal scaling increases the storage capacity within the database. With respect to cost, a network of smaller and cheaper servers may be more cost effective in the long term than maintaining one big server.&lt;/p&gt;
&lt;p&gt;Additionally, sharding can offer increased availability. During downtime, the data within an unsharded database is inaccessible. In a sharded database, nodes experiencing downtime will only ensure downtime for its shards. Meaning, any nodes that remain online will be available for read and write operations.&lt;/p&gt;
&lt;p&gt;Essentially, sharding can be implemented by taking partitions and storing them on separate nodes. Then, we need to think about how to shard our data, which is usually performed on a column. In most cases, we shard a table on the primary key. For example, we could shard a dataset on a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;userid&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{userid}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;userid&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; or &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mtext&gt;locid&lt;/mtext&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\text{locid}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord text&quot;&gt;&lt;span class=&quot;mord&quot;&gt;locid&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Typically, optimizing join operations is a commonly experienced problem when separating our data into shards. Specifically, queries will perform poorly when performing join operations, since the join needs to gather data from different shards across the network, which is costly.&lt;/p&gt;
&lt;p&gt;In addition, there needs to be a flexible way for adding and removing shards within a cluster. Consistent hashing can help solve this problem, which is mentioned in the &lt;a href=&quot;/blog/hash/&quot;&gt;previous post&lt;/a&gt;. Specifically, consistent hashing can be used for indexing data into shards. Alternatively, a &lt;a href=&quot;https://www.youtube.com/watch?v=5faMjKuB9bc&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;hierarchical sharding protocol&lt;/a&gt; also can help reduce poor performance of this problem. Lastly, we can optimize the performance of queries on shards by by creating an index on each shard.&lt;/p&gt;
&lt;p&gt;For more a more detailed explanation about sharding, refer to &lt;a href=&quot;https://www.youtube.com/watch?v=5faMjKuB9bc&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this video&lt;/a&gt; or &lt;a href=&quot;https://blog.yugabyte.com/how-data-sharding-works-in-a-distributed-sql-database/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;. For a more detailed analysis of sharding techniques, refer to &lt;a href=&quot;http://blog.gaurav.im/2016/11/17/sharding-databases-a-quick-trick/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Consistent Hashing for Load Balancing]]></title><description><![CDATA[As you know, a hash function maps key values to index values. Typically, these functions are used to determine the location (i.e. index) of a record within a table. They have other applications, such…]]></description><link>https://dkharazi.github.io/blog/hash</link><guid isPermaLink="false">https://dkharazi.github.io/blog/hash</guid><pubDate>Fri, 03 Jan 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;As you know, a hash function maps key values to index values. Typically, these functions are used to determine the location (i.e. index) of a record within a table. They have other applications, such as mapping an image to some hash value. Hash functions are used on a much broader scale for password verification and detection of changes to data. The following image illustrates a few use cases for hash functions.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/69b4f11b67ed2152ab9c04e5b65300d3/hash1.svg&quot; alt=&quot;HashUseCases&quot;&gt;&lt;/p&gt;
&lt;p&gt;A hash function represents the binary code representing any hashed meidum. For example, a song can be represented as binary code, which can be input into a hash function. Similarly, an image can be represented as binary code, which can be input into a hash function as well. Likewise, string, integers, etc. can all be represented as binary code, which is the input of a hash function. &lt;/p&gt;
&lt;p&gt;In terms of load balancing, a load balancer receives a request from a client and returns a response from a designated server on our network. If we make multiple servers available to the client, then the load balancer needs to determine which server will provide the client with the fastest possible response time. Typically, this process involves equally distributing the number of requests to the servers within our network. The traditional method for solving this problem involves the use of the modulo operator. For a more detailed explanation about basic hashing, refer to &lt;a href=&quot;https://www.youtube.com/watch?v=tHEyzVbl4bg&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this video&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/678d7d0050ddc7fae5505768c60fceb3/hash2.svg&quot; alt=&quot;BasicHashing&quot;&gt;&lt;/p&gt;
&lt;p&gt;Consistent hashing can be found in caching throughout load balancing, such as &lt;a href=&quot;https://eng.uber.com/ringpop-open-source-nodejs-library/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Uber&apos;s open-sourced RingPop project&lt;/a&gt;. Broadly speaking, consistent hashing maps objects to the same cache machine as far as possible. When a machine used for caching is added to the network, the machine takes its share of objects from all the other cache machines and when it is removed, its objects are shared among the remaining machines.&lt;/p&gt;
&lt;p&gt;The primary notion behind consistent hashing involves associating each cache with one or more has value intervals, whether the interval boundaries are determined by calculating the hash of each cache identifier. Once the cache is removed, its interval is taken over by a cache with an adjacent interval, while all the remaining caches are unchanged.&lt;/p&gt;
&lt;p&gt;More specifically, this process can be defined using the illustration below, and in &lt;a href=&quot;https://blog.carlosgaldino.com/consistent-hashing.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;these illustrations&lt;/a&gt;. First, we implement a hash ring containing an &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M-1&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.76666em;vertical-align:-0.08333em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.64444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; amount of hashed request ids. Then, we map the available servers in our network on the ring uniformly. Each server is associated with its own hashed request id, which represents a boundary. Once a load balancer receives a request, it maps the hashed request onto the hash ring. Then, this hashed request is mapped to a server immediately clockwise to it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/c2ec4a9f9b20fa59da6b56b34382206c/hash3.svg&quot; alt=&quot;ConsistentHashing&quot;&gt;&lt;/p&gt;
&lt;p&gt;Since a request is mapped to an immediate server in the clockwise direction, the addition or removal of a server will uniformly affect fewer requests in theory. In other words, we can expect the load to be equally distributed on average, since the hash values are assigned randomly and uniformly.&lt;/p&gt;
&lt;p&gt;In practice, there is a greater change of having non-uniform distribution of requests between servers, since we often have a small number of servers on our hash ring. The image below illustrates a scenario where nodes are removed from our ring, creating a non-uniform distribution.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/6ed023279411a47aef49508b2619fc66/hash4.svg&quot; alt=&quot;ConsistentHashingRemoval&quot;&gt;&lt;/p&gt;
&lt;p&gt;To prevent non-uniform distribution from occurring in our hash ring, we can introduce the idea of virtual nodes. Virtual nodes refer to multiple instances of a server on the hash ring. By using a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;k&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; number of hash function, each server corresponds to &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;k&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.69444em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03148em;&quot;&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; number of hash values on the ring. This increases the randomness of the load. For a more detailed understanding of consistent hashing and its use cases, refer to &lt;a href=&quot;https://www.youtube.com/watch?v=zaRkONvyGr8&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this video&lt;/a&gt; and &lt;a href=&quot;https://tom-e-white.com/2007/11/consistent-hashing.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article by Tom White&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/c6ec68e93f83b772ccbd4cebd40a6a4a/hash5.svg&quot; alt=&quot;ConsistentHashingVirtual&quot;&gt;&lt;/p&gt;
&lt;p&gt;Consistent hashing has a few advantages over standard hashing. In particular, it involves a minimal amount of data transfer between machines, which has been proven to be the most optimal amount. However, there are still a few areas of improvments, such as the client needing to know the number of nodes. The client also must know the location of each node on the circle. To improve upon the method mentioned above, we can implement sharding techniques, which are mentioned &lt;a href=&quot;http://blog.gaurav.im/2016/11/17/sharding-databases-a-quick-trick/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The table below illustrates the asymptotic &lt;a href=&quot;https://en.wikipedia.org/wiki/Consistent_hashing&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;time complexities&lt;/a&gt; for &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;N&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; nodes and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;K&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.68333em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07153em;&quot;&gt;K&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; keys. The &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;/&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(K/N)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07153em;&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; complexity refers to the average cost for redistribution of keys, whereas the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(logN)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; complexity occurs from a binary search among nodes, in order to find the next node on the ring.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Classic Hashing&lt;/th&gt;
&lt;th&gt;Consistent Hashing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adding a node&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(K)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07153em;&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;/&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(K/N + logN)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07153em;&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Removing a node&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(K)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07153em;&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;/&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(K/N + logN)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07153em;&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222222222222222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adding a key&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(1)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(logN)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Removing a key&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(1)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;O(logN)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.02778em;&quot;&gt;O&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.01968em;&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.03588em;&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Consistent Hashing isn&apos;t only used in load balancing. It can also be found in database indexing. Generally, there are a few ways to improve the performance of an RDBMS database system. First, we may be interested in very particular performance improvements, so optimizing queries may be the extent of the problem. For broader improvements to the performance of a database, we may want to perform indexing on certain tables.&lt;/p&gt;
&lt;p&gt;For database indexing, a B-tree index is used for column comparisons in expressions that use the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mo&gt;&amp;gt;&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;&amp;gt;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.5782em;vertical-align:-0.0391em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mo&gt;≤&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\le&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.7719400000000001em;vertical-align:-0.13597em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;≤&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; operators. A B-tree index can also be used for LIKE comparisons. On the other hand, hash indexes are used for equality comparisons, which involve the &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;=&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.36687em;vertical-align:0em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mo&gt;&amp;lt;&lt;/mo&gt;&lt;mo&gt;&amp;gt;&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;&amp;lt;&amp;gt;&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.5782em;vertical-align:-0.0391em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.5782em;vertical-align:-0.0391em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; operators. For more information about the use cases of B-tree indexing and hash indexing, refer to the &lt;a href=&quot;https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;MySQL docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are a few other improvements we could implement. For example, we may be interested in translating our SQL database over to a NoSQL database, &lt;a href=&quot;https://softwareengineering.stackexchange.com/a/175546&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;depending on the structure&lt;/a&gt; of data being saved. In most cases, it will not be worthwhile to make this change. Lastly, we can implement a data sharding strategy, which is discussed more in the &lt;a href=&quot;/blog/shard/&quot;&gt;next post&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Scaling a Pizza Chain]]></title><description><![CDATA[In computer science, systems design refers to the process of defining and developing a system that satisfies certain requirements made by the user. Obviously, this involves a detailed understanding of…]]></description><link>https://dkharazi.github.io/blog/pizza</link><guid isPermaLink="false">https://dkharazi.github.io/blog/pizza</guid><pubDate>Wed, 01 Jan 2020 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In computer science, systems design refers to the process of defining and developing a system that satisfies certain requirements made by the user. Obviously, this involves a detailed understanding of the components within a system and how they interact with each other. In particular, this process usually involves knowing the input requirements, output requirements, storage requirements, processing requirements, etc. Specifically, the following concepts can help improve a particular system being managed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#running-a-single-pizza-shop&quot;&gt;Vertical scaling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#preparing-pizzas-in-advance&quot;&gt;Scheduling batch jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#recovering-from-failures-quickly&quot;&gt;Assigning backup servers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#hiring-additional-chefs&quot;&gt;Horizontal scaling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#adding-additional-pizza-shops&quot;&gt;Developing a distributed system&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#efficiently-managing-pizza-orders&quot;&gt;Load balancing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#monitoring-the-performance-of-our-shops&quot;&gt;Logging and calculating metrics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Interestingly, these same concepts can be applied to many problems outside of computer science, such as running a pizza chain efficiently. This post was motivated by &lt;a href=&quot;https://www.youtube.com/watch?v=SqcXvc3ZmRU&amp;#x26;list=PLMCXHnjXnTnvo6alSjVkgxV-VH6EPyvoX&amp;#x26;index=2&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this great video&lt;/a&gt;, so please refer to it for more details.&lt;/p&gt;
&lt;h2&gt;Running a Single Pizza Shop&lt;/h2&gt;
&lt;p&gt;In this example, we&apos;ll consider a single pizza parlor as a cluster, whereas a single chef is a single computer. Then, we can think of an order as a request to our server. In the technical sense, horizontal scaling would imply hiring additional chefs to our pizza parlor. On the other hand, vertical scaling would imply purchasing additional technology for an individual chef to help them work harder.&lt;/p&gt;
&lt;p&gt;In a distributed system, our goal is to increase the throughput of incoming requests by optimizing resources and running processes. One way of achieving this in our pizza shop analogy is through vertical scaling, or expanding the limits of a chef so they can handle more work. This improvement to the system is analogous to the change to the pizza shop illustrated below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/b655727c787f514d991e0f2491057c1d/pizza1.svg&quot; alt=&quot;VerticalScaling&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Preparing Pizzas in Advance&lt;/h2&gt;
&lt;p&gt;In most cases, the peak hours of a pizza shop is during the day. In order to ease the workload on our chef during these peak hours, he or she can prepare pizzas during the night. The chef would prefer preparing pizzas after hours to avoid preparing pizzas during the day, when he or she has many orders to handle.&lt;/p&gt;
&lt;p&gt;In computer science, this could look like setting up batch jobs to process large amounts of data during the nights. Usually, these could be in the form of cron jobs, and they can run for a few hours during 3AM-4AM. Specifically, we wouldn&apos;t want to run these batch jobs during the day, since this could harm the performance of users querying our database.&lt;/p&gt;
&lt;h2&gt;Recovering from Failures Quickly&lt;/h2&gt;
&lt;p&gt;In any system, we want to avoid having a single point of failure. Indicating, our system should be resilient. When managing a pizza store with a single chef, there should be one or more chefs on call, in case the chef calls in sick. Otherwise, the business would be shut down for the day the chef calls in sick, causing this to be a single point of failure. In our system, we would want to create backup servers in this case. Then, our backup server would fill in for our worker server.&lt;/p&gt;
&lt;h2&gt;Hiring Additional Chefs&lt;/h2&gt;
&lt;p&gt;Clearly, only employing a single chef at our pizza shop becomes a greater problem if we need to handle a much larger number of pizza orders. Hiring additional chefs can also prevent a single point of failure, since our shop still can function if one chef calls off. In other words, we can hire more pizza chefs to scale up our pizza shop. &lt;/p&gt;
&lt;p&gt;In a distributed system, this concept translates to purchasing additional servers to handle more requests. This improvement to the system is analogous to the change to the pizza shop illustrated below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/969017e03a7bd929f63f2136cbe87eeb/pizza2.svg&quot; alt=&quot;HorizontalScaling&quot;&gt;&lt;/p&gt;
&lt;p&gt;Vertical scaling and horizontal scaling typically are used together in a system. When systems observe a problem requiring either horizontal scaling and vertical scaling, we can make the appropriate changes based on the properties of the problem. In particular, we can use the more general use cases provided below to determine which type of scaling should be applied.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Horizontal&lt;/th&gt;
&lt;th&gt;Vertical&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Load Balancing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resilient&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Single point of failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed of Communication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Slower (RPC)&lt;/td&gt;
&lt;td&gt;Faster (IPC)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Inconsistent&lt;/td&gt;
&lt;td&gt;Consistent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scalability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scales well as users increase&lt;/td&gt;
&lt;td&gt;Hardware limits&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Adding Additional Pizza Shops&lt;/h2&gt;
&lt;p&gt;For creating a truly scalable pizza chain, there should be more than one pizza shop. To illustrate this point, suppose our shop experiences a power outage for an entire day. In this case, our pizza shop couldn&apos;t accept any orders and would need to close down for the day. Expanding our pizza shop to other locations could help solve this problem.&lt;/p&gt;
&lt;p&gt;By building additional shops, we still can handle orders in case a single shop experiences a power outage. Also, pizzas can be delivered at a faster rate if orders are handled at a shop closter to the customer. These pizza shops need to be able to communicate with each other, but should be able to handle orders on their own in case one shop isn&apos;t available.&lt;/p&gt;
&lt;p&gt;Adding additional shops is analogous to creating new servers or even clusters in a distributed system. Similarly, these servers need to be able to communicate with each other. Creating new servers or clusters leads to a more fault-tolerant and responsive distributed system. For example, AWS will include multiple servers from different datacenters within their clusters, which leads to faster response times.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/f8cc1001e31820cc519de9a06d57a8a1/pizza3.svg&quot; alt=&quot;DistributedPizzaStores&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Efficiently Managing Pizza Orders&lt;/h2&gt;
&lt;p&gt;Ideally, customers would submit an order for delivery to an entry point for our delivery services, which could be a site, app, or specific number. Then, a manager would determine a store that efficiently prepares the pizza and delivers it in the fastest possible time frame. The image below illustrates a load balancer as a central office for our delivery services. &lt;/p&gt;
&lt;p&gt;In a distributed system, the component responsible for managing the load and forwarding of a request is known as a &lt;em&gt;load balancer&lt;/em&gt;. Specifically, a load balancer is responsible for dispatching requests based on statistics saved by it or some other external agent. In other words, a load balancer will receive a request and determine the optimal node that should process the request.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/1784658686eb5e369b490df8ccabefaf/pizza4.svg&quot; alt=&quot;LoadBalancer&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Monitoring the Performance of our Shops&lt;/h2&gt;
&lt;p&gt;As we build even more shops and hire additional chefs to handle the demand for our delicious pizza, monitoring the performance, decisions, and actions of our employees and shops becomes difficult. As a result, logging the decisions and calculating metrics on these components becomes critical. Automating behavior based on these logs becomes even more critical if we want to build a truly scalable pizza shop.&lt;/p&gt;
&lt;p&gt;Similarly, we should log information, warning, and errors within our system to measure. In other words, creating a system for logging is important to ensure nodes and clusters are behaving as expected.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Testing Spark Applications with Mesos]]></title><description><![CDATA[This post walks through an example of running a cluster using a Mesos cluster manager on Mac OS. In the coming posts, we'll explore other examples, including clusters running a standalone cluster…]]></description><link>https://dkharazi.github.io/blog/spark-mesos</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-mesos</guid><pubDate>Thu, 04 Jul 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post walks through an example of running a cluster using a Mesos cluster manager on Mac OS. In the coming posts, we&apos;ll explore other examples, including clusters running a &lt;a href=&quot;/blog/spark-standalone/&quot;&gt;standalone&lt;/a&gt; cluster manager and a cluster manager in &lt;a href=&quot;/blog/spark-yarn/&quot;&gt;YARN&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#describing-the-mesos-architecture&quot;&gt;Describing the Mesos Architecutre&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#comparing-mesos-and-standalone-architectures&quot;&gt;Comparing Mesos and Standalone Architectures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-mesos&quot;&gt;Setting up Mesos&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-a-sparksession&quot;&gt;Setting up a SparkSession&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-mesos-daemons&quot;&gt;Launching Mesos Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-spark-daemons&quot;&gt;Launching Spark Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#accessing-web-ui-for-daemons&quot;&gt;Accessing Web UI for Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-applications-in-client-mode&quot;&gt;Launching Applications in Client Mode&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Describing the Mesos Architecture&lt;/h2&gt;
&lt;p&gt;The Mesos architecture is arguably more similar to the standalone architecture, compared to the YARN architecture. This is because the essential components of the Mesos architecture include a master and workers only. In the Mesos architecture, a master schedules worker resources for applications that need to use them. Then, workers launch executors for applications, which execute tasks.&lt;/p&gt;
&lt;p&gt;Mesos can run Docker containers. As a result, Mesos essentially can run any application that can be set up in a Docker container, which includes Spark applications.&lt;/p&gt;
&lt;p&gt;Generally, Mesos is more powerful than a cluster with a Spark standalone cluster manager. Mesos can be used for applications other than Spark, as well. Specifically, it can be used for Java, Scala, Python, and other applications. It can also do more than schedule CPU and RAM resources. In particular, it is capable of scheduling disk space, network ports, etc.&lt;/p&gt;
&lt;p&gt;Spark applications running on Mesos consist of two components. These two components include a scheduler and executor. The scheduler accepts or rejects CPU and RAM resources offered by the Mesos master. Then, this master automatically starts Mesos workers, which automatically start executors. Lastly, Mesos executors run tasks as requested by the scheduler.&lt;/p&gt;
&lt;h2&gt;Comparing Mesos and Standalone Architectures&lt;/h2&gt;
&lt;p&gt;Resource scheduling is the most distinct difference between the Mesos and standalone architectures. Specifically, a standalone cluster manager &lt;em&gt;automatically&lt;/em&gt; assigns resources to applications. A Mesos cluster manager &lt;em&gt;optionally&lt;/em&gt; offers resources to applications. In this case, an application can accept and refuse the resources.&lt;/p&gt;
&lt;h2&gt;Setting up Mesos&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Install Mesos&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;brew &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; mesos&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the ZooKeeper dependency&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;brew &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; zookeeper&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, assign IP addresses for masters in &lt;code class=&quot;language-text&quot;&gt;/etc/mesos/zk&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;zk://192.0.2.1:2181,192.0.2.2:2181/mesos&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Setting up a SparkSession&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Download &lt;a href=&quot;https://apache.claz.org/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Spark 2.4.6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Add the path for Spark in &lt;code class=&quot;language-text&quot;&gt;.bash_profile&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;export SPARK_HOME=./spark-2.4.6-bin-hadoop2.7&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;Create the file &lt;code class=&quot;language-text&quot;&gt;./conf/spark-defaults.conf&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;spark.master=yarn
spark.driver.am.memory=512m
spark.yarn.am.memory=512m
spark.executor.memory=512m
spark.eventLog.enabled=true
spark.eventLog.dir=./tmp/spark-events/
spark.history.fs.logDirectory=./tmp/spark-events/
spark.driver.memory=5g&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Create a Spark application:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# test.py&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; pyspark &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; SparkContext
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;file&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;~/data.txt&quot;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;# path of data&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; masterurl &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;spark://localhost:7077&apos;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; sc &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; SparkContext&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;masterurl&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;myapp&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;cache&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; num_a &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;a&apos;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;num_a&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;stop&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Launching Mesos Daemons&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Start the master:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sudo&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;service&lt;/span&gt; mesos-master start&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a worker:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sudo&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;service&lt;/span&gt; mesos-slave start&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stop the master:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sudo&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;service&lt;/span&gt; mesos-master stop&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stop a worker:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;sudo&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;service&lt;/span&gt; mesos-slave stop&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Launching Spark Daemons&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Start a master daemon in standalone mode&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-master.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a worker daemon&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-slave.sh spark://localhost:7077&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a history daemon&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-history-server.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a Spark application&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./bin/spark-submit &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
--master mesos://localhost:5050 &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
test.py&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stop the daemons&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/stop-master.sh
$ ./sbin/stop-slave.sh
$ ./sbin/stop-history-server.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Accessing Web UI for Daemons&lt;/h2&gt;
&lt;p&gt;Mesos provides a web UI for each initialized daemon. By default, Spark creates a web UI for the master on port &lt;code class=&quot;language-text&quot;&gt;5050&lt;/code&gt;. The workers can take on different ports and can be accessed via the master web UI. The history server can be accessed on port &lt;code class=&quot;language-text&quot;&gt;18080&lt;/code&gt; by default. The table below summarizes the default locations for each web UI.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Daemon&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mesos Master&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;5050&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spark History&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;18080&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Launching Applications in Client Mode&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Mesos workers offer their resources to the master&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Mesos scheduler registers with the Mesos master&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Mesos master is located in the cluster&lt;/li&gt;
&lt;li&gt;The Mesos scheduler is Spark&apos;s Mesos-specific scheduler&lt;/li&gt;
&lt;li&gt;The Mesos scheduler runs in the driver&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Mesos master offers available resources to the Mesos scheduler&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This happens continuously&lt;/li&gt;
&lt;li&gt;The offer is sent out every second while master is alive&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The Mesos scheduler accepts some of the resources&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Mesos scheduler sends metadata about the resources to the Mesos master&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This metadata includes information about these resources and tasks that run these resources&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The Mesos master asks the workers to start the tasks with its specified resources&lt;/li&gt;
&lt;li&gt;The Mesos workers launch Mesos executors&lt;/li&gt;
&lt;li&gt;The Mesos executors launch Spark executors consisting of Tasks&lt;/li&gt;
&lt;li&gt;The Spark executors communicate with the Spark driver&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;/ba2baf777014cc5fa9fcf0f80c6fffbc/mesos-client.svg&quot; alt=&quot;MesosClient&quot;&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Testing Spark Applications with YARN]]></title><description><![CDATA[This post walks through an example of running a cluster using a YARN cluster manager on Mac OS. In the coming posts, we'll explore other examples, including clusters running a standalone cluster…]]></description><link>https://dkharazi.github.io/blog/spark-yarn</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-yarn</guid><pubDate>Mon, 17 Jun 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post walks through an example of running a cluster using a YARN cluster manager on Mac OS. In the coming posts, we&apos;ll explore other examples, including clusters running a &lt;a href=&quot;/blog/spark-standalone/&quot;&gt;standalone&lt;/a&gt; cluster manager and &lt;a href=&quot;/blog/spark-mesos/&quot;&gt;Mesos&lt;/a&gt; cluster manager.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#describing-the-yarn-architecture&quot;&gt;Describing the YARN Architecutre&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#comparing-yarn-and-standalone-architectures&quot;&gt;Comparing YARN and Standalone Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-hadoop&quot;&gt;Setting up Hadoop&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-a-sparksession&quot;&gt;Setting up a SparkSession&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-yarn-daemons&quot;&gt;Launching YARN Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-spark-daemons&quot;&gt;Launching Spark Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#accessing-web-ui-for-daemons&quot;&gt;Accessing Web UI for Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-applications-in-client-mode&quot;&gt;Launching Applications in Client Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-applications-in-cluster-mode&quot;&gt;Launching Applications in Cluster Mode&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Describing the YARN Architecture&lt;/h2&gt;
&lt;p&gt;There are many redundancies found throughout the standard YARN architecture compared to the standalone architecture in Spark. There are a few additional components in YARN that replaces some of the daemons in the standalone architecture:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Resource Manager&lt;/li&gt;
&lt;li&gt;Node Manager&lt;/li&gt;
&lt;li&gt;Containers&lt;/li&gt;
&lt;li&gt;Application Master&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Essentially, the resource manager is the same as the master process in Spark&apos;s standalone mode. The node manager is essentially the same as the worker process. There is a single resource manager per cluster and a single node manager per node in the cluster. &lt;/p&gt;
&lt;p&gt;Rather than representing executors and processes as JVM instances, YARN represents them as containers. However, each containers is still run as a JVM with a requested heap size. These containers contain an application master, which is responsible for requesting application resources from the resource manager.&lt;/p&gt;
&lt;p&gt;When an application is run using YARN, the driver process acts as the YARN application master in Spark. Then, node managers monitor CPU and RAM resources used by containers. As a result, they report these resources to the resource manager.&lt;/p&gt;
&lt;h2&gt;Comparing YARN and Standalone Architectures&lt;/h2&gt;
&lt;p&gt;Although a the standalone Spark cluster manager and the YARN cluster manager has a lot of similarities, some of the responsibilities change and JVM instances behave differently. Primarily, resource scheduling is performed by the master JVM in standalone mode, whereas it is performed by the resource manager in YARN.&lt;/p&gt;
&lt;p&gt;Executors are asked to start by the master JVM in standalone mode, whereas they are asked to start by the application master in YARN. Job scheduling still is performed by the Spark scheduler in both modes. When Spark is running on YARN, the Spark driver process acts as the YARN application master. Additionally, YARN refers to its processes as containers, rather than JVM instances in standalone mode.&lt;/p&gt;
&lt;h2&gt;Setting up Hadoop&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Install Hadoop:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;brew &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; hadoop&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Download Java version supported by Hadoop 3.0:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;brew cask &lt;span class=&quot;token function&quot;&gt;install&lt;/span&gt; java8&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;Configure the path of Java ran by Hadoop:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;# /usr/local/Cellar/hadoop/3.2.1_1/libexec/etc/hadoop-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_231.jdk/Contents/Home&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Configure the HDFS address:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;xml&quot;&gt;&lt;pre class=&quot;language-xml&quot;&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;&amp;lt;!--/usr/local/Cellar/hadoop/3.2.1_1/libexec/etc/core-site.xml--&gt;&lt;/span&gt;
&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;configuration&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      hadoop.tmp.dir
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      /usr/local/Cellar/hadoop/hdfs/tmp
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;description&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      A base for other temporary directories
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;description&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;             
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      fs.default.name
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      hdfs://localhost:8020
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;configuration&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;5&quot;&gt;
&lt;li&gt;Configure the MapReduce JobTracker address:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;xml&quot;&gt;&lt;pre class=&quot;language-xml&quot;&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;&amp;lt;!--/usr/local/Cellar/hadoop/3.2.1_1/libexec/etc/mapred-site.xml--&gt;&lt;/span&gt;
&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;configuration&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      mapred.job.tracker
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      localhost:8021
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;configuration&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;6&quot;&gt;
&lt;li&gt;Configure the HDFS properties:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;xml&quot;&gt;&lt;pre class=&quot;language-xml&quot;&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;&amp;lt;!--/usr/local/Cellar/hadoop/3.2.1_1/libexec/etc/hdfs-site.xml--&gt;&lt;/span&gt;
&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;configuration&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      dfs.replication
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;name&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
      1
    &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;value&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;property&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;/&lt;/span&gt;configuration&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;7&quot;&gt;
&lt;li&gt;Configure SHH Keys:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;ssh-keygen -t rsa -P &lt;span class=&quot;token string&quot;&gt;&apos;&apos;&lt;/span&gt; -f ~/.ssh/id_rsa
$ &lt;span class=&quot;token function&quot;&gt;cat&lt;/span&gt; ~/.ssh/id_rsa.pub &gt;&gt; ~/.ssh/authorized_keys
$ &lt;span class=&quot;token function&quot;&gt;chmod&lt;/span&gt; 0600 ~/.ssh/authorized_keys&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Setting up a SparkSession&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Download &lt;a href=&quot;https://apache.claz.org/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Spark 2.4.6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Add the path for Spark in &lt;code class=&quot;language-text&quot;&gt;.bash_profile&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;export SPARK_HOME=./spark-2.4.6-bin-hadoop2.7&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;Create the file &lt;code class=&quot;language-text&quot;&gt;./conf/spark-defaults.conf&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;spark.master=yarn
spark.driver.am.memory=512m
spark.yarn.am.memory=512m
spark.executor.memory=512m
spark.eventLog.enabled=true
spark.eventLog.dir=./tmp/spark-events/
spark.history.fs.logDirectory=./tmp/spark-events/
spark.driver.memory=5g&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Create a Spark application:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# test.py&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; pyspark &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; SparkContext
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;file&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;~/data.txt&quot;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;# path of data&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; masterurl &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;spark://localhost:7077&apos;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; sc &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; SparkContext&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;masterurl&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;myapp&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;cache&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; num_a &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;a&apos;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;num_a&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;stop&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Launching YARN Daemons&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Start YARN from the NameNode:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./bin/start-yarn.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The YARN cluster manager needs to be started on the NameNode. By doing this, the ResourceManager and NodeManagers should be started using the command above. Specifically, we should see the following lines after running the command above.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Starting resourcemanager
Starting nodemanagers&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;
&lt;p&gt;Stop the daemons&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./bin/stop-yarn.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Launching Spark Daemons&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Start a master daemon in standalone mode&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-master.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a worker daemon&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-slave.sh spark://localhost:7077&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a history daemon&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-history-server.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a Spark application&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./bin/spark-submit &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
--master &lt;span class=&quot;token function&quot;&gt;yarn&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
test.py&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stop the daemons&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/stop-master.sh
$ ./sbin/stop-slave.sh
$ ./sbin/stop-history-server.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Accessing Web UI for Daemons&lt;/h2&gt;
&lt;p&gt;Spark provides a web UI for each initialized daemon. By default, Spark creates a web UI for the master on port &lt;code class=&quot;language-text&quot;&gt;8080&lt;/code&gt;. The workers can take on different portsand can be accessed via the master web UI. The history server can be accessed on port &lt;code class=&quot;language-text&quot;&gt;18080&lt;/code&gt; by default. The table below summarizes the default locations for each web UI.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Daemon&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;YARN Master&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;8080&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YARN Worker&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;8081&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spark History&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;18080&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HDFS Resource Manager&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;9870&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YARN JobTracker&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;8088&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Launching Applications in Client Mode&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Client&apos;s JVM process submits a driver to the resource manager&lt;/li&gt;
&lt;li&gt;The driver is launched&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The resource manager instructs a node manager to start a container with an application master&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The container includes the application master&lt;/li&gt;
&lt;li&gt;The resource manager represents the master&lt;/li&gt;
&lt;li&gt;The node manager represents the worker&lt;/li&gt;
&lt;li&gt;The application master requests for resources&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Node manager launches a container with an application master&lt;/li&gt;
&lt;li&gt;Application master requests the resource manager to allocate resources for the application&lt;/li&gt;
&lt;li&gt;App master asks node managers to start executor containers&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Node managers launch executors&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This is on behalf of the Spark application master&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The driver and executors communicate independently&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Doesn&apos;t involves the master or workers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;/fa40e4b908ccfcea54f0e62ad42c1688/yarn-client.svg&quot; alt=&quot;ClientModeYARN&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Launching Applications in Cluster Mode&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Client&apos;s JVM process submits a driver to the resource manager&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The resource manager instructs a node manager to start a container with an application master&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The container includes the application master&lt;/li&gt;
&lt;li&gt;The resource manager represents the master&lt;/li&gt;
&lt;li&gt;The node manager represents the worker&lt;/li&gt;
&lt;li&gt;The application master requests for resources&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Node manager launches a container with an application master&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The application master contains the spark driver&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Application master requests the resource manager to allocate resources for the application&lt;/li&gt;
&lt;li&gt;App master asks node managers to start executor containers&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Node managers launch executors&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This is on behalf of the Spark application master&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The driver and executors communicate independently&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Doesn&apos;t involves the master or workers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;/54b031f131be5adf52293f1de3c6c813/yarn-cluster.svg&quot; alt=&quot;ClusterModeYARN&quot;&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Testing Spark Applications in Standalone]]></title><description><![CDATA[This post walks through an example of a cluster running in standalone mode. In the coming posts, we'll explore other examples, including clusters running a YARN cluster manager and Mesos cluster…]]></description><link>https://dkharazi.github.io/blog/spark-standalone</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-standalone</guid><pubDate>Thu, 06 Jun 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post walks through an example of a cluster running in standalone mode. In the coming posts, we&apos;ll explore other examples, including clusters running a &lt;a href=&quot;/blog/spark-yarn/&quot;&gt;YARN&lt;/a&gt; cluster manager and &lt;a href=&quot;/blog/spark-mesos/&quot;&gt;Mesos&lt;/a&gt; cluster manager.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-a-sparksession&quot;&gt;Setting up a SparkSession&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-daemons&quot;&gt;Launching Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#accessing-web-ui-for-daemons&quot;&gt;Accessing Web UI for Daemons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#caveat-about-pyspark-applications&quot;&gt;Caveat about PySpark Applications&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-applications-in-client-mode&quot;&gt;Launching Applications in Client Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#launching-applications-in-cluster-mode&quot;&gt;Launching Applications in Cluster Mode&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Setting up a SparkSession&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Download &lt;a href=&quot;https://apache.claz.org/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Spark 2.4.6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Create the file &lt;code class=&quot;language-text&quot;&gt;./conf/spark-defaults.conf&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;spark.master=spark://localhost:7077
spark.eventLog.enabled=true
spark.eventLog.dir=./tmp/spark-events/
spark.history.fs.logDirectory=./tmp/spark-events/
spark.driver.memory=5g&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;Create a Spark application:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# test.py&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; pyspark &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; SparkContext
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;file&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;~/data.txt&quot;&lt;/span&gt;  &lt;span class=&quot;token comment&quot;&gt;# path of data&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; masterurl &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;spark://localhost:7077&apos;&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; sc &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; SparkContext&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;masterurl&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;myapp&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;cache&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; num_a &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;a&apos;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;count&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;num_a&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;stop&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Launching Daemons&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Start a master daemon in standalone mode&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-master.sh &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a worker daemon&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-slave.sh spark://localhost:7077&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a history daemon&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/start-history-server.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start a Spark application&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./bin/spark-submit &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
--master spark://localhost:7077 &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
test.py&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stopping the daemons&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./sbin/stop-master.sh
$ ./sbin/stop-slave.sh
$ ./sbin/stop-history-server.sh &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Accessing Web UI for Daemons&lt;/h2&gt;
&lt;p&gt;Spark provides a web UI for each initialized daemon. By default, Spark creates a web UI for the master on port &lt;code class=&quot;language-text&quot;&gt;8080&lt;/code&gt;. The workers can take on different portsand can be accessed via the master web UI. The history server can be accessed on port &lt;code class=&quot;language-text&quot;&gt;18080&lt;/code&gt; by default. The table below summarizes the default locations for each web UI.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Daemon&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Master&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;8080&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worker&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;8081&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;History&lt;/td&gt;
&lt;td&gt;&lt;code class=&quot;language-text&quot;&gt;18080&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Caveat about PySpark Applications&lt;/h2&gt;
&lt;p&gt;Notice, launching an application in client mode doesn&apos;t seem to trigger a driver according to the master&apos;s web UI. This doesn&apos;t mean a driver isn&apos;t launched in client mode. The driver is still launched within the spark-submit process. However, the master&apos;s web UI omits driver information if the application is running in client mode.&lt;/p&gt;
&lt;p&gt;So, we may want to launch an application in cluster mode now. However, running an application in cluster mode would give us the following error:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;./bin/spark-submit &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
    --master spark://localhost:7077 &lt;span class=&quot;token punctuation&quot;&gt;\&lt;/span&gt;
    --deploy-mode cluster
    test.py
Exception &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; thread &lt;span class=&quot;token string&quot;&gt;&quot;main&quot;&lt;/span&gt; org.apache.spark.SparkException: Cluster deploy mode is currently not supported &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; python applications on standalone clusters.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As of Spark 2.4.6, we can&apos;t run python applications in cluster mode when running a standalone cluster manager. This is a good opportunity for us to experiment with other resource managers in the &lt;a href=&quot;&quot;&gt;next post&lt;/a&gt;. For now, we will run &lt;code class=&quot;language-text&quot;&gt;JavaSparkPi.java&lt;/code&gt; found in the examples directory.&lt;/p&gt;
&lt;h2&gt;Launching Applications in Client Mode&lt;/h2&gt;
&lt;p&gt;In a &lt;a href=&quot;&quot;&gt;previous post&lt;/a&gt;, we defined the components associated with a driver program and cluster, while illustrating the interaction between a driver program and cluster components. Specifically, we defined this interaction when applications are launched in client mode. Now, we can execute an application and verify these steps using the logs.&lt;/p&gt;
&lt;p&gt;Note, the timestamps and logged messages were slightly modified for clarification. However, the order and substance of each message still remains the same.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;14:43:01 INFO
SparkContext: Submitted application: Spark Pi

14:43:02 INFO
Utils: Successfully started service &amp;#39;sparkDriver&amp;#39;

14:43:03 INFO
StandaloneAppClient: Connecting to master

14:43:04 INFO
StandaloneSchedulerBackend: Connected to Spark cluster

14:43:05 INFO
Master: Registered app Spark Pi

14:43:06 INFO
Master: Launching executor on worker

14:43:07 INFO
Worker: Asked to launch executor

14:43:08 INFO
ExecutorRunner: Launched

14:43:09 INFO
StandaloneAppClient: Executor added on worker

14:43:10 INFO
StandaloneSchedulerBackend: Granted executor ID

14:43:11 INFO
StandaloneAppClient: Executor is now RUNNING

14:43:12 INFO
SparkContext: Starting job

...

14:43:13 INFO
DAGScheduler: Job finished

14:43:14 INFO
StandaloneSchedulerBackend: Shutting down all executors

14:43:15 INFO
Worker: Asked to kill executor

14:43:16 INFO
ExecutorRunner: Killing process!

14:43:17 INFO
Master: Removing app

14:43:18 INFO
SparkContext: Successfully stopped SparkContext&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Launching Applications in Cluster Mode&lt;/h2&gt;
&lt;p&gt;In a &lt;a href=&quot;&quot;&gt;previous post&lt;/a&gt;, we defined the interaction between a driver program and cluster components, while applications are launched in cluster mode. Now, we can execute an application in cluster mode to verify these steps using the logs.&lt;/p&gt;
&lt;p&gt;Note, the timestamps and logged messages were slightly modified for clarification. However, the order and substance of each message still remains the same.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;14:43:01 INFO
Master: Driver submitted

14:43:02 INFO
Master: Launching driver

14:43:03 INFO
Worker: Asked to launch driver

14:43:04 INFO
DriverRunner: Launched

14:43:05 INFO
Utils: Successfully started service &amp;#39;driverClient&amp;#39;

14:43:06 INFO
ClientEndpoint: Driver successfully submitted

14:43:07 INFO
SparkContext: Submitted application: Spark Pi

14:43:08 INFO
Utils: Successfully started service &amp;#39;sparkDriver&amp;#39;

14:43:11 INFO
StandaloneAppClient: Connecting to master

14:43:10 INFO
StandaloneSchedulerBackend: Connected to Spark cluster

14:43:12 INFO
Master: Registered app Spark Pi

14:43:13 INFO
Master: Launching executor on worker

14:43:14 INFO
Worker: Asked to launch executor

14:43:15 INFO
ExecutorRunner: Launched

14:43:16 INFO
StandaloneAppClient: Executor added on worker

14:43:17 INFO
StandaloneSchedulerBackend: Granted executor ID

14:43:18 INFO
StandaloneAppClient: Executor is now RUNNING

14:43:19 INFO
SparkContext: Starting job

...

14:43:20 INFO
DAGScheduler: Job finished

14:43:21 INFO
StandaloneSchedulerBackend: Shutting down all executors

14:43:22 INFO
Worker: Asked to kill executor

14:43:23 INFO
ExecutorRunner: Killing process!

14:43:24 INFO
Worker: Driver exited successfully

14:43:25 INFO
Master: Removing app

14:43:26 INFO
SparkContext: Successfully stopped SparkContext&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Data Locality in Spark]]></title><description><![CDATA[This post provides an overview of different types of data locality in Spark. In the coming posts, we'll dive deeper into more low-level concepts. Meaning, we'll explore the Spark internals using…]]></description><link>https://dkharazi.github.io/blog/spark-locality</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-locality</guid><pubDate>Sun, 12 May 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post provides an overview of different types of data locality in Spark. In the coming posts, we&apos;ll dive deeper into more low-level concepts. Meaning, we&apos;ll explore the Spark internals using examples. Then, we&apos;ll explore some examples of running spark applications in a cluster with a &lt;a href=&quot;/blog/spark-standalone/&quot;&gt;standalone&lt;/a&gt; cluster manager, &lt;a href=&quot;/blog/spark-yarn/&quot;&gt;YARN&lt;/a&gt; cluster manager, and &lt;a href=&quot;/blog/spark-mesos/&quot;&gt;Mesos&lt;/a&gt; cluster manager.&lt;/p&gt;
&lt;h2&gt;Defining Data Locality&lt;/h2&gt;
&lt;p&gt;In Spark, tasks are run as close to the location of data as possible. Meaning, executors are selected based on their proximity to requested data within the cluster. This notion is referred to as &lt;em&gt;data locality&lt;/em&gt;. Since the selection of executors is affected, data locality influences job scheduling as a consequence. To find optimal executors closest to the data, Spark maintains a list of preferred executors for each partition.&lt;/p&gt;
&lt;p&gt;Data locality can have a major impact on the performance of Spark jobs. If data and the code operating on it are together, then computation tends to be fast. The goal of data locality is to minimize the read and write speed from the CPU to memory. Data locality is achieved if each HDFS block is loaded in the RAM of the same node where the HDFS block lives. Specifically, data transfer can be avoided if the Spark scheduler runs tasks on executors where these blocks are present.&lt;/p&gt;
&lt;h2&gt;Levels of Data Locality&lt;/h2&gt;
&lt;p&gt;The locality level indicates which type of data access is performed. There are five levels of data locality:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;PROCESS_LOCAL&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Execute a task on the same executor as the cached data&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;NODE_LOCAL&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Execute a task on the same node as the cached data, but on different executors&lt;/li&gt;
&lt;li&gt;Generally, this level is slower than the previous one&lt;/li&gt;
&lt;li&gt;However, sometimes there is waiting for available executors&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;RACK_LOCAL&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Execute a task on the same rack as the cached data, but on different executors and nodes&lt;/li&gt;
&lt;li&gt;Generally, this level is even slower&lt;/li&gt;
&lt;li&gt;This is because even more data is moved through the network&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;NO_PREF&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There isn&apos;t any preference for data locality&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;ANY&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Execute a task on any executor on any rack, node, or executor&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title><![CDATA[Spark Deployment Modes]]></title><description><![CDATA[This post provides an overview of the different deployment modes in Spark and how each deployment mode changes the behavior of Spark components. In the coming posts, we'll dive deeper into more low…]]></description><link>https://dkharazi.github.io/blog/spark-deployment</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-deployment</guid><pubDate>Thu, 28 Mar 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post provides an overview of the different deployment modes in Spark and how each deployment mode changes the behavior of Spark components. In the coming posts, we&apos;ll dive deeper into more low-level concepts. Meaning, we&apos;ll explore the Spark internals, along with some examples.&lt;/p&gt;
&lt;p&gt;An application can be deployed to a cluster in one of two modes: &lt;em&gt;cluster&lt;/em&gt; or &lt;em&gt;client&lt;/em&gt; mode. These modes determine the location of the driver process. By default, Spark will run a driver in an application on the client JVM. Python applications can&apos;t run in cluster mode on a standalone cluster.&lt;/p&gt;
&lt;p&gt;To walk through an example demonstrating the interaction between a driver program and cluster components in standalone mode, refer to the &lt;a href=&quot;/blog/spark-standalone/&quot;&gt;my other post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Client-Deploy Mode&lt;/h2&gt;
&lt;p&gt;As stated previously, the deployment mode determines the location of the driver process. In client-deploy mode, the driver program runs on the client&apos;s JVM process. Meaning, the driver program runs on the client&apos;s machine. This is the same machine as the one that called the &lt;code class=&quot;language-text&quot;&gt;spark-submit&lt;/code&gt; command, which implies the driver process sits outside of the cluster. Generally, applications deployed in client mode perform the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Client&apos;s JVM process submits a driver to the master&lt;/li&gt;
&lt;li&gt;The driver is launched&lt;/li&gt;
&lt;li&gt;Master instructs workers to start executor processes for the driver&lt;/li&gt;
&lt;li&gt;Workers launch executor JVMs&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The driver and executors communicate independently&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Doesn&apos;t involves the master or workers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;/f4b936a138f19424b9bb26e6b5f5a039/standalone-client.svg&quot; alt=&quot;standaloneclient&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Cluster-Deploy Mode&lt;/h2&gt;
&lt;p&gt;In cluster-deploy mode, the driver program runs on its own JVM process located inside the cluster. Indicating, the driver program doesn&apos;t run on the client&apos;s machine. Instead, it runs on a node within the cluster. Again, the driver program is started by a worker JVM, but runs in a separate JVM in cluster-deploy mode. Generally, applications deployed in cluster mode perform the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Client&apos;s JVM process submits a driver to the master&lt;/li&gt;
&lt;li&gt;Master instructs one of its workers to launch a driver&lt;/li&gt;
&lt;li&gt;That worker launches a driver JVM in the cluster&lt;/li&gt;
&lt;li&gt;Master instructs any workers to start executors for the driver&lt;/li&gt;
&lt;li&gt;Those workers launch executor JVMs&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The driver and executors communicate independently&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Doesn&apos;t involves the master and workers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;/e5cd11d221bdb2be2c917f04bddb8313/standalone-cluster.svg&quot; alt=&quot;standalonecluster&quot;&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Runtime Architecture in Spark]]></title><description><![CDATA[This post provides a high-level introduction to generic objects in the Spark API, along with the responsibilities for each object. In the coming posts, we'll dive deeper into more low-level concepts…]]></description><link>https://dkharazi.github.io/blog/spark-architecture</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-architecture</guid><pubDate>Mon, 11 Mar 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post provides a high-level introduction to generic objects in the Spark API, along with the responsibilities for each object. In the coming posts, we&apos;ll dive deeper into more low-level concepts. Meaning, we&apos;ll explore the Spark internals, along with some examples.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-two-common-questions&quot;&gt;Two Common Questions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-the-driver-program&quot;&gt;Driver Program&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-a-spark-context&quot;&gt;Spark Context&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-the-spark-scheduler&quot;&gt;Spark Scheduler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-the-cluster-manager&quot;&gt;Cluster Manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-workers-and-executors&quot;&gt;Workers and Executors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Two Common Questions&lt;/h2&gt;
&lt;p&gt;A Spark application runs on a cluster with either a standalone cluster manager, YARN cluster manager, or Mesos cluster manager. Although these clusters use very different component comparatively, they still need to answer the same two questions regardless of how each one is set up. Specifically, these questions relate to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Job scheduling&lt;/li&gt;
&lt;li&gt;Resource scheduling&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Job scheduling refers to determining which executors run a given set of tasks. To do this, we must know what resources are available, which includes CPU and RAM resources. Job scheduling is typically performed by Spark&apos;s scheduler, which will be defined in detail later.&lt;/p&gt;
&lt;p&gt;As stated previously, we must know the available resources in order to perform job scheduling. Implying, job scheduling requires resource scheduling, in order to function properly. Resource scheduling refers to determining which executors receive available resources. Specifically, this involves assigning CPU and RAM resources to executors, which represent units of processing in a cluster.&lt;/p&gt;
&lt;h2&gt;Defining the Driver Program&lt;/h2&gt;
&lt;p&gt;A driver program represents a Spark program. During the execution of a driver program, it requests for executor processes from the cluster manager. In particular, it requests for CPU and memory resources. Then, it organizes its application components into stages and tasks. The driver program is responsible for defining the tasks sent to executors of a Spark cluster. Then, it collects results from the executors. A driver program is represented as the following objects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A Spark &lt;code class=&quot;language-text&quot;&gt;Scheduler&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;A Spark Context&lt;/li&gt;
&lt;li&gt;A Spark application&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Defining a Spark Context&lt;/h2&gt;
&lt;p&gt;One of the most essential responsibilities of the driver program is to define and initialize a spark context. A spark context acts as the entry point for a driver program to interact with a Spark cluster. In other words, it connects to a spark cluster from an application. A spark context is responsible for configuring a spark session, which is not associated with workers or masters.&lt;/p&gt;
&lt;p&gt;Configurations related to a spark session include managing job execution, loading files, and saving files. This also includes assigning the number of executors allocated to an individual application, and includes assigning the number of cores per executor.&lt;/p&gt;
&lt;h2&gt;Defining the Spark Scheduler&lt;/h2&gt;
&lt;p&gt;Each driver program comes equipped with a Spark &lt;code class=&quot;language-text&quot;&gt;Scheduler&lt;/code&gt;, which runs in the background of a driver program. It can be configured via the Spark context. The &lt;code class=&quot;language-text&quot;&gt;Scheduler&lt;/code&gt; performs &lt;em&gt;job scheduling&lt;/em&gt; for its application after communicating with a cluster manager (of a cluster). Indicating, it is responsible for the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sending tasks to executors&lt;/li&gt;
&lt;li&gt;Requesting memory and CPU resources from a cluster manager&lt;/li&gt;
&lt;li&gt;Deciding which executors run which tasks&lt;/li&gt;
&lt;li&gt;Monitoring execution of tasks sent to executors&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notice, the decision of which executors run which tasks depends solely on Spark, rather than the cluster manager. In other words, the &lt;code class=&quot;language-text&quot;&gt;Scheduler&lt;/code&gt; is responsible for choosing which executors run its tasks, while the cluster manager is only responsible for initializing the executors and keeping them up and running.&lt;/p&gt;
&lt;h2&gt;Defining the Cluster Manager&lt;/h2&gt;
&lt;p&gt;Each cluster comes equipped with a cluster manager, which runs on a node in the cluster. In Spark, there are different types of cluster managers that demonstrate their own unique behavior in a cluster. By default, a Spark cluster runs a standalone cluster manager, which is referred to as a Spark master also. Other popular cluster managers include the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;Standalone:&lt;/code&gt; Spark Master&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;YARN:&lt;/code&gt; Application Master&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;Mesos:&lt;/code&gt; Mesos Master&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A cluster manager instructs workers to start executor processes when a driver program requests for resouces from the cluster. In order to do this, the cluster manager provides workers with information about the cluster&apos;s CPU and memory resources. The cluster manager performs &lt;em&gt;resource scheduling&lt;/em&gt; for its cluster. Indicating, it is responsible for the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Starting and stops processes in its cluster&lt;/li&gt;
&lt;li&gt;Assigning the maximum number of CPUs used by executors&lt;/li&gt;
&lt;li&gt;Dividing cluster resources among several applications&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In particular, resource scheduling involves distributing the resources of a cluster requested by several applications. Meaning, a cluster manager will make its cluster&apos;s resources available to applications via executors. In other words, the cluster manager allocates its cluster&apos;s CPU and memory resources to its executors for any applications to use.&lt;/p&gt;
&lt;h2&gt;Defining Workers and Executors&lt;/h2&gt;
&lt;p&gt;Each worker runs on a node in its cluster. A worker is responsible for launching executors and monitoring executors for the cluster manager, in case of any failures. Additionally, it is used for launching the driver process when &lt;code class=&quot;language-text&quot;&gt;-deploy-mode=cluster&lt;/code&gt;. Notice, certain components behave differently depending on the deployment mode of an application. For more information about these behaviors, refer to my &lt;a href=&quot;/blog/spark-deployment/&quot;&gt;next post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Similarly, each executors runs on a node in its cluster. There can only be one executor per worker. An executor is responsible for accepting tasks from the driver program, executing those tasks, and returning any results to the driver. Each executor has several &lt;em&gt;tasks slots&lt;/em&gt;. These slots are used to run tasks in parallel on the executor. Specifically, these tasks are implemented as threads, rather than individual CPU cores. Indicating, they do not correspond to the number of CPU cores on a machine.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Datasets and DataFrames]]></title><description><![CDATA[Describing Spark SQL Unlike the basic Spark  API, the Spark SQL API provides additional data structures used for holding data and performing computations. As a result, Spark SQL is able to perform…]]></description><link>https://dkharazi.github.io/blog/spark-dataset</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-dataset</guid><pubDate>Sun, 03 Mar 2019 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Describing Spark SQL&lt;/h2&gt;
&lt;p&gt;Unlike the basic Spark &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; API, the Spark SQL API provides additional data structures used for holding data and performing computations. As a result, Spark SQL is able to perform improved optimizations. Specifically, Spark SQL is used for executing SQL queries. Spark SQL can also be used to read data from an existing Hive installation. When running SQL from within another programming language, the results are returned as a &lt;code class=&quot;language-text&quot;&gt;Dataset&lt;/code&gt; or &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Datasets and DataFrames&lt;/h2&gt;
&lt;p&gt;Similar to an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;, a &lt;code class=&quot;language-text&quot;&gt;Dataset&lt;/code&gt; is a distributed collection of data that can be cached in memory. It provides the following benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Strong typing&lt;/li&gt;
&lt;li&gt;Ability to use powerful lambda functions&lt;/li&gt;
&lt;li&gt;Optimized execution engine&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt; is a &lt;code class=&quot;language-text&quot;&gt;Dataset&lt;/code&gt; organized into named columns. It is conceptually equivalent to a table in a relational database or a pandas &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;. However, it has improved internal optimizations. For a more detailed description of the benefits of &lt;code class=&quot;language-text&quot;&gt;Datasets&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;DataFrames&lt;/code&gt;, please refer to this &lt;a href=&quot;https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;article&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Visualizing DAGs in Spark]]></title><description><![CDATA[The goal of this post is to provide a general introduction to the  API. Each example has a snippet of PySpark code with explanations. Another goal is to provide a general introduction to Spark's web…]]></description><link>https://dkharazi.github.io/blog/spark-dag</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-dag</guid><pubDate>Wed, 27 Feb 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The goal of this post is to provide a general introduction to the &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; API. Each example has a snippet of PySpark code with explanations. Another goal is to provide a general introduction to Spark&apos;s web UI. Certain examples have &lt;code class=&quot;language-text&quot;&gt;DAG&lt;/code&gt; visualizations for jobs and stages. Spark starts a web UI for each &lt;code class=&quot;language-text&quot;&gt;SparkContext&lt;/code&gt; that is initialized.&lt;/p&gt;
&lt;p&gt;This will only include rudimentary examples of methods in the &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; API. For more detailed illustrations and explanations of these concepts, refer to this &lt;a href=&quot;https://stackoverflow.com/a/37529233/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;post&lt;/a&gt; and this &lt;a href=&quot;https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;article&lt;/a&gt;. In the coming posts, we&apos;ll dive deeper into more generic objects in the Spark API. Then, we&apos;ll explore low-level concepts, including the Spark internals.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#setting-up-a-sparksession&quot;&gt;Setting up a SparkSession&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#using-an-rdd-with-python&quot;&gt;Using an RDD with Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#counting-words-from-files&quot;&gt;Counting Words from Files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#visualizing-the-dag&quot;&gt;Visualizing the DAG&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Setting up a SparkSession&lt;/h2&gt;
&lt;p&gt;Before walking through any code examples, we need to create a &lt;code class=&quot;language-text&quot;&gt;SparkSession&lt;/code&gt;. Each &lt;code class=&quot;language-text&quot;&gt;SparkSession&lt;/code&gt; acts as an entry point into Spark programming with &lt;code class=&quot;language-text&quot;&gt;RDDs&lt;/code&gt;. After executing the setup code, we&apos;ll be able to use the session in our examples below.&lt;/p&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;builder&lt;/code&gt; object of a SparkSession provides access to the associated application name, associated master URL, and configuration options. It also provides access to a &lt;code class=&quot;language-text&quot;&gt;getOrCreate&lt;/code&gt; method, which initializes a SparkSession after setting options in the &lt;code class=&quot;language-text&quot;&gt;builder&lt;/code&gt; object.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;from&lt;/span&gt; pyspark&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sql &lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; SparkSession
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; spark &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; SparkSession \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;builder \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;appName&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;TxtRdr&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;         &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getOrCreate&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; sc &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; spark&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sparkContext&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Some examples use a set of files named &lt;code class=&quot;language-text&quot;&gt;hello.txt&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;order.txt&lt;/code&gt;. A SparkContext will read these files using the &lt;code class=&quot;language-text&quot;&gt;textFile&lt;/code&gt; method. After creating an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;, we&apos;ll call methods to transform and filter the data in the files listed below.&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;hello.txt&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Hello world
This is a file&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;order.txt&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Hello friend
This file is my order
Burger, fries, soda&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice, there are two similar words found in both files. Specifically, these words are &lt;em&gt;Hello&lt;/em&gt; and &lt;em&gt;file&lt;/em&gt;. In the upcoming examples, we&apos;ll run some standard Python code and Spark code to find these words. Then, we&apos;ll count the number of each word using Spark.&lt;/p&gt;
&lt;h2&gt;Using an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; with Python&lt;/h2&gt;
&lt;p&gt;Before performing any transformations, the data needs to be read using the &lt;code class=&quot;language-text&quot;&gt;textFile&lt;/code&gt; method. Specifically, the SparkContext will read the &lt;code class=&quot;language-text&quot;&gt;hello.txt&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;orders.txt&lt;/code&gt; files using this method, which returns an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; object. After calling this method, the returned &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; essentially contains a list of strings, where each string represents a line from the file.&lt;/p&gt;
&lt;p&gt;The &lt;code class=&quot;language-text&quot;&gt;flatMap&lt;/code&gt; method maps a function that is defined using the lambda keyword to our &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;. In this example, our function separates each line into individual strings based on any spaces. Although it isn&apos;t used here, the &lt;code class=&quot;language-text&quot;&gt;map&lt;/code&gt; method is very similar to the &lt;code class=&quot;language-text&quot;&gt;flatMap&lt;/code&gt; method. Specifically, the &lt;code class=&quot;language-text&quot;&gt;map&lt;/code&gt; method performs an extra step, which involves storing these individual strings into a list for each line.&lt;/p&gt;
&lt;p&gt;Ultimately, the &lt;code class=&quot;language-text&quot;&gt;collect&lt;/code&gt; method is called, which returns all the elements from the &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; as a list to the driver program. This method is an action. In other words, the transformations prior to the &lt;code class=&quot;language-text&quot;&gt;collect&lt;/code&gt; method aren&apos;t computed until the &lt;code class=&quot;language-text&quot;&gt;collect&lt;/code&gt; method is called. The code snippet below creates an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; and performs the described operations on the &lt;code class=&quot;language-text&quot;&gt;hello.txt&lt;/code&gt; file.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; hello &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;hello.txt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;           &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;flatMap&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; c&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; c&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;split&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos; &apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;           &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;collect&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For this example, there&apos;s still a second file with words that should be counted by the same Spark program.
Through a similar process as before, the &lt;code class=&quot;language-text&quot;&gt;collect&lt;/code&gt; method is called for reading in the &lt;code class=&quot;language-text&quot;&gt;order.txt&lt;/code&gt; file. Additionally, the other methods are called for function mapping and data collection.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; order &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;order.txt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;           &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;flatMap&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; c&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; c&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;split&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos; &apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;           &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;collect&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After executing these two code snippets, the resulting &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; includes a long list of each word in both files. Since the &lt;code class=&quot;language-text&quot;&gt;collect&lt;/code&gt; method outputs a list from the &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;, we can perform standard Python functions on it. As an example, we can use Python to find the common words between the two &lt;code class=&quot;language-text&quot;&gt;RDDs&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;hello&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;order&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;Hello&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&apos;file&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Counting Words from Files&lt;/h2&gt;
&lt;p&gt;Now, we may prefer to compute some of these operations in Spark. By performing filters in CPython, we will lose the benefit of distributed computation in Spark. If &lt;code class=&quot;language-text&quot;&gt;hello.txt&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;orders.txt&lt;/code&gt; are  much larger files, or if we instead count the words of thousands of files, filtering in Spark becomes much faster compared to filtering in ordinary CPython on a single machine.&lt;/p&gt;
&lt;p&gt;To illustrate this point, let&apos;s go through an example that is very similar to the previous one, but ultimately performs counting and filtering.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; hello &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;hello.txt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; order &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;order.txt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; counts &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; hello&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;union&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;order&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;               &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;flatMap&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; w&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; w&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;split&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos; &apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;               &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; c&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;               &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;reduceByKey&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;add&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;               &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;lambda&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;               &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;collect&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again, calling the &lt;code class=&quot;language-text&quot;&gt;flatMap&lt;/code&gt; method returns an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; containing individual words as strings. The &lt;code class=&quot;language-text&quot;&gt;map&lt;/code&gt; method converts each string into a tuple containing words and their frequency. Then, the &lt;code class=&quot;language-text&quot;&gt;reduceByKey&lt;/code&gt; operation loops through each key-value pair and adds up the values for any repeated keys. Lastly, the &lt;code class=&quot;language-text&quot;&gt;filter&lt;/code&gt; method will only include key-value pairs, where the value is greater than one.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; counts
&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;Hello&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;file&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As expected, the program outputs two key-value pairs. The first key-value pair is &lt;code class=&quot;language-text&quot;&gt;(&amp;#39;Hello&amp;#39;, 2)&lt;/code&gt;, since there are two of these keys in both files. Similarly, the second key-value pair is &lt;code class=&quot;language-text&quot;&gt;(&amp;#39;file&amp;#39;, 2)&lt;/code&gt;, since there are two of these keys in both files.&lt;/p&gt;
&lt;h2&gt;Visualizing the DAG&lt;/h2&gt;
&lt;p&gt;The number of jobs for an application equals the number of &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; actions. In our example, we can see there is a single job. This is because there is only one action, which is &lt;code class=&quot;language-text&quot;&gt;collect&lt;/code&gt;. Instead of having a single job, we would have two jobs with this code:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; hello &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;hello.txt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;collect&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; order &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;order.txt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;collect&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The number of additional stages equals the number of wide transformations in an application. In our example, we can see there are two stages in total, but only a single additional stage. This is because there is only one wide transformation, which is &lt;code class=&quot;language-text&quot;&gt;reduceByKey&lt;/code&gt;. Notice, the web UI in Spark provides a nice visualization of the &lt;code class=&quot;language-text&quot;&gt;DAG&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/fc9c48f25fb6bb781bc41b4773ca7fa5/pysparkwordcounts.svg&quot; alt=&quot;wordcounts&quot;&gt;&lt;/p&gt;
&lt;p&gt;Recall, the number of tasks within a stage equals the number of partitions in an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;. By default, Spark assigns the number of paritions to be two. Meaning, the default number of tasks per stage is two. The number of partitions is assigned to an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; when it is initialized. Thus, this parameter can be adjusted when calling &lt;code class=&quot;language-text&quot;&gt;textFile(file, num_partitions)&lt;/code&gt;. Therefore, our stage will have six tasks (instead of four) if we change this line of code:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;python&quot;&gt;&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;&gt;&lt;/span&gt; hello &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; sc&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;textFile&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&apos;hello.txt&apos;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since Spark assigns the number of paritions of an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; during initialization, the number of tasks are determined after shuffling as well. This is because Spark creates a new &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; after shuffling. This object is called a &lt;code class=&quot;language-text&quot;&gt;ShuffleRDD&lt;/code&gt;. For a more detailed description of how tasks are separated and organized in Spark, refer to this &lt;a href=&quot;https://stackoverflow.com/a/37759913/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;post&lt;/a&gt;.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Spark RDD Fundamentals]]></title><description><![CDATA[This post provides a high-level introduction to the RDD object in the Spark API. In the coming posts, we'll dive deeper into more generic objects in the Spark API. Then, we'll explore low-level…]]></description><link>https://dkharazi.github.io/blog/spark-rdd</link><guid isPermaLink="false">https://dkharazi.github.io/blog/spark-rdd</guid><pubDate>Fri, 15 Feb 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post provides a high-level introduction to the RDD object in the Spark API. In the coming posts, we&apos;ll dive deeper into more generic objects in the Spark API. Then, we&apos;ll explore low-level concepts, including the Spark internals.&lt;/p&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#resilient-distributed-datasets&quot;&gt;Resilient Distributed Datasets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#defining-a-dagscheduler&quot;&gt;Defining a DAGScheduler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#types-of-transformations&quot;&gt;Types of Transformations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#lifecycle-of-a-spark-program&quot;&gt;Lifecycle of a Spark Program&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Resilient Distributed Datasets&lt;/h2&gt;
&lt;p&gt;Most likely, we&apos;ve all worked with pandas &lt;code class=&quot;language-text&quot;&gt;DataFrames&lt;/code&gt; before. They&apos;re in-memory, &lt;em&gt;single-server&lt;/em&gt; data structures that offer many user-friendly functions for data processing. Functionally, Spark provides a data structure that is very similar in this sense, but can be used across multiple servers. This data structure is called a &lt;em&gt;resilient distributed dataset&lt;/em&gt;, or an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; for short. In short, an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; is an in-memory data structure that is distributed across many servers within a Spark cluster.&lt;/p&gt;
&lt;p&gt;Roughly, we can think of an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; as a distributed version of a pandas &lt;code class=&quot;language-text&quot;&gt;DataFrame&lt;/code&gt;. I&apos;m making this comparison because RDDs offer many pandas-like functions that are focused around data processing. These functions are called &lt;a href=&quot;https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Transformations&lt;/a&gt; and &lt;a href=&quot;https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Actions&lt;/a&gt;. Specifically, transformations create a new dataset from an existing one. Contrastingly, actions return non-dataset values, which generally relate to some aggregation.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/b6e026fce430f4b5b89e1bab73bac406/sparkaction.svg&quot; alt=&quot;diagram of rdd to transf to action&quot;&gt;&lt;/p&gt;
&lt;p&gt;Transformations are &lt;em&gt;lazy&lt;/em&gt;. Meaning, an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; isn&apos;t computed until it receives an action. Actions always return values to the driver program. Spark receives a performance boost from any lazy evaluations. However, this could become a problem if users continuously recompute that same transformation. As a result, Spark allows us to persist an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; to memory using the &lt;code class=&quot;language-text&quot;&gt;persist&lt;/code&gt; method. To summarize, transformations and actions have the following properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transformations are lazy by default&lt;/li&gt;
&lt;li&gt;Actions aren&apos;t lazy&lt;/li&gt;
&lt;li&gt;Transformations return a new &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Actions return an aggregated value of the &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As stated previously, an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; receives a huge boost in performance by keeping data in memory. However, Spark supports data persistance to disk as well. Spark also supports data persistence to databases. To summarize, an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; has the following properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distributed&lt;/li&gt;
&lt;li&gt;Fault-tolerant&lt;/li&gt;
&lt;li&gt;Flexible functions such as &lt;code class=&quot;language-text&quot;&gt;map(func)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Optionally in-memory on the Driver&apos;s JVM&lt;/li&gt;
&lt;li&gt;Parallelizable using &lt;code class=&quot;language-text&quot;&gt;sc.parallelize(data)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Immutable (more on this later)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Defining a &lt;code class=&quot;language-text&quot;&gt;DAGScheduler&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Recall, a transformation is a type of special &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; method that returns another &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; object. Again, these methods aren&apos;t computed until the &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; receives an action, indicating that &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; objects are immutable. Since &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; objects can&apos;t &lt;em&gt;change&lt;/em&gt; once they are created, Spark creates a new object called a &lt;code class=&quot;language-text&quot;&gt;DAG&lt;/code&gt; when an action is called. In a &lt;code class=&quot;language-text&quot;&gt;DAG&lt;/code&gt;, each node is an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; partition, and an edge is a transformation.&lt;/p&gt;
&lt;p&gt;Spark breaks the &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; into smaller chunks of data called &lt;em&gt;partitions&lt;/em&gt;. In Spark, a partition is a chunk of data on a node within the cluster. At a high level, Spark breaks transformations and actions into &lt;code class=&quot;language-text&quot;&gt;Tasks&lt;/code&gt;, which are mapped to partitions. Essentially, a &lt;code class=&quot;language-text&quot;&gt;Task&lt;/code&gt; represents a unit of work on a partition of a distributed dataset.&lt;/p&gt;
&lt;p&gt;Assuming nonsequential dependence, &lt;code class=&quot;language-text&quot;&gt;Tasks&lt;/code&gt; are executed in parallel on partitions. Thus, the number of partitions made up by an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; should equal the number of CPU cores within a cluster. Theoretically, increasing the number of partitions would increase the amount of parallelism for a system, assuming there are available CPU cores. If an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt; can&apos;t fit an entire set of data into memory, then that data will be stored to and read from disk.&lt;/p&gt;
&lt;p&gt;Now, let&apos;s return to our previous discussion about the &lt;code class=&quot;language-text&quot;&gt;DAG&lt;/code&gt; object. When an action is called, each &lt;code class=&quot;language-text&quot;&gt;DAG&lt;/code&gt; is submitted to a &lt;code class=&quot;language-text&quot;&gt;DAGScheduler&lt;/code&gt; object for execution. A &lt;code class=&quot;language-text&quot;&gt;DAGScheduler&lt;/code&gt; organizes operations into &lt;code class=&quot;language-text&quot;&gt;Stages&lt;/code&gt;, and a &lt;code class=&quot;language-text&quot;&gt;Stage&lt;/code&gt; is organized into &lt;code class=&quot;language-text&quot;&gt;Tasks&lt;/code&gt;. Each &lt;code class=&quot;language-text&quot;&gt;Task&lt;/code&gt; is scheduled separately. It represents a unit of work on a partition of an &lt;code class=&quot;language-text&quot;&gt;RDD&lt;/code&gt;, and is executed as a thread in an executor&apos;s JVM. The &lt;code class=&quot;language-text&quot;&gt;DAGScheduler&lt;/code&gt; returns a &lt;code class=&quot;language-text&quot;&gt;TaskSet&lt;/code&gt; object, which is passed to a &lt;code class=&quot;language-text&quot;&gt;TaskScheduler&lt;/code&gt;. The &lt;code class=&quot;language-text&quot;&gt;TaskScheduler&lt;/code&gt; launches tasks in the a cluster manager.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/28e6454b64974a357969e1e53fbe616c/sparktasks.svg&quot; alt=&quot;SparkTaskLifecycle&quot;&gt;&lt;/p&gt;
&lt;p&gt;Multiple tasks can be executed in parallel for any stage. Specifically, any two stages can be executed in parallel if they aren&apos;t sequentially dependent on each other. Implying, tasks from one stage can be executed in parallel with tasks from a separate stage, if they aren&apos;t sequentially dependent on each other. Refer to &lt;a href=&quot;https://stackoverflow.com/a/41340858/12777044&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt; for an illustration of how &lt;code class=&quot;language-text&quot;&gt;Tasks&lt;/code&gt; and stages run in parallel.&lt;/p&gt;
&lt;p&gt;The number of tasks is equal to the number of partitions. The number of stages is equal to the number of wide transformations. For examples that may help illustrate these concepts, refer to my &lt;a href=&quot;/blog/spark-dag/&quot;&gt;next post&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Types of Transformations&lt;/h2&gt;
&lt;p&gt;There are two types of transformations that can be applied to &lt;code class=&quot;language-text&quot;&gt;RDDs&lt;/code&gt;: narrow transformations and wide transformations. Narrow transformations refer to transformations where each partition contributes to one stage only. These include transformations like &lt;code class=&quot;language-text&quot;&gt;map&lt;/code&gt;, &lt;code class=&quot;language-text&quot;&gt;filter&lt;/code&gt;, etc. Wide transformations refer to transformations where each partition contributes to many stages. In Spark, this concept is called &lt;em&gt;shuffling&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Shuffling is used for regrouping data between partitions. Shuffling is necessary for situations requiring information from each partition. Wider transformations are more expensive than narrow transformations in comparison. For example, the &lt;code class=&quot;language-text&quot;&gt;map&lt;/code&gt; transformation doesn&apos;t require shuffling, since it applies element-wise transformations to each partition. This technique is called pipelining. In other words, an element in one partition doesn&apos;t need any information from other partitions. On the other hand, the &lt;code class=&quot;language-text&quot;&gt;groupByKey&lt;/code&gt; wide transformation needs information from each partition. Specifically, a narrow transformation keeps its results in memory, whereas a wide transformation writes its results to disk. This &lt;a href=&quot;https://0x0fff.com/spark-architecture-shuffle/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;post&lt;/a&gt; defines optimized shuffling algorithms in detail.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/dc7f75a874f8b22bdacf7fbcffe7758e/sparktransformation.svg&quot; alt=&quot;SparkNarrowAndWideTransformation&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Lifecycle of a Spark Program&lt;/h2&gt;
&lt;p&gt;Now, we have a high-level level understanding of the core Spark data structures. This &lt;a href=&quot;https://www.youtube.com/watch?v=7ooZ4S7Ay6Y&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;lecture&lt;/a&gt; defines the lifecycle of a Spark program in detail.  Generally, a common lifecycle of a spark program looks like the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create some input RDDs from external data&lt;/li&gt;
&lt;li&gt;Lazily transform them to define new RDDs using transformations&lt;/li&gt;
&lt;li&gt;Ask Spark to &lt;code class=&quot;language-text&quot;&gt;cache()&lt;/code&gt; any intermediate RDDs that will be reused&lt;/li&gt;
&lt;li&gt;Launch actions to start parallel computation&lt;/li&gt;
&lt;li&gt;Spark optimizes and executes its computations&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title><![CDATA[Hadoop as a Distributed OS]]></title><description><![CDATA[Before investigating Spark in detail, we should develop a core intuition behind Hadoop. This post compares Hadoop to a traditional computer operating system. In the coming posts, we'll begin exploring…]]></description><link>https://dkharazi.github.io/blog/hadoop-os</link><guid isPermaLink="false">https://dkharazi.github.io/blog/hadoop-os</guid><pubDate>Mon, 22 Oct 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Before investigating Spark in detail, we should develop a core intuition behind Hadoop. This post compares Hadoop to a traditional computer operating system. In the coming posts, we&apos;ll begin exploring generic objects in the Spark API. Then, we&apos;ll dive deeper into more low-level concepts, including the Spark internals.&lt;/p&gt;
&lt;p&gt;Developing a core intuition behind Hadoop is an important first step before investigating Spark in great detail. Recall, a basic computer operating system constists of two essential components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A file system&lt;/li&gt;
&lt;li&gt;A scheduler&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As a reminder, a file system manages user data. A scheduler manages any running process or program in the system. These programs involve storing, retrieving, or updating the data in the file system.&lt;/p&gt;
&lt;p&gt;Roughly, Hadoop can be seen as a &lt;em&gt;distributed&lt;/em&gt; operating system. In this comparison, YARN represents a distributed scheduler. Similar to a scheduler in a basic operating system, YARN does the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monitors computing resources&lt;/li&gt;
&lt;li&gt;Schedules jobs involving processing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, the key difference is YARN performs these functions across many different machines. In a similar fashion, HDFS differs from a standard file system, because it manages user data across many different machines. Lastly, MapReduce programs are the distributed form of programs in a traditional computer system.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Relevance of Hadoop]]></title><description><![CDATA[It's been 14 years since the initial release of Apache Hadoop, which is a long time for any software. Unsuprisingly, the internet is flooded with clickbait articles about Hadoop being replaced by…]]></description><link>https://dkharazi.github.io/blog/hadoop-relevance</link><guid isPermaLink="false">https://dkharazi.github.io/blog/hadoop-relevance</guid><pubDate>Thu, 17 May 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It&apos;s been 14 years since the initial release of &lt;a href=&quot;https://hadoop.apache.org/&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;Apache Hadoop&lt;/a&gt;, which is a long time for any software. Unsuprisingly, the internet is flooded with &lt;a href=&quot;https://www.google.com/search?q=is+hadoop+dead&amp;#x26;rlz=1C5CHFA_enUS832US832&amp;#x26;oq=is+hadoop+dead&amp;#x26;aqs=chrome..69i57j0l4j69i60l3.254j0j7&amp;#x26;sourceid=chrome&amp;#x26;ie=UTF-8&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;clickbait articles&lt;/a&gt; about Hadoop being replaced by newer and improved alternatives. This isn&apos;t to say these opinions don&apos;t have any merit. Even now, Hadoop remains essential for many of today&apos;s largest enterprises with their own data centers. However, other companies have the liberty to outsource sensitive data to the cloud. In this case, cloud storage services, such as &lt;a href=&quot;https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;AWS S3&lt;/a&gt;, can easily replace HDFS in the cloud.&lt;/p&gt;
&lt;p&gt;Recall, Hadoop has two other frameworks in its ecosystem: the &lt;code class=&quot;language-text&quot;&gt;YARN&lt;/code&gt; resource manager and the &lt;code class=&quot;language-text&quot;&gt;MapReduce&lt;/code&gt; computing paradigm. Recently, some companies have replaced &lt;code class=&quot;language-text&quot;&gt;YARN&lt;/code&gt; with Kubernetes for scheduling, and most have replaced &lt;code class=&quot;language-text&quot;&gt;MapReduce&lt;/code&gt; with Apache Spark for computing. Specifically, Apache Spark excels when used in the cloud and data centers, contributing to its wild popularity. Hopefully, this serves as motivation to learning more about Spark.&lt;/p&gt;
&lt;p&gt;This post hopefully motivates the use-cases and benefits of Spark in today&apos;s world. The comping posts will provide a high-level introduction to generic objects in the Spark API. Then, we&apos;ll dive deeper into more low-level concepts, which includes the Spark internals.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Running Hugo on GitHub]]></title><description><![CDATA[In this post, we walk through the steps of running a site on GitHub created by a static site generator. This post assumes a directory containing hugo source files has already been created. For more…]]></description><link>https://dkharazi.github.io/blog/hugo</link><guid isPermaLink="false">https://dkharazi.github.io/blog/hugo</guid><pubDate>Tue, 10 Apr 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In this post, we walk through the steps of running a site on GitHub created by a static site generator. This post assumes a directory containing hugo source files has already been created. For more information about hosting a project on GitHub as a submodule, refer to &lt;a href=&quot;https://gohugo.io/hosting-and-deployment/hosting-on-github/#step-by-step-instructions&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;this article&lt;/a&gt;.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Access the local hugo directory&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We&apos;ll refer to this directory as &lt;code class=&quot;language-text&quot;&gt;./mysite&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;This directory will contain the files on the hugo site&lt;/li&gt;
&lt;li&gt;The directory should look like this:&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;mysite/
├── archetypes/
├── config.toml
├── content/
├── data/
├── layouts/
├── public/
├── resources/
├── static/
└── themes/&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;
&lt;p&gt;Create a new repository in GitHub&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Call this repository &lt;code class=&quot;language-text&quot;&gt;&amp;lt;username&amp;gt;.github.io&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Where &lt;code class=&quot;language-text&quot;&gt;&amp;lt;username&amp;gt;&lt;/code&gt; is a github username&lt;/li&gt;
&lt;li&gt;This repository will contain the rendered version of the site&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Call &lt;code class=&quot;language-text&quot;&gt;hugo&lt;/code&gt; to create a public directory&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This command build the site to the public directory&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;In the new &lt;code class=&quot;language-text&quot;&gt;public&lt;/code&gt; directory, initialize the git repo&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;bash&quot;&gt;&lt;pre class=&quot;language-bash&quot;&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;token operator&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;token builtin class-name&quot;&gt;cd&lt;/span&gt; public
$ &lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; init
$ &lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;token builtin class-name&quot;&gt;.&lt;/span&gt;
$ &lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; remote &lt;span class=&quot;token function&quot;&gt;add&lt;/span&gt; origin https://github.com/&amp;lt;username&gt;/&amp;lt;username&gt;.github.io.git
$ &lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; commit -m &lt;span class=&quot;token string&quot;&gt;&quot;Commit to site&quot;&lt;/span&gt;
$ &lt;span class=&quot;token function&quot;&gt;git&lt;/span&gt; push origin master&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content:encoded></item></channel></rss>